surfacecleaningimpute
Cleaningscalar · returns varchar

IMPUTE

Statistical imputation for missing values (distribution-aware)

Per-row — runs once for each row.

cleaningllmtext

Syntax

IMPUTE({{ value }}, '{{ context }}')
IMPUTE({{ value }}, '{{ context }}', '{{ field_name }}')
{{ value }} IMPUTED FROM '{{ context }}'

Arguments

nametypedescription
valueVARCHAR
contextVARCHARStatistical context: group values, similar records, distribution info
field_name(optional)VARCHARField name to impute (e.g., first_name, age)

About

Statistical imputation for missing values — fills gaps by reasoning about what would be distribution-consistent for the field given its context. More careful than FILL, which just picks the most likely guess. Useful when you're preparing data for analysis or modeling and need missing values replaced without biasing downstream statistics. Pass an optional `field_name` to anchor what kind of field is being imputed. For simple best-guess fills, FILL is cheaper and usually good enough. Reach for IMPUTE when you need the imputation to respect distributional reasoning — age columns, numeric ranges, categorical frequencies.

Examples

First name imputed from context

SELECT
  impute ('unknown', 'name: John Smith', 'first_name')

Nearby rabbit holes

same domain
Climb back to The Looking Glass