Cleaningscalar · returns varchar

IMPUTE

Statistical imputation for missing values (distribution-aware)

Per-row — runs once for each row.

cleaningllmtext

Syntax

IMPUTE({{ value }}, '{{ context }}')

IMPUTE({{ value }}, '{{ context }}', '{{ field_name }}')

{{ value }} IMPUTED FROM '{{ context }}'

Arguments

name	type	description
value	VARCHAR	—
context	VARCHAR	Statistical context: group values, similar records, distribution info
field_name(optional)	VARCHAR	Field name to impute (e.g., first_name, age)

About

Statistical imputation for missing values — fills gaps by reasoning about what would be distribution-consistent for the field given its context. More careful than FILL, which just picks the most likely guess. Useful when you're preparing data for analysis or modeling and need missing values replaced without biasing downstream statistics. Pass an optional `field_name` to anchor what kind of field is being imputed. For simple best-guess fills, FILL is cheaper and usually good enough. Reach for IMPUTE when you need the imputation to respect distributional reasoning — age columns, numeric ranges, categorical frequencies.

Examples

First name imputed from context

SELECT
  impute ('unknown', 'name: John Smith', 'first_name')

Nearby rabbit holes

same domain

scalar

ANONYMIZE

Remove or mask personally identifiable information from text

cleaningllmtext

scalar

AS_SMART

Type-cast messy real-world values that trip up standard CAST

cleaningllmtext

scalar

CANONICAL

Return the canonical/official form of a value (auto-detects entity type)

cleaningllmtext

scalar

CLEAN_YEAR

Extracts 4-digit year from messy text, returns -1 if undetermined

cleaningllmdate

scalar

CLEAN_YEAR_LLM

LLM-backed year extraction (escape hatch for CLEAN_YEAR)

cleaningllmllm-escape-hatchdate

aggregate

COALESCE_SMART

Pick the best non-null value from a group (quality-aware COALESCE)

cleaningllmtext

Climb back to The Looking Glass

surface↓cleaning↓impute

Cleaningscalar · returns varchar

IMPUTE

Statistical imputation for missing values (distribution-aware)

Per-row — runs once for each row.

cleaningllmtext

Syntax

IMPUTE({{ value }}, '{{ context }}')

IMPUTE({{ value }}, '{{ context }}', '{{ field_name }}')

{{ value }} IMPUTED FROM '{{ context }}'

Arguments

name	type	description
value	VARCHAR	—
context	VARCHAR	Statistical context: group values, similar records, distribution info
field_name(optional)	VARCHAR	Field name to impute (e.g., first_name, age)

About

Examples

First name imputed from context

SELECT
  impute ('unknown', 'name: John Smith', 'first_name')

Nearby rabbit holes

same domain

scalar

ANONYMIZE

Remove or mask personally identifiable information from text

cleaningllmtext

scalar

AS_SMART

Type-cast messy real-world values that trip up standard CAST

cleaningllmtext

scalar

CANONICAL

Return the canonical/official form of a value (auto-detects entity type)

cleaningllmtext

scalar

CLEAN_YEAR

Extracts 4-digit year from messy text, returns -1 if undetermined

cleaningllmdate

scalar

CLEAN_YEAR_LLM

LLM-backed year extraction (escape hatch for CLEAN_YEAR)

cleaningllmllm-escape-hatchdate

aggregate

COALESCE_SMART

Pick the best non-null value from a group (quality-aware COALESCE)

cleaningllmtext

Climb back to The Looking Glass