surfacecleaningnormalize
Cleaningscalar · returns varchar

NORMALIZE

Standardize a value to its canonical form for a given type

Per-row — runs once for each row.

cleaningllmtext

Syntax

NORMALIZE({{ value }}, '{{ type }}')
NORMALIZE({{ value }}, '{{ type }}', '{{ context }}')
{{ value }} NORMALIZE AS '{{ type }}'

Arguments

nametypedescription
valueVARCHAR
typeVARCHAREntity type: company, address, name, phone, email, country, state
context(optional)VARCHAROptional context for disambiguation

About

Standardize a value to its canonical form for a known type. Handles variations, abbreviations, typos, and different representations of the same entity. Pass the type as the second argument (`email`, `phone`, `country`, `company`, etc.) to steer normalization. Examples: `US` / `USA` / `United States` / `u.s.a.` → `United States` `john.smith+tag@Gmail.COM` → `john.smith@gmail.com` `apple inc` / `Apple, Inc.` / `AAPL` → `Apple Inc.` Useful for preparing data for fuzzy joins, grouping, or cross-system matching. Prefer CANONICAL when the type is unknown and you want the operator to detect it for you.

Examples

Company name normalized

SELECT
  normalize('MICROSOFT CORPORATION', 'company')

Email lowercased

SELECT
  normalize('john.smith@GMAIL.COM', 'email')

State abbreviation expanded or kept

SELECT
  normalize('CA', 'state')

Nearby rabbit holes

same domain
Climb back to The Looking Glass