surfaceclassificationquality
Classificationscalar · returns double

QUALITY

Assess data quality (0.0-1.0)

Per-row — runs once for each row.

classificationllmtext

Syntax

QUALITY({{ value }})
QUALITY({{ value }}, '{{ expected_type }}')

Arguments

nametypedescription
valueVARCHAR
expected_type(optional)VARCHARExpected type hint: address, name, phone, email, etc.

About

Assess data quality of a value. Returns a score from 0.0 (garbage) to 1.0 (perfect) based on completeness, validity, and formatting. Backend: deterministic scoring with four components — presence (is the value non-empty), length-appropriateness (not garbage-short, not absurdly-long), type-validity (when a type hint is given, uses the same pure-Python validators as LOOKS_LIKE: email-validator, phonenumbers, dateparser, usaddress, etc.), and cleanliness (whitespace, casing, noise characters). When `expected_type` is NOT provided the score only uses the generic non-type components, so "asdfghjkl" scores low and "John Smith" scores moderate. For LLM-style nuanced quality judgment (e.g., free-form copy evaluation, policy compliance scoring), use QUALITY_LLM — see quality_single_llm.cascade.yaml.

Examples

Perfect email should score high

SELECT
  quality ('john.smith@gmail.com')

Malformed email should score low

SELECT
  quality ('j@g')

Empty value should score near zero

SELECT
  quality ('')

Clean name scores high when hinted

SELECT
  quality ('John Smith', 'name')

Nearby rabbit holes

same domain
Climb back to The Looking Glass