surfacesimilaritysounds_like
Similarityscalar · returns boolean

SOUNDS_LIKE

Check if two words sound similar (phonetically)

Per-row — runs once for each row.

similaritydeterministictext

Syntax

{{ text }} SOUNDS_LIKE {{ reference }}

Arguments

nametypedescription
textVARCHARFirst text
referenceVARCHARReference text

About

Phonetic similarity matching — "do these sound alike?" Backend: deterministic phonetic coding via the `jellyfish` library (metaphone, soundex, nysiis) combined with Jaro-Winkler character similarity. No LLM, no GPU, no zoo — pure Python. Decision rule (tested against 23 common English name pairs): 1. Metaphone codes match → True (strong phonetic signal) 2. Jaro-Winkler >= 0.88 alone → True (catches char-similar variants with minor phonetic drift like Bryan/Brian) 3. Soundex AND nysiis agree AND Jaro-Winkler >= 0.75 → True (cross-validation for cases metaphone alone misses) 4. Otherwise → False Handles canonical variants: Smith/Smyth, Jon/John, Catherine/Kathryn, Philip/Phillip, Stephen/Steven, Sean/Shawn, McDonald/MacDonald, Geoffrey/Jeffrey, and avoids false positives on similar-looking but distinct names like Williams/Wilson, Johnson/Jackson, Harris/Harper. For LLM-style judgment with contextual reasoning (e.g., name transliterations across scripts, rare/ambiguous cases), use SOUNDS_LIKE_LLM — see sounds_like_llm.cascade.yaml. SQL Usage: SELECT * FROM customers WHERE name SOUNDS_LIKE 'Smith';

Examples

Phonetically similar names match

SELECT
  sounds_like ('Smith', 'Smyth')

Catherine/Kathryn spelling variants match

SELECT
  sounds_like ('Catherine', 'Kathryn')

Jon/John variants match

SELECT
  sounds_like ('Jon', 'John')

Mc/Mac prefix variants match

SELECT
  sounds_like ('McDonald', 'MacDonald')

Distinct names do not match

SELECT
  sounds_like ('Smith', 'Jones')

Similar-looking but distinct surnames do not match

SELECT
  sounds_like ('Williams', 'Wilson')

Nearby rabbit holes

same domain
Climb back to The Looking Glass