surfaceentity resolutiondedupe_llm
Entity Resolutionaggregate · returns varchar

DEDUPE_LLM

LLM-backed deduplication (escape hatch for DEDUPE)

Per-group — reads the whole group in one call.

entity-resolutionllmllm-escape-hatchscales-largejson

Syntax

DEDUPE_LLM({{ texts }})
DEDUPE_LLM({{ texts }}, '{{ criteria }}')

Arguments

nametypedescription
textsJSON
criteriaVARCHAR

About

LLM-backed escape hatch for semantic deduplication. Use when the canonical embedding-threshold path (DEDUPE) is insufficient because the dedup requires world knowledge or nuanced judgment — e.g., knowing that "IBM" and "International Business Machines" are the same entity, or handling abbreviations and acronyms that don't cluster well in the embedding space. For routine fuzzy dedup (typos, variant spellings, name forms), prefer DEDUPE — it is 100-1000x faster and scales to 10K+ values.

Examples

LLM escape hatch dedups abbreviations with expanded forms

WITH
  test_data AS (
    SELECT
      *
    FROM
      (
        VALUES
          ('IBM'),
          ('International Business Machines'),
          ('Apple Inc.'),
          ('Apple Computer')
      ) AS t (name)
  )
SELECT
  DEDUPE_LLM (name)
FROM
  test_data

Nearby rabbit holes

same domain
Climb back to The Looking Glass