Similarityscalar · returns json

MATCH

Fuzzy cross-match two string arrays via bge-m3 embeddings

Per-row — runs once for each row.

similarityllmjson

Arguments

name	type	description
left	JSON	JSON array of strings (left side)
right	JSON	JSON array of strings (right side)
threshold(optional)	DOUBLE	—
top_k(optional)	INTEGER	—

About

Fuzzy / semantic cross-match between two collections of text. Takes two JSON arrays of strings (the "left" and "right" sides), embeds them both with bge-m3 on the zoo GPU, computes a cosine similarity across the cross-product, and returns the (left, right) pairs whose similarity exceeds the supplied threshold. Use it for: • Entity resolution across two CRM snapshots • Dedup / linkage of customer records • Fuzzy product catalog merging • Retrieval-augmented generation candidate lookup The table-macro form is auto-generated by the registry — any cascade with `returns_columns` gets a `<name>_rows(...)` variant that can be used with FROM / LATERAL. Typical SQL shape: SELECT * FROM semantic_match_rows( (SELECT ARRAY_AGG(company_name) FROM salesforce), (SELECT ARRAY_AGG(name) FROM hubspot), threshold => 0.7, top_k => 5 ) WHERE score > 0.8 ORDER BY score DESC; Each row returned is (left_idx, right_idx, left_value, right_value, score). The caller can join back to the original tables by index to recover any columns beyond the matched text.

Examples

Fuzzy match picks the best pair from two company lists

SELECT
  left_value
FROM
  semantic_match_rows (
    JSON('["ACME Corporation","Globex Industries"]'),
    JSON('["acme corp","Globex Inc"]'),
    0.5,
    1
  )
ORDER BY
  score DESC
LIMIT
  1;

Nearby rabbit holes

same domain

scalar

ALIGNS

How strongly text supports a specific message or stance (0.0-1.0)

similarityllmtext

scalar

IMAGE_MATCHES

Check if an image semantically matches a text query (cross-modal)

scalar

IMAGE_SIMILARITY

Cross-modal cosine similarity between an image and a text query

scalar

MATCH_PAIR

Checks whether two values match under a relationship

scalar

MEANS

Returns TRUE if text semantically matches the criterion (cross-encoder)

similarityrerankerspecialist-zootext

scalar

MEANS_LLM

LLM-backed boolean match (escape hatch for MEANS when encoder-based matching is insufficient)

similarityllmllm-escape-hatchtext

Climb back to The Looking Glass

surface↓similarity↓match

Similarityscalar · returns json

MATCH

Fuzzy cross-match two string arrays via bge-m3 embeddings

Per-row — runs once for each row.

similarityllmjson

Arguments

name	type	description
left	JSON	JSON array of strings (left side)
right	JSON	JSON array of strings (right side)
threshold(optional)	DOUBLE	—
top_k(optional)	INTEGER	—

About

Examples

Fuzzy match picks the best pair from two company lists

SELECT
  left_value
FROM
  semantic_match_rows (
    JSON('["ACME Corporation","Globex Industries"]'),
    JSON('["acme corp","Globex Inc"]'),
    0.5,
    1
  )
ORDER BY
  score DESC
LIMIT
  1;