surfacesimilaritysimilar_to
Similarityscalar · returns double

SIMILAR_TO

Cosine similarity between two texts (0.0 to 1.0) via bge-m3

Per-row — runs once for each row.

similarityembedding-modelspecialist-zooscales-largetext

Syntax

{{ text1 }} SIMILAR_TO {{ text2 }}
SIMILAR_TO({{ text1 }}, {{ text2 }})

Arguments

nametypedescription
text1VARCHARFirst text
text2VARCHARSecond text

About

Cosine similarity between two texts. Returns 0.0 to 1.0 (higher = more similar). Useful for WHERE clauses and cross-row filtering. Backend: specialist zoo bge-m3 embeddings. Both texts are embedded in a single batched call and the dot product is returned directly (bge-m3 embeddings are L2-normalized, so dot product = cosine similarity). This cascade was already embedding-backed before the specialist zoo refactor — the refactor just simplifies it to a single python_data cell that calls `specialist_embed` explicitly. Behavior is unchanged. SQL Usage: SELECT * FROM products WHERE description SIMILAR_TO 'sustainable' > 0.7; SELECT c1.name, c2.name, c1.description SIMILAR_TO c2.description as similarity FROM companies c1, companies c2 WHERE c1.id < c2.id AND c1.description SIMILAR_TO c2.description > 0.8; Warning: Be careful with CROSS JOINs! Use LIMIT to prevent N×M embedding costs: WHERE a SIMILAR_TO b > 0.8 LIMIT 100 -- Good WHERE a SIMILAR_TO b > 0.8 -- Dangerous (N×M calls)

Examples

Similar topics receive a strong cosine similarity score

SELECT
  similar_to (
    'machine learning algorithms',
    'artificial intelligence'
  )

Nearby rabbit holes

same domain
Climb back to The Looking Glass