surfaceclusteringcluster
Clusteringaggregate · returns json

CLUSTER

Assigns values to K semantic clusters via GPU embeddings + k-means/HDBSCAN

Per-group — reads the whole group in one call.

clusteringembedding-modelspecialist-zooscales-largejson

Syntax

MEANING({{ values }})
MEANING({{ values }}, {{ num_clusters }})
MEANING({{ values }}, {{ num_clusters }}, {{ criterion }})

Arguments

nametypedescription
valuesJSON
num_clustersINTEGER
criterion(optional)VARCHAR

About

Semantic clustering — assigns each value in a collection to one of K semantic clusters. Used by GROUP BY MEANING(col, K, [criterion]). Backend: specialist zoo. Step 1 embeds all values with bge-m3 on the GPU and runs k-means (or HDBSCAN if K is not given). Step 2 asks a fast LLM to name the clusters using only the K representative values — not the full collection — so context window cost is O(K) not O(N). This lets MEANING scale to 100K+ rows without hitting any context limits. Output shape is preserved from the old LLM-only cascade: { "clusters": [{"id": 0, "name": "…", "description": "…"}, …], "assignments": {"<value>": <cluster_id>, …} }

Examples

Clusters cities by geographic region

WITH
  test_data AS (
    SELECT
      *
    FROM
      (
        VALUES
          ('New York'),
          ('Los Angeles'),
          ('Chicago'),
          ('Paris'),
          ('London'),
          ('Berlin'),
          ('Tokyo'),
          ('Beijing'),
          ('Seoul')
      ) AS t (city)
  )
SELECT
  MEANING (city, 3, 'geographic region')
FROM
  test_data
Climb back to The Looking Glass