surfaceclassificationtopics
Classificationdimension · returns varchar

TOPICS

Extract topics from text collection and assign each text to a topic

Per-row classifier — stable across GROUP BY.

classificationembedding-modelspecialist-zooscales-largetext

Syntax

TOPICS({{ text }})
TOPICS({{ text }}, {{ num_topics }})
TOPICS({{ text }}, {{ num_topics }}, '{{ focus }}')
{{ text }} TOPICS {{ num_topics }}
{{ text }} TOPICS INTO {{ num_topics }}

Arguments

nametypedescription
textVARCHAR
num_topicsINTEGER
focusVARCHAR

About

Semantic topic dimension — extracts N topics from a collection of texts and assigns each text to the most relevant topic. Used in GROUP BY clauses to bucket mixed-content corpora by emergent topic. Backend: specialist zoo — BERTopic-lite hybrid. 1. Embed all texts with bge-m3 2. Cluster into num_topics clusters via k-means on the GPU 3. Pick the centroid-closest text as each cluster's representative 4. Small LLM call names each cluster from its representative only 5. Build mapping {text: assigned_topic_name} keyed by text value This unbounds the scale problem that plagued the LLM-only version: the LLM only ever sees K representatives (one per cluster), never the full N texts. O(N) embedding + O(K) naming, linear scaling. For LLM-only topic extraction (context-window-bound but potentially better on small collections with nuanced topic structure), use TOPICS_LLM — see topics_dimension_llm.cascade.yaml.

Examples

Topics extracted

SELECT
  topics ('Machine learning and neural networks')

Nearby rabbit holes

same domain
Climb back to The Looking Glass