surface↓classification↓topics

Classificationdimension · returns varchar

TOPICS

Extract topics from text collection and assign each text to a topic

Per-row classifier — stable across GROUP BY.

classificationembedding-modelspecialist-zooscales-largetext

Syntax

TOPICS({{ text }})

TOPICS({{ text }}, {{ num_topics }})

TOPICS({{ text }}, {{ num_topics }}, '{{ focus }}')

{{ text }} TOPICS {{ num_topics }}

{{ text }} TOPICS INTO {{ num_topics }}

Arguments

name	type	description
text	VARCHAR	—
num_topics	INTEGER	—
focus	VARCHAR	—

About

Semantic topic dimension — extracts N topics from a collection of texts and assigns each text to the most relevant topic. Used in GROUP BY clauses to bucket mixed-content corpora by emergent topic. Backend: specialist zoo — BERTopic-lite hybrid. 1. Embed all texts with bge-m3 2. Cluster into num_topics clusters via k-means on the GPU 3. Pick the centroid-closest text as each cluster's representative 4. Small LLM call names each cluster from its representative only 5. Build mapping {text: assigned_topic_name} keyed by text value This unbounds the scale problem that plagued the LLM-only version: the LLM only ever sees K representatives (one per cluster), never the full N texts. O(N) embedding + O(K) naming, linear scaling. For LLM-only topic extraction (context-window-bound but potentially better on small collections with nuanced topic structure), use TOPICS_LLM — see topics_dimension_llm.cascade.yaml.

Examples

Topics extracted

SELECT
  topics ('Machine learning and neural networks')

Nearby rabbit holes

same domain

AUDIENCE

Identify target audience via zero-shot NLI

classificationnlispecialist-zootext+1

AUDIENCE_LLM

LLM-backed audience identification (escape hatch for AUDIENCE)

classificationllmllm-escape-hatchtext

AUTHENTICITY

Assess authenticity of content via zero-shot NLI

classificationnlispecialist-zootext

AUTHENTICITY_LLM

LLM-backed authenticity assessment (escape hatch for AUTHENTICITY)

classificationllmllm-escape-hatchtext

BUCKET

Classify text into user-specified buckets via zero-shot NLI

classificationnlispecialist-zootext

BUCKET_LLM

LLM-backed bucketing (escape hatch for BUCKET)

classificationllmllm-escape-hatchtext

Climb back to The Looking Glass

surface↓classification↓topics

Classificationdimension · returns varchar

TOPICS

Extract topics from text collection and assign each text to a topic

Per-row classifier — stable across GROUP BY.

classificationembedding-modelspecialist-zooscales-largetext

Syntax

TOPICS({{ text }})

TOPICS({{ text }}, {{ num_topics }})

TOPICS({{ text }}, {{ num_topics }}, '{{ focus }}')

{{ text }} TOPICS {{ num_topics }}

{{ text }} TOPICS INTO {{ num_topics }}

Arguments

name	type	description
text	VARCHAR	—
num_topics	INTEGER	—
focus	VARCHAR	—

About

Semantic topic dimension — extracts N topics from a collection of texts and assigns each text to the most relevant topic. Used in GROUP BY clauses to bucket mixed-content corpora by emergent topic. Backend: specialist zoo — BERTopic-lite hybrid. 1. Embed all texts with bge-m3 2. Cluster into num_topics clusters via k-means on the GPU 3. Pick the centroid-closest text as each cluster's representative 4. Small LLM call names each cluster from its representative only 5. Build mapping {text: assigned_topic_name} keyed by text value This unbounds the scale problem that plagued the LLM-only version: the LLM only ever sees K representatives (one per cluster), never the full N texts. O(N) embedding + O(K) naming, linear scaling. For LLM-only topic extraction (context-window-bound but potentially better on small collections with nuanced topic structure), use TOPICS_LLM — see topics_dimension_llm.cascade.yaml.

Examples

Topics extracted

SELECT
  topics ('Machine learning and neural networks')

Nearby rabbit holes

same domain

AUDIENCE

Identify target audience via zero-shot NLI

classificationnlispecialist-zootext+1

AUDIENCE_LLM

LLM-backed audience identification (escape hatch for AUDIENCE)

classificationllmllm-escape-hatchtext

AUTHENTICITY

Assess authenticity of content via zero-shot NLI

classificationnlispecialist-zootext

AUTHENTICITY_LLM

LLM-backed authenticity assessment (escape hatch for AUTHENTICITY)

classificationllmllm-escape-hatchtext

BUCKET

Classify text into user-specified buckets via zero-shot NLI

classificationnlispecialist-zootext

BUCKET_LLM

LLM-backed bucketing (escape hatch for BUCKET)

classificationllmllm-escape-hatchtext

Climb back to The Looking Glass