Summarizationaggregate · returns json

THEMES

Extract N main topics from texts (embed centrality + LLM naming)

Per-group — reads the whole group in one call.

summarizationllmscales-largejson

Syntax

TOPICS({{ texts }})

TOPICS({{ texts }}, {{ num_topics }})

THEMES({{ texts }})

THEMES({{ texts }}, {{ num_topics }})

Arguments

name	type	description
texts	JSON	—
num_topics	INTEGER	—

About

Topic extraction — extracts N main topics from a text collection. Returns a JSON array of topic name strings. Backend: hybrid. For collections up to 30 texts, the LLM reads all of them directly. For larger collections, the cascade embeds every text with bge-m3, finds the 30 most-central texts (closest to the collection centroid), and passes only those to the LLM. This caps the LLM prompt at O(30) texts regardless of collection size so THEMES scales to 100K+ rows without hitting context window limits. Pure clustering (e.g. BERTopic-style HDBSCAN + c-TF-IDF) was considered and rejected for Phase 0 because topic *naming* is where LLMs genuinely add value — a small focused LLM call over 30 representatives is cheaper and more readable than running HDBSCAN + keyword extraction and getting mechanical labels like "ml_neural_learning". For the "no scaling needed" case (small collections) this is effectively identical to the old LLM version; the specialist refactor is about unbounding the upper end of N.

Examples

Extracts themes from AI/healthcare articles

WITH
  test_data AS (
    SELECT
      *
    FROM
      (
        VALUES
          ('Machine learning is transforming healthcare'),
          ('AI models can detect cancer early'),
          ('Deep learning improves medical imaging'),
          ('Neural networks assist in diagnosis'),
          ('Healthcare AI reduces costs')
      ) AS t (article)
  )
SELECT
  THEMES (article, 3)
FROM
  test_data

Nearby rabbit holes

same domain

scalar

CONDENSE

Condense individual text into brief summary (scalar, per-row)

summarizationllmtext

aggregate

CONSENSUS

Finds common ground among texts via centrality + LLM summary

summarizationllmscales-largejson

aggregate

CONSENSUS_LLM

LLM-only consensus (escape hatch for CONSENSUS)

summarizationllmllm-escape-hatchjson

aggregate

MERGE_TEXTS

Combine multiple text values into one coherent output

summarizationllmtext

aggregate

SUMMARIZE

Summarize a group of texts into one concise overview

summarizationllmtext

scalar

SUMMARIZE_URLS

Extracts URLs from text, fetches with browser, returns summary

summarizationllmurl

Climb back to The Looking Glass

surface↓summarization↓themes

Summarizationaggregate · returns json

THEMES

Extract N main topics from texts (embed centrality + LLM naming)

Per-group — reads the whole group in one call.

summarizationllmscales-largejson

Syntax

TOPICS({{ texts }})

TOPICS({{ texts }}, {{ num_topics }})

THEMES({{ texts }})

THEMES({{ texts }}, {{ num_topics }})

Arguments

name	type	description
texts	JSON	—
num_topics	INTEGER	—

About

Examples

Extracts themes from AI/healthcare articles

WITH
  test_data AS (
    SELECT
      *
    FROM
      (
        VALUES
          ('Machine learning is transforming healthcare'),
          ('AI models can detect cancer early'),
          ('Deep learning improves medical imaging'),
          ('Neural networks assist in diagnosis'),
          ('Healthcare AI reduces costs')
      ) AS t (article)
  )
SELECT
  THEMES (article, 3)
FROM
  test_data