surfaceclassificationlanguage
Classificationdimension · returns varchar

LANGUAGE

Detect language for grouping (xlm-roberta-lid)

Per-row classifier — stable across GROUP BY.

classificationhybridspecialist-zootext

Syntax

LANGUAGE({{ text }})
{{ text }} BY LANGUAGE

Arguments

nametypedescription
textVARCHAR

About

Dimension operator that buckets text by detected language. Useful for multilingual data analysis. Backend: specialist zoo `papluca/xlm-roberta-base-language-detection` via /language. Supports 20+ languages with high accuracy: en, es, fr, de, it, pt, nl, ru, zh, ja, ko, ar, hi, vi, tr, th, pl, sw, ur, bg, el Returns ISO 639-1 language codes (2-letter). For LLM-style detection with more languages or more nuanced dialects, use LANGUAGE_LLM — see language_dimension_llm.cascade.yaml.

Examples

English detected

SELECT
  language ('Hello, how are you today?')

French detected

SELECT
  language ('Bonjour, comment allez-vous?')

Spanish detected

SELECT
  language ('Hola, como estas?')

Nearby rabbit holes

same domain
Climb back to The Looking Glass