surfacepipelinestats
Pipelinepipeline · returns table

STATS

Compute comprehensive column profiles for chart planning

Table-in, table-out — composes downstream of SELECTs.

pipelinellmpipeline-composabletext

Syntax

THEN STATS
THEN STATS {{ columns }}
THEN STATS({{ columns }})

Arguments

nametypedescription
columns(optional)VARCHAR
_tableTABLE

About

PIPELINE cascade for comprehensive column profiling. Analyzes ALL columns (not just numeric) to help with chart planning and data exploration. Uses pure Python/pandas - no LLM calls. Returns for each column: - dtype: Data type (numeric, string, datetime, boolean) - role: Suggested role (dimension vs measure) based on type/cardinality - non_null, null_count: Count statistics - distinct: Cardinality (unique value count) - top_values: Most common values with counts - sample_values: Example values showing data format - min, max, mean, std, p25, p50, p75: Numeric stats (for numeric columns) Usage: SELECT * FROM sales THEN STATS SELECT * FROM sales THEN STATS('revenue, quantity')

Examples

Profiles numeric columns with summary statistics

SELECT
  *
FROM
  (
    VALUES
      ('A', 10),
      ('B', 25),
      ('C', 7)
  ) AS t (name, amount) THEN STATS THEN PYTHON (
    'result = pd.DataFrame({"amount_mean":[float(df.loc[df["column"] == "amount", "mean"].iloc[0])]})'
  )

Nearby rabbit holes

same domain
Climb back to The Looking Glass