Pipelinepipeline · returns table

DEDUPE

Remove duplicate rows

Table-in, table-out — composes downstream of SELECTs.

pipelinellmpipeline-composabletext

Syntax

THEN DEDUPE

THEN DEDUPE {{ columns }}

THEN DEDUPE({{ columns }})

Arguments

name	type	description
columns(optional)	VARCHAR	—
_table	TABLE	—

About

PIPELINE cascade for removing duplicate rows. Uses pure Python/pandas - no LLM calls. Usage: SELECT * FROM contacts THEN DEDUPE SELECT * FROM contacts THEN DEDUPE('email') SELECT * FROM contacts THEN DEDUPE('email, phone')

Examples

Removes exact duplicate rows while preserving unique records

SELECT
  *
FROM
  (
    VALUES
      ('john@example.com', 'John'),
      ('john@example.com', 'John'),
      ('jane@example.com', 'Jane')
  ) AS t (email, name) THEN DEDUPE THEN PYTHON ('result = pd.DataFrame({"row_count":[len(df)]})')

Nearby rabbit holes

same domain

pipeline

ADD_STYLES

Apply theme styling to a chart specification

pipelinellmpipeline-composabletext

pipeline

ANALYZE

Analyze query results with LLM based on a prompt

pipelinellmpipeline-composabletext

pipeline

ENRICH

Add LLM-computed columns to query results

pipelinellmpipeline-composabletext

pipeline

FILTER

Filter query results using LLM-based semantic matching

pipelinellmpipeline-composabletext

pipeline

GROUP

Group by column and aggregate another

pipelinellmpipeline-composabletext

pipeline

INVESTIGATE

Investigative analysis - explores related data to answer questions

pipelinellmpipeline-composabletext

Climb back to The Looking Glass

surface↓pipeline↓dedupe

Pipelinepipeline · returns table

DEDUPE

Remove duplicate rows

Table-in, table-out — composes downstream of SELECTs.

pipelinellmpipeline-composabletext

Syntax

THEN DEDUPE

THEN DEDUPE {{ columns }}

THEN DEDUPE({{ columns }})

Arguments

name	type	description
columns(optional)	VARCHAR	—
_table	TABLE	—

About

Examples

Removes exact duplicate rows while preserving unique records

SELECT
  *
FROM
  (
    VALUES
      ('john@example.com', 'John'),
      ('john@example.com', 'John'),
      ('jane@example.com', 'Jane')
  ) AS t (email, name) THEN DEDUPE THEN PYTHON ('result = pd.DataFrame({"row_count":[len(df)]})')