surfacepipelinededupe
Pipelinepipeline · returns table

DEDUPE

Remove duplicate rows

Table-in, table-out — composes downstream of SELECTs.

pipelinellmpipeline-composabletext

Syntax

THEN DEDUPE
THEN DEDUPE {{ columns }}
THEN DEDUPE({{ columns }})

Arguments

nametypedescription
columns(optional)VARCHAR
_tableTABLE

About

PIPELINE cascade for removing duplicate rows. Uses pure Python/pandas - no LLM calls. Usage: SELECT * FROM contacts THEN DEDUPE SELECT * FROM contacts THEN DEDUPE('email') SELECT * FROM contacts THEN DEDUPE('email, phone')

Examples

Removes exact duplicate rows while preserving unique records

SELECT
  *
FROM
  (
    VALUES
      ('john@example.com', 'John'),
      ('john@example.com', 'John'),
      ('jane@example.com', 'Jane')
  ) AS t (email, name) THEN DEDUPE THEN PYTHON ('result = pd.DataFrame({"row_count":[len(df)]})')

Nearby rabbit holes

same domain
Climb back to The Looking Glass