Extractionscalar · returns table

CRAWL_BATCH

Crawl a website and extract structured data from each page (via Firecrawl)

Per-row — runs once for each row.

extractionexternal-apiurl

Arguments

name	type	description
url	VARCHAR	Starting URL or domain
schema_json	VARCHAR	JSON Schema for per-page extraction
limit(optional)	INTEGER	Max pages to crawl
prompt(optional)	VARCHAR	Optional extraction guidance prompt

About

Batch web crawl with optional structured extraction. Backend cascade for LARS CRAWL INTO syntax. Discovers URLs via /map, then either: - Extracts structured data per page (if schema provided) - Returns markdown content per page (if no schema)

Nearby rabbit holes

same domain

scalar

EXTRACT

Extract specific information from unstructured text (zero-shot NER)

extractionnlispecialist-zootext

scalar

EXTRACT_LLM

LLM-backed extraction (escape hatch for EXTRACTS)

extractionllmllm-escape-hatchtext

scalar

EXTRACT_STRUCTURED

Extract structured fields from text per a user-supplied schema

extractionllmtext

aggregate

MERGE_TIMELINES

Merge multiple timelines into unified chronological sequence

extractionllmjson

scalar

PARSE

Extract information from text using natural-language instructions

extractionllmtext

scalar

PARSE

Parse, validate, or transform patterned strings using plain-English instructions

extractionllmtextpipeline-composable

Climb back to The Looking Glass

surface↓extraction↓crawl_batch

Extractionscalar · returns table

CRAWL_BATCH

Crawl a website and extract structured data from each page (via Firecrawl)

Per-row — runs once for each row.

extractionexternal-apiurl

Arguments

name	type	description
url	VARCHAR	Starting URL or domain
schema_json	VARCHAR	JSON Schema for per-page extraction
limit(optional)	INTEGER	Max pages to crawl
prompt(optional)	VARCHAR	Optional extraction guidance prompt