Extractionscalar · returns table

REFRESH_CRAWL

Re-crawl a stored web-table using its saved configuration

Per-row — runs once for each row.

extractionexternal-apiurl

Arguments

name	type	description
table_name	VARCHAR	Name of the table to refresh (must have been created with LARS CRAWL)

About

Re-crawl a previously created web table using stored configuration. Backend cascade for LARS REFRESH syntax. Reads the crawl config from _lars_crawl_registry, then re-executes the crawl with the original parameters.

Examples

Refresh reuses crawl settings stored in the crawl registry even when seeded outside the current UDF session

SELECT
  json_extract_string (
    skill_json (
      'python_data',
      json_object(
        'code',
        'from lars.db_adapter import get_db_adapter; db = get_db_adapter(); db.execute("""CREATE OR REPLACE TABLE _lars_crawl_registry AS SELECT * FROM (VALUES (''test_refresh_crawl'', ''https://example.com'', ''{\"title\":\"string\"}'', ''{\"limit\":1,\"prompt\":\"Extract the page title\"}'')) AS t(table_name, url, schema_json, options_json)"""); result = "ok"'
      )
    ),
    '$[0].result'
  );

SELECT
  *
FROM
  refresh_crawl ('test_refresh_crawl')
LIMIT
  1

Nearby rabbit holes

same domain

scalar

CRAWL_BATCH

Crawl a website and extract structured data from each page (via Firecrawl)

extractionexternal-apiurl

scalar

EXTRACT

Extract specific information from unstructured text (zero-shot NER)

extractionnlispecialist-zootext

scalar

EXTRACT_LLM

LLM-backed extraction (escape hatch for EXTRACTS)

extractionllmllm-escape-hatchtext

scalar

EXTRACT_STRUCTURED

Extract structured fields from text per a user-supplied schema

extractionllmtext

aggregate

MERGE_TIMELINES

Merge multiple timelines into unified chronological sequence

extractionllmjson

scalar

PARSE

Extract information from text using natural-language instructions

extractionllmtext

Climb back to The Looking Glass

surface↓extraction↓refresh_crawl

Extractionscalar · returns table

REFRESH_CRAWL

Re-crawl a stored web-table using its saved configuration

Per-row — runs once for each row.

extractionexternal-apiurl

Arguments

name	type	description
table_name	VARCHAR	Name of the table to refresh (must have been created with LARS CRAWL)

About

Examples

Refresh reuses crawl settings stored in the crawl registry even when seeded outside the current UDF session

SELECT
  json_extract_string (
    skill_json (
      'python_data',
      json_object(
        'code',
        'from lars.db_adapter import get_db_adapter; db = get_db_adapter(); db.execute("""CREATE OR REPLACE TABLE _lars_crawl_registry AS SELECT * FROM (VALUES (''test_refresh_crawl'', ''https://example.com'', ''{\"title\":\"string\"}'', ''{\"limit\":1,\"prompt\":\"Extract the page title\"}'')) AS t(table_name, url, schema_json, options_json)"""); result = "ok"'
      )
    ),
    '$[0].result'
  );

SELECT
  *
FROM
  refresh_crawl ('test_refresh_crawl')
LIMIT
  1