Speech-to-text — turns an audio column into a text column via the
specialist zoo's Whisper large-v3-turbo. Once transcribed, every
other text operator (MEANS, ABOUT, EXTRACTS, SENTIMENT, CLASSIFY,
SUMMARIZE, MEANING, THEMES, …) composes naturally on top.
Accepts any of:
- https:// or http:// URL
- Local path visible to the gateway
- data:audio/...;base64,... URI
- Raw base64-encoded audio bytes
Handles arbitrary-length audio via the gateway's chunked inference
path (30s windows, beam batching), so hour-long call recordings or
meeting transcripts work out of the box.
For LLM-style audio Q&A (e.g., "what did the caller ask about?"
without the raw transcript), use an LLM cell on top of the TRANSCRIBE
output. This cascade is purely the speech-to-text conversion.
Speech-to-text — turns an audio column into a text column via the
specialist zoo's Whisper large-v3-turbo. Once transcribed, every
other text operator (MEANS, ABOUT, EXTRACTS, SENTIMENT, CLASSIFY,
SUMMARIZE, MEANING, THEMES, …) composes naturally on top.
Accepts any of:
- https:// or http:// URL
- Local path visible to the gateway
- data:audio/...;base64,... URI
- Raw base64-encoded audio bytes
Handles arbitrary-length audio via the gateway's chunked inference
path (30s windows, beam batching), so hour-long call recordings or
meeting transcripts work out of the box.
For LLM-style audio Q&A (e.g., "what did the caller ask about?"
without the raw transcript), use an LLM cell on top of the TRANSCRIBE
output. This cascade is purely the speech-to-text conversion.