Skip to content

[Enhancement] Add Haystack pipeline component for audio transcription #710

@deepgram-robot

Description

@deepgram-robot

Summary

Add a Haystack-compatible pipeline component (AudioTranscriber or DeepgramTranscriber) that integrates Deepgram STT into Haystack's pipeline architecture, enabling audio transcription as a step in Haystack RAG and document processing pipelines.

Problem it solves

Haystack is a leading open-source framework for building RAG (Retrieval-Augmented Generation) applications and document processing pipelines. Developers building audio-aware RAG systems — where meeting recordings, podcasts, or call center audio need to be transcribed and indexed — need a native Haystack component that fits into the pipeline abstraction. Without it, developers must write custom glue code to convert Deepgram transcriptions into Haystack Document objects. AssemblyAI has a Haystack integration (assemblyai-haystack); Deepgram does not, despite having superior STT accuracy.

Proposed API

from deepgram_haystack import DeepgramTranscriber

# As a Haystack pipeline component
transcriber = DeepgramTranscriber(
    api_key=Secret.from_env_var("DEEPGRAM_API_KEY"),
    model="nova-3",
    smart_format=True,
    diarize=True,
)

# In a Haystack pipeline
pipeline = Pipeline()
pipeline.add_component("transcriber", transcriber)
pipeline.add_component("splitter", DocumentSplitter())
pipeline.add_component("embedder", SentenceTransformersDocumentEmbedder())

pipeline.connect("transcriber", "splitter")
pipeline.connect("splitter", "embedder")

# Transcribe audio and process
result = pipeline.run({"transcriber": {"sources": ["meeting.mp3"]}})

Acceptance criteria

  • Implements Haystack @component protocol with proper input/output types
  • Converts Deepgram transcription to Haystack Document objects with metadata (speaker, timestamps, confidence)
  • Supports both file paths and URLs as input sources
  • Supports key STT parameters (model, language, diarize, smart_format, topics, sentiment)
  • Publishable as deepgram-haystack on PyPI
  • Documented with usage example
  • Compatible with existing API

Raised by the DX intelligence system.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions