Speech to TextRealtime transcription

Quickstart

Learn how to transcribe streaming audio to text in real-time.

The quickest way to try Realtime transcription is via the web portal — no code required.

Using the Realtime API

The Realtime API streams audio over a WebSocket connection and returns transcript results as you speak. Unlike the Batch API, results arrive continuously — within milliseconds of the spoken words.

1. Create an API key

Create an API key in the portal, which you'll use to securely access the API. Store the key as a managed secret.

Enterprise customers may need to speak to Support to get your API keys.

2. Install the library

Install using pip:

pip install speechmatics-rt pyaudio

pyaudio is required for microphone input in this quickstart.

Install using npm:

npm install @speechmatics/real-time-client @speechmatics/auth

This quickstart uses sox for microphone input. Install it with brew install sox (macOS) or apt install sox (Linux).

3. Run the example

Replace YOUR_API_KEY with your key, then run the script.

import asyncio
from speechmatics.rt import (
    AudioEncoding, AudioFormat, AuthenticationError,
    Microphone, ServerMessageType, TranscriptResult,
    TranscriptionConfig, AsyncClient,
)

API_KEY = YOUR_API_KEY

# Set up config and format for transcription
audio_format = AudioFormat(
    encoding=AudioEncoding.PCM_S16LE, 
    sample_rate=16000, 
    chunk_size=4096,
)
config = TranscriptionConfig(
    language="en", 
    max_delay=0.7,
)

async def main():

    # Set up microphone
    mic = Microphone(
        sample_rate=audio_format.sample_rate, 
        chunk_size=audio_format.chunk_size
    )
    if not mic.start():
        print("Mic not started — please install PyAudio")

    try:
        async with AsyncClient(api_key=API_KEY) as client:
            # Handle ADD_TRANSCRIPT message
            @client.on(ServerMessageType.ADD_TRANSCRIPT)
            def handle_finals(msg):
                if final := TranscriptResult.from_message(msg).metadata.transcript:
                    print(f"[Final]: {final}")

            try:
                # Begin transcribing
                await client.start_session(
                    transcription_config=config, 
                    audio_format=audio_format
                )
                while True:
                    await client.send_audio(
                        await mic.read(
                            chunk_size=audio_format.chunk_size
                        )
                    )
            except KeyboardInterrupt:
                pass
            finally:
                mic.stop()

    except AuthenticationError as e:
        print(f"Auth error: {e}")

if __name__ == "__main__":
    asyncio.run(main())

Press Ctrl+C to stop.

import { spawn } from "node:child_process";
import { createSpeechmaticsJWT } from "@speechmatics/auth";
import { RealtimeClient } from "@speechmatics/real-time-client";

const apiKey = YOUR_API_KEY;
const client = new RealtimeClient();

const audio_format = {
    type: "raw",
    encoding: "pcm_s16le",
    sample_rate: 44100,
};

async function transcribe() {
    client.addEventListener("receiveMessage", ({ data }) => {
        if (data.message === "AddTranscript") {
            const transcript = data.metadata?.transcript;
            if (transcript) console.log(`[Final]: ${transcript}`);
        } else if (data.message === "Error") {
            console.error(`Error [${data.type}]: ${data.reason}`);
            process.exit(1);
        }
    });

    const jwt = await createSpeechmaticsJWT({ type: "rt", apiKey, ttl: 60 });

    await client.start(jwt, {
        transcription_config: {
            language: "en",
            max_delay: 0.7
        },
        audio_format,
    });

    const recorder = spawn("sox", [
        "-d",                                   // default audio device (mic)
        "-q",                                   // quiet
        "-r", String(audio_format.sample_rate), // sample rate
        "-e", "signed-integer",                 // match pcm_s16le
        "-b", "16",                             // match pcm_s16le
        "-c", "1",                              // mono
        "-t", "raw",                            // raw PCM output
        "-",                                    // pipe to stdout
    ]);

    recorder.stdout.on("data", (chunk) => client.sendAudio(chunk));
    recorder.stderr.on("data", (d) => console.error(`sox: ${d}`));

    process.on("SIGINT", () => {
        recorder.kill();
        client.stopRecognition({ noTimeout: true });
    });
}

transcribe().catch((err) => {
    console.error(err);
    process.exit(1);
});

Speak into your microphone. You should see output like:

[Final]: Hello, welcome to Speechmatics.
[Final]: This is a real-time transcription example.

Press Ctrl+C to stop.

Understanding the output

The API returns two types of transcript results. Finals and Partials.

Finals represent the best transcription for a span of audio and are never updated once emitted.

Partials are emitted immediately as audio arrives and may be revised as more context is processed.

Type	Latency	Stability	Best for
Final	~0.7–2s	Definitive, never revised	Accurate transcripts, subtitles
Partial	<500ms	May be revised	Live captions, voice interfaces

Receiving Finals and Partials

To receive partials, add the following changes and handlers to your code:

config = TranscriptionConfig(
    language="en",
    max_delay=0.7,
    enable_partials=True,
)

async with AsyncClient(api_key=API_KEY) as client:
    @client.on(ServerMessageType.ADD_PARTIAL_TRANSCRIPT)
    def handle_partials(msg):
        if partial := TranscriptResult.from_message(msg).metadata.transcript:
            print(f"[Partial]: {partial}")

    @client.on(ServerMessageType.ADD_TRANSCRIPT)
    def handle_finals(msg):
        if final := TranscriptResult.from_message(msg).metadata.transcript:
            print(f"[Final]:   {final}")

await client.start(jwt, {
    transcription_config: {
        max_delay: 0.7,
        language: "en",
        enable_partials: true,
    },
});

client.addEventListener("receiveMessage", ({ data }) => {
    if (data.message === "AddTranscript") {
        console.log(`[Final]:   ${data.metadata.transcript}`);
    } else if (data.message === "AddPartialTranscript") {
        console.log(`[Partial]: ${data.metadata.transcript}\r`);
    }
});

With both handlers registered, you'll see partials arrive first, followed by the final result:

[Partial]: Hello
[Partial]: Hello welcome to
[Final]:   Hello, welcome to Speechmatics.

Next steps

Now that you have Realtime transcription working, explore these features to build more powerful applications.

Using the Realtime API​

1. Create an API key​

2. Install the library​

3. Run the example​

Understanding the output​

Receiving Finals and Partials​

Next steps​

Speaker Diarization

Custom Dictionary

Turn Detection

Output & Latency

Audio Input

Speaker Identification