Quickstart
The quickest way to try Realtime transcription is via the web portal — no code required.
Using the Realtime API
The Realtime API streams audio over a WebSocket connection and returns transcript results as you speak. Unlike the Batch API, results arrive continuously — within milliseconds of the spoken words.
1. Create an API key
Create an API key in the portal, which you'll use to securely access the API. Store the key as a managed secret.
Enterprise customers may need to speak to Support to get your API keys.
2. Install the library
Install using pip:
pip install speechmatics-rt pyaudio
pyaudio is required for microphone input in this quickstart.
Install using npm:
npm install @speechmatics/real-time-client @speechmatics/auth
This quickstart uses sox for microphone input. Install it with brew install sox (macOS) or apt install sox (Linux).
3. Run the example
Replace YOUR_API_KEY with your key, then run the script.
import asyncio
from speechmatics.rt import (
AudioEncoding, AudioFormat, AuthenticationError,
Microphone, ServerMessageType, TranscriptResult,
TranscriptionConfig, AsyncClient,
)
API_KEY = YOUR_API_KEY
# Set up config and format for transcription
audio_format = AudioFormat(
encoding=AudioEncoding.PCM_S16LE,
sample_rate=16000,
chunk_size=4096,
)
config = TranscriptionConfig(
language="en",
max_delay=0.7,
)
async def main():
# Set up microphone
mic = Microphone(
sample_rate=audio_format.sample_rate,
chunk_size=audio_format.chunk_size
)
if not mic.start():
print("Mic not started — please install PyAudio")
try:
async with AsyncClient(api_key=API_KEY) as client:
# Handle ADD_TRANSCRIPT message
@client.on(ServerMessageType.ADD_TRANSCRIPT)
def handle_finals(msg):
if final := TranscriptResult.from_message(msg).metadata.transcript:
print(f"[Final]: {final}")
try:
# Begin transcribing
await client.start_session(
transcription_config=config,
audio_format=audio_format
)
while True:
await client.send_audio(
await mic.read(
chunk_size=audio_format.chunk_size
)
)
except KeyboardInterrupt:
pass
finally:
mic.stop()
except AuthenticationError as e:
print(f"Auth error: {e}")
if __name__ == "__main__":
asyncio.run(main())
Press Ctrl+C to stop.
import { spawn } from "node:child_process";
import { createSpeechmaticsJWT } from "@speechmatics/auth";
import { RealtimeClient } from "@speechmatics/real-time-client";
const apiKey = YOUR_API_KEY;
const client = new RealtimeClient();
const audio_format = {
type: "raw",
encoding: "pcm_s16le",
sample_rate: 44100,
};
async function transcribe() {
client.addEventListener("receiveMessage", ({ data }) => {
if (data.message === "AddTranscript") {
const transcript = data.metadata?.transcript;
if (transcript) console.log(`[Final]: ${transcript}`);
} else if (data.message === "Error") {
console.error(`Error [${data.type}]: ${data.reason}`);
process.exit(1);
}
});
const jwt = await createSpeechmaticsJWT({ type: "rt", apiKey, ttl: 60 });
await client.start(jwt, {
transcription_config: {
language: "en",
max_delay: 0.7
},
audio_format,
});
const recorder = spawn("sox", [
"-d", // default audio device (mic)
"-q", // quiet
"-r", String(audio_format.sample_rate), // sample rate
"-e", "signed-integer", // match pcm_s16le
"-b", "16", // match pcm_s16le
"-c", "1", // mono
"-t", "raw", // raw PCM output
"-", // pipe to stdout
]);
recorder.stdout.on("data", (chunk) => client.sendAudio(chunk));
recorder.stderr.on("data", (d) => console.error(`sox: ${d}`));
process.on("SIGINT", () => {
recorder.kill();
client.stopRecognition({ noTimeout: true });
});
}
transcribe().catch((err) => {
console.error(err);
process.exit(1);
});
Speak into your microphone. You should see output like:
[Final]: Hello, welcome to Speechmatics.
[Final]: This is a real-time transcription example.
Press Ctrl+C to stop.
Understanding the output
The API returns two types of transcript results. Finals and Partials.
Finals represent the best transcription for a span of audio and are never updated once emitted.
Partials are emitted immediately as audio arrives and may be revised as more context is processed.
Receiving Finals and Partials
To receive partials, add the following changes and handlers to your code:
config = TranscriptionConfig(
language="en",
max_delay=0.7,
enable_partials=True,
)
async with AsyncClient(api_key=API_KEY) as client:
@client.on(ServerMessageType.ADD_PARTIAL_TRANSCRIPT)
def handle_partials(msg):
if partial := TranscriptResult.from_message(msg).metadata.transcript:
print(f"[Partial]: {partial}")
@client.on(ServerMessageType.ADD_TRANSCRIPT)
def handle_finals(msg):
if final := TranscriptResult.from_message(msg).metadata.transcript:
print(f"[Final]: {final}")
await client.start(jwt, {
transcription_config: {
max_delay: 0.7,
language: "en",
enable_partials: true,
},
});
client.addEventListener("receiveMessage", ({ data }) => {
if (data.message === "AddTranscript") {
console.log(`[Final]: ${data.metadata.transcript}`);
} else if (data.message === "AddPartialTranscript") {
console.log(`[Partial]: ${data.metadata.transcript}\r`);
}
});
With both handlers registered, you'll see partials arrive first, followed by the final result:
[Partial]: Hello
[Partial]: Hello welcome to
[Final]: Hello, welcome to Speechmatics.
Next steps
Now that you have Realtime transcription working, explore these features to build more powerful applications.