Web library

How to use Soniox Speech-to-Text Web Library to transcribe microphone audio in your web application.

Transcribe audio directly in your web application

Transcribing audio in a web application is a common use case — whether you're building live captioning, searchable audio interfaces, or voice-powered tools. To make this easy, Soniox provides a lightweight Web SDK that allows you to stream audio from the browser and receive real-time transcriptions with minimal setup.

The Soniox Web Library handles:

Capturing audio from the user's microphone
Streaming it to the Soniox WebSocket API
Receiving and displaying transcription results in real time
Additional optional features like speaker diarization

The library is framework-agnostic and works with plain JavaScript, as well as modern frontend frameworks like React or Vue.

Installation

Install via your preferred package manager:

npm install @soniox/speech-to-text-web

Or use the module directly from a CDN:

<script type="module">
  import { RecordTranscribe } from 'https://unpkg.com/@soniox/speech-to-text-web?module';
 
  var recordTranscribe = new RecordTranscribe({ ... })
  ...
</script>

Starting the transcription

To transcribe microphone audio, create an instance of the RecordTranscribe class and call the start() method.

import { RecordTranscribe } from '@soniox/speech-to-text-web';
 
const recordTranscribe = new RecordTranscribe({
  apiKey: '<SONIOX_API_KEY|TEMPORARY_API_KEY>',
});
 
recordTranscribe.start({
  model: 'stt-rt-preview',
  onPartialResult: (result) => {
    console.log(result.tokens);
  },
  onError: (status, message) => {
    console.error(status, message);
  },
});

Parameters

apiKeyRequiredstring | function

Static SONIOX_API_KEY string or async function that returns a temporary API key.

modelRequiredstring

The transcription model to use. Example: "stt-rt-preview".
Use the GET /models endpoint to retrieve a list of available models.

languageHintsOptionalArray<string>

Hints to guide transcription toward specific languages.
See supported languages for list of available ISO language codes.

contextOptionalstring

Provide domain-specific terms or phrases to improve recognition accuracy.
Max length: 10,000 characters.

enableSpeakerDiarizationOptionalboolean

Enables automatic speaker separation.

enableLanguageIdentificationOptionalboolean

Enables automatic language detection at the token level.

onStartedOptionalfunction

Called on transcription start.

onFinishedOptionalfunction

Called on transcription finish.

onPartialResultOptionalfunction

Called when partial results are received.

onErrorOptionalfunction

Called when an error occurs.

streamOptionalMediaStream

Provide a custom audio stream source.

Stopping the transcription

Use stop() for graceful exits and cancel() for abrupt stops, e.g. on component unmount.

recordTranscribe.stop();   // Waits for final results
recordTranscribe.cancel(); // Stops immediately

Using temporary API keys

You can defer API key generation until after the user initiates transcription:

const recordTranscribe = new RecordTranscribe({
  apiKey: async () => {
    const res = await fetch('/api/get-temporary-api-key', { method: 'POST' });
    const { apiKey } = await res.json();
    return apiKey;
  },
});

Buffered audio ensures no loss during WebSocket connection setup.

Event callbacks

Callbacks can be passed to either the constructor or the start method:

// Constructor-level
new RecordTranscribe({
  onPartialResult: (result) => console.log(result.tokens),
});
 
// Method-level
recordTranscribe.start({
  onPartialResult: (result) => console.log(result.tokens),
});

View full list of supported callbacks in the Github README.

Transcribing custom audio streams

To transcribe audio from sources like an <audio> or <video> element:

const audioElement = new Audio('https://example.com/audio.mp3');
audioElement.crossOrigin = 'anonymous';
 
const audioCtx = new AudioContext();
const source = audioCtx.createMediaElementSource(audioElement);
const destination = audioCtx.createMediaStreamDestination();
 
source.connect(destination);
source.connect(audioCtx.destination);
 
recordTranscribe.start({
  model: 'stt-rt-preview',
  stream: destination.stream,
  onFinished: () => audioElement.pause(),
});
 
audioElement.play();

You are responsible for managing the audio stream lifecycle.

Examples

HTML example

Useful for a quick local real-time transcription test.

JavaScript app

Browser-based transcription in a plain JS app.

Next.js example

App with real-time transcription using Next.js.

On this page