Skip to main content

Speech-to-Text

POST 

/workstations/:workstation_id/audio/rstt

The Real-Time Speech-to-Text (RSTT) endpoint provides a live streaming URL to listen for voice-based audio from the virtual speakers in the Workstation and receive transcribed text
via speech-to-text model. The transcription is streamed to the client in realtime via Server-Sent Events (SSE). As audio is detected, it is transcribed and sent as either 'partial' or 'final' events.

Example usage with JavaScript:

const id = 'HvcqZjmeoPtP';
const url = `https://api.agentstation.ai/v1/workstations/${id}/audio/rstt`;

// Get SSE URL from AgentStation API
const response = await fetch(url, {
method: 'POST',
headers: {
'Authorization': 'Bearer <your_token>'
}
});
const stream = await response.json();
// stream.url example: https://stream.agentstation.ai/v1/rstt/sse/cxBMzXSFUSXf

// Connect to SSE stream
const eventSource = new EventSource(stream.url);

// Handle partial transcriptions (in-progress)
eventSource.addEventListener('partial', (event) => {
const partialTranscription = event.data;
console.log('Partial:', partialTranscription);
});

// Handle final transcriptions (completed)
eventSource.addEventListener('final', (event) => {
const finalTranscription = event.data;
console.log('Final:', finalTranscription);
// Final event indicates completion, so close the connection
eventSource.close();
});

eventSource.onerror = (error) => {
console.error('EventSource failed:', error);
eventSource.close();
};

Example SSE stream events:

event: partial
data: "Hello wo"

event: partial
data: "Hello world, how"

event: final
data: "Hello world, how are you?"

The SSE stream will have the following headers:

Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

Request

Responses

Successfully retrieved SSE stream URL