API Endpoints Guide

You can transcribe input audio or video using one of these APIs:

Synchronous API — for quick, immediate results.
Asynchronous API — for longer jobs with job IDs and polling.
OpenAI-Compatible API — for seamless integration with OpenAI clients and SDKs.

At the moment, our API cannot add subtitles to video or generate subtitle files (.SRT / .ASS). These outputs are available only through our web, mobile, and desktop applications.

Speech-to-Text Models

We currently support these AI models:

Formal — model-01052025
Conversational — model-23012026

Pick one with the stt_model query parameter on the upload endpoint.

Listen to the sample audio used in the examples below:

Formal model

model-01052025 Released 01 Nov 2025 Default

Best for formal speech and subtitles. In some cases, non-formal speech can be turned into formal.

Transcribe with the Formal model:

curl --location 'https://api.hispeech.ai/api/v1/transcriptions/upload?stt_model=model-01052025' \
--header 'x-auth-token: YOUR_API_KEY' \
--form 'file=@"/path/to/audio_or_video.mp3"' \
--form 'wait_for_result="true"'

Formal is the default model, so you can also omit stt_model and call .../transcriptions/upload directly.

Response Example:

{
    "job_id": "a91f17d0-5d9a-11f1-8d6f-215414e3f380",
    "error": null,
    "success": true,
    "secure_mode": false,
    "return_word_timestamps": false,
    "status": 200,
    "transcription": "Խոսում է, գրում է, պատմում է, ասում է։",
    "audio_duration": 3
}

Conversational model

model-23012026 Released 23 Jan 2026

Best for conversational (non-dialectal) speech. Transcribes exactly what it hears.

Transcribe with the Conversational model:

curl --location 'https://api.hispeech.ai/api/v1/transcriptions/upload?stt_model=model-23012026' \
--header 'x-auth-token: YOUR_API_KEY' \
--form 'file=@"/path/to/audio_or_video.mp3"' \
--form 'wait_for_result="true"'

Response Example:

{
    "job_id": "84379d20-5d9a-11f1-a2d3-8a2446801cc3",
    "error": null,
    "success": true,
    "secure_mode": false,
    "return_word_timestamps": false,
    "status": 200,
    "transcription": "Խոսում ա գրում ա, պատմում ա, ասում ա։",
    "audio_duration": 3
}

Do not forget to replace:
• YOUR_API_KEY
• /path/to/audio_or_video.mp3
with your own correct values.

Description

Send a request and wait for the transcription result.

Good for short audio files or immediate results.

Works with every speech-to-text model — see the Models section. Pick one with the stt_model query parameter; if you omit it, the default model is used.

Speech-to-Text

Our API offers two distinct modes to handle your data, giving you control over privacy and storage.

Standard Mode

In Standard Mode:
• Both audio files and their corresponding transcriptions are securely stored on our servers.
• Your data remains available until you explicitly delete it via the API or your dashboard.

This mode is ideal for long-term access.

How to use (Standard Mode)?

Just make a request:

curl --location 'https://api.hispeech.ai/api/v1/transcriptions/upload' \
--header 'x-auth-token: YOUR_API_KEY' \
--form 'file=@"/path/to/audio_or_video.mp3"' \
--form 'wait_for_result="true"'

Response Example:

{
  "job_id": "c9bc1860-2279-11f0-8545-24e1ac00d517",
  "error": "null"
  "success": "true"
  "secure_mode": "false"
  "transcription": "Ողջույն, ես սղագրվել եմ հայսփիչի միջոցով"
}

(Optional). Delete the transcription and audio:

curl --location --request DELETE 'https://api.hispeech.ai/api/v1/transcriptions' \
--header 'x-auth-token: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{ "transcription_id": "JOB_ID" }'

Response Example:

{
  "success": "true"
  "message": "transcription deleted successfully"
}

Secure Mode

Secure Mode is designed with privacy-first principles. When enabled no data is ever stored — neither audio nor transcription is saved on our servers or your device.

Use Secure Mode to ensure maximum privacy — no history is kept, and all data is processed temporarily without being saved anywhere.

How to use (Secure Mode)?

Just make a request:

curl --location 'https://api.hispeech.ai/api/v1/transcriptions/upload' \
--header 'x-auth-token: YOUR_API_KEY' \
--form 'file=@"/path/to/audio_or_video.mp3"' \
--form 'secure_mode="true"' \
--form 'wait_for_result="true"'

Note: by default secure_mode=false.

Response Example:

{
  "job_id": "c9bc1860-2279-11f0-8545-24e1ac00d517",
  "error": "null"
  "success": "true"
  "secure_mode": "true"
  "ttl": "900"
  "transcription": "Ողջույն, ես սղագրվել եմ հայսփիչի միջոցով"
}

Supported audio/video formats:

.mp4 .mov .mkv .webm .3gp .hevc .m4v .mp3 .aac .m4a .wav .flac .ogg .opus .aiff .aif .aifc .alac .wma .asf .adpcm .snd .gsm .amr .awb .dss .ds2

Do not forget to replace:
• YOUR_API_KEY
• /path/to/audio_or_video.mp3
with your own correct values.

Description

Submit a job and receive a job ID immediately. Check the job status later to retrieve the results.

Best for long audio files or batch processing.

Works with every speech-to-text model — see the Models section. Pick one with the stt_model query parameter; if you omit it, the default model is used.

Speech-to-Text

Our API offers two distinct modes to handle your data, giving you control over privacy and storage.

Standard Mode

This mode is ideal for long-term access.

How to use (Standard Mode)?

Step 1. Submit a transcription job:

curl --location 'https://api.hispeech.ai/api/v1/transcriptions/upload' \
--header 'x-auth-token: YOUR_API_KEY' \
--form 'file=@"/path/to/audio_or_video.mp3"' \
--form 'wait_for_result="false"'

Response Example:

{
  "job_id": "c9bc1860-2279-11f0-8545-24e1ac00d517",
  "error": "null"
  "success": "true"
  "secure_mode": "false"
}

Step 2. Check job status and get transcription:

You can access the transcription using job_id. In order to check the status and get the transcription, you have to periodically call this endpoint. But you are limited to 60 calls per hour.

Best practice: Check status every 60 seconds.

curl --location 'https://api.hispeech.ai/api/v1/transcriptions/transcription-status/JOB_ID' \
--header 'x-auth-token: YOUR_API_KEY'

Response Example (processing):

{
  "job_id": "c9bc1860-2279-11f0-8545-24e1ac00d517",
  "status": "processing"
  "secure_mode": "false"
}

Response Example (completed transcription):

{
  "job_id": "c9bc1860-2279-11f0-8545-24e1ac00d517",
  "status": "completed",
  "audio_duration": 10,
  "title": "Ողջույն, ես սղագրվել եմ հայսփիչի միջոցով",
  "transcription": "Ողջույն, ես սղագրվել եմ հայսփիչի միջոցով։ Հայսփիչը ընդունակ է շատ արագ սղագրել  հայերեն բանավոր խոսքը: Ես հասկանում եմ տարբեր բարբառներ և ակցենտներ:"
  "secure_mode": "false"
}

Step 3 (optional). Delete the transcription and audio:

curl --location --request DELETE 'https://api.hispeech.ai/api/v1/transcriptions' \
--header 'x-auth-token: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{ "transcription_id": "JOB_ID" }'

Response Example:

{
  "success": "true"
  "message": "transcription deleted successfully"
}

Secure Mode

Secure Mode is designed with privacy-first principles. When enabled:
1. No data is ever stored — neither audio nor transcription is saved on our servers or your device.
2. The transcription is temporarily held in memory and will be permanently deleted after it is accessed once through the "Check job status and get transcription" API (see below).
3. If not accessed, the transcription is automatically deleted after a configurable TTL (Time-To-Live), which defaults to 900 seconds (15 minutes).
• TTL can be set between 1 and 36000 seconds (up to 10 hours).
• After the TTL expires, the transcription will be automatically deleted and cannot be retrieved.

Use Secure Mode to ensure maximum privacy — no history is kept, and all data is processed temporarily without being saved anywhere.

How to use (Secure Mode)?

Step 1. Submit a transcription job:

curl --location 'https://api.hispeech.ai/api/v1/transcriptions/upload' \
--header 'x-auth-token: YOUR_API_KEY' \
--form 'file=@"/path/to/audio_or_video.mp3"' \
--form 'wait_for_result="false"' \
--form 'secure_mode="true"' \
--form 'ttl="900"'

Note: secure_mode and ttl are optional parameters. By default: secure_mode=false and ttl=900 seconds. When secure_mode is false, the ttl value is ignored.

Response Example:

{
  "job_id": "c9bc1860-2279-11f0-8545-24e1ac00d517",
  "error": "null"
  "success": "true"
  "secure_mode": "true"
  "ttl": "900"
}

Step 2. Check job status and get transcription:

You can access the transcription using job_id. In order to check the status and get the transcription, you have to periodically call this endpoint. But you are limited to 60 calls per hour.

Best practice: Check status every 60 seconds.

curl --location 'https://api.hispeech.ai/api/v1/transcriptions/transcription-status/JOB_ID' \
--header 'x-auth-token: YOUR_API_KEY'

Response Example (processing):

{
  "job_id": "c9bc1860-2279-11f0-8545-24e1ac00d517",
  "status": "processing"
  "secure_mode": "true"
}

Response Example (completed transcription):

{
  "job_id": "c9bc1860-2279-11f0-8545-24e1ac00d517",
  "status": "completed",
  "audio_duration": 10,
  "title": "Ողջույն, ես սղագրվել եմ հայսփիչի միջոցով",
  "transcription": "Ողջույն, ես սղագրվել եմ հայսփիչի միջոցով։ Հայսփիչը ընդունակ է շատ արագ սղագրել  հայերեն բանավոր խոսքը: Ես հասկանում եմ տարբեր բարբառներ և ակցենտներ:"
  "secure_mode": "true"
}

Supported audio/video formats:

.mp4 .mov .mkv .webm .3gp .hevc .m4v .mp3 .aac .m4a .wav .flac .ogg .opus .aiff .aif .aifc .alac .wma .asf .adpcm .snd .gsm .amr .awb .dss .ds2

Do not forget to replace:
   • YOUR_API_KEY
   • JOB_ID
   • /path/to/audio_or_video.mp3
with your own correct values.

Description

Compatible with the official OpenAI API structure and SDKs, making integration easy.

Ideal if you already use OpenAI clients or libraries.

OpenAI-compatible tools

The OpenAI-compatible API currently uses only the default model, listed in the Models section. Model selection via stt_model is not available on this endpoint yet.

You can integrate our API into OpenAI-compatible tools by setting:

BaseURL = "https://api.hispeech.ai/api"
Using your Hispeech API key (Authorization: Bearer YOUR_HISPEECH_API_KEY)

cURL example underneath will be:

curl --location 'https://api.hispeech.ai/api/v1/audio/transcriptions' \
--header 'Authorization: Bearer YOUR_HISPEECH_API_KEY' \
--form 'file=@"/path/to/audio_or_video.mp3"'

Response:

{
  "duration": 7,
  "language": "auto",
  "segments": [
    {
      "id": 0,
      "start": 0,
      "end": 7,
      "text": "Ողջույն, ես սղագրվել եմ հայսփիչի միջոցով"
    }
  ],
  "task": "transcribe",
  "text": "Ողջույն, ես սղագրվել եմ հայսփիչի միջոցով"
}

OpenAI NodeJS library ("openai")

Here is a basic example how to use the OpenAI NodeJS library called "openai" to transcribe audio with the Hispeech API:

// Node.js example for OpenAI transcription API
import fs from "fs";
import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: YOUR_HISPEECH_API_KEY,
  baseURL: "https://api.hispeech.ai/api/v1",
});

async function transcribeAudio(filePath) {
  try {
    const audioFile = fs.createReadStream(filePath);
    const response = await openai.audio.transcriptions.create({
      file: audioFile,
      model: "whisper-1", // just keep this for compatibility
    });
    console.log("Transcription result:\n", response);
  } catch (error) {
    console.error("Error during transcription:");
    if (error.response) {
      console.error("Status:", error.response.status);
      console.error("Data:", error.response.data);
    } else {
      console.error(error.message);
    }
  }
}

// Example usage
transcribeAudio("/path/to/audio_or_video.mp3");

In case of the Node.js library, the Base URL is "https://api.hispeech.ai/api/v1" (ends with "v1").

Supported audio/video formats:

.mp4 .mov .mkv .webm .3gp .hevc .m4v .mp3 .aac .m4a .wav .flac .ogg .opus .aiff .aif .aifc .alac .wma .asf .adpcm .snd .gsm .amr .awb .dss .ds2

Do not forget to replace:
• YOUR_HISPEECH_API_KEY
• /path/to/audio_or_video.mp3
with your own correct values.