Transcription Contents

Version 1


The transcript of an audio recording. Part of a Transcription.

Transcription
      ↪ Transcription Contents
          ↪ Segment
              ↪ Word

Fields

Name Type Notes

version

Integer

Version of this transcription object. Different versions can have a different structure.

language

String

Detected language of the audio.

text

String

The full transcript of the audio, without regard for timestamps or speakers.

language_probability

Double

Confidence in the detected language. Between 0 (low) and 1 (high).

segments

[Segment]

List of all segments.

Segment

A segment is a contiguous portion of the audio, typically corresponding to a short phrase or sentence, along with its associated start and end timestamps.

Name Type Notes

speaker_id

String

ID of the speaker.
- 0: Mono recording. Unable to extract speakers.
- 11: Channel 1 speaker
- 12: Channel 2 speaker
- 13: Channel 3 speaker
- NN: Channel N speaker

start_ms

Long

Start time of this segment, in milliseconds.

end_ms

Long

End time of this segment, in milliseconds.

text

String

Text of this segment.

word

[Word]

List of words.

Word

A word is a finer-grained unit within a segment. Each word represents a single transcribed word from the audio, along with its own start and end timestamps (denoting when that word was spoken in the audio).

Name Type Notes

start_ms

Long

Start time of this word, in milliseconds.

end_ms

Long

End time of this word, in milliseconds.

text

String

Text of this word.

probability

Double

Confidence in the detected text. Between 0 (low) and 1 (high).