Transcription Contents

Version 1


The transcript of an audio recording. Part of a Transcription.

Transcription
      ↪ Transcription Contents
          ↪ TranscriptionSegment
              ↪ TranscriptionWord

Fields

Name Type Notes

version

Integer

Version of this transcription object. Different versions can have a different structure.

language

String

Detected language of the audio.

text

String

The full transcript of the audio, without regard for timestamps or speakers.

languageProbability

Double

Confidence in the detected language. Between 0 (low) and 1 (high).

segments

[TranscriptionSegment]

List of all segments.

TranscriptionSegment

A segment is a contiguous portion of the audio, typically corresponding to a short phrase or sentence, along with its associated start and end timestamps.

Name Type Notes

speakerId

String

ID of the speaker.
- 0: Mono recording. Unable to extract speakers.
- 11: Channel 1 speaker
- 12: Channel 2 speaker
- 13: Channel 3 speaker
- NN: Channel N speaker

startMs

Long

Start time of this segment, in milliseconds.

endMs

Long

End time of this segment, in milliseconds.

text

String

Text of this segment.

word

[TranscriptionWord]

List of words.

TranscriptionWord

A word is a finer-grained unit within a segment. Each word represents a single transcribed word from the audio, along with its own start and end timestamps (denoting when that word was spoken in the audio).

Name Type Notes

startMs

Long

Start time of this word, in milliseconds.

endMs

Long

End time of this word, in milliseconds.

text

String

Text of this word.

probability

Double

Confidence in the detected text. Between 0 (low) and 1 (high).