Transcription Contents

Version 1

The transcript of an audio recording. Part of a Transcription.

Transcription
      ↪ Transcription Contents
          ↪ Segment
              ↪ Word

Fields

Name Type Notes

Name	Type	Notes
version	`Integer`	Version of this transcription object. Different versions can have a different structure.
language	`String`	Detected language of the audio.
text	`String`	The full transcript of the audio, without regard for timestamps or speakers.
language_probability	`Double`	Confidence in the detected language. Between 0 (low) and 1 (high).
segments	`[Segment]`	List of all segments.

version

Integer

Version of this transcription object. Different versions can have a different structure.

language

String

Detected language of the audio.

text

String

The full transcript of the audio, without regard for timestamps or speakers.

language_probability

Double

Confidence in the detected language. Between 0 (low) and 1 (high).

segments

[Segment]

List of all segments.

Segment

A segment is a contiguous portion of the audio, typically corresponding to a short phrase or sentence, along with its associated start and end timestamps.

Name Type Notes

Name	Type	Notes
speaker_id	`String`	ID of the speaker. - 0: Mono recording. Unable to extract speakers. - 11: Channel 1 speaker - 12: Channel 2 speaker - 13: Channel 3 speaker - NN: Channel N speaker
start_ms	`Long`	Start time of this segment, in milliseconds.
end_ms	`Long`	End time of this segment, in milliseconds.
text	`String`	Text of this segment.
word	`[Word]`	List of words.

speaker_id

String

ID of the speaker.
- 0: Mono recording. Unable to extract speakers.
- 11: Channel 1 speaker
- 12: Channel 2 speaker
- 13: Channel 3 speaker
- NN: Channel N speaker

start_ms

Long

Start time of this segment, in milliseconds.

end_ms

Long

End time of this segment, in milliseconds.

text

String

Text of this segment.

word

[Word]

List of words.

Word

A word is a finer-grained unit within a segment. Each word represents a single transcribed word from the audio, along with its own start and end timestamps (denoting when that word was spoken in the audio).

Name Type Notes

Name	Type	Notes
start_ms	`Long`	Start time of this word, in milliseconds.
end_ms	`Long`	End time of this word, in milliseconds.
text	`String`	Text of this word.
probability	`Double`	Confidence in the detected text. Between 0 (low) and 1 (high).

start_ms

Long

Start time of this word, in milliseconds.

end_ms

Long

End time of this word, in milliseconds.

text

String

Text of this word.

probability

Double

Confidence in the detected text. Between 0 (low) and 1 (high).