Transcription Contents
Version 1
The transcript of an audio recording. Part of a Transcription.
Fields
Name | Type | Notes | |
---|---|---|---|
version |
|
Version of this transcription object. Different versions can have a different structure. |
|
language |
|
Detected language of the audio. |
|
text |
|
The full transcript of the audio, without regard for timestamps or speakers. |
|
language_probability |
|
Confidence in the detected language. Between 0 (low) and 1 (high). |
|
segments |
|
List of all segments. |
Segment
A segment is a contiguous portion of the audio, typically corresponding to a short phrase or sentence, along with its associated start and end timestamps.
Name | Type | Notes | |
---|---|---|---|
speaker_id |
|
ID of the speaker. |
|
start_ms |
|
Start time of this segment, in milliseconds. |
|
end_ms |
|
End time of this segment, in milliseconds. |
|
text |
|
Text of this segment. |
|
word |
|
List of words. |
Word
A word is a finer-grained unit within a segment. Each word represents a single transcribed word from the audio, along with its own start and end timestamps (denoting when that word was spoken in the audio).
Name | Type | Notes | |
---|---|---|---|
start_ms |
|
Start time of this word, in milliseconds. |
|
end_ms |
|
End time of this word, in milliseconds. |
|
text |
|
Text of this word. |
|
probability |
|
Confidence in the detected text. Between 0 (low) and 1 (high). |