Hive’s speech-to-text model outputs a transcript and timestamps for each word in the prediction.
{
"id": "24c3bde0-5b07-11ed-8204-4b1fd39434b3",
"code": 200,
"project_id": 41564,
"user_id": 3121654,
"created_on": "2022-11-02T23:36:21.562Z",
"status": [
{
"status": {
"code": "0",
"message": "SUCCESS"
},
"response": {
"input": {
"id": "24c3bde0-5b07-11ed-8204-4b1fd39434b3",
"created_on": "2022-11-02T23:36:15.806Z",
"user_id": 3121654,
"project_id": 41564,
"charge": 0.10200000000000001,
"model": "multilingual_v1",
"model_version": 2,
"model_type": "TRANSCRIPTION",
"hash": "76d67994b63bb5991ad90e4bdeba66bf",
"media": {
"url": null,
"type": "AUDIO",
"mime_type": "m4a",
"mimetype": "audio/m4a",
"duration": 33.144479,
"filename": "Building.m4a"
}
},
"output": [
{
"transcript": "And so the Woy Thing bot is that, like, I think, I think the Alpha logic looks like you have, like, each time sap ple have a series of for it. Okay. And this is if you have a major. Right, right. And you speak the into a into a language model athen. No, do put that into a actually. Okay, I say, Yeah, because I was wondering why, like, you know, Tet times, for example, like to be kind of the biggest thing, you know, B er, like.",
"words": [
{
"time": 1.52,
"alternatives": [
{
"text": "And",
"score": 0.23571264642017406
}
],
"type": "pronunciation",
"meta": {}
},
{
"time": 1.66,
"alternatives": [
{
"text": "so",
"score": 0.8547682215857079
}
],
"type": "pronunciation",
"meta": {}
},
{
"time": 1.78,
"alternatives": [
{
"text": "the",
"score": 0.24715620931406074
}
],
"type": "pronunciation",
"meta": {}
},
{
"time": 1.92,
"alternatives": [
{
"text": "Woy",
"score": 0.0882212337357475
}
],
"type": "pronunciation",
"meta": {}
},
{
"time": 2.04,
"alternatives": [
{
"text": "Thing",
"score": 0.20552547462905607
}
],
"type": "pronunciation",
"meta": {}
},
{
"time": 2.2,
"alternatives": [
{
"text": "bot",
"score": 0.17888841354435747
}
],
"type": "pronunciation",
"meta": {}
},
{
"time": 2.4,
"alternatives": [
{
"text": "is",
"score": 0.5683327300151447
}
],
"type": "pronunciation",
"meta": {}
},
{
"time": 2.54,
"alternatives": [
{
"text": "that",
"score": 0.08846855855205872
}
],
"type": "pronunciation",
"meta": {}
},
{
"time": 2.66,
"alternatives": [
{
"text": ",",
"score": 0.5431774317653615
}
],
"type": "punctuation",
"meta": {}
},
{
"time": 2.72,
"alternatives": [
{
"text": "like",
"score": 0.36523307802689764
}
],
"type": "pronunciation",
"meta": {}
},
{
"time": 2.84,
"alternatives": [
{
"text": ",",
"score": 0.7326579822975169
}
],
"type": "punctuation",
"meta": {}
},
{
"time": 2.92,
"alternatives": [
{
"text": "I",
"score": 0.7160323656880226
}
],
"type": "pronunciation",
"meta": {}
},
{
"time": 3.02,
"alternatives": [
{
"text": "think",
"score": 0.21625709657570108
}
],
"type": "pronunciation",
"meta": {}
},
{
"time": 3.14,
"alternatives": [
{
"text": ",",
"score": 0.5415581650841741
}
],
"type": "punctuation",
"meta": {}
},
{
"time": 3.86,
"alternatives": [
{
"text": "I",
"score": 0.9967535198656756
}
],
"type": "pronunciation",
"meta": {}
},
{
"time": 3.94,
"alternatives": [
{
"text": "think",
"score": 0.9966784771549189
}
],
"type": "pronunciation",
"meta": {}
},
{
"time": 27.82,
"alternatives": [
{
"text": ".",
"score": 0.7616554849091626
}
],
"type": "punctuation",
"meta": {}
}
]
}
]
}
}
],
"from_cache": false
}
Name | Description |
---|---|
transcript | Transcript of entire video or audio clip at once. |
time | Timestamp in seconds for each predicted word or punctuation in the transcript. |
type | pronunciation: If the predicted character string is a word. punctuation: If the predicted character string is a punctuation. |
text | Predicted character string at that timestamp. |
score | Confidence score for the predicted character string. |
alternatives | List of alternative word predictions at each timestamp. |