Hive’s speech moderation model outputs a transcript and then a set of classifications, timestamps and indexes for each sentence in the transcript.
{
"id": "5ce602f0-5b07-11ed-80f8-138a369a2201",
"code": 200,
"project_id": 41565,
"user_id": 3121654,
"created_on": "2022-11-02T23:37:53.725Z",
"status": [
{
"status": {
"code": "0",
"message": "SUCCESS"
},
"response": {
"input": {
"id": "5ce602f0-5b07-11ed-80f8-138a369a2201",
"created_on": "2022-11-02T23:37:49.983Z",
"user_id": 3121654,
"project_id": 41565,
"charge": 0.10200000000000001,
"model": "multilingual_v1",
"model_version": 2,
"model_type": "TRANSCRIPTION",
"hash": "0f2701494d8d099e32076b97602af706",
"media": {
"url": null,
"filename": "Building.m4a",
"type": "AUDIO",
"mime_type": "m4a",
"mimetype": "audio/m4a",
"duration": 33.144479
}
},
"custom_classes": [],
"text_filters": [],
"pii_entities": [],
"language": "EN",
"moderated_classes": [
"sexual",
"violence",
"hate",
"bullying"
],
"output": [
{
"transcript": "And so the Woy Thing bot is that, like, I think, I think the Alpha logic looks like you have, like, each time sap ple have a series of for it. Okay. And this is if you have a major. Right, right. And you speak the into a into a language model athen. No, do put that into a actually. Okay, I say, Yeah, because I was wondering why, like, you know, Tet times, for example, like to be kind of the biggest thing, you know, B er, like.",
"classifications": [
{
"classes": [
{
"class": "sexual",
"score": 0
},
{
"class": "hate",
"score": 0
},
{
"class": "violence",
"score": 0
},
{
"class": "bullying",
"score": 0
}
],
"text": "And so the Woy Thing bot is that, like, I think, I think the Alpha logic looks like you have, like, each time sap ple have a series of for it.",
"custom_classes": [],
"text_filters": [],
"pii_entities": [],
"start_timestamp": 1.52,
"end_timestamp": 8.42,
"start_char_index": 0,
"end_char_index": 142
},
{
"classes": [
{
"class": "sexual",
"score": 0
},
{
"class": "hate",
"score": 0
},
{
"class": "violence",
"score": 0
},
{
"class": "bullying",
"score": 0
}
],
"text": "Okay, I say, Yeah, because I was wondering why, like, you know, Tet times, for example, like to be kind of the biggest thing, you know, B er, like.",
"custom_classes": [],
"text_filters": [],
"pii_entities": [],
"start_timestamp": 18.02,
"end_timestamp": 27.819999999999997,
"start_char_index": 283,
"end_char_index": 430
}
],
"words": [
{
"time": 1.52,
"alternatives": [
{
"text": "And",
"score": 0.23571264642017406
}
],
"type": "pronunciation",
"meta": {}
},
{
"time": 27.82,
"alternatives": [
{
"text": ".",
"score": 0.7616554849091626
}
],
"type": "punctuation",
"meta": {}
}
]
}
]
}
}
],
"from_cache": false
}
Name | Description |
---|---|
transcript | Transcript of entire video or audio clip at once. |
words[j].time | Timestamp in seconds for each predicted word or punctuation in the transcript. |
words[j].type | pronunciation: If the predicted character string is a word. punctuation: If the predicted character string is a punctuation. |
words[j].alternatives[k].text | Predicted character string at that timestamp. |
words[j].alternatives[k].scores | Confidence score for the predicted character string. |
alternatives | List of alternative word predictions at each timestamp. |
classifications[i].classes | List of dictionaries of all output classes. Each dictionary contains the class name and the score. The scores range from 0 to 3 with 3 being the most severe. |
classifications[i].classes.class | Name of predicted class. |
classifications[i].classes.score | Score of predicted class. |
classifications[i].start_char_index | First character processed. |
classifications[i].end_char_index | Last character processed. |