Hive’s speech moderation model outputs a transcript and then a set of classifications, timestamps and indexes for each sentence in the transcript.

{
  "response": {
        "input": "...",
        "output": [
          {
            "transcript": "Hello, this is a test for the speech text model. Thank you.",
            "classifications": [
              {
                "classes": [
                  {
                    "class": "sexual",
                    "score": 0
                  },
                  {
                    "class": "hate",
                    "score": 0
                  },
                  {
                    "class": "violence",
                    "score": 0
                  },
                  {
                    "class": "bullying",
                    "score": 0
                  }
                ],
                "text": "Hello, this is a test for the speech text model.",
                "custom_classes": [],
                "text_filters": [],
                "pii_entities": [],
                "start_timestamp": 1.92,
                "end_timestamp": 5.76,
                "start_char_index": 0,
                "end_char_index": 48
              },
              {
                "classes": [
                  {
                    "class": "sexual",
                    "score": 0
                  },
                  {
                    "class": "hate",
                    "score": 0
                  },
                  {
                    "class": "violence",
                    "score": 0
                  },
                  {
                    "class": "bullying",
                    "score": 0
                  }
                ],
                "text": "Thank you.",
                "custom_classes": [],
                "text_filters": [],
                "pii_entities": [],
                "start_timestamp": 6.72,
                "end_timestamp": 7.12,
                "start_char_index": 49,
                "end_char_index": 59
              }
            ],
            "words": [
              {
                "time": 1.92,
                "alternatives": [
                  {
                    "text": "Hello",
                    "score": 0.9104309678077698
                  }
                ],
                "type": "pronunciation",
                "meta": {}
              },
              {
                "time": 1.92,
                "alternatives": [
                  {
                    "text": ",",
                    "score": 0.4422236680984497
                  }
                ],
                "type": "punctuation",
                "meta": {}
              },
              {
                "time": 2.88,
                "alternatives": [
                  {
                    "text": "this",
                    "score": 0.9942981600761414
                  }
                ],
                "type": "pronunciation",
                "meta": {}
              },
              {
                "time": 3.2,
                "alternatives": [
                  {
                    "text": "is",
                    "score": 0.9997261166572571
                  }
                ],
                "type": "pronunciation",
                "meta": {}
              },
              {
                "time": 3.44,
                "alternatives": [
                  {
                    "text": "a",
                    "score": 0.9971330165863037
                  }
                ],
                "type": "pronunciation",
                "meta": {}
              },
              {
                "time": 3.6,
                "alternatives": [
                  {
                    "text": "test",
                    "score": 0.9948557615280151
                  }
                ],
                "type": "pronunciation",
                "meta": {}
              },
              {
                "time": 4,
                "alternatives": [
                  {
                    "text": "for",
                    "score": 0.9994066953659058
                  }
                ],
                "type": "pronunciation",
                "meta": {}
              },
              {
                "time": 4.32,
                "alternatives": [
                  {
                    "text": "the",
                    "score": 0.9988841414451599
                  }
                ],
                "type": "pronunciation",
                "meta": {}
              },
              {
                "time": 4.56,
                "alternatives": [
                  {
                    "text": "speech",
                    "score": 0.9517272710800171
                  }
                ],
                "type": "pronunciation",
                "meta": {}
              },
              {
                "time": 5.12,
                "alternatives": [
                  {
                    "text": "text",
                    "score": 0.9351105093955994
                  }
                ],
                "type": "pronunciation",
                "meta": {}
              },
              {
                "time": 5.68,
                "alternatives": [
                  {
                    "text": "model",
                    "score": 0.9877890944480896
                  }
                ],
                "type": "pronunciation",
                "meta": {}
              },
              {
                "time": 5.68,
                "alternatives": [
                  {
                    "text": ".",
                    "score": 0.689773678779602
                  }
                ],
                "type": "punctuation",
                "meta": {}
              },
              {
                "time": 6.72,
                "alternatives": [
                  {
                    "text": "Thank",
                    "score": 0.9980304837226868
                  }
                ],
                "type": "pronunciation",
                "meta": {}
              },
              {
                "time": 7.04,
                "alternatives": [
                  {
                    "text": "you",
                    "score": 0.9963014125823975
                  }
                ],
                "type": "pronunciation",
                "meta": {}
              },
              {
                "time": 7.04,
                "alternatives": [
                  {
                    "text": ".",
                    "score": 0.8989142179489136
                  }
                ],
                "type": "punctuation",
                "meta": {}
              }
            ]
          }
        ]
      }
    }

Name

Description

transcript

Transcript of entire video or audio clip at once.

words[j].time

Timestamp in seconds for each predicted word or punctuation in the transcript.

words[j].type

pronunciation: If the predicted character string is a word.
punctuation: If the predicted character string is a punctuation.

words[j].alternatives[k].text

Predicted character string at that timestamp.

words[j].alternatives[k].scores

Confidence score for the predicted character string.

alternatives

List of alternative word predictions at each timestamp.

classifications[i].classes

List of dictionaries of all output classes. Each dictionary contains the class name and the score. The scores range from 0 to 3 with 3 being the most severe.

classifications[i].classes.class

Name of predicted class.

classifications[i].classes.score

Score of predicted class.

classifications[i].start_char_index

First character processed.

classifications[i].end_char_index

Last character processed.