Hive’s Text Moderation response format is an instantiation of the general classification response with additional fields to support the pattern-matching algorithms and the optional splitting of larger text inputs into sentence chunks.

Pattern-matching algorithm response:
The pattern-matching algorithm results for profanity are returned in the text_filter object. Similarly, pattern-matching algorithm results for PII are returned in the pii_entities object. Each pattern match will return an object describing matched substring in the value field, the start and end index of the pattern match in the start_index and end_index field, respectively, and the type (profanity, email, phone number, etc.).

Deep Learning model classifications:
The classified language is returned in the language object. Based on the classified language, and depending on the currently supported text moderation model classes, the moderated_classes field will indicate which classes have been moderated. If the classified language is "UNSUPPORTED", the moderated_classes array will be empty. The output array will contain the deep learning model results for each supported class. For longer text inputs, if the text input was split into sentence chunks, each object in the output array will correspond to the model results for that sentence chunk. The model results follow the general classification format.

"response": {
      "input": "..."
      "custom_classes": [],
      "text_filters": [
        {
          "value": "ASS",
          "start_index": 107,
          "end_index": 110,
          "type": "profanity"
        }
      ],
      "pii_entities": [
        {
          "value": "[email protected]",
          "start_index": 38,
          "end_index": 57,
          "type": "Email Address"
        },
        {
          "value": " 617-768-2274.",
          "start_index": 80,
          "end_index": 94,
          "type": "U.S. Phone Number"
        }
      ],
      "language": "EN",
      "moderated_classes": [
        "sexual",
        "hate",
        "violence",
        "bullying",
        "spam"
      ],
      "output": [
        {
          "time": 0,
          "start_char_index": 0,
          "end_char_index": 110,
          "classes": [
            {
              "class": "spam",
              "score": 3
            },
            {
              "class": "sexual",
              "score": 2
            },
            {
              "class": "hate",
              "score": 0
            },
            {
              "class": "violence",
              "score": 0
            },
            {
              "class": "bullying",
              "score": 0
            }
          ]
        }
      ]
    }

Name

Description

classes

List of dictionaries of all output classes. Each dictionary contains the class name and the score. The scores range from 0 to 3 with 3 being the most severe.

class

Name of predicted class.

score

Score of predicted class.

start_char_index

First character processed.

end_char_index

Last character processed.