Hive’s Text Moderation response format is an instantiation of the general classification response with additional fields to support the pattern-matching algorithms and the optional splitting of larger text inputs into sentence chunks.
Pattern-matching algorithm response:
The pattern-matching algorithm results for profanity are returned in the text_filter object. Similarly, pattern-matching algorithm results for PII are returned in the piientities object. Each pattern match will return an object describing matched substring in the **_value field, the start and end index of the pattern match in the start_index and end_index field, respectively, and the type** (profanity, email, phone number, etc.).
Deep Learning model classifications:
The classified language is returned in the language object. Based on the classified language, and depending on the currently supported text moderation model classes, the moderated_classes field will indicate which classes have been moderated. If the classified language is "UNSUPPORTED", the moderated_classes array will be empty. The output array will contain the deep learning model results for each supported class.
Note: We are aware of an issue where start_index and end_index may be offset or misaligned relative to the text input in some cases. This can occur if the text input is significantly distorted with non-alphabetic characters, if pattern matching occurs on a subword, or if many characters are repeated on both ends of the text input. We are working to optimize our solution to this issue.
{
"id": "19fa9f60-5b06-11ed-8fc4-659b38d4c6a6",
"code": 200,
"project_id": 41392,
"user_id": 3121654,
"created_on": "2022-11-02T23:28:48.546Z",
"status": [
{
"status": {
"code": "0",
"message": "SUCCESS"
},
"response": {
"input": {
"hash": "18320b1e0a343a0ddf7d595f6e64b683",
"inference_client_version": "6.0.13",
"model": "...",
"model_type": "TEXT_CLASSIFICATION",
"model_version": 1,
"text": "...",
"id": "19fa9f60-5b06-11ed-8fc4-659b38d4c6a6",
"created_on": "2022-11-02T23:28:48.214Z",
"user_id": 3121654,
"project_id": 41392,
"charge": 0.003
},
"custom_classes": [
{
"value": "HO",
"start_index": 139,
"end_index": 141,
"class": "badwords_subwords"
},
{
"value": "HO",
"start_index": 178,
"end_index": 180,
"class": "badwords_subwords"
},
{
"value": "HIV",
"start_index": 226,
"end_index": 229,
"class": "badwords_subwords"
},
{
"value": "PORN",
"start_index": 276,
"end_index": 280,
"class": "badwords_no_subwords"
},
{
"value": "PORN",
"start_index": 276,
"end_index": 280,
"class": "badwords_subwords"
},
{
"value": "KILL",
"start_index": 285,
"end_index": 289,
"class": "badwords_no_subwords"
},
{
"value": "KILL",
"start_index": 285,
"end_index": 289,
"class": "badwords_subwords"
},
{
"value": "NIGGER",
"start_index": 301,
"end_index": 307,
"class": "badwords_no_subwords"
},
{
"value": "NIGGER",
"start_index": 301,
"end_index": 307,
"class": "badwords_subwords"
}
],
"text_filters": [
{
"value": "ASSHOLE",
"start_index": 16,
"end_index": 23,
"type": "profanity"
},
{
"value": "PORN",
"start_index": 276,
"end_index": 280,
"type": "profanity"
},
{
"value": "NIGGER",
"start_index": 301,
"end_index": 307,
"type": "profanity"
}
],
"pii_entities": [
{
"value": "[email protected]",
"start_index": 38,
"end_index": 62,
"type": "Email Address"
},
{
"value": "123 YERBA BUENA LN, SAN FRANCISCO, CA 94103",
"start_index": 81,
"end_index": 124,
"type": "U.S. Mailing Address"
},
{
"value": "617-768-2274",
"start_index": 152,
"end_index": 164,
"type": "U.S. Phone Number"
},
{
"value": "+91-92342-43234",
"start_index": 190,
"end_index": 205,
"type": "International Phone Number"
}
],
"urls": [
{
"value": "thehive.ai/projects/99999/settings",
"base_domain": "thehive.ai",
"start_index": 215,
"end_index": 257
}
],
"language": "EN",
"moderated_classes": [
"sexual",
"hate",
"violence",
"bullying",
"spam",
"promotions",
"gibberish",
"child_exploitation",
"phone_number"
],
"output": [
{
"time": 0,
"start_char_index": 0,
"end_char_index": 307,
"classes": [
{
"class": "spam",
"score": 3
},
{
"class": "sexual",
"score": 1
},
{
"class": "hate",
"score": 3
},
{
"class": "violence",
"score": 3
},
{
"class": "bullying",
"score": 3
},
{
"class": "promotions",
"score": 3
},
{
"class": "gibberish",
"score": 0
},
{
"class": "child_exploitation",
"score": 0
},
{
"class": "phone_number",
"score": 3
}
]
}
]
}
}
],
"from_cache": false
}
Name | Description |
---|---|
classes | List of dictionaries of all output classes. Each dictionary contains the class name and the score. The scores range from 0 to 3 with 3 being the most severe. |
class | Name of predicted class. |
score | Score of predicted class. |
start_char_index | First character processed. |
end_char_index | Last character processed. |