Text Moderation Explanations is a new feature that explains why a given text string was assigned a certain score by our text moderation tools. The API takes in three inputs: a text string, its class label, and the score it was assigned. The output is a text string that explains why the original input text was given that score relative to its class.
An example JSON response is shown below.
"response": {
"Input": {
"args": {
"class": "bullying",
"severity": "2"
},
"text": "You are the worst person I have ever met"
},
"output": [
{
"text": "This is a bullying 2 because it is a disparaging statement against and individual without profanity"
}
]
}