Text Moderation Explanations

A guide to our Text Moderation Explanations API

Overview

Text Moderation Explanations is a new feature that explains why a given text string was assigned a certain score by our Text Moderation model. The API takes in three inputs: a text string, its class label, and the score it was assigned. The output is a text string that explains why the original input text was given that score relative to its class.

Supported Languages

We currently support the following languages for this feature:

  • English
  • Hindi
  • Spanish
  • Portuguese
  • French
  • Arabic
  • Italian
  • German

If you are unsure if your required language is supported/want to request an additional language, please reach out to our sales team ([email protected]).

Request Format

Below are the input fields for a Text Moderation Explanations request.

class: The class that Text Moderation assigned to the original input text. The possible classes are: “sexual”, “bullying”, “hate”, “violence”.
severity: The original score that Text Moderation assigned to the input text. It is an integer value ranging from 0 (benign) to 3 (most severe), inclusive.
text: The input text string, whose severity (relative to its class) the user would like explained. The maximum amount of characters is 1024.

Here is an example of a cURL request using the following format:

curl -X POST "https://api.thehive.ai/api/v2/task/sync" \
-H "Authorization: Token koYDZUYPYDuwnkb7iDLBY9UnHas32Xtt" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d 'text_data=You are the worst person I have ever met' \
-d 'options={"class": "bullying", "severity": "2"}'

Response

Below are the output fields for a response.

text: The output text string. An explanation for why the input text string received its designated severity.

For an annotated sample response, please refer to our API Reference page.