Text Moderation Explanations
A guide to our Text Moderation Explanations API
Overview
Text Moderation Explanations is a new feature that explains why a given text string was assigned a certain score by our Text Moderation model. The API takes in three inputs: a text string, its class label, and the score it was assigned. The output is a text string that explains why the original input text was given that score relative to its class.
Supported Languages
We currently support the following languages for this feature:
- English
- Hindi
- Spanish
- Portuguese
- French
- Arabic
- Italian
- German
If you are unsure if your required language is supported/want to request an additional language, please reach out to our sales team ([email protected]).
Request Format
Below are the input fields for a Text Moderation Explanations request.
class
: The class that Text Moderation assigned to the original input text. The possible classes are: “sexual”, “bullying”, “hate”, “violence”.
severity
: The original score that Text Moderation assigned to the input text. It is an integer value ranging from 0 (benign) to 3 (most severe), inclusive.
text
: The input text string, whose severity (relative to its class) the user would like explained. The maximum amount of characters is 1024.
Here is an example of a cURL request using the following format:
curl -X POST "https://api.thehive.ai/api/v2/task/sync" \
-H "Authorization: Token koYDZUYPYDuwnkb7iDLBY9UnHas32Xtt" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d 'text_data=You are the worst person I have ever met' \
-d 'options={"class": "bullying", "severity": "2"}'
Response
Below are the output fields for a response.
text
: The output text string. An explanation for why the input text string received its designated severity.
For an annotated sample response, please refer to our API Reference page.
Updated about 15 hours ago