Overview

This page gives a more comprehensive description of Hive’s Text Moderation classes. If you need more details or have further questions on certain types of content after reading the main Text Moderation page, look here. We’ll describe as clearly as possible and provide examples of what is covered by each model head, including multi-level classes if applicable.

All platforms have different moderation requirements and risk sensitivities, so we recommend that you consult these descriptions carefully as you decide which classes to moderate (and at what severity).

📘
NOTE:
To determine which classes and severity levels cover specific types of text, it may be helpful to search this page (Crtl/Cmd + F) with relevant terms. You can also use the sidebar on the right to navigate to classes of interest.

General Notes

Before looking at subject-matter breakdowns for each class, it’s helpful to understand the following:

Hive’s text classifier is multi-headed. Classifications from each model head (e.g., sexual, hate, violence, etc.) are made independently and returned together. If a message scores highly in multiple classes, it meets our definition of each class.
Some model heads define four classes that capture different severity levels. In the API response, these multi-level classes are represented by severity scores spanning integer values from 0 (benign) to 3 (most severe). Other model heads – spam, child exploitation, promotions, and phone numbers – are binary. For binary model heads, content will be classified as either Level 3 or Level 0. In these cases, Level 3 simply means that the relevant content is present; it is not intended to be an indicator of severity.
Just because a message is classified as Level 0 for one model head does not mean the content is clean. For this reason, we will describe the Level 0 classes as a non-exhaustive list of subject that is not captured by the higher severity classes but that might be helpful to distinguish between what is and is not flagged (e.g., borderline content, content captured by other classes).
Usernames that use intelligible words and phrases (including character replacements) generally follow the same rules as messages and other text strings.
The lists given in these subject matter breakdowns are not exhaustive, and the context around how a particular word or phrase is used can have a significant impact on the classification. We’ll provide descriptive examples and rules around contextual factors where relevant, but you can also use our web demo to get a rough impression for other cases interest.

Sexual Model Head

This is the main model head for flagging sexual messages and text content like solicitation, descriptions of sexual activity, and mentions of pornography. Slang that has a sexual connotation and certain emojis will also be flagged if there is enough context to infer sexual meaning.

Here’s a breakdown of subject matter captured by each class/severity level.

Level 3: Sexually Explicit

Captures text that explicitly references sexual activity, genitalia, and pornography. Text will also be classified as a level 3 if it explicitly refers to illegal or non-consensual activity. Specifically, level 3 flags the following cases:

-Phrases that explicitly or indirectly reference sexual intercourse or sexual activity with high confidence, including:

Sexual intercourse or penetration of any kind
Oral sex
Anal sex
Orgasm or sexual bodily fluids

-Phrases that explicitly reference or describe an individual masturbating, including:

Fingering
Jerking off and similar terms
Orgasm and sexual bodily fluids (see above)
Other references to masturbation e.g., brb gonna go stroke my wood
Note: Mentioning masturbation in an informational context without reference to a particular individual or act is classified as Level 1

-Phrases that mention or discuss pornography, porn websites, porn actors, or cam sites

-Phrases that mention or describe the use of sex toys by a person, including:

Dildos, fleshlights, vibrators, buttplugs, etc.
Common objects used as sex toys if there is context to conclude sexual intent
Note: Sex toys that are simply described without referring to individual use are classified as Level 1.

-Phrases that explicitly suggest sexual desire or turns-ons, including:

“Horny” and similar words
“Hard,” “wet,” and similar words, if context is sufficient to infer sexual meaning
“MILF” and variants

Phrases that include explicit mentions of genitals, sexual body parts, and sexual bodily fluids (including slang and synonyms)

This includes emojis commonly used to substitute for words describing body parts and sexual acts
“Butt” and synonyms are classified as Level 3 if used in the context of anal sex or penetration, typically Level 2 otherwise.
“Dick,” “pussy,” “asshole,” etc. are classified as Level 1 if used as an insult or in non-sexual contexts. Similarly, if “balls” is used as a colloquialism for courage, this would also be classified as Level 1 (see Sexual Level 1 description for examples)

-Phrases that use “whore” or “slut” to describe a person, context does not necessarily need to be sexual

-Phrases that explicitly describe, condone, promote, or threaten non-consensual or unlawful sexual activity with enough context to infer sexual meaning, including:

Rape and Molestation
Incest
Pedophilia
Prostitution
Bestiality

Level 2: Sexual

This class captures text that references sexual content like nudity and suggestive messages that are not necessarily explicit or use tamer language. Specifically, the level 2 class will flag:

-Phrases that explicitly mention nudity, but do not reference genitalia, breasts, etc, including:

References to being nude or undressing
Requesting or mentioning sending or receiving nude photos (note: the reference to nude photos must be explicit)
References to strippers or strip clubs

-Phrases that reference foreplay or kissing, licking, or sucking non-genital body parts (feet, face, neck, inner thighs, etc.)

-Phrases that implicitly suggest sexual desire or sexual intent ("turned on," "aroused," etc.)

-Phrases that comment suggestively on frequently sexualized body parts of a particular person (e.g., usually accompanied by names or pronouns like “your,” “her,” “his,” etc.).

This always includes “butt,” “ass,” and relevant synonyms and emojis, but can also include legs or lips if the accompanying language is suggestive enough, e.g., your legs are so sexy
Note: phrases that do the same for genitals and breasts are classified as Level 3

-Phrases that reference lingerie or revealing underwear (jockstraps, thongs, panties, etc.) if mentioned in connection with a wearer or human subject (usually accompanied by names or pronouns)

If lingerie or underwear is simply described with no clear reference to a person (e.g., a description of a product or general comment), this would be instead be classified as Level 1, e.g., that store sells really comfortable bras

-Phrases that use “ho,” “hoe,” or “thot” to refer to a person or group of people

Level 1: Potentially Sexual

This class captures references to sexuality and relationships that are not suggestive or explicit, like pet names, relationship status, and non-sexual compliments on appearance. Depending on context, Level 1 can also include non-sexual uses of potentially sexual words (e.g., profanity, insults), as well as sexual subject matter discussed in an informational, medical, or educational setting.

📘
USAGE:
Most of the content flagged by the Level 1 class would likely be considered benign by adults, but may not be appropriate for children.

Specifically, the level 1 class will flag:

-Phrases that mention affectionate activities besides sex, masturbation, and foreplay, including:

Kissing, making out
Cuddling
Hooking up if there is not enough context to tell that the phrase refers to sex
Note: If “hook up” or similar phrases clearly refer to sex based on context, this would instead be classified as Level 3. For example: we hooked up last night but the condom broke

-Phrases that include innocent or non-sexual flirting. This includes:

Compliments on a person’s physical appearance or non-sexualized attributes, e.g., your dimples are really cute
Lovey or sweet messages between romantic partners
Phrases that include pet names in a flirtatious way ("cutie," "babe," "sweetie")
Note: “Sexy” will be classified as at least Level 1, even when used to describe objects or non-sexual actions, e.g., that was a sexy shot

-Phrases that discuss level of sexual experience or relationship status

-Phrases that mention sexuality or sexual orientation in a casual, non-suggestive way

-Non-sexual uses of words that are otherwise be sexual, including:

“Dick,” “cock,” “pussy,” and similar terms when used as an insult or to convey weakness or disrespect
“Fucker” or “motherfucker” when used an insult or to describe a person in a non-sexual context
“Fuck me/him/her/them” if used without additional/sexual context
“Balls” when used as slang for courage
“Slut” or “whore” when used to convey attention-seeking behavior or fondness for non-human and non-sexual objects e.g., i hate all these self absorbed social media whores, i’m such a slut for chipotles queso
“Porn” when used in non-sexual contexts, like describing other types of photos/videos (e.g., Earth porn, food porn)

-Potentially sexual subject manner if discussed in informational, educational, or medical context, including:

Medical procedures and conditions, STDs, certain medications (e.g., Viagra, Plan B), certain types of contraception (e.g., condoms)
Discussions of human anatomy, sex, or reproductive health in medical contexts
Discussion of strip clubs, pornography, and sex workers in an informational or news context, e.g., Porn stars made over $1.2B in pay in 2019
Discussion of sperm banks, sperm donations, and animal sperm in breeding contexts
Discussion of sex crimes or illegal sex acts in a news context if discussed purely matter-of-fact and without actual description of sexual activity, e.g., sexual assault has become a growing problem on college campuses

-Sexual terms used as a figure of speech and not directed towards a human subject, e.g., I got absolutely railed by that exam

-Instances where the speaker is:

Rejecting a sexual advance
Rejecting, expressing disapproval, or non-consent on any mentioned sexual subject matter, e.g., i’m not gonna send nudes lol

Level 0: Non-Sexual

Messages that do not contain sexual or suggestive content or language are classified as Level 0 (benign). This includes some cases where potentially sexual words are used as a figure of speech, in other contexts, or when there is not enough context to infer sexual meaning or intent. For clarity, content is not flagged as Level 1, Level 2, or Level 3 includes:

Phrases that mention hugging, non-romantic, or parental affection
“Breast” when referring to food (e.g., turkey breast, chicken breast)
Internal reproductive organs (e.g., uterus, ovaries)
Pregnancy
The word “sex” when used is the context of gender, e.g., my sex is female
“Beautiful,” “hot,” “cute” etc. used to describe or compliment objects, things, or oneself e.g., I look HOT today
Insults or figures of speech using (non-sexual body) parts
“Fuck” used as profanity or as an insult without sexual meaning (this is flagged as profanity and/or bullying instead)
“Naked” or “nude” when not describing a person or used metaphorically, e.g., I feel naked without my headphones
“Suck” used descriptively or as an insult, e.g., You suck, their pizza fucking sucks, suck it!

Hate Model Head

This is our main model head for flagging hate speech directed at minorities and protected groups, including slang, slurs, and hateful use of emojis (e.g., different skin tones) depending on context.

🚧
NOTE:
Hateful or discriminatory language targeted at a specific individual or a specific group of people (e.g., a group of friends or classmates) is typically flagged as bullying instead of or in addition to hate.

Level 3: Hate Speech

This class flags language that is overtly discriminatory, threatening, directed, and/or violent. Slurs, demeaning terms, comparisons of minority groups to animals, and hateful ideology are all categorized as Level 3. The scope of content covered by Level 3 aligns roughly with the legal definition of hate speech against protected groups (including racial minorities, religious minorities, LGBT+ people, women), although slurs or demeaning terms directed at white people are also covered. Immigration status and nationality is also considered sensitive. Specifically, the Level 3 class will flag:

Phrases that call for or justify violence against a particular group
Phrases that describe a particular group as physically or morally inferior
Phrases that describe a particular group as criminals
Phrases that refer to a particular group as animals, sub-human, or non-human. Also applies to words like “trash,” “scum,” and “savages”
Slurs of any kind, including variants, intentional misspellings, and character substitutions
Support or promotion of hateful ideology or hate groups, including:

Content that speaks positively about symbols, slogans, gestures, groups, policies, or individuals that are explicitly hateful (Nazism, the Ku Klux Klan and other white power groups, hateful policies or atrocities toward protected groups)
Content that denies or makes light of well-documented atrocities or violent events against minority groups or individuals (Holocaust or other genocide denial, hate crimes, lynchings, etc.)

Level 2: Hateful

This class flags language that is a bit softer than hate speech but might exclude, silence, intimidate, or dissuade individuals from participating or feeling safe in a discussion or online community based on race, gender, sexuality, or religion. Specifically, the Level 2 class flags:

Phrases that perpetuate negative stereotypes, negative descriptions of different cultural practices, and references to specific physical traits used to stereotype groups
Statements that target protected groups on the basis of religious or moral beliefs, e.g., homosexuality is a sin
Certain slurs (e.g., "gay," "retarded") when used as casual insults rather than to target someone in the relevant protected group
Denying an individual’s gender identity (references to trans, non-binary, genderfluid etc. identities must be explicit)
Advocating for the removal of a protected group’s civil rights or legal protections
Promoting, condoning, or justifying exclusion, discrimination, or inequality on the basis of protected characteristics, e.g., it’s wrong for women to be the breadwinner, that should be a man’s job
Advocating violence against or destruction of religious texts, religious symbols, or places of worship
Language that is not explicitly hate speech, but degrades or implies lesser status of a protected group
Language that denounces, rejects, or criticizes the slurs flagged by Level 3 (if the message includes the word itself)

Level 1: Controversial

Generally, this class flags language around protected groups that might imply prejudice, bring up negative connotations, or provoke controversy or conflict. Content flagged as Level 1 would also generally be considered inappropriate for children. Generally, Level 1 flags:

Statements challenging the validity of minority status, e.g., asians don't face discrimination
Neutral or negative references to hateful ideology, hate symbols, slogans, gestures, groups, or policies that are understood as hateful, e.g., Police are the 21st century KKK. Note: Statements that express support are instead classified as Level 3 (see above)
Controversial topics mentioned in connection with a particular group (even in an informational manner), including crime, politics, voting, morality, intellect, educational achievement, work ethic, health, incarceration, privilege, reproductive health, colonization, cultural or religious attire, censorship & free speech, and civil rights. For example: gay men are 30 times more likely to get AIDS, Blacks make up half the prison population,white privilege doesn’t exist
Statements that defend against negative stereotypes or denounce slurs
Statements that imply preference for some groups over others, e.g.,I’m not attracted to Asians tbh
Statements that reference discrimination or inequities faced by a particular group, e.g., he’s only being attacked because he’s black
Controversial or vulgar terms related to a protected group when used in a non-hateful context and/or not directed toward an individual. For example, using “gay” or “retarded” with a negative connotation in references to things, actions, places. E.g., math is gay, that movie was retarded
Colloquial, humorous, or race-neutral uses of the spelling “nigga.” Context is very important – if any version of the word is used as a slur, or in a racialized context, it is automatically classified as Level 3.
Statements that suggest or imply affiliation with or support for a hate group, even if not serious, e.g., I’ll be partying with Hitler in hell

Level 0: Not hateful or controversial

Messages that do not contain hateful, offensive, or potentially controversial content related to protected groups or identity. To be clear, the following would not be flagged by Level 1, Level 2, or Level 3:

Identity, race, religious affiliation, etc. used in neutral or descriptive contexts, e.g., I’m black, my friend Jason is Jewish
Informational statements about protected groups in non-controversial contexts, e.g., Muslims celebrate Ramadan this month
Profanity that has a gendered undertone (“slut,” “bitch,” “whore”) when no other hateful content is present. Note: these words may instead be flagged as bullying or sexual, depending on context

Violence Model Head

This model head identifies text content that mentions violence, including threats toward an individual or group, encouraging or calling for violence, descriptions of past violence, self-harm, and other topics.

Here’s a breakdown of subject matter captured by each class/severity level:

Level 3: Serious Threats

This class flags violent threats that are explicitly malicious, intentional, realistic, and also involve severe physical or sexual violence. Level 3 covers threats where the speaker is directly threatening the violence or issuing a command or direct call for the violence. Actions that are severe enough to be flagged as Level 3 are:

Stabbing
Beating
Torture
Shooting
Kidnapping
Rape
Hanging
Killing
Breaking bones

This is not meant to be an exhaustive list. Generally, threats where the action is severe enough to cause serious bodily injury or death will be flagged.

This applies to threats of future violence [“if i see you around here again, i’m gonna do X”], descriptions of past violence by the speaker [“i did Y last time he crossed me”], and calls for violence in the imperative [“Z that guy!”].

🚧
NOTE:
To be flagged as Level 3, the threat needs to be perceivable as serious, deliberate and realistic. If context clearly indicates that violent language is used as a joke, exaggeration, or to be dramatic, this will be flagged as Level 1 instead (examples below).

Other cases that are captured by Level 3 include:

Threatening a person’s pets or animals. This generally applies only to threats against animals that would have a direct negative impact on a person or group
Threatening buildings or property when the act would have a direct negative impact on a person and carry serious risk of bodily harm
Threats in the form of a question
Violent threats in which humans are referred to as animals (e.g., “cockroaches,” “pigs,” “monkeys”)

Level 2: Incitement

This class captures cases that are not direct threats from the speaker, but nonetheless encourage, provoke, or support serious violence. The main point of differentiation from content captured by Level 3 is the use of hypothetical rather than direct language. For example, “I will do X” is classified as Level 3, but “I might do X,” “I want to do X,” or “Someone should do X” would be classified under Level 2. In other words, language that indicates a desire for violence or possible violence falls under Level 2 (incitement) rather than Level 3 (direct threat).

To be clear, this captures cases where:

The speaker is indicating possible violence they might or want to commit in the future
The speaker is calling for the violent action to be committed by others

Examples of hypothetical or equivocating language that might support a classification of Level 2 instead of Level 3 are actions phrased as “should,” “would,” “could,” “maybe,” “might,” “want to,” and “would like to,” and “needs to.”

Other cases that are captured by Level 2 include:

Content that incites or encourages destruction of or significant physical or financial damage to public property (streets, public buildings, monuments, or public spaces) or valuable personal assets (homes, businesses, cars), e.g., we need to just burn down the supreme court, someone should throw bricks through all their windows and loot their shit
Note: a difference between threats involving property in Level 3 and threats involving property in Level 2 involves explicit mention or very strongly implied threat of death or injury to people. For example, this can depend on the act (e.g., “bomb” or “blow up” versus “burn”).
Threats of self-harm, or encouraging another person to harm themselves
Calls for serious violence that can only be carried out by large-scale actors (i.e., a government or military) and are unlikely to be effected by the speaker

Level 1: Potentially violent, violent language, and neutral descriptions

This class generally captures violent language that is not clearly used as a direct threat or incitement, or isn’t severe enough to result in death or severe injury. This usually depends on context and other language used in the text. In many cases, content flagged by this class may still not be appropriate for children. Cases that are flagged as Level 1 include:

Denouncing, rejecting, discouraging, or criticizing serious violence, e.g.,it’s absolutely unacceptable that the police continue to shoot unarmed people
Suggesting or calling for a stop to serious violence or accountability for serious violence, e.g., They burned down the entire building! Come on someone needs to be held responsible for this
Objective, neutral descriptions of violent acts or events without support or encouragement, such as in the context of reporting, journalism, or recounting a story. For example, The armed suspect shot the victim 10 times, since then, she’s been receiving death threats
Phrases involving legal use of guns or gun rights, such as target shooting, hunting as a sport, reasonable self-defense, use of guns in military or policing contexts, and mentioning guns or use of guns without additional evidence of illegal use or violent intent
Phrases that refer to abortion or terminating a pregnancy as “killing babies” or other violent language
Calls for capital punishment or the death penalty in a clearly legal context (e.g., explicit references to a judge, court, or judicial system). If capital punishment or methods of execution are called for without clear reference or a judicial system or legal bodies, this would be Level 2 instead
Threats or descriptions of violence that are not severe enough to cause serious injury in the context of the text (punching, kicking, slapping, fighting, etc.). Note: Other language in the text that suggests a more serious threat can bump these up to Level 3. For example, ima kick your ass would be Level 1, but ima kick your teeth in is easily Level 3 (specific, serious).
Threats that involve damaging or destroying personal belongings like phones, bicycles, computers, clothing, etc. For example: I swear I’ll break your laptop in half if you keep playing games all day. Remember: Threats against more substantial property assets like homes, cars and business are classified as at least Level 2 because they likely involve risk of physical harm
Non-actionable statements that wish death on someone by non-violent means, e.g., he needs to get covid and die already
Violent language directed at an unspecified subject, usually indicated by “it,” “that,” “those,” “them,” etc. For example: burn it down baby!, i’m happy to let them just bleed out, kill it!. In this case, there’s not enough context for the message to be clearly understood as a threat or call for violence. But if other language clearly directs the violence towards a person or group (e.g., use of profanity to describe human subjects), the text will be classified as Level 2 or 3.
Jokes or seemingly ironic statements that mention violence of any kind

Level 0: Benign

This class captures all other content with no violent or threatening language, or language that could have a violent meaning but is used in a non-threatening way. For clarity, this includes:

Common sayings, slang, and figures of speech. As some examples: I have so much time to kill these days, They left me hanging for weeks, He finally dropped the bomb and broke up with his girlfriend, can you shoot them an email please etc.
Referring to violence in non-realistic, non-threatening, or proscribed contexts like video games and sports, e.g., bro I was on fire that game, 25 kills 2 deaths, did you watch the UFC fight last night?
Referring to accidental or unintentional injuries like car crashes, bike accidents, etc.
Mentions of deaths due to non-violent causes

Bullying Model Head

This model head generally captures insults and toxicity aimed at particular individuals or groups of people. For this purpose, language intended to frighten, harm, intimidate, mock, or cause distress is considered bullying. This can include harassing someone or threatening violence, cursing at someone, or disparaging their physical appearance, intellect, capability, personality, etc.

📘
NOTE:
Depending on the phrases used, overlap between bullying and other model heads is common, typically Hate and Violence.

Here’s a breakdown of subject matter captured by each class/severity level:

Level 3: Severe bullying and toxicity

This class captures:

Slurs and profane name calling directed at a specific person or at a group of people, including variants, abbreviations, and misspellings. This includes race-based slurs, gendered insults, slurs related to intellectual disability, etc.
Encouraging suicide or severe self-harm (e.g., cutting). This includes the abbreviation “kys” = "kill yourself" if used in this context
Severe threats of physical or sexual violence towards a person or group. To be classified as Level 3, the threat must be targeted, serious, explicitly malicious, and realistic (see description of Violence Level 3 above). Phrases need to be explicitly targeted to specific individuals or groups to be classified as Level 3. If violent language is obviously used as a joke, exaggeration, or figure of speech, or if there’s not enough context to tell the language is likely a threat, this would be Level 0.

Level 2: Bullying

This class captures messages that aren’t as egregious as Level 3, but could still be considered harmful, insulting, or threatening. Specifically, the Level 2 class flags:

Aggressive or derogatory cursing towards an individual or group. The main difference with Level 3 content and Level 2 is whether or not the individual is being called a profane word. Typically, Level 2 captures instances where profanity is used as a verb or action statement or to amplify the effect of another, non-profane statement. For example, over here, fuckface would be Level 3, whereas fuck you or you guys are fkin idiots would be Level 2
Non-profane insults directed at a specific individual or group of people. This captures both negative descriptors (e.g., "ugly," "stupid," "freak," "pathetic," "loser,") and insults that refer to people as animals or objects (e.g., "you're garbage," "that guy is such a snake," "you smell like a pig").
Encouraging non-severe self-harm, e.g., you should slap yourself
Non-severe threats of violence towards a specific individuals or groups, including encouraging the same. Generally this includes punching, slapping, kicking, etc. For these purposes, threats where the action would not cause, bleeding, severe bodily injury, loss of consciousness, or death are considered non-severe.
Attempts to silence someone or exclude them a group or conversation. For example, telling someone to shut up, telling someone to get out, fuck off, etc. are always classified as Level 2. Note: If these statements also include profane insults, slurs, etc., they would be instead classified as Level 3.
Disparaging statements where the bullying language cannot be pinned down to a particular word or phrase, but instead refer holistically to someone's physical appearance, intellect, competence, personality, etc with the intent to bully or hurt. For example: yuck no one would ever want to hang out with you

Level 1: Possible bullying, non-bullying profanity

This class generally captures content that uses potentially bullying language in non-bullying contexts. Generally, either the language used or the way that language is used is not serious enough to flag the text as Level 2 or Level 3, but still might be considered inappropriate or controversial. Specifically, this class captures:

Referring to people with profanity in a non-bullying context or profanity that's not directed at specific people, e.g., bitches be doing anything for the gram, damn he a badass mofo
Using the exact spelling "nigga" in a neutral, colloquial, or positive context [Note: ALWAYS Level 3 if used negatively or spelled with an "r"]
Certain non-profane words (e.g., "ugly," "stupid," "freak," "pathetic," "loser,") when used alone, or when there is not enough context to tell that the word is intended to describe a person
Mentions that the speaker is seriously considering suicide or self-harm. Note that similar language used as an exaggeration or to be dramatic is NOT flagged.
Threats that are not directed at a particular person or group of people, even if severe (this would be flagged as Violence instead)
Neutral, observational statements that describe past bullying, e.g., he just called her a slut and left
Neutral phrases or descriptions that may interpreted as harmful by the reader, but are not clearly intended to be hurtful, e.g., I wouldn't say you're hot, but you're definitely not ugly either
Self-deprecating statements, even if using profanity, e.g., i'm a basic bitch what can I say
Statements that negate bullying or defend someone from bullying, e.g., you are NOT ugly
Playful teasing that may include bullying language (e.g., "booo you suck," "omg stop I hate you," "lame") but is harmless in context.

Level 0: Benign

This class flags all other content with no bullying-related language, or bullying-relevant language used in a non-bullying context. To clarify, content not captured by the Level 3, Level 2, and Level 1 classes includes:

Profanity when not related to people, used to describe people, or directed at people (includes sexual language).
Associations between people and animals in neutral or positive context such as a pet name or nickname, or as a positive attribute. E.g., he was a cheetah on the track today
Bullying-relevant language like violence or self-harm used as a figure of speech, exaggeration, or common saying
Mocking, degrading, or criticizing hate groups
Neutral, non-bullying statements and observations

Drugs Model Head (Beta)

This model head identifies text content that mentions the sale, possession, and usage of drugs. Across all severity levels, this includes references to buying, selling, or trading drugs, promotions of drug usage, and descriptions of past or present drug usage (recreational or otherwise). This head also flags text that discourages or speaks in a neutral tone about drug use, such as in informational or educational contexts.

Here’s a breakdown of subject matter captured by each class/severity level:

Level 3: Trading, promotion, or encouragement of drugs

This class flags content that describes the acquisition of drugs or text that explicitly promotes, advertises, or encourages drug use. Content that is severe enough to be flagged as Level 3 includes:

Phrases that describe buying, selling, or trading of drugs
Phrases asking for information regarding the acquisition of drugs
Phrases advertising drugs or drug use, such as I’m selling an eighth of weed for 30 bucks or You need a plug?
Phrases explicitly promoting or encouraging drug use to someone else, such as You should try [drug name]

Level 2: Discussions of recreational drug use

This class flags references to past drug acquisition or use as well as descriptions of recreational use that do not promote drugs to others. Specifically, the Level 2 class flags:

Phrases that describe recreational drug use but do not do any of the following:
- Promote or encourage other people's drug use
- Describe drug use related to rehab or recovery
- Reference the intention of buying, selling, or trading drugs
Phrases that discuss past drug use and events without promoting drug use
Admissions of past experiences buying, selling, trading drugs
Phrases that ask questions that imply intention of drug use
Phrases that are drug-specific and likely used in a drug context
Phrase is asking questions with implied intention of drug use, e.g., Do you wanna smoke weed? or Do you have any weed on you?

Level 1: Neutral discussions or discouragement of drugs

Generally, this class flags language around drugs that is neutral or informational, discouraging, or ambiguous in meaning. Generally, Level 1 flags:

Phrases that include neutral, educational, or informational descriptions of drug use, such as news articles or medical contexts
A 3rd person describing a drug-related event in a neutral tone with no intention of wanting to engage in the drug-related behavior
Phrases discouraging or negating drug use
Phrase is a neutral or informational question that does not imply the speaker wants to engage in drug use
Text includes a word or phrase that has dual-intention (two meanings) and is in an unclear context, e.g., coke or smoke

Level 0: No mention of drugs or drug use

This class includes content with no mention of drugs, as well as content that uses a drug-related word or phrase in a non-drug content. To be clear, the following would not be flagged by Level 1, Level 2, or Level 3:

Phrase does not mention drugs based on the context
Phrase contains a word with dual-intention (two meanings) but is more likely to be indicating normal usage of the word based on the context, e.g., You need to blow on it first, it's really hot or Was the coke cold enough? I just put it in the fridge
Phrase is drug-specific but is clearly being used in non-drug context, e.g., Her shoes are dope or What's the addy to the party?

Weapons Model Head (Beta)

This model head identifies text content that mentions the sale, possession, construction, and usage of weapons. Across all severity levels, this includes references to buying, selling, trading, and constructing weapons as well as promotions or general descriptions of weapon usage. This head also flags text that discourages or speaks in a neutral tone about the use of weapons, such as in informational or educational contexts.

For the purposes of this head, non-powder and toy guns such as BB guns, airsoft guns, paintball guns, air guns, and blow guns are not considered to be firearms.

Here’s a breakdown of subject matter captured by each class/severity level:

Level 3: Buying, selling, trading, and constructing firearms and bombs

This class flags content that describes the acquisition, construction, possession, or promotion of explosives as well as the acquisition and construction of firearms (promotion of firearm usage falls under Level 2, possession of firearms falls under Level 1).

Content that is severe enough to be flagged as Level 3 includes:

Text describing, promoting, or encouraging the buying, selling, or trading of any of the following:
- Explosives
- Firearms and firearm accessories (ammunition, scopes, etc.)
Text asking for information regarding the acquisition of weapons of any kind
Text describing the making or modification of explosives
Text describing the making or modification of firearms
Text that clearly describes the possession of explosives
Text that explicitly promotes or encourages the use of explosives

Level 2: Buying, selling, and trading of non-explosive weapons

This class captures references to the acquisition, construction, and usage of non-explosive weapons. This includes:

Text that describes the buying, selling, or trading of non-explosive weapons, including:
- Machetes, swords, switchblades, and knives
- Flamethrowers and pyrotechnic devices
- Spears and other items
- Pepper spray and mace
- Batons and nun-chucks
- Stun guns and other taser-like devices
Text that describes the making or modification of the non-explosive weapons listed above (i.e., How to make a machete step by step)
Text that explicitly promotes or encourages the use of the non-explosive weapons listed above
Text that explicitly promotes or encourages the use of firearms or firearm parts
Text that describes the usage of non-explosive weapons, including firearms

Level 1: Neutral mentions of all weapons

Generally, this class flags references to the possession of non-explosive weapons, gun safety, or sport-based gun use (i.e., skeet shooting or hunting). Mentions of any weapon that are neutral, informational, or discouraging in tone fall under this class as well, as are text phrases that include clear exaggeration or humorous language involving weapon use and thus do not constitute any sort of threat.

Cases that are flagged as Level 1 include:

Text phrase includes neutral, educational, or informational descriptions of any weapon (i.e., Bomb threat alert: local school shuts down)
Text phrase is a 3rd person description of a weapon-related event with no expressed desire for weapon use (i.e., I saw him with a knife)
Text phrase mentions weapons as part of an exaggeration or joke (i.e., Can you call in a bomb threat at my job tomorrow I don’t feel like going in?)
Text that discourages weapon use (i.e., Please don't shoot your gun at me, We should all work together to stop bombs from being acquired)
Descriptions of the possession of non-explosive weapons, including firearms
Text that mentions weapons in an unclear context, where words like “bomb” or “shoot” could hold different meanings (i.e., Can you still bomb with your broken finger, You should have pulled the trigger the first time)
Mentions of the Second Amendment
Descriptions of gun usage in a sport context, such as in skeet shooting or hunting
Text promoting and encouraging gun safety
Text promoting businesses that legally sell weapons, such as gun stores (i.e., Come in to Benny's Gun Shop!)

Level 0: No mentions of weapons

This class captures all other content with no mention of weapons, or text that uses language associated with weapons in a non-weapon context. For clarity, this includes:

Text with no mention of weapons
Content that uses words like shoot, bomb, and other weapon-related language in a non-weapon context (i.e., He shot that ball right in the net , I think that tastes so bomb )

Child Exploitation Model Head

This class flags content that mentions or explicitly alludes to child sexual exploitation, including child pornography (both soliciting and distributing material), child sexual abuse, and soliciting or descriptions of sexual activity with or by children under the age of 18.

Level 3: Child Exploitation

Specifically this includes:

Messages that solicit, advertise, or attempt to distribute child pornography. This includes the acronym “CP” if used in the context of sending or receiving links, or alongside other sexual language. References to trading links (“STR”, “S2R” = send to receive), buying links, and distribution platforms (e.g., Dropbox, MEGA) are also flagged if context indicates they are likely to refer to child pornography (e.g., also mentions kids, teens, CP, or age ranges below 18)
Messages that solicit or describe sexual activity with underaged individuals or child sexual abuse of any kind. This includes sexualized touching, and non-touch abuse like stripping, voyeurism, photographing pornography, etc.
Users that are identifiably underaged (e.g., I’m 14 looking for X) soliciting underaged pornography or sex. This also applies to identifiably underaged users offering to distribute child pornography (e.g., nude photos of themselves or friends)
Sexual roleplay or fantasies involving underaged children

Level 0: No Child Exploitation

All other content that does not contain the above is flagged as Level 0. This includes:

Cases where there is not enough context to conclude that the message is referring to child sexual exploitation or child pornography. For example, Looking to buy Mega links with no other language associating “Mega links” with child pornography is not enough to trigger a Level 3 classification. Similarly, descriptors like “young” are not sufficient without additional context to infer that “young” means under 18.
Sexual content or mentions of pornography involving adults. In the absence of specific language about age, children, teens, etc., we presume that sexual content and messages refer to legal, of-age conduct.

Child Safety Model Head (Beta)

The purpose of this head is to provide school administrators the ability to keep children in schools safe from physical violence. It flags text content that contains a direct or indirect threat to children in a school or school-related setting, including threats of self-harm, violence towards other people, violence towards property, possession of weapons, and reports or safety complaints about other people.

Level 3: Threat to Child Safety

This class flags threats of violence that are located at a school, occur during a school-related activity, or have no specified location and could take place at a school. This includes the following types of violence:

Content that discusses self harm, mentions the intent to commit self harm, or encourages self harm
Content that mentions violence towards others. This includes:
- Severe physical violence or sexual violence (e.g., i gonna slice you with this blade in the bathroom)
- Possible violence (e.g., we could kill her after prom)
- Calls for violence (e.g., i will punch you, kick that fucking bitch)
- Mentions of violence in self-defense (e.g., if you try to rob me i will slit your throat)
Content that mentions destruction of property (e.g., i will burn the building)
Claims that someone possesses a weapon
Expressions of intent to possess and/or make a weapon
Reports of violent threats or acts by other people (e.g., yesterday there was a kid threatening to shoot up the school)
Mentions or threats of robbery or kidnapping
Any text that mentions school shootings, school violence, or violence against children

Level 0: No Threat to Child Safety

All content that does not contain a school-related threat to child safety is categorized as level 0. This includes:

Violent threats made about politicians, celebrities, professional athletes, or soldiers/armies in global conflicts
Content that is purely sexual and does not contain any mention of sexual violence
Sexual content where violence is mentioned in a kink context (e.g., I will suck your dick until your dick is bleeding)
Text that mentions violence in a purely informational context, such as news reporting, product descriptions, etc.
Text that contains a general discussion of violence with no intent to commit a violent act (e.g., Can you spot the sniper?)
Discussions of violence that has already happened
Commentary on violent situations that contains no intent or will to harm (e.g., He might be stabbed dead on a bar somewhere)
Violence mentioned as part of hypothetical situations
Negations of violence (e.g., I won’t kms that’s emo)
Figures of speech involving violence (e.g., i would kick your ass in softball, I want to die holding her hands ufff ufff, I hope you choke 💗)
Mentions of violence against animals
Mentions of violence that takes place in video games or plays

Self Harm Model Head (Beta)

This head flags content related to promoting, planning, or carrying out self harm. For the purposes of this head, we define self harm as deliberate injury to oneself, including suicide, nonsuicidal self harm (cutting, burning, etc.), and eating disorders.

Level 3: Mentions of Self Harm

This class flags mentions of committing, inciting, or encouraging self harm and instructions for or coordination of self harm. These mentions must not be ambiguous. Text that lacks enough context to determine whether or not it is referring to self harm or suicide (such as I’m giving up or goodbye) will not be flagged. The following is considered to be level 3:

Someone directly stating that they will commit suicide or self harm
Someone telling someone else that they should commit suicide or self harm
Promoting suicide or self harm by discussing positive effects or outcomes of it
Descriptions of how to commit suicide or self harm
The use of synonyms and slang terms for suicide, such as toaster bath, or mentions of ritual suicide such as seppuku and hara-kiri
Promotion or glorification of eating disorders

Level 0: No Mentions of Self Harm

All content that does not contain references to self harm or mentions of self harm is categorized as level 0. Content that describes self harm in a neutral context (i.e. without mentions or promotions of future action) or reference self harm in a sarcastic or exaggerated manner will also fall into this category. Overall, level 0 includes:

Text that doesn’t include any reference to suicide or self harm
Neutral descriptions of suicide or self harm that do not discuss or incite any future action, such as mentions of past self harm or news headlines involving self harm (e.g., Gasoline bombs thrown at UK immigration center, suspect commits suicide, Last year I made an attempt, thank god for my friends and family…)
Denouncements of suicide or self harm
Mentions of harm that is not self-inflicted, including violent threats to others. This will instead be flagged by the Violence head.
Mentions of suicide or self harm that do not discuss or incite any future action, even if they are insensitive (e.g., you made jokes about slitting wrists when yk damn well i struggle)
Joking or casual mentions of self harm to express frustration, embarrassment, or distaste (e.g., The garbage smells so fucking bad i want to off myself, I got a 49 what the actual fuck I though i did good kms)
Questions about whether someone wants to commit suicide or self harm (without promoting it)
Text that could be referencing suicide or self harm, but lacks the context to be certain (e.g., Goodbye everyone..., Agh. I'm done. I can't do this anymore)

Promotions Model Head

This class captures content that advertises or promotes a product, service, account, or event when the text also includes a suggested action (e.g., reposting, donating) or an attempt to redirect to another application or platform (e.g., links, contact information).

Level 3: Promotion

Types of messages flagged by the promotions class include:

Messages that advertise a giveaway or free service (or spam and phishing attempts disguised as a giveaway or free service), e.g., $5 special on custom videos on my site: https://example.com, Feeling depressed? call our free hotline on (000)-000-0000
Requests for follows or subscriptions, e.g.,Want to learn more about Bitcoin? Follow me on Twitter for more awesome content!, I'm on twitter now! Find me at https://example.xyz
Soliciting donations, e.g., Send $5 to my Venmo for a chance to win custom-made stuff! https://t.co/CO6RT
Promoting events and activities with a call to action (e.g., to repost, attend, meet, or join). For example, Please Repost! Join us at Decker Plaza for the memorial at 10:00pm

Level 0: No Promotion

All other messages that either do not promote/advertise a product, service, etc. or don't attempt to redirect to another platform or prompt the reader to do something will be classified as Level 0 (no promotion). For clarity, this includes:

Mentioning follower or subscriber count on a social platform or website without a request to follow/subscribe, e.g., I have 10K followers on Twitter
Expressing gratitude for past follows, subscriptions, or donations, e.g., Thanks so much for subscribing to my channel!
Mentioning products, services, accounts, or events in casual conversation, e.g., I don't know dude, if you don’t wanna suck at cooking just watch Gordon Ramsay’s youtube videos
Jokes, sarcasm, or ironic statements, e.g., For every 10000$ you donate to us, 1$ will go to the homeless people! if you ask where the rest of the money is going? then you are just a right-wing bigot!

Redirection Model Head

This class captures content that includes any type of call to action or encouragement for the reader to go to a specific social media platform, website, or app.

Level 3: Redirection

Types of messages flagged by the redirections class include:

Messages are clearly redirecting users to a specific platform, e.g., Click here to download the Facebook app, Click here for my Skype, or Message me on WhatsApp
Messages that encourage someone to use a specific platform, e.g., Download TikTok it’s so fun, Why don’t you have Snapchat :(, or Omg get BeReal I’ll add you
Messages prompting an action to be taken on a specific platform, e.g., Message me on LinkedIn, Pay me on Venmo, or Follow me on Insta
Links to other platforms, even when accompanied by neutral text, e.g. Good morning! https://open.spotify.com/track/4cOdK2wGLETKBW3PvgPWqT?si=218e37fdc7eb4c5e
Messages that contain the name of a platform and a username for that platform, e.g., Any musky bros/dads wanna chat? Kik: shwimppasta, SC: shewwon237, or IG @boris_0664

Level 0: No Redirection

All other messages that don't attempt to redirect to or encourage the use of another platform. For clarity, this includes:

Asking someone to send their social media information, e.g., Hey babes, when you have a chance, send me your Skype ID 👀
Redirections to communicate via email, text, and other methods that are not social media platforms, websites, or communication apps, e.g., Call me at 311-114-873636 or You should email him
Messages that aim to redirect the reader but do not mention a specific platform, e.g., Follow me cutie! or Please subscribe to my channel
Mentions of platforms that have no redirect or call to action, e.g., Yeah I saw that on TikTok, I added her on ig, or I think Telegram is more user-friendly than WhatsApp
Messages that discuss past actions on another platform, e.g., My Twitter got banned or Did you find my Instagram?
Messages that negatively mention another platform, e.g., I hate Twitter but I use it anyway or I’ll never join Snapchat

Gibberish Model Head

This model head flags keyboard spam and messages that have no comprehensible meaning.

Level 3: Gibberish

This class flags messages where the entire message is incomprehensible. Examples would be strings along the lines of grljwbrg, dfoibhnlfadknbsdfg.

Level 0: No Gibberish

Messages with that convey an intelligible meaning through words and phrases are not classified as gibberish, including acronyms, emojis, character replacements, and non-Latin text. For clarity, the following counts as a word or phrase:

Any string that contains an English word with four letters or more, even if part of a larger string, e.g., qpwelcome-1po (avoids nearly all user names)
Recognizable acronyms and slang (e.g., lmfao, stg)
Intentional pattern repetition of short base words (e.g., "hahahahaha" and foreign language variants, "pewpewpewpew", "lololololol"). If the base unit being repeated is not a word (e.g., "asd"), then it will be flagged as gibberish
Elongated words or acronyms that use extra letters (e.g., "ooooooooof", "lmaooooooo", "wtffff", "no waaaaay")
Emojis and stylized text
Non-latin text (e.g., Arabic script, Japanese characters)
Brand names
Text that contains numbers only

Phone Number Model Head

This model head detects phone numbers in message strings, including international formats.

Level 3: Phone Number

This class flags phone numbers using context and syntax to differentiate between other strings of numbers.

Level 0: No Phone Number

All other messages that do not contain a number that is indicated to be or recognizable as a phone number is classified as Level 0: no phone number.

Spam Model Head

Level 3: Spam

This class flags text that contains a link intended to redirect other users to a different site or platform. This includes links and link shorteners, email addresses, and phone numbers. Links to popular and reputable websites, such as news organizations and publications, YouTube, large social platforms, Wikipedia, etc. are not flagged as spam.

Level 0: No Spam

This class is flagged when the text does not contain such a link or includes a link to a popular and reputable website. Other types of messages typically associated with spam content may be flagged by the Promotions model (e.g., advertising a service, platform, or account) or the Gibberish model (unintelligible messages).

Minor Presence (Explicit) Model Head (Beta)

This model head detects if a message definitively mentions a minor. We define a minor as anyone under the age of 18.

Level 3: Minor Explicitly Present

This class flags text that contains explicit mention of a minor. These mentions cannot be ambiguous. For clarity, the following would be considered Level 3:

Mentions of preteens
Declarations of ages under 18 (e.g. “I am under 16”)
Mentions of babies, but not when used outside of its literal meaning (e.g. as a term of endearment)

Level 0: No Minor Explicitly Present

This class is flagged when the text does not contain explicit mention of a minor.

Minor Presence (Implicit) Model Head (Beta)

This model head detects if a message likely mentions a minor based on the message’s context. We define a minor as anyone under the age of 18.

Level 3: Minor Implicitly Present

This class flags text that contains implicit mention of a minor. A mention is considered implicit if there are likely minors present, based on the sentence’s context or word choices. For clarity, the following would be considered Level 3:

Adjectives often used to describe children (e.g. little, young, small), but only when used in combination with nouns that often refer to children (e.g. “little boy” or “young girl”)
Mentions of teenagers
Mentions of school (with the exception of college or university)

Level 0: No Minor Implicitly Present

This class is flagged when the text does not contain implicit mention of a minor.

Sexual Description Model Head (Beta)

This model head detects if a message describes sexual imagery. For a description to be considered sexual, it must either imply or explicitly describe nudity and/or sexual acts. This class is particularly useful for catching generative AI prompts intended to create sexual images.

Level 3: Sexual Description

This class flags text that contains sexual imagery. For clarity, the following would be considered Level 3:

Partial or full nudity, explicitly described or otherwise implied
Suggestive imagery
Explicit or implied sexual acts
Sexual depictions of non-human objects or animals

Level 0: No Sexual Description

This class is flagged when the text does not contain descriptions of sexual imagery.

Violent Description Model Head (Beta)

This model head detects if a message describes violent imagery. For a description to be considered violent, it must either suggest or explicitly describe harm being done to a living being. This includes imagery of death, near-death, or gore. This class is especially useful for catching generative AI prompts intended to create violent images.

Level 3: Violent Description

This class flags text that contains violent imagery. For clarity, the following would be considered Level 3:

Weapons being used against a person
Severe or gratuitous violence, typically resulting in death or near-death
Severe sexual violence
Gore or excessive blood

Level 0: No Violent Description

This class is flagged when the text does not contain descriptions of violent imagery.

Self Harm Intent Model Head (Beta)

This model head detects if a message implies or encourages any form of deliberate injury to one's self. This model head covers a broader range of content than the existing Self Harm Head.

Note: All text strings flagged as Self Harm Intent are flagged as Self Harm, but not all text strings flagged as Self Harm are flagged as Self Harm Intent.

Level 3: Self Harm Intent

This class flags text that contains implications or encouragement of self-harm. For clarity, the following would be considered Level 3:

Implicit and explicit mentions of committing, inciting, or encouraging self harm and instructions for or coordination of self harm
Researching ways to commit self-harm or suicide
Mentions of overdosing in a drug-related context, or if it is unclear that the surrounding context is non-drug-related
Negating, i.e., avoiding self-harm, depression, suicide, or anxiety
Goodbye messages or letters written by individuals intending to commit suicide

Level 0: No Self Harm Intent

This class is flagged when the text does not contain implications or encouragement of self-harm.

Overview

📘NOTE:

General Notes

Sexual Model Head

Level 3: Sexually Explicit

Level 2: Sexual

Level 1: Potentially Sexual

📘USAGE:

Level 0: Non-Sexual

Hate Model Head

🚧NOTE:

Level 3: Hate Speech

Level 2: Hateful

Level 1: Controversial

Level 0: Not hateful or controversial

Violence Model Head

Level 3: Serious Threats

🚧NOTE:

Level 2: Incitement

Level 1: Potentially violent, violent language, and neutral descriptions

Level 0: Benign

Bullying Model Head

📘NOTE:

Level 3: Severe bullying and toxicity

Level 2: Bullying

Level 1: Possible bullying, non-bullying profanity

Level 0: Benign

Drugs Model Head (Beta)

Level 3: Trading, promotion, or encouragement of drugs

Level 2: Discussions of recreational drug use

Level 1: Neutral discussions or discouragement of drugs

Level 0: No mention of drugs or drug use

Weapons Model Head (Beta)

Level 3: Buying, selling, trading, and constructing firearms and bombs

Level 2: Buying, selling, and trading of non-explosive weapons

Level 1: Neutral mentions of all weapons

Level 0: No mentions of weapons

Child Exploitation Model Head

Level 3: Child Exploitation

Level 0: No Child Exploitation

Child Safety Model Head (Beta)

Level 3: Threat to Child Safety

Level 0: No Threat to Child Safety

Self Harm Model Head (Beta)

Level 3: Mentions of Self Harm

Level 0: No Mentions of Self Harm

Promotions Model Head

Level 3: Promotion

Level 0: No Promotion

Redirection Model Head

Level 3: Redirection

Level 0: No Redirection

Gibberish Model Head

Level 3: Gibberish

Level 0: No Gibberish

Phone Number Model Head

Level 3: Phone Number

Level 0: No Phone Number

Spam Model Head

Level 3: Spam

Level 0: No Spam

Minor Presence (Explicit) Model Head (Beta)

Level 3: Minor Explicitly Present

Level 0: No Minor Explicitly Present

Minor Presence (Implicit) Model Head (Beta)

Level 3: Minor Implicitly Present

Level 0: No Minor Implicitly Present

Sexual Description Model Head (Beta)

Level 3: Sexual Description

Level 0: No Sexual Description

Violent Description Model Head (Beta)

Level 3: Violent Description

Level 0: No Violent Description

Self Harm Intent Model Head (Beta)

Level 3: Self Harm Intent

Level 0: No Self Harm Intent

📘
NOTE:

📘
USAGE:

🚧
NOTE:

🚧
NOTE:

📘
NOTE: