OCR (Text Recognition) Moderation

OCR Content Moderation

Hive offers an end-to-end API that provides content moderation for text within images by first extracting all text in the image and passing that information through our text moderation models. Our response will return the localization information of all text in the image, confidence scores across our moderated classes for the extracted text, and all profanity/personal identifiable information found in the extracted text. Possible use-cases include phone numbers in profile pictures and hateful/sexual text in memes. We currently support the following languages in our OCR moderation endpoints:

  • English
  • Spanish
  • French
  • German
  • Italian
  • Mandarin
  • Russian
  • Portuguese
  • Arabic
  • Korean
  • Japanese
  • Hindi (coming soon)

If we detect languages that are supported by our OCR models but are not explicitly supported above (e.g. Thai), we will return the detected text but all moderation classes will be 0.

Note: our OCR models are optimized for images with 150 words or fewer. If you submit images with more words than that, we recommend you split the image into multiple segments and submit them separately.

For in-depth information on our OCR technology and moderated classes, please refer to the Text Recognition (OCR) page and Text Content Moderation Classes page. To try our demo, please refer to: https://hivemoderation.com/text-moderation.