Audio Moderation & Speech to Text

Overview

Hive offers a suite of audio-related products capable of moderating and extracting a wide range of information from your audio and video streams. Given a clip, our models can execute the following capabilities:

transcription: extract out all speech
speech moderation: detect undesirable speech
audio sound moderation: detect undesirable sounds
audio sound classification: identify different kinds of sounds
language classification: determine which language was spoken

Our customers currently use our products for a wide range of applications, from auditing social media videos, streams, and chat rooms, to doing brand-mention analysis on TV programs. Please read the below sections for more information on each of our products.

Speech to Text Transcription

Hive's Speech-to-Text API ingests an audio stream and returns each word that was spoken, along with a confidence score and timestamp for that word. We additionally return a fully punctuated transcript of the entire text. If you wish to use multiple languages, we also offer automatic language detection where you can pass in any audio clip and we'll identify/transcribe to the correct language automatically.

We currently support the following languages:

English
Spanish
Portuguese
French
Hindi
German
Italian

We are actively working on adding in new languages. If you are interested in a language that is not currently supported, please contact [email protected].

Audio Moderation

Hive offers a speech moderation API built on top of our speech-to-text and text moderation models. Through one endpoint, Hive Speech Moderation will return back all the information present in the Speech-to-Text transcription endpoint (i.e. a confidence score and timestamp for each detected word as a well as a punctuated transcript across all detected words), as well as scores for our supported moderation classes.

We currently can support the following languages and their corresponding moderation classes:

Language	Sexual	Violence	Hate	Bullying
English	Model	Model	Model	Model
Spanish	Model	Model	Model	Model
Portuguese	Model	Model	Model	Model
French	Model	Model	Model	Model
Hindi	Model	Model	Model	Model
German	Model	Model	Model	Model

We are actively working on adding more languages for this API suite. If you are interested in a language that is not currently supported or is not on the roadmap, please contact [email protected]. For more details about our text moderation models, please refer to the text content moderation docs page.

Sound Moderation and Classification

We are currently working on a series of audio sounds/events to help you moderate/classify your content. This sound moderation is not automatically included as part of audio moderation — it is a separate product to help you gain more insight into your content. For any questions about pricing, please contact our sales team at [email protected]

Today, we cover the following classes:

sexual noises
yelling/shouting (coming 2024 Q4)

Request Format

The request format for this API includes a field for the media being submitted, either as a local file path or as a url. For more information about submitting a task, see our API reference guides to synchronous and asynchronous submissions.

# submit a task with media with url
curl --request POST \
  --url https://api.thehive.ai/api/v2/task/sync \ # this is a sync example, see API reference for async
  --header 'accept: application/json' \
  --header 'authorization: token <API_KEY>' \
  --form 'url=http://hive-public.s3.amazonaws.com/demo_request/gun1.jpg'

# submit a task with media with local media file
 curl --request POST \
     --url https://api.thehive.ai/api/v2/task/sync \ # this is a sync example, see API reference for async
     --header 'Authorization: Token <token>' \
     --form 'media=@"<absolute/path/to/file>"'

Supported File Types

Video Formats:
mp4
webm
avi
mkv
wmv
mov

Audio Formats
flac
mp3
ogg
wav
m4a