Frequently Asked Questions (FAQ)

Answers to some common questions about getting started with Hive

Here are some answers to questions we frequently get from prospective or new customers. If you need more information, you can navigate through our documentation and API Reference for details. If any questions remain, feel free to reach out to [email protected] or [email protected] and we'd be happy to help.

General

Q: How do we get started with Hive? First steps?

A: For references on how to access projects, submit tasks, and view model responses and results, please visit our API reference page. You can use the sidebar to navigate through each topic. You can also check out our more complete customer walkthroughs here.

Q: Which types of media can Hive classify/moderate?

A: Our models currently support moderation of text, images, text within images (OCR), audio, and video content.

Q: How are Hive’s models priced?

A: We generally charge by volume. For example, visual models are priced based on the number of images or frames of video submitted as tasks to our API. However, each model has its own corresponding rate. Please feel free to reach out [email protected] with any questions or for more information about pricing tiers.

Q: Does Hive remove content or ban users from our platform?

A: No. The Hive API simply returns classification metadata based on our model outputs. Our results tell customers whether material across various classes (e.g., sexuality, violence, hate) is present in user-generated content based on a confidence score (for visual models) or a severity level (for text models). Our customers then use those results to inform appropriate enforcement actions - such as automatically tagging or removing NSFW content - based on their own policies and practices.

Q: How can we add a different project or different model to our suite?

A: You can contact [email protected]. We are happy to help add new models to your existing capabilities.

Hive APIs

Q: Should we submit our tasks to the API synchronously or asynchronously?

A: Synchronous submission is optimized for real-time moderation needs. If you have a continuous flow of tasks with low latency requirements and need responses quickly, you'll want to use synchronous submission - model results will be returned directly in the API response message. To get a sense of our API response times, please see the latency benchmarks in the table below.

If you are submitting large volumes of tasks concurrently, tasks that reference large files (e.g., long videos or audio files), or tasks for a project that uses Hive manual review, asynchronous submission is preferred.

Q: What kinds of latency can we expect from the Hive API?

A: Generally, it depends on the model, the size and/or type of the file submitted, download time, and whether the task was submitted synchronously or asynchronously. The table below has some benchmarking information to give you a sense of our response times.

Textp50p90
Typical Message~60ms~100ms
Max Length Input~250ms~450ms
Visualp50p90
Single Frame (Image)~250ms~400ms
60 Second Video~6 seconds~13 seconds
5 Minute Video~1 minute<1.5 minutes
1 Hour Video~3 minutes<3.5 minutes
Audiop50p90
30s clip (sync)~6 seconds~8 seconds
60s clip (sync)~8 seconds~11 seconds
1 hour clip (async)~15 minutes~20 minutes

Q: Does Hive support batch processing?

A: Currently, Hive APIs do not support batch processing, though our team may add this functionality in the future. For now, Hive APIs achieve high concurrency in task processing with generous default rate limits of 25-50 tasks per second, depending on the model. At full utilization, this yields 2-4 million completed tasks per day! While this is sufficient for most customers, we are happy to increase rate limits for customers that process or are looking to process higher volumes.

Additionally, you can upload batch submissions via the project dashboard using the Hive UI, either by directly uploading local media files (for small submissions of less than 100 tasks), or by submitting a CSV file with URLs to hosted files (for submissions of 100-50,000 tasks).

Q: What happens to our data after tasks are processed by Hive APIs?

A: Our default policy is to retain customer data for 14 days. We can increase or decrease our retention time upon request according to your needs (including not retaining your data at all). Please note that increasing retention time may incur additional charges.

Q: Can we access multiple models with a single task submission/API call?

A: Currently, we are able to combine visual moderation and contextual scene classification into a single API endpoint, enabling customers to access both model outputs with a single task. In other cases, however, classification across multiple formats requires separate tasks to be submitted to the API for each model.

Q: How can we report false positives and false negatives or give other feedback?

A: You can send us feedback directly through the API. If you have the task ID, you can send us text feedback (e.g., “false positive”) and, optionally, provide us with a link to the corresponding file or the file itself for review. You can also send feedback via email to [email protected] or [email protected].

Visual Moderation

Q: Can Hive moderate live streams?

A: Yes, our classification and moderation models support processing live RTMP and HLS streams. Our customers will typically sample and extract frames from these streams then submit these frames as tasks to our visual moderation models. Alternatively, you can capture short clips (30 seconds or less) and send those as tasks to both visual and audio moderation APIs.

Q: How does Hive moderate videos?

A: We partition videos into representative frames (one frame per second of video) and pass video frames as input to our visual moderation model. The API will return a model response object with an analysis of each frame to the API endpoint. Video can also be submitted as a separate task to our audio moderation model via the API.

Q: What is the video sampling rate? Is the sampling rate adjustable?

A: Our default, recommended sampling rate for video content is one frame per second (1 FPS). While we can adjust the sampling rate to any desired rate, we find that higher sampling rates do not necessarily improve predictions from our models. Additionally, higher sampling rates may result in higher latencies and incur additional costs.

Q: How should we interpret and use Hive’s scores for visual moderation?

A: Hive’s visual model incorporates a set of submodels, called model heads, configured to identify different types of sensitive subject matter (called classes). Each model head is specific to a certain type of subject matter. Extreme examples include nudity and graphic sexual content, gore, and hate symbols. But other model heads capture more benign imagery such as guns and knives, swimwear, and smoking. A full list of model heads corresponding to each type of content we classify as well as descriptions is available here.

The visual model response includes predictions from each model head as a confidence score for each class that correlates with our certainty that class is reflected in the content. This gives you the flexibility to design moderation logic utilizing scores pulled from the API endpoint based on which types of content are or are not allowed on your platform (e.g., flagging content that scores highly in undesirable classes).

Most of our customers choose to flag, restrict, or remove user-generated content that scores 0.9 or higher in NSFW classes, and will suspend or ban repeat offenders or users that submit particularly graphic content. You can also choose to simply tag content according to our classification results, restrict content to certain classes of users (e.g., age 18+), or take other actions based on your policies.

For a more complete description as well as code samples, please see our Visual Moderation API Guide

Q: What thresholds does Hive recommend for visual moderation?

A: It depends on the model head, your datasets, and your own policies and risk tolerance. For visual models, we recommend a starting threshold of 0.9 for new projects. You can then adjust your threshold confidence score based on model performance for a natural distribution of your data. Alternatively, recall-focused customers may want to compare the clean class (i.e., not_x) to a low threshold like 0.1.

We are also available to run evaluations with your data in order to recommend thresholds, and you can reach out to your point of contact at Hive or our API team for more information on best practices.

Q: Where can we view processed video and image results?

A: You can view the results of submitted tasks directly in the UI by selecting 'View data' in the project dashboard. Then, hover over the desired task and select 'View Task Details'. This page will include details about your task (i.e. Task ID, Callback Metadata, Completed On, etc.). To view the results of the task, you can then click on the 'Results' tab to see predictions on the task as well as labeled input images.

Text-Based Moderation

Q: Can we increase the character limit for text moderation?

A: No. For now, text classification/moderation tasks are limited to 1024 character submissions, as longer submissions may introduce bias into model outputs. You can, however, split longer text content into segments (e.g., based on punctuation or spacing) and submit each segment as related tasks to the Hive API.

Q: What thresholds does Hive recommend for text/audio moderation?

A: To decide whether to moderate at level 1, 2, or 3, we strongly encourage you to consult the descriptive examples of each severity level listed in our text moderation documentation. Generally, text that scores a 1 may reference a sensitive topic but may be benign. Deciding whether to moderate at level 2 or level 3 will depend on your community guidelines and sensitivity around controversial topics. As a rough example, text that includes slurs or hate speech would be classified as a 3, while text that simply references a negative stereotype would be classified as a 2. You are also free to moderate different classes at different thresholds.

If you need help determining which threshold to use, feel free to contact [email protected] or [email protected] with more information about your platform and your needs.