Hive Moderation 11B Vision Language Model (VLM)
hive/moderation-11b-vision-language-model
About
🔑 model key: hive/moderation-11b-vision-language-model
Our Moderation 11B Vision Language Model is built on Llama 3.2 11B Vision Instruct and enhanced with Hive's proprietary data, expanding the capabilities of our established moderation suite.
Designed to handle complex contexts and edge cases, this model combines state-of-the-art multimodal understanding with specialized moderation expertise, making it ideal for identifying and addressing nuanced content across text and images.
How to Get Started
Authentication is required to use these models. You’ll need an API Key, which can be created in the left sidebar.
Follow these steps to generate your key:
- Click ‘API Keys’ in the sidebar.
- Click ‘+’ to create a new key scoped to your organization. The same key can be used with any "Playground available" model.
⚠️ Important: Keep your API Key secure. Do not expose it in client-side environments like browsers or mobile apps.
Querying Moderation 11B Vision Language Model
Hive AI offers an OpenAI-compatible Rest API for querying LLMs and multimodal LLMs. Here are the ways to call it:
- Using the OpenAI SDK
- Directly invoking the REST API
Using this API, the model will successively generate new tokens until either the maximum number of output tokens has been reached or if the model’s end-of-sequence (EOS) token has been generated.
Note: Some fields such as top_k are supported via the REST API, but is not supported by the OpenAI SDK.
Using the OpenAI SDK
from openai import OpenAI
# Configure the client with custom base URL and API key
client = OpenAI(
base_url="https://api.thehive.ai/api/v3/", # Hive AI's endpoint
api_key="<YOUR API KEY>" # Replace with your API key
)
def get_completion(prompt, model = "hive/moderation-11b-vision-language-model"):
response = client.chat.completions.create(
model=model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{
"type": "image_url",
"image_url": {
"url": "https://d24edro6ichpbm.thehive.ai/example-images/vlm-example-image.jpeg"
}
}
]
}
],
temperature=0.7,
max_tokens=1000
)
# Extract the response content
return response.choices[0].message.content
get_completion("What's in the image?")
const OpenAI = require('openai');
let openai = new OpenAI({
apiKey: '<YOUR API KEY>',
baseURL: 'https://api.thehive.ai/api/v3/'
});
async function main() {
const completion = await openai.chat.completions.create({
model: 'hive/moderation-11b-vision-language-model',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: "What's in this image?" },
{
type: 'image_url',
image_url: {
url: 'https://dash.readme.com/project/thehiveai/v1.0/docs/llama-32-11b-vision-instruct'
}
}
]
}
],
});
console.log(completion.choices[0]?.message?.content);
}
main();
Directly invoking the REST API
All LLMs and Multimodal LLMs support this basic completions REST API.
curl --location --request POST 'https://api.thehive.ai/api/v3/chat/completions' \
--header 'authorization: Bearer <API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
"model": "hive/moderation-11b-vision-language-model",
"max_tokens": 1000,
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Can you describe what is in this photo?"
},
{
"type": "image_url",
"image_url": {
"url": "https://d24edro6ichpbm.thehive.ai/example-images/vlm-example-image.jpeg"
}
}
]
}
]
}'
After making a request, you’ll receive a JSON response with the generated images as URLs. Here’s a sample output:
{
"id": "12345-be90-11ef-8469-67890",
"object": "chat.completion",
"model": "hive/moderation-11b-vision-language-model",
"created": 1734671183762,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The image depicts a soccer ball situated on a lush green field, with a stadium in the background. The soccer ball is positioned in the foreground, featuring a white base with vibrant blue, red, and green accents. It rests on a well-manicured grass pitch, which is neatly trimmed and maintained.\n\nIn the background, a large stadium is visible, characterized by its tiered seating and metal framework. The sky above is a brilliant blue, dotted with a few wispy clouds. The overall atmosphere suggests a daytime setting, likely during a match or practice session. The presence of the soccer ball and the stadium implies that the image was taken at a sports venue, possibly during a professional or amateur game."
},
"finish_reason": "finish"
}
],
"usage": {
"prompt_tokens": 6420,
"completion_tokens": 142,
"total_tokens": 6562
}
}
Schema
Below are the definitions of possible relevant input and output fields. Some fields have default values that will be assigned if the user does not assign a value themselves.
Input
Field | Type | Definition |
---|---|---|
messages | array of objects | Required. A structured array containing the conversation history. Each object includes a role and content. |
model | string | Required. The name of the model to call. |
role | string | The role of the participant in the conversation. Must be system, user, or assistant. |
content | string OR array of objects | Your content string. If array, each object must have a type and corresponding data, as shown in the examples above. |
text | string | Referenced inside content arrays, containing the text message to be sent. |
image_url | object | Contains the image URL when type is image_url . Includes the field url . |
max_tokens | int | Limits the number of tokens in the output. Default: 2048. Range: 1 to 2048. |
temperature | float | Controls randomness in the output. Lower values make output more deterministic. Default: 0.7. Range: [0 to 1] |
top_p | float | Nucleus sampling parameter to limit the probability space of token selection. Default: 0.95. Range: [0 to 1]. |
top_k | int | Limits token sampling to the top K most probable tokens. Default: 0 [0 to 2147483647]. |
Output
Field | Type | Definition |
---|---|---|
id | string | The ID of the submitted task. |
model | string | The name of the model used. |
created | int | The timestamp (in epoch milliseconds) when the task was created. |
choices | array of objects | Contains the model’s responses. Each object includes the index, message, and finish_reason. |
usage | object | Contains token usage information for the request and response. |
Example messages
array with multiple roles
messages
array with multiple rolesWhen submitting messages to the model, you can assign different roles to your content:
- system
- user
- assistant
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are a helpful assistant that provides detailed descriptions of images and answers user queries concisely."
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Can you describe this image in detail?"
},
{
"type": "image_url",
"image_url": {
"url": "https://d24edro6ichpbm.thehive.ai/example-images/vlm-example-image.jpeg"
}
}
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "The image depicts:"
}
]
}
]
Common Errors
Each model has a default starting rate limit of 5 requests per second. You may see this error below if you submit higher than the rate limit.
To request a higher rate limit please contact [email protected].
{
"status_code": 429,
"message": "Too Many Requests"
}
A positive Organization Credit balance is required to continue using Hive Models. Once you run out of credits requests will fail with the following error.
{
"status_code":405,
"message":"Your Organization is currently paused. Please check your account balance, our terms and conditions, or contact [email protected] for more information."
}
Updated 5 days ago