Image Captioning
Overview
Hive’s Image Captioning API generates natural-language descriptions for images. For every image input, the model outputs a short text string that describes what is shown in that image. These captions can be up to 32 tokens (around 64 characters) in length. This model also accepts questions about the image as an optional input, and the answer to that question will be included in the model response (i.e., what color is the cat?
or how many cats are in the image?
).
This API has many possible applications, one of which is generating alt text. Alt text is an HTML attribute that contains a short text description to be displayed in place of an image when that image fails to load. It is also crucial for web accessibility — screen readers and other related tools use it to describe visual content to blind and low-vision users. This API provides a quick solution for the many images across the web that are missing this key attribute.

Request Format
The request format for this API includes the question
field, which is an optional field to ask a question about the input image. This question must be less than 32 tokens (around 64 characters) in length, or else the task will fail. To submit a request with no question, either leave the field contents blank or leave the field out of the request entirely.
In full, a request to our Image Captioning API follows the following format:
curl --location 'https://api.thehive.ai/api/v1/task/async' \
--header 'Authorization: Token <YOUR_TOKEN>' \
--form 'media=@"<YOUR_PATH>"' \
--form 'callback_url=<SAMPLE_URL>' \
--form 'options={"question":"who is in the picture?"}'
import requests
headers = {
'Authorization': 'Token <YOUR_TOKEN>',
}
files = {
'media': open('happy-kid.jpg', 'rb')
}
data = {
'callback_url': '<SAMPLE_URL>',
'options': '{"question":"how old is the kid?"}'
}
response = requests.post('https://api.thehive.ai/api/v1/task/async', headers=headers, files=files, data=data)
Response
The output of our Image Captioning API contains the image's caption as a text string. If a question about the image was included in the API call, the response contains an answer instead of a caption. To see an annotated example of an API response object for this model, you can visit our API reference page.
Supported file formats
Image Formats:
jpg
png
Updated 6 months ago