Image Captioning


Hive’s Image Captioning API generates natural-language descriptions for images. For every image input, the model outputs a short text string that describes what is shown in that image. These captions can be up to 32 tokens (around 64 characters) in length. This model also accepts questions about the image as an optional input, and the answer to that question will be included in the model response (i.e., what color is the cat? or how many cats are in the image?).

This API has many possible applications, one of which is generating alt text. Alt text is an HTML attribute that contains a short text description to be displayed in place of an image when that image fails to load. It is also crucial for web accessibility — screen readers and other related tools use it to describe visual content to blind and low-vision users. This API provides a quick solution for the many images across the web that are missing this key attribute.


Request Format

The request format for this API includes the question field, which is an optional field to ask a question about the input image. This question must be less than 32 tokens (around 64 characters) in length, or else the task will fail. To submit a request with no question, either leave the field contents blank or leave the field out of the request entirely.

In full, a request to our Image Captioning API follows the following format:

curl --location '' \
--header 'Authorization: Token <YOUR_TOKEN>' \
--form 'media=@"<YOUR_PATH>"' \
--form 'callback_url=<SAMPLE_URL>' \
--form 'options={"question":"who is in the picture?"}'
import requests

headers = {
    'Authorization': 'Token <YOUR_TOKEN>',

files = {
    'media': open('happy-kid.jpg', 'rb')

data = {
    'callback_url': '<SAMPLE_URL>',
    'options': '{"question":"how old is the kid?"}'

response ='', headers=headers, files=files, data=data)


The output of our Image Captioning API contains the image's caption as a text string. If a question about the image was included in the API call, the response contains an answer instead of a caption. To see an annotated example of an API response object for this model, you can visit our API reference page.

Supported file formats

Image Formats: