Short Captions


Hive’s Short Caption API generates natural-language descriptions for images that are up to 32 tokens (around 64 characters) in length. For every image input, the model outputs a short text string that describes what is shown in that image. This model also accepts questions about the image as an optional input, and the answer to that question will be included in the model response (i.e., what color is the cat? or how many cats are in the image?).

Request Format

The request format for this API includes the question field, which is an optional field to ask a question about the input image. This question must be less than 32 tokens (around 64 characters) in length, or else the task will fail. To submit a request with no question, either leave the field contents blank or leave the field out of the request entirely.

In full, a request to our Image Captioning API follows the following format:

curl --location '' \
--header 'Authorization: Token <YOUR_TOKEN>' \
--form 'media=@"<YOUR_PATH>"' \
--form 'callback_url=<SAMPLE_URL>' \
--form 'options={"question":"who is in the picture?"}'
import requests

headers = {
    'Authorization': 'Token <YOUR_TOKEN>',

files = {
    'media': open('happy-kid.jpg', 'rb')

data = {
    'callback_url': '<SAMPLE_URL>',
    'options': '{"question":"how old is the kid?"}'

response ='', headers=headers, files=files, data=data)


The output of our Short Caption API contains the image's caption as a text string. If a question about the image was included in the API call, the response contains an answer instead of a caption. To see an annotated example of an API response object for this model, you can visit our API reference page.

Supported file formats

Image Formats: