Long Captions

Overview

Hive’s Long Caption API generates natural-language descriptions for images with an adjustable length limit of up to 512 tokens. For every image input, the model outputs a text string that describes what is shown in that image. This model also accepts questions about the image as an optional input, and the answer to that question will be included in the model response (i.e., what color is the cat? or how many cats are in the image?).

Request Format

The request format for this API includes the question field, which is an optional field to ask a question about the input image. This question must be less than 1024 characters in length, or else the task will fail. To submit a request with no question, either leave the field contents blank or leave the field out of the request entirely.

Unlike the Short Caption API which has a set number of maximum tokens, the Long Caption API has an optional field for you to set your own token limit. This field is called max_tokens, and can be any integer from 1-512. If you do not supply a value for max_tokens, the default maximum of 512 will be used.

In full, a request to our Image Captioning API follows the following format:

curl --location 'https://api.thehive.ai/api/v1/task/async' \
--header 'Authorization: Token <YOUR_TOKEN>' \
--form 'media=@"<YOUR_PATH>"' \
--form 'callback_url=<SAMPLE_URL>' \
--form 'options={"question":"who is in the picture?", "max_tokens":12}'

import requests

headers = {
    'Authorization': 'Token <YOUR_TOKEN>',
}

files = {
    'media': open('happy-kid.jpg', 'rb')
}

data = {
    'callback_url': '<SAMPLE_URL>',
    'options': '{"question":"how old is the kid?"}'
}

response = requests.post('https://api.thehive.ai/api/v1/task/async', headers=headers, files=files, data=data)

Response

The output of our Long Caption API contains the image's caption as a text string. If a question about the image was included in the API call, the response contains an answer instead of a caption. To see an annotated example of an API response object for this model, you can visit our API reference page.

Supported file formats

Image Formats:
jpg
png