Hive’s Long Caption API generates natural-language descriptions for images with an adjustable length limit of up to 512 tokens. For every image input, the model outputs a short text string that describes what is shown in that image. This model also accepts questions about the image as an optional input, and the answer to that question will be included in the model response (i.e., what color is the cat? or how many cats are in the image?).

The output of our Long Caption API contains the image's caption as a text string in the response field. If a question about the image was included in the API call, the response field instead contains the answer to that question. In other words, responses to API calls that included a question do not contain a caption, only the question and answer.

An example JSON response with no question and no optional max_tokens limit is shown below:

{
      "status": {
        "code": "0",
        "message": "SUCCESS"
      },
      "_version": 2,
      "response": {
        "response": "The image features a young boy standing in a field of tall grass, with his arms outstretched. He appears to be enjoying the open space and the surrounding environment. The boy is wearing an orange and white striped shirt, which adds a pop of color to the scene. The field is vast, with the grass reaching up to the boy's waist, creating a sense of freedom and adventure. The boy's outstretched arms suggest that he is embracing the moment and the beauty of the natural surroundings."
      }
    }
  ]
}

Here is an example JSON response for the same image with a max_tokens limit of 10:

{
  "status": [
    {
      "status": {
        "code": "0",
        "message": "SUCCESS"
      },
      "_version": 2,
      "response": {
        "response": "The image features a young boy standing in a field"
      }
    }
  ]
}

An example JSON response to the question What should I do when I see this sign? for an image of a stop sign:

{
  "status": [
    {
      "status": {
        "code": "0",
        "message": "SUCCESS"
      },
      "_version": 2,
      "response": {
        "question": "what sign is in this picture other than the stop sign?",
        "response": "In addition to the stop sign, there is a blue sign with a white arrow pointing to the right in the picture."
      }
    }
  ]
}