Hive’s Image Captioning APIs generate natural-language descriptions for images. For every image input, the model outputs a short text string that describes what is shown in that image. These models also accept questions about the image as an optional input. We offer a Short Caption model, with a maximum token length of 32 for each description, and a Long Caption model, with a token limit of 512. To see example JSON responses for these models, please see their individual pages as linked above.