Image Captioning

Overview

Hive’s Image Captioning APIs generate natural-language descriptions for images. For every image input, the model outputs a short text string that describes what is shown in that image. These models also accept questions about the image as an optional input. We offer a Short Caption model, with a maximum token length of 32 for each description, and a Long Caption model, with a token limit of 512. For more details on each of these models, please see their individual pages as linked above.

These APIs have many possible applications, one of which is generating alt text. Alt text is an HTML attribute that contains a short text description to be displayed in place of an image when that image fails to load. It is also crucial for web accessibility — screen readers and other related tools use it to describe visual content to blind and low-vision users. These APIs, particularly our Short Caption API, provide a quick solution for the many images across the web that are missing this key attribute.