Video Captioning
Overview
Hive’s Video Caption API generates natural-language descriptions for videos shorter than 30s. For every video input, the model outputs a text string that describes what is shown. The model samples 16 frames from each video and uses them to generate a unified caption that describes not only the subject and scenery, but also movements and actions. The resulting captions are long and descriptive, with a maximum size of 1024 tokens.
Request Format
This API allows you to submit videos either as binary files or as publicly available urls. Here are examples for either submission method:
# submit a task with media with url
curl --request POST \
--url https://api.thehive.ai/api/v2/task/sync \
--header 'accept: application/json' \
--header 'authorization: token <API_KEY>' \
--form 'url=http://public_url.mp4'
# submit a task with media with local media file
curl --request POST \
--url https://api.thehive.ai/api/v2/task/sync \
--header 'Authorization: Token <token>' \
--form 'media=@"<absolute/path/to/file>"'
Response
The output of our Video Caption API contains the video's caption as a text string. To see an annotated example of an API response object for this model, you can visit our API Reference
Updated 3 months ago