Common Object Detection

Visual Detection Overview

Visual detection models localize an object of interest in an image by returning a box that bounds that object, as well as the type of that object, also referred to as the class. A detector can detect multiple objects of different classes per image. For each detection, a detector outputs a confidence score that is independent of any other detections.

The output object in Hive detection APIs lists each detected object, including:

The geometric description of the detected bounding box.
The predicted class for the detection.
For some model’s, the confidence score for the detection.

When submitting a video to be processed, Hive’s backend splits the video into frames, runs the model on each frame, then recombines the results into a combined response for the entire video. The video output for a detector is similar to a list of detection output objects, but with multiple timestamps.

Classes

wine glass
bottle
baseball glove
baseball bat
banana
backpack
apple
train
vase
umbrella
tv
truck
traffic light
toothbrush
toilet
tie
tennis racket
teddy bear
surfboard
suitcase
stop sign
spoon
skis
skateboard
sink
remote
sheep
scissors
refrigerator
potted plant
pizza
person
parking meter
oven
mouse
motorcycle
microwave
laptop
knife
kite
keyboard
hot dog
horse
handbag
hair drier
frisbee
fork
fire hydrant
donut
elephant
dog
dining table
cup
cow
couch
clock
chair
cell phone
cat
carrot
car
cake
bus
broccoli
bowl
book
boat
bird
bicycle
bench
bed
airplane
bear
giraffe
orange
sandwich
snowboard
toaster
zebra
sports ball

Supported File Types

Image Formats:
gif
jpg
png
webp

Video Formats:
mp4
webm
avi
mkv
wmv
mov