Hive's text recognition model detects and transcribes each word in an image. It can also returned semantically grouped and ordered text blocks in their natural reading order for words that are grouped closely together.

Hive’s OCR model response format is an instantiation of the detection response format that outputs a confidence score for each detected object (word), an additional confidence score for the transcription of the characters in the detected words, as well as a separate field containing the aforementioned block text.

Word detections are returned in the bounding_poly list.

{
  "output": [
    {
      "block_text": "Hello world",
      "bounding_poly": [
        {
          "classes": [
            {
              "class": "Hello",
              "score": 0.21373027151957627
            }
          ],
          "dimensions": {
            "top": 13.345781326293945,
            "bottom": 50.08814239501953,
            "left": 27.43756866455078,
            "right": 146.09616088867188
          },
          "vertices": [
            {
              "x": 27.43756866455078,
              "y": 13.345781326293945
            },
            {
              "x": 146.09616088867188,
              "y": 13.345781326293945
            },
            {
              "x": 146.09616088867188,
              "y": 50.08814239501953
            },
            {
              "x": 27.43756866455078,
              "y": 50.08814239501953
            }
          ],
          "meta": {
            "score": 0.9999997615814209,
            "label": "text"
          }
        },
        {
          "classes": [
            {
              "class": "world",
              "score": 0.21373027151957627
            }
          ],
          "dimensions": {
            "top": 22.649433135986328,
            "bottom": 59.69649887084961,
            "left": 155.3162384033203,
            "right": 233.7376251220703
          },
          "vertices": [
            {
              "x": 155.3162384033203,
              "y": 22.649433135986328
            },
            {
              "x": 233.7376251220703,
              "y": 22.649433135986328
            },
            {
              "x": 233.7376251220703,
              "y": 59.69649887084961
            },
            {
              "x": 155.3162384033203,
              "y": 59.69649887084961
            }
          ],
          "meta": {
            "score": 0.9999957084655762,
            "label": "text"
          }
        }
      ],
      "time": 0
    }
  ]
}

The additional fields to the detection response format that are unique to the logo models are:

Name

Description

classes.0.class

Contains the transcribed characters for the detected word.

classes.0.score

Contains the confidence score for the transcribed word.

meta.score

Contains the confidence score for the detected word — irrespective of the transcription of that word.

Hive only returns their high confidence predictions to end-users. The scores are provided as additional metadata to users, but users do not need to apply any thresholds or discard predictions to obtain accurate model results.