Lucene Query Search

Hive Dashboard provides multiple ways of searching and filtering for tasks. The Lucene tab allows a user to write custom queries not possible via the UI to search for tasks using the lucene syntax.

Currently this is limited to completed tasks within the last 30 days. Contact support if you need this extended.
Tasks are typically available to be searched about 1 minute after completion.

Lucene Syntax

Lucene Query can be used to do advanced searches with ranges and combining multiple cases that can’t be done via the Quick Search UI.

Hive follows the standard lucene syntax for search.
https://lucene.apache.org/core/2_9_4/queryparsersyntax.html

See below for examples and a list of fields which are indexed by Hive Data.

Examples

Searching for tasks with at least 10 objects for a bounding_box job:
object_count:[10 TO ?]

Searching for tasks with a class in a specific range in a predict_classification project:
class_thresholds.general_nsfw:[0.7 TO 0.8]

Searching for a video where a logo appears at least 3 times in a predict_detection project:

class_counts.bmw:[3 TO ?] AND media_length:[1 TO ?]

Searching for an image where a logo appears at exactly 3 times in a predict_detection project:

class_counts.bmw:3 AND media_length:0

Fields

Below is a list of all fields we index in hive data for completed tasks.

callback_metadata

type: string

Metadata for the task uploaded by the user.

callback_url

type: string

Metadata for the task uploaded by the user.

text_data

type: string

Metadata for the task uploaded by the user.

task_description

type: string

Metadata for the task uploaded by the user.

Also known as label_data or text.

image_url

type: string

The url the task was uploaded with.

original_filename

type: string

The filename if the file was uploaded directly from the customer UI.

media_length

type: number

The total duration of the media in seconds. If there multiple media, it is the duration of the first media. If a still image, the media_length is zero.

status

type: string

Only written for category, transcription, and predict_audio status formats. Stores the status as a string.

Note: For category with cat_allow_multiple enabled, the categories are comma separated.

upload_history_id

type: string

The CSV upload id.

object_count

type: number

Depends on the status format:

category: the number of categories
bounding_box, rotated_bounding_box, and similar spatial formats: the number of boxes

For unlisted formats, this field is not used.

Note: object_count is zero for inconclusive tasks

created_on

type: timestamp

Timestamp that the task was uploaded. Can use epoch milliseconds or formatted date string to query.

finished_on

type: timestamp

Timestamp that the task was finished. Can use epoch milliseconds or formatted date string to query.

is_inconclusive

type: boolean

Whether a task finished as inconclusive. true or false

classes

type: list of strings

A list of classes that exist within the result.

Also stores label from bounding_box, point, and bounding_box_poly formats and categories from category format.

Note: Even classes with a score of zero are indexed in this field, so it’s only useful for checking if the class exists at all.

Example structure as JSON:

"classes": [
"general_not_nsfw_not_suggestive",
"general_nsfw",
"general_suggestive"
]

Example queries:

classes:"general_nsfw”

For multiple class lookup: classes:"general_nsfw” AND classes:”general_suggestive”

class_thresholds

type: flattened

A set of thresholds for each class. Structured as a mapping from the class name to a list of thresholds.

Note: Rounds to the nearest 3 decimal places of precision.

Example structure as JSON:

"class_thresholds": {
"general_not_nsfw_not_suggestive": [0.123, 0.456],
"general_nsfw": [0.234],
"general_suggestive": [0]
}

Example queries:

For exact threshold values: class_thresholds.general_nsfw:”234”

For GTE queries: class_thresholds.general_nsfw:[0.1 TO ?]

For range queries: class_thresholds.general_nsfw:[0.1 TO 0.2]

class_counts

type: flattened

The number of occurrences of a class. Structured as a mapping from class name to the count.

Note: Even classes with a score of zero are counted.

Example structure as JSON:

"class_counts": {
"general_not_nsfw_not_suggestive": 2,
"general_nsfw": 2,
"general_suggestive": 2
}

Example queries:

For exact threshold values: class_thresholds.general_nsfw:”234”

For GTE queries: class_thresholds.general_nsfw:[0.1 TO ?]

For range queries: class_thresholds.general_nsfw:[0.1 TO 0.2]

total_passes

type: number

The number of passes the task goes through when multi_pass is enabled.