Viewing and Filtering Tasks - Lucene
Lucene Query Search
Hive Dashboard provides multiple ways of searching and filtering for tasks. The Lucene tab allows a user to write custom queries not possible via the UI to search for tasks using the lucene syntax.
Currently this is limited to completed tasks within the last 30 days. Contact support if you need this extended.
Tasks are typically available to be searched about 1 minute after completion.
Lucene Syntax
Lucene Query can be used to do advanced searches with ranges and combining multiple cases that can’t be done via the Quick Search UI.
Hive follows the standard lucene syntax for search.
https://lucene.apache.org/core/2_9_4/queryparsersyntax.html
See below for examples and a list of fields which are indexed by Hive Data.
Examples
Searching for tasks with at least 10 objects for a bounding_box job:
object_count:[10 TO ?]
Searching for tasks with a class in a specific range in a predict_classification project:
class_thresholds.general_nsfw:[0.7 TO 0.8]
Searching for a video where a logo appears at least 3 times in a predict_detection project:
class_counts.bmw:[3 TO ?] AND media_length:[1 TO ?]
Searching for an image where a logo appears at exactly 3 times in a predict_detection project:
class_counts.bmw:3 AND media_length:0
Fields
Below is a list of all fields we index in hive data for completed tasks.
callback_metadata
type: string
Metadata for the task uploaded by the user.
callback_url
type: string
Metadata for the task uploaded by the user.
text_data
type: string
Metadata for the task uploaded by the user.
task_description
type: string
Metadata for the task uploaded by the user.
Also known as label_data or text.
image_url
type: string
The url the task was uploaded with.
original_filename
type: string
The filename if the file was uploaded directly from the customer UI.
media_length
type: number
The total duration of the media in seconds. If there multiple media, it is the duration of the first media. If a still image, the media_length is zero.
status
type: string
Only written for category, transcription, and predict_audio status formats. Stores the status as a string.
Note: For category with cat_allow_multiple enabled, the categories are comma separated.
upload_history_id
type: string
The CSV upload id.
object_count
type: number
Depends on the status format:
-
category: the number of categories
-
bounding_box, rotated_bounding_box, and similar spatial formats: the number of boxes
For unlisted formats, this field is not used.
Note: object_count is zero for inconclusive tasks
created_on
type: timestamp
Timestamp that the task was uploaded. Can use epoch milliseconds or formatted date string to query.
finished_on
type: timestamp
Timestamp that the task was finished. Can use epoch milliseconds or formatted date string to query.
is_inconclusive
type: boolean
Whether a task finished as inconclusive. true or false
classes
type: list of strings
A list of classes that exist within the result.
Also stores label from bounding_box, point, and bounding_box_poly formats and categories from category format.
Note: Even classes with a score of zero are indexed in this field, so it’s only useful for checking if the class exists at all.
Example structure as JSON:
"classes": [
"general_not_nsfw_not_suggestive",
"general_nsfw",
"general_suggestive"
]
Example queries:
classes:"general_nsfw”
For multiple class lookup: classes:"general_nsfw” AND classes:”general_suggestive”
class_thresholds
type: flattened
A set of thresholds for each class. Structured as a mapping from the class name to a list of thresholds.
Note: Rounds to the nearest 3 decimal places of precision.
Example structure as JSON:
"class_thresholds": {
"general_not_nsfw_not_suggestive": [0.123, 0.456],
"general_nsfw": [0.234],
"general_suggestive": [0]
}
Example queries:
For exact threshold values: class_thresholds.general_nsfw:”234”
For GTE queries: class_thresholds.general_nsfw:[0.1 TO ?]
For range queries: class_thresholds.general_nsfw:[0.1 TO 0.2]
class_counts
type: flattened
The number of occurrences of a class. Structured as a mapping from class name to the count.
Note: Even classes with a score of zero are counted.
Example structure as JSON:
"class_counts": {
"general_not_nsfw_not_suggestive": 2,
"general_nsfw": 2,
"general_suggestive": 2
}
Example queries:
For exact threshold values: class_thresholds.general_nsfw:”234”
For GTE queries: class_thresholds.general_nsfw:[0.1 TO ?]
For range queries: class_thresholds.general_nsfw:[0.1 TO 0.2]
total_passes
type: number
The number of passes the task goes through when multi_pass is enabled.
Updated about 1 year ago