Copyright Search

Overview

Hive’s Copyright Search API identifies copies and variants of 3rd party IP with a comprehensive search index containing hundreds of thousands of hours of movies, TV shows, and broadcasts. For each query, the model returns any matches along with corresponding similarity scores. The response also includes valuable metadata such as IMDB ID, content type (movie or TV show), title, relevant timestamps, and season and episode numbers (if applicable). This API enables digital platforms to automatically flag content to avoid hosting copyright-protected media.

Upon entering a query, the Copyright Search API compares that image or video against our entire search index of copyrighted material. These pair-wise comparisons are conducted using the same image similarity model that underlies our other Intelligent Search Suite products.

Our vast search copyright index includes movies, TV shows, and broadcasts (including international content). We're working hard to build in support for new content types such as sports media, which we plan to add in the near future.

Image Similarity Model

Copyright Search will detect both duplicates and modified versions. This includes manual image manipulations like rotations and text overlay, as well as more subtle augmentations such as introduction of noise, filters, and other pixel-level changes.

Hive's image similarity model will generate a similarity score – normalized between 0 and 1 – between a query image and any matches. A similarity score of 1.0 indicates an exact match between two images, while lower values indicate that the query image has been modified to some extent.

How Our Index Gets Updated

In order to keep our search index up to date, we are continually adding new content. We sweep for titles once a month and update our index accordingly. We also add additional titles by request — if there are any titles missing that you would like to be included, contact your Hive sales rep and we will ensure that they get added to our search index.

Supported File Types

Image Formats:
jpg, png, webp, gif

Video Formats:
mp4, webm, avi, flv, mkv, mpg, wmv, mov

Response Fields

The response object, task.status.output will include the following fields:

  • task_id : a unique identifier generated and returned for the matching image/video when it was added to the index
  • metadata: identifying information for the piece of media
    - imdb_id : the unique ID of the piece of media in IMDB's catalogue, if applicable
    - type : type of content (movie or TV show)
    - name : the title of the media, e.g. lord_of_the_rings
    - season : the season number of the piece of media, if applicable
    - episode : the episode number of the media within that season, if applicable
  • matches : a list enumerating the timestamps that were visually similar between the query file and the matching index file. Each item in the list includes three fields: query_timestamp, matching_timestamp, and similarity_score.

A response with query_timestamp: 0, matching_timestamp: 10, similarity_score: 0.9 indicates that the 0th second of the query video matched with the 10th second of the matched video with a similarity score of 0.9.

📘

Note that within a single match, a query_timestamp can only show up once and that we only return matches where the similarity_score is above a certain threshold.

For each query we return up to 5 videos/images that we deem to be the closest matches: the ordering and selection of these matches is based on a combination of the similarity scores of the matching frames and the total number of frames that matched between the query and matching index file.

Example API response objects are available [here] (https://docs.thehive.ai/reference/copyright-search-1).

Using the Response

We recommend customers use a combination of the similarity_score of the matching frames in conjunction with the query_timestamps that are returned. Customers have found success using the following two workflows:

  • Overall Ratio : In this scenario, customers flag content when the total number of frames in the query video with a similar_match divided by the total number of frames extracted (typically, the number of seconds in the query video) exceeds a certain threshold.
  • Continuous Time Ranges : In this scenario, customers flag content when there is an extended period of consecutive seconds from the query video that have matching video frames with frame(s) in an index video.