AutoML for Image Classification
Build custom deep learning models for image classification tasks
Hive AutoML for image classification can be used to identify various subjects, settings, and more in images and videos. These labels can be used to quickly sort and tag large image libraries as well as to create custom content moderation classes not otherwise offered through our Visual Moderation API.
Creating a New Training
To start building your model, head to our AutoML platform and select the
Create New Project button in the top right corner of the screen. You will be brought to a project setup page where you will be prompted to enter a project name and description. Below the project description field, click the
Upload CSV button in order to upload your training dataset.
Your data will be uploaded as a CSV file. To format it, one column (titled
image_url) should contain the image data in the format of URLs, and all other columns should represent model heads (classification categories). The values within each row of any given column represent the classes (possible classifications) within that head. An example of this formatting is shown below:
The uploaded data file must include image data as a column titled “image_url” or the training will result in an error.
In the example shown above, the data (
image_url) includes links to various images of chihuahuas and muffins. There is one model head called
subject_head which contains two classes:
In order to be processed correctly, the data file must satisfy the following requirements:
- Dataset file must be in CSV format.
- CSV must have a header row (a row of column names above the actual data).
- The header row must contain a column for the image data titled
- Each image linked to in
image_urlmust be in either jpg, jpeg, or png format and cannot be larger than 50MB. Any image that does not meet these requirements will be considered invalid. If more than 5% of the training images are invalid, the training will fail.
- The total size of all image data cannot exceed 100GB.
- Images used for training cannot contain more than one frame (i.e., they cannot be gifs or videos).
- Column names cannot include the
-characters. All other characters are allowed.
- Column names cannot include Python reserved words, or keywords. A full list of these words can be found here.
- Each CSV file cannot contain more than 10,000 rows.
- Each CSV file can contain up to 20 heads (column headers apart from
- For each head, there must be at least two classes. For example, the head
dog_breedcan have the classes
yorkiebut cannot have just
- For each class, there must be at least 10 examples of that class label in the training data. This is the absolute minimum; we recommend including at least 100 examples per class for optimal model performance.
- Every class must appear in at least one row in the CSV.
If any of the above are not satisfied, the training will fail and return an error.
For your test dataset, you can choose to either upload a separate test dataset or split off a random section of your training dataset to use instead. If you choose to upload a separate test dataset, this dataset must also satisfy all of the file requirements listed above and must contain the same heads and classes as your training dataset. If you choose to split off a section of your training dataset, you will be able to choose the percentage of that dataset that you would like to use for testing.
Evaluating Model Performance
After model training is complete, viewing the page for that project will provide various metrics in order to help you evaluate the performance of your model. At the top of the page you will be able to select the head and, if desired, the class that you would like to evaluate. Use the slider to control the confidence threshold. Once selected, you will see the precision, recall, and balanced accuracy. Below that, you can view the precision/recall curve (P/R curve) as well as a confusion matrix that shows how many predictions were correct and incorrect per class.
If you would like to retrain your model based on these metrics, go back to your AutoML Training Projects page and start a new project.
Deploying Model With Hive Data
When you’re happy with your model and ready to deploy it, select the project and click the “Create Deployment” button in the top right corner. The project’s status will shift to “Deploying.” The deployment may take a few minutes.
After the deploy status shows as “Complete,” the project page will have a “Deployment Details” button in below the training and deployment statuses. Clicking this button will open the project on Hive Data, where you will be able to upload tasks, view tasks, and access your API key as you would with any other Hive Data project. There will also be a button to "Undeploy" your project, if you wish to deactivate it at any point. Undeploying a model is not permanent — you can redeploy the project if you later choose to.
To begin using your custom-built API, click on the “API Key” button on the top right of the Hive Data project page to copy your API Key. For instructions on how to submit a task via API, either synchronously or asynchronously, see our API Reference documentation.
Image classification APIs built through our AutoML tool accept the following image formats:
Additionally, we will support videos that are less than 90 seconds long in the following formats:
Please note that not all of these data formats are accepted for use during model training, only as tasks once the model is successfully deployed.
Updated 3 days ago