AutoML for Text Classification
Build custom deep learning models for text classification tasks
Overview
Hive AutoML for text classification can be used to sort and tag text content by topic, tone, and more in order to better search text libraries. Our AutoML tool can also be used to add new custom content moderation classes not otherwise offered through our Text Moderation API.
Creating a New Training
To start building your model, head to our AutoML platform and select the Create New Project
button in the top right corner of the screen. You will be brought to a project setup page where you will be prompted to enter a project name and description. Below the project description field, click the Upload CSV
button in order to upload your training dataset.

The AutoML Training Projects dashboard. The Create New Project
button sits in the top right corner.
Dataset Upload
Your data will be uploaded as a CSV file. To format it, one column should contain the text data (titled text_data
) and all other columns represent model heads (classification categories). The values within each row of any given column represent the classes (possible classifications) within that head. An example of this formatting is shown below:
satire_head | topic_head | text_data |
---|---|---|
no | entertainment | Aerosmith to ‘Peace Out’ after 50 years with farewell tour |
yes | sports | Panicked Mel Kiper realizes he left NFL draft big board in Uber |
no | health | Your pollen allergies are overwhelming? This might be why |
yes | entertainment | Nostalgic woman hopes ‘Barbie’ movie lives up to girlhood body dysmorphia |
yes | crime | Police feel bad after easily solving series of riddles serial killer obviously put a lot of work into |
The uploaded data file must include text data as a column titled “text_data” or the training will result in an error.
In the example shown above, the data (text_data
) includes various news headlines. The two model heads are satire_head
and topic_head
. The classes within the satire_head
are yes
and no
, while the classes within the topic_head
are entertainment
, sports
, health
, and crime
In order to be processed correctly, the data file must satisfy the following requirements:
- Dataset file must be in CSV format.
- CSV must have a header row (a row of column names above the actual data).
- The header row must contain a column for the text data titled
text_data
. - Each piece of data in
text_data
must be less than 1024 characters long. All characters past that will not be processed. - Column names cannot include the
|
or-
characters. All other characters are allowed. - Column names cannot include Python reserved words, or keywords. A full list of these words can be found here.
- Each CSV file cannot contain more than 10,000 rows.
- Each CSV file can contain up to 20 heads (column headers apart from
text_data
). - For each head, there must be at least two classes. For example, the head “topic” can have the classes “sports,” “music,” and “movies” but cannot have just “sports.”
- Every class must appear in at least one row in the CSV.
Test Dataset
For your test dataset, you can choose to either upload a separate test dataset or split off a random section of your training dataset to use instead. If you choose to upload a separate test dataset, this dataset must also satisfy all of the file requirements listed above and must contain the same heads and classes as your training dataset. If you choose to split off a section of your training dataset, you will be able to choose the percentage of that dataset that you would like to use for testing.
Evaluating Model Performance
After model training is complete, viewing the page for that project will provide various metrics in order to help you evaluate the performance of your model. At the top of the page you will be able to select the head and, if desired, the class that you would like to evaluate. Use the slider to control the confidence threshold. Once selected, you will see the precision, recall, and balanced accuracy. Below that, you can view the precision/recall curve (P/R curve) as well as a confusion matrix that shows how many predictions were correct and incorrect per class.

Evaluation metrics for an AutoML project as shown after training has completed.
If you would like to retrain your model based on these metrics, go back to your AutoML Training Projects page and start a new project.
Deploying Model With Hive Data
When you’re happy with your model and ready to deploy it, select the project and click the “Create Deployment” button in the top right corner. The project’s status will shift to “Deploying.” The deployment may take a few minutes.

After a a project has finished deploying, there will be a "View Deployment" button in the top right corner of the project page.
After the deploy status shows as “Complete,” the project page will have a “View Deployment” button in the top right corner. Clicking this button will open the project on Hive Data, where you will be able to upload tasks, view tasks, and access your API key as you would with any other Hive Data project.

An AutoML project as viewed in the Hive customer dashboard. The "API Key" button is on the top right.
To begin using your custom-built API, click on the “API Key” button on the top right of the Hive Data project page to copy your API Key. For instructions on how to submit a task via API, either synchronously or asynchronously, see our API Reference documentation.
Updated 3 days ago