AutoML for Text Classification

Build custom deep learning models for text classification tasks


Hive AutoML for text classification can be used to sort and tag text content by topic, tone, and more in order to better search text libraries. Our AutoML tool can also be used to add new custom content moderation classes not otherwise offered through our Text Moderation API.

Creating a New Training

To start building your model, head to our AutoML platform and select the Create New Model button in the top right corner of the screen. You will be brought to a project setup page where you will be prompted to enter a project name and description. Below the model type field, add your training dataset either by clicking the Upload File button or the Select Dataset button, which allows you to choose from a list of datasets that you've already uploaded to the AutoML platform.

The AutoML Training Projects dashboard. The `Create New Model` button sits in the top right corner.

The AutoML Training Projects dashboard. The Create New Model button sits in the top right corner.

Dataset Upload

Your data will be uploaded as a CSV file. To format it, one column should contain the text data (titled text_data) and all other columns represent model heads (classification categories). The values within each row of any given column represent the classes (possible classifications) within that head. An example of this formatting is shown below:

noentertainmentAerosmith to ‘Peace Out’ after 50 years with farewell tour
yessportsPanicked Mel Kiper realizes he left NFL draft big board in Uber
nohealthYour pollen allergies are overwhelming? This might be why
yesentertainmentNostalgic woman hopes ‘Barbie’ movie lives up to girlhood body dysmorphia
yescrimePolice feel bad after easily solving series of riddles serial killer obviously put a lot of work into


The uploaded data file must include text data as a column titled “text_data” or the training will result in an error.

In the example shown above, the data (text_data) includes various news headlines. The two model heads are satire_head and topic_head. The classes within the satire_head are yes and no, while the classes within the topic_head are entertainment, sports, health, and crime

In order to be processed correctly, the data file must satisfy the following requirements:

  • Dataset file must be in CSV format.
  • CSV must use , as the delimiter. Other delimiters such as ; or | are invalid.
  • CSV must have a header row (a row of column names above the actual data).
  • The header row must contain a column for the text data titled text_data.
  • Each piece of data in text_data must be less than 1024 characters long. All characters past that will not be processed.
  • Column names cannot include the | or - characters. All other characters are allowed.
  • Column names cannot include Python reserved words, or keywords. A full list of these words can be found here.
  • Each CSV file cannot contain more than 100,000 rows.
  • Each CSV file can contain up to 20 heads (column headers apart from text_data).
  • For each head, there must be at least two classes. For example, the head “topic” can have the classes “sports,” “music,” and “movies” but cannot have just “sports.”
  • Every class must appear in at least one row in the CSV.

Test Dataset

For your test dataset, you can choose to upload a separate test dataset, select one from your previously uploaded datasets, or split off a random section of your training dataset to use instead. If you choose to upload a separate test dataset, this dataset must also satisfy all of the file requirements listed above and must contain the same heads and classes as your training dataset. If you choose to split off a section of your training dataset, you will be able to choose the percentage of that dataset that you would like to use for testing.

Evaluating Model Performance

After model training is complete, viewing the page for that project will provide various metrics in order to help you evaluate the performance of your model. At the top of the page you will be able to select the head and, if desired, the class that you would like to evaluate. Use the slider to control the confidence threshold. Once selected, you will see the precision, recall, and balanced accuracy. Below that, you can view the precision/recall curve (P/R curve) as well as a confusion matrix that shows how many predictions were correct and incorrect per class.

Evaluation metrics for an AutoML project as shown after training has completed.

Evaluation metrics for an AutoML project as shown after training has completed.

If you would like to retrain your model based on these metrics, go back to your AutoML Training Projects page and start a new project.

Deploying Model With Hive Data

When you’re happy with your model and ready to deploy it, select the project and click the “Create Deployment” button in the top right corner. The project’s status will shift to “Deploying.” The deployment may take a few minutes.

After the deploy status shows as “Complete,” you can view the deployment by clicking on the "Deployments" tab above the metrics. This will show a list of all deployments for this model.

The "Deployments" tab, showing two different model deployments and their statuses.

The "Deployments" tab, showing two different model deployments and their statuses.

To view any deployment, hit the exit symbol to the left of its name. This will open the project on Hive Data, where you will be able to upload tasks, view tasks, and access your API key as you would with any other Hive Data project. There will also be a button to "Undeploy" your project, if you wish to deactivate it at any point. Undeploying a model is not permanent — you can redeploy the project if you later choose to.

An Auto ML project as viewed in the Hive Data dashboard. The "API Key" button is on the top right.

An AutoML project as viewed in the Hive customer dashboard. The "API Key" button is on the top right.

To begin using your custom-built API, click on the “API Key” button on the top right of the Hive Data project page to copy your API Key. For instructions on how to submit a task via API, either synchronously or asynchronously, see our API Reference documentation.