AutoML for Text Classification
Build custom deep learning models for text classification tasks
Hive AutoML for text classification can be used to sort and tag text content by topic, tone, and more in order to better search text libraries. Our AutoML tool can also be used to add new custom content moderation classes not otherwise offered through our Text Moderation API.
Creating a New Training
To start building your model, head to our AutoML platform and select the
Create New Project button in the top right corner of the screen. You will be brought to a project setup page where you will be prompted to enter a project name and description. Below the project description field, click the
Upload CSV button in order to upload your training dataset.
Your data will be uploaded as a CSV file. To format it, one column should contain the text data (titled
text_data) and all other columns represent model heads (classification categories). The values within each row of any given column represent the classes (possible classifications) within that head. An example of this formatting is shown below:
|no||entertainment||Aerosmith to ‘Peace Out’ after 50 years with farewell tour|
|yes||sports||Panicked Mel Kiper realizes he left NFL draft big board in Uber|
|no||health||Your pollen allergies are overwhelming? This might be why|
|yes||entertainment||Nostalgic woman hopes ‘Barbie’ movie lives up to girlhood body dysmorphia|
|yes||crime||Police feel bad after easily solving series of riddles serial killer obviously put a lot of work into|
The uploaded data file must include text data as a column titled “text_data” or the training will result in an error.
In the example shown above, the data (
text_data) includes various news headlines. The two model heads are
topic_head. The classes within the
no, while the classes within the
In order to be processed correctly, the data file must satisfy the following requirements:
- Dataset file must be in CSV format.
- CSV must have a header row (a row of column names above the actual data).
- The header row must contain a column for the text data titled
- Each piece of data in
text_datamust be less than 1024 characters long. All characters past that will not be processed.
- Column names cannot include the
-characters. All other characters are allowed.
- Column names cannot include Python reserved words, or keywords. A full list of these words can be found here.
- Each CSV file cannot contain more than 10,000 rows.
- Each CSV file can contain up to 20 heads (column headers apart from
- For each head, there must be at least two classes. For example, the head “topic” can have the classes “sports,” “music,” and “movies” but cannot have just “sports.”
- Every class must appear in at least one row in the CSV.
For your test dataset, you can choose to either upload a separate test dataset or split off a random section of your training dataset to use instead. If you choose to upload a separate test dataset, this dataset must also satisfy all of the file requirements listed above and must contain the same heads and classes as your training dataset. If you choose to split off a section of your training dataset, you will be able to choose the percentage of that dataset that you would like to use for testing.
Evaluating Model Performance
After model training is complete, viewing the page for that project will provide various metrics in order to help you evaluate the performance of your model. At the top of the page you will be able to select the head and, if desired, the class that you would like to evaluate. Use the slider to control the confidence threshold. Once selected, you will see the precision, recall, and balanced accuracy. Below that, you can view the precision/recall curve (P/R curve) as well as a confusion matrix that shows how many predictions were correct and incorrect per class.
If you would like to retrain your model based on these metrics, go back to your AutoML Training Projects page and start a new project.
Deploying Model With Hive Data
When you’re happy with your model and ready to deploy it, select the project and click the “Create Deployment” button in the top right corner. The project’s status will shift to “Deploying.” The deployment may take a few minutes.
After the deploy status shows as “Complete,” the project page will have a “View Deployment” button in the top right corner. Clicking this button will open the project on Hive Data, where you will be able to upload tasks, view tasks, and access your API key as you would with any other Hive Data project.
To begin using your custom-built API, click on the “API Key” button on the top right of the Hive Data project page to copy your API Key. For instructions on how to submit a task via API, either synchronously or asynchronously, see our API Reference documentation.
Updated 3 days ago