Snapshots
A quick guide to creating and using snapshots
Overview
A snapshot is an immutable point-in-time version of a dataset that is used to train an AutoML model. Multiple snapshots can be created from the same dataset and multiple models can be trained from the same snapshot.
Create a Snapshot
Create a snapshot by navigating to a dataset’s Snapshots tab and clicking the Create Snapshot button. The snapshot creation form requires a snapshot type, an input column, and one or more label columns (or prompt and completion columns in the case of LLMs). Additionally, there are optional snapshot split and filtering inputs.
data:image/s3,"s3://crabby-images/8f209/8f2096bc017016f87a3511aa16a65a7644d42999" alt="The `Create Snapshot` button lies at the top right of the detail page for an individual dataset."
The Create Snapshot
button lies at the top right of a dataset's Snapshots tab.
For more information on snapshot creation and snapshot types, see the Snapshot Requirements section below.
Snapshot Requirements
Snapshot Type | Description | Supported Models |
---|---|---|
Text Classification | Snapshot that can be used to train any text classification model. | 1. Text Classification v2 2. Text Moderation v2 3. DeBERTa v3 4. Longformer v1 |
Image Classification | Snapshot that can be used to train any image classification model. | 1. Image Classification v2 2. Visual Moderation v2 |
Large Language Model | Snapshot that can be used to train any large language model. | 1. LLM Instruct 8B v3 2. LLM Instruct 70B v3 |
Backup | Snapshot that can be used to restore a dataset but NOT to train a model. | -- |
Text Classification
Column | Description | Requirements | Example |
---|---|---|---|
Text Input* | Text data that will be classified by the model using the Labels column | 1. Must be Text column type | “The movie was one of the best I’ve seen in the past year, hands down.” |
Labels* | Class labels that the model will assign to text inputs | 1. Must have between 2 and 20 unique values 2. Max 512 tokens per row | “positive” |
Image Classification
Field | Description | Requirements | Example |
---|---|---|---|
Image Input* | Images that will be classified by the model using the Labels column | 1. Must be Image column type | https://www.link-to-example-images.com/my_dog |
Labels* | Class labels that the model will assign to image inputs | 1. Must have between 2 and 20 unique values 2. Max 512 tokens per row | "dog" |
Large Language Model
Field | Description | Requirements | Example |
---|---|---|---|
Prompt* | Text data that will be paired with the associated value in the Completion column to train the model to generate text output | 1. Must be Text column type | “What does Hive do?” |
Completion* | Data that is expected to be generated for the associated value in the Prompt column | 1. Max 4096 tokens per row | “Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.” |
System Prompt | Guidance text that will be prefixed to each Prompt to give the model additional context on the desired output. | 1. Must be Text column type 2. Max 4096 tokens per row | “Please use a clear, concise, and professional tone to answer the following question.” |
Backup
There are no validations or requirements for backup snapshots.
Updated 4 months ago