Snapshots

A quick guide to creating and using snapshots

Overview

A snapshot is an immutable point-in-time version of a dataset that is used to train an AutoML model. Multiple snapshots can be created from the same dataset and multiple models can be trained from the same snapshot.

Create a Snapshot

Create a snapshot by navigating to a dataset’s Snapshots tab and clicking the Create Snapshot button. The snapshot creation form requires a snapshot type, an input column, and one or more label columns (or prompt and completion columns in the case of LLMs). Additionally, there are optional snapshot split and filtering inputs.

The `Create Snapshot` button lies at the top right of the detail page for an individual dataset.

The Create Snapshot button lies at the top right of a dataset's Snapshots tab.

For more information on snapshot creation and snapshot types, see the Snapshot Requirements section below.

Snapshot Requirements

Snapshot TypeDescriptionSupported Models
Text ClassificationSnapshot that can be used to train any text classification model.1. Text Classification v2
2. Text Moderation v2
3. DeBERTa v3
4. Longformer v1
Image ClassificationSnapshot that can be used to train any image classification model.1. Image Classification v2
2. Visual Moderation v2
Large Language ModelSnapshot that can be used to train any large language model.1. LLM Instruct 8B v3
2. LLM Instruct 70B v3
BackupSnapshot that can be used to restore a dataset but NOT to train a model.--

Text Classification

ColumnDescriptionRequirementsExample
Text Input*Text data that will be classified by the model using the Labels column1. Must be Text column type“The movie was one of the best I’ve seen in the past year, hands down.”
Labels*Class labels that the model will assign to text inputs1. Must have between 2 and 20 unique values 2. Max 512 tokens per row“positive”

Image Classification

FieldDescriptionRequirementsExample
Image Input*Images that will be classified by the model using the Labels column1. Must be Image column typehttps://www.link-to-example-images.com/my_dog
Labels*Class labels that the model will assign to image inputs1. Must have between 2 and 20 unique values 2. Max 512 tokens per row"dog"

Large Language Model

FieldDescriptionRequirementsExample
Prompt*Text data that will be paired with the associated value in the Completion column to train the model to generate text output1. Must be Text column type“What does Hive do?”
Completion*Data that is expected to be generated for the associated value in the Prompt column1. Max 4096 tokens per row“Hive is the leading provider of cloud-based AI solutions to understand, search, and generate content, and is trusted by hundreds of the world's largest and most innovative organizations.”
System PromptGuidance text that will be prefixed to each Prompt to give the model additional context on the desired output.1. Must be Text column type 2. Max 4096 tokens per row“Please use a clear, concise, and professional tone to answer the following question.”

Backup

There are no validations or requirements for backup snapshots.


What’s Next