Snapshots

A quick guide to creating and using snapshots

Overview

A snapshot is a point-in-time export of a dataset which can be used to train models or create embeddings. After a snapshot is created, its contents cannot be changed. A single snapshot can be used to create multiple models or embeddings.

Create a Snapshot

To create a snapshot, go to the dataset detail page and click Create Snapshot.

The `Create Snapshot` button lies at the top right of the detail page for an individual dataset.

The Create Snapshot button lies at the top right of the detail page for an individual dataset.

On the Create Snapshot form, the first field is the Snapshot Type. This field indicates how you plan to use the snapshot. The Snapshot Type determines which models can be trained or whether it can be used to create an embedding. Each type is explained in the table below:

Snapshot TypeDescriptionCan Create
No validationsSnapshot can be used to restore a previous state.Version of your dataset
Text ClassificationSnapshot can be used to train a model that categorizes text into multiple classes.Text Classification model
Image ClassificationSnapshot can be used to train a model that categorizes images into multiple classes.Image Classification model
Large Language ModelSnapshot can be used to train a model that generates custom text from prompts.Large Language Model
EmbeddingSnapshot can be used to create an embedding.Embedding

Once the Snapshot Type is selected, additional fields may appear that are specific to the type.

🚧

Only datasets created from txt files can be used to create Embedding type snapshots.

Snapshot Requirements

The requirements for a snapshot depend on the type you select. For No validations and Embedding types, there are no additional requirements beyond the Dataset Requirements. For the other types, please see the tables below for a full list of snapshot requirements (required columns are marked with an *).

Text Classification

ColumnDescriptionRequirementsExample Row Value
Text Input*This is the textual data that will be assigned a label or category in your Labels column.1. Must be a "Text" column type.
2. Should have more than 20 unique values.
3. Each row must contain at most 512 tokens.
"The movie was one of the best ones I've seen in the past year, hands down."
Labels*These are the class labels your model will learn to predict.1. Must be a "Text" column type.
2. Must have between 2 and 20 unique values.
3. Each row must contain at most 512 tokens.
"Positive Sentiment"

Image Classification

FieldDescriptionRequirementsExample Row Value
Image Input*This is the image data that will be assigned a label or category in your Labels column.1. Must be a "Image" column type.
2. Should have more than 20 unique values.
https://exampleimage.com/image1
Labels*These are the class labels your model will learn to predict.1. Must be a "Text" column type.
2. Must have between 2 and 20 unique values.
3. Each row must contain at most 512 tokens.
"Dog"

Large Language Model

FieldDescriptionRequirementsExample Row Value
Prompt*Example prompts provided to the model that, along with the associated completions in the Completion column, will teach the model to generate coherent text responses.1. Must be a "Text" column type.
2. Should have more than 20 unique values.
3. Each row must contain at most 4096 tokens.
How old is Le Cobusier?
Completion*The text you expect to be generated as output for each prompt in the Prompt column.1. Must be a "Text" column type.
2. Each row must contain at most 4096 tokens.
Le Corbusier, born Charles-Édouard Jeanneret, was a Swiss-French architect...
System PromptGuidance text that will be prefixed to each prompt. It helps to further inform the model what kind of output is desired.1. Must be a "Text" column type.
2. Each row must contain at most 4096 tokens.
Please answer in a formal, academic tone.
RolesThe roles the model and customer interactions should assume for each prompt.1. Must be a "JSON" column type with the structure shown to the right. An additional example is shown below this table.
2. Each row must contain at most 4096 tokens.
{
"user": "Bob",
"model": "Computer"
}
Prompt HistoryProvides past interactions between the model and the customer to provide additional context to the current prompt.1. Must be a "JSON" column type with the structure found to the right. An additional example is shown below this table.
2. Each row must contain at most 4096 tokens.
[
{
"content": "Who are some french architects?",
"role": "Bob"
},
{
"content": "Here are some notable French architects throughout history: 1) Le Corbusier...",
"role": "Computer"
}
]
{
  "user": "Harry Potter",
  "model": "Voldemort"
}
[
  // Arranged from oldest to newest.
  // The "role" must match the values in the "roles" column.
  // The newest interaction must come from the "model".
  {
    "content": "Who are you?",
    "role": "Harry Potter"
  },
  {
    "content": "You know who I am, he who must not be named.",
    "role": "Voldemort"
  }
]

What’s Next