Snapshots
A quick guide to creating and using snapshots
Overview
A snapshot is a point-in-time export of a dataset which can be used to train models or create embeddings. After a snapshot is created, its contents cannot be changed. A single snapshot can be used to create multiple models or embeddings.
Create a Snapshot
To create a snapshot, go to the dataset detail page and click Create Snapshot
.
On the Create Snapshot
form, the first field is the Snapshot Type
. This field indicates how you plan to use the snapshot. The Snapshot Type
determines which models can be trained or whether it can be used to create an embedding. Each type is explained in the table below:
Snapshot Type | Description | Can Create |
---|---|---|
No validations | Snapshot can be used to restore a previous state. | Version of your dataset |
Text Classification | Snapshot can be used to train a model that categorizes text into multiple classes. | Text Classification model |
Image Classification | Snapshot can be used to train a model that categorizes images into multiple classes. | Image Classification model |
Large Language Model | Snapshot can be used to train a model that generates custom text from prompts. | Large Language Model |
Embedding | Snapshot can be used to create an embedding. | Embedding |
Once the Snapshot Type
is selected, additional fields may appear that are specific to the type.
Only datasets created from
txt
files can be used to createEmbedding
type snapshots.
Snapshot Requirements
The requirements for a snapshot depend on the type you select. For No validations
and Embedding
types, there are no additional requirements beyond the Dataset Requirements. For the other types, please see the tables below for a full list of snapshot requirements (required columns are marked with an *
).
Text Classification
Column | Description | Requirements | Example Row Value |
---|---|---|---|
Text Input* | This is the textual data that will be assigned a label or category in your Labels column. | 1. Must be a "Text" column type. 2. Should have more than 20 unique values. 3. Each row must contain at most 512 tokens. | "The movie was one of the best ones I've seen in the past year, hands down." |
Labels* | These are the class labels your model will learn to predict. | 1. Must be a "Text" column type. 2. Must have between 2 and 20 unique values. 3. Each row must contain at most 512 tokens. | "Positive Sentiment" |
Image Classification
Field | Description | Requirements | Example Row Value |
---|---|---|---|
Image Input* | This is the image data that will be assigned a label or category in your Labels column. | 1. Must be a "Image" column type. 2. Should have more than 20 unique values. | https://exampleimage.com/image1 |
Labels* | These are the class labels your model will learn to predict. | 1. Must be a "Text" column type. 2. Must have between 2 and 20 unique values. 3. Each row must contain at most 512 tokens. | "Dog" |
Large Language Model
Field | Description | Requirements | Example Row Value |
---|---|---|---|
Prompt* | Example prompts provided to the model that, along with the associated completions in the Completion column, will teach the model to generate coherent text responses. | 1. Must be a "Text" column type. 2. Should have more than 20 unique values. 3. Each row must contain at most 4096 tokens. | How old is Le Cobusier? |
Completion* | The text you expect to be generated as output for each prompt in the Prompt column. | 1. Must be a "Text" column type. 2. Each row must contain at most 4096 tokens. | Le Corbusier, born Charles-Édouard Jeanneret, was a Swiss-French architect... |
System Prompt | Guidance text that will be prefixed to each prompt. It helps to further inform the model what kind of output is desired. | 1. Must be a "Text" column type. 2. Each row must contain at most 4096 tokens. | Please answer in a formal, academic tone. |
Roles | The roles the model and customer interactions should assume for each prompt. | 1. Must be a "JSON" column type with the structure shown to the right. An additional example is shown below this table. 2. Each row must contain at most 4096 tokens. | { "user": "Bob", "model": "Computer" } |
Prompt History | Provides past interactions between the model and the customer to provide additional context to the current prompt. | 1. Must be a "JSON" column type with the structure found to the right. An additional example is shown below this table. 2. Each row must contain at most 4096 tokens. | [ { "content": "Who are some french architects?", "role": "Bob" }, { "content": "Here are some notable French architects throughout history: 1) Le Corbusier...", "role": "Computer" } ] |
{
"user": "Harry Potter",
"model": "Voldemort"
}
[
// Arranged from oldest to newest.
// The "role" must match the values in the "roles" column.
// The newest interaction must come from the "model".
{
"content": "Who are you?",
"role": "Harry Potter"
},
{
"content": "You know who I am, he who must not be named.",
"role": "Voldemort"
}
]
Updated about 1 month ago