Embeddings

An introduction to embeddings and how to use them to enhance your LLMs

Overview

An embedding is a vectorized form of a dataset which can be used to lookup text data. Embeddings can be queried directly to search for text that is similar to the provided input or used to augment Large Language Models.

Create an Embedding

To create an embedding, click the Create New Embedding button on the Embeddings dashboard page.

To create a new embedding, select the `Create New Embedding` button on the top right of the `Embeddings` page.

To create a new embedding, select the Create New Embedding button on the top right of the Embeddings page.

🚧

Please note that embeddings can only be created from datasets created from txt files and snapshots created using the Embedding snapshot type.

Once your embedding is activated, you'll be automatically taken to the detail page for that embedding. Here you can view all information associated with your embedding and any deployments it has been augmented to.

Embedding Options

An embedding is a collection of documents, where each document is an excerpt of the original dataset. When searching an embedding, the most relevant documents are returned.

In order to turn a snapshot into an embedding, the data will be broken down into sections and each section will become one document. You have access to several chunking options — settings that control just how the data is split into these sections. These options are as follows:

OptionDescription
Chunking StrategyThis option indicates what type of semantic breaks in the document that should be used to cluster relevant information together.

For example, for the "Markdown" chunking strategy, chunks will be recursively split based on the heading levels until each chunk is less than the defined "Chunk Size".
Chunk SizeThis determines the maximum size in tokens of each resulting document in an embedding.
OverlapThis option indicates how many characters can be shared between documents. Higher overlaps help avoid documents that start or end poorly.
Trim WhitespaceThis option indicates that spaces, tabs, and other whitespace characters should be removed from each document.
The form to create an embedding includes several settings that allow you to control the chunking strategy.

The form to create an embedding includes several settings that allow you to control the chunking strategy.

Deactivating Your Embedding

If you no longer want to use your embedding, go to the page for that embedding and select the Deactivate button in the top right corner. Deactivating an embedding is a permanent action and cannot be undone. If you have any deployments associated with an embedding, you will not be able to deactivate that embedding until it is removed from those deployments.


What’s Next