Embeddings
An introduction to embeddings and how to use them to enhance your LLMs
Overview
An embedding is a vectorized form of a dataset which can be used to lookup text data. Embeddings can be queried directly to search for text that is similar to the provided input or used to augment Large Language Models.
Create an Embedding
To create an embedding, click the Create New Embedding
button on the Embeddings
dashboard page.
Please note that embeddings can only be created from datasets created from
txt
files and snapshots created using theEmbedding
snapshot type.
Once your embedding is activated, you'll be automatically taken to the detail page for that embedding. Here you can view all information associated with your embedding and any deployments it has been augmented to.
Embedding Options
An embedding is a collection of documents, where each document is an excerpt of the original dataset. When searching an embedding, the most relevant documents are returned.
In order to turn a snapshot into an embedding, the data will be broken down into sections and each section will become one document. You have access to several chunking options — settings that control just how the data is split into these sections. These options are as follows:
Option | Description |
---|---|
Chunking Strategy | This option indicates what type of semantic breaks in the document that should be used to cluster relevant information together. For example, for the "Markdown" chunking strategy, chunks will be recursively split based on the heading levels until each chunk is less than the defined "Chunk Size". |
Chunk Size | This determines the maximum size in tokens of each resulting document in an embedding. |
Overlap | This option indicates how many characters can be shared between documents. Higher overlaps help avoid documents that start or end poorly. |
Trim Whitespace | This option indicates that spaces, tabs, and other whitespace characters should be removed from each document. |
Deactivating Your Embedding
If you no longer want to use your embedding, go to the page for that embedding and select the Deactivate
button in the top right corner. Deactivating an embedding is a permanent action and cannot be undone. If you have any deployments associated with an embedding, you will not be able to deactivate that embedding until it is removed from those deployments.
Updated 2 months ago