Large Language Model APIs
A guide to our large language models
Overview
Our LLMs (large language models) produce text in response to a given text prompt. The input and output response combined can be up to 32,768 tokens (about 64,000 characters). The model produces writing across many genres and formats for a wide variety of use cases, including answering questions, writing stories, participating in conversations, and programming in multiple programming languages.
Models
We offer a variety of Meta’s open-source Llama Instruct models from the 3.1 and 3.2 series for self-serve, with additional models to be served in the near future.
Here are the differences between our current LLM offerings:
Model | Description |
---|---|
Llama 3.2 1B Instruct | Llama 3.2 1B Instruct is a lightweight, multilingual, instruction-tuned text-only model that fits onto both edge and mobile devices. Use cases where the model excels include summarizing or rewriting inputs, as well as instruction following. We provide this model in one additional size (3B). |
Llama 3.2 3B Instruct | Llama 3.2 3B Instruct is a lightweight, multilingual, instruction-tuned text-only model that fits onto both edge and mobile devices. Use cases where the model excels include summarizing or rewriting inputs, as well as instruction following. We provide this model in one additional size (1B). |
Llama 3.1 8B Instruct | Llama 3.1 8B Instruct is a multilingual, instruction-tuned text-only model. Compared to other available open source and closed chat models, Llama 3.1 instruction-tuned text-only models achieve higher scores across common industry benchmarks. We provide this model in one additional size (70B). |
Llama 3.1 70B Instruct | Llama 3.1 70B Instruct is a multilingual, instruction-tuned text-only model. Compared to other available open source and closed chat models, Llama 3.1 instruction-tuned text-only models achieve higher scores across common industry benchmarks. We provide this model in one additional size (8B). |
Request Format
Below are the input fields for an LLM cURL request. The asterisk (*) next to an input field designates that it is required.
Input Field | Type | Definition |
---|---|---|
text_data* | string | The main text prompt. This describes what the text response should include. |
callback_url* | string | When the task is completed, we will send a callback from our servers to this callback url. |
system_prompt | string | String that provides context for how the model should respond to all requests. |
prompt_history | Provides the chat history in chronological order, where the last item is the most recent chat. Each item needs to be a dictionary with the keys "content" and "role". | |
top_p | float | A value between 0 and 1. If the value is 0, requests should be more diverse than when set to a higher value. |
temperature | float | A value between 0 and 1. If this value is 0, a repeated request should yield the same response. If this value is 1, a repeated request should yield different responses. |
max_tokens | int | This is the maximum token window that you want for your model completion output. |
Here is an example of a cURL request using the following format:
curl --location 'https://api.thehive.ai/api/v1/task/async' \
--header 'Authorization: Token <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
"text_data": "<YOUR_PROMPT>",
"callback_url": "<YOUR_CALLBACK_URL>",
"options": {
"system_prompt": "<YOUR_SYSTEM_PROMPT>",
"prompt_history": "<YOUR_PROMPT_HISTORY>",
"top_p": <YOUR_TOP_P>,
"temperature": <YOUR_TEMPERATURE>,
"max_tokens": <YOUR_MAX_TOKENS>
}
}'
Response
After making an LLM cURL request, you will receive a text response. To see another example API request and response for this model, along with detailed information about the parameters, you can visit our API reference page.
Updated 9 days ago