Large Language Model APIs

A guide to our large language models

Overview

Our LLMs (large language models) produce text in response to a given text prompt. The input and output response combined can be up to 32,768 tokens (about 64,000 characters). The model produces writing across many genres and formats for a wide variety of use cases, including answering questions, writing stories, participating in conversations, and programming in multiple programming languages.

Models

We offer a variety of Meta’s open-source Llama Instruct models from the 3.1 and 3.2 series for self-serve, with additional models to be served in the near future.

Here are the differences between our current LLM offerings:

ModelDescription
Llama 3.2 1B InstructLlama 3.2 1B Instruct is a lightweight, multilingual, instruction-tuned text-only model that fits onto both edge and mobile devices. Use cases where the model excels include summarizing or rewriting inputs, as well as instruction following. We provide this model in one additional size (3B).
Llama 3.2 3B InstructLlama 3.2 3B Instruct is a lightweight, multilingual, instruction-tuned text-only model that fits onto both edge and mobile devices. Use cases where the model excels include summarizing or rewriting inputs, as well as instruction following. We provide this model in one additional size (1B).
Llama 3.1 8B InstructLlama 3.1 8B Instruct is a multilingual, instruction-tuned text-only model. Compared to other available open source and closed chat models, Llama 3.1 instruction-tuned text-only models achieve higher scores across common industry benchmarks. We provide this model in one additional size (70B).
Llama 3.1 70B InstructLlama 3.1 70B Instruct is a multilingual, instruction-tuned text-only model. Compared to other available open source and closed chat models, Llama 3.1 instruction-tuned text-only models achieve higher scores across common industry benchmarks. We provide this model in one additional size (8B).

Request Format

Below are the input fields for an LLM cURL request. The asterisk (*) next to an input field designates that it is required.

Input FieldTypeDefinition
text_data*stringThe main text prompt. This describes what the text response should include.
callback_url*stringWhen the task is completed, we will send a callback from our servers to this callback url.
system_promptstringString that provides context for how the model should respond to all requests.
prompt_historyProvides the chat history in chronological order, where the last item is the most recent chat. Each item needs to be a dictionary with the keys "content" and "role".
top_pfloatA value between 0 and 1. If the value is 0, requests should be more diverse than when set to a higher value.
temperaturefloatA value between 0 and 1. If this value is 0, a repeated request should yield the same response. If this value is 1, a repeated request should yield different responses.
max_tokensintThis is the maximum token window that you want for your model completion output.

Here is an example of a cURL request using the following format:

curl --location 'https://api.thehive.ai/api/v1/task/async' \
--header 'Authorization: Token <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--data '{
    "text_data": "<YOUR_PROMPT>",
    "callback_url": "<YOUR_CALLBACK_URL>",
    "options": {
        "system_prompt": "<YOUR_SYSTEM_PROMPT>",
		"prompt_history": "<YOUR_PROMPT_HISTORY>",
        "top_p": <YOUR_TOP_P>,
        "temperature": <YOUR_TEMPERATURE>,
        "max_tokens": <YOUR_MAX_TOKENS>
    }
}'

Response

After making an LLM cURL request, you will receive a text response. To see another example API request and response for this model, along with detailed information about the parameters, you can visit our API reference page.