Chat with LLM
Chat with state-of-the-art LLM using Delos.
About LLM Chat
The LLM Chat Service allows for conversational interactions with the AI, providing a user-friendly interface for message exchanges and responses. It offers several state-of-the-art models, such as GPT-4o, Llama-3, and more.
Large Language Model | Developer |
---|---|
gpt-3.5 (legacy ) | OpenAI |
gpt-4o | OpenAI |
gpt-4o-mini | OpenAI |
command-r | Cohere |
command-r-plus | Cohere |
llama-3-70b-instruct | Meta |
mistral-large | Mistral AI |
mistral-small | Mistral AI |
claude-3.5-sonnet | Anthropic |
claude-3-haiku | Anthropic |
Key features of the DelosPlatform Chat Service:
- Natural language interaction
- Context-aware responses
- Low-latency AI communication
- Transparent cost tracking
Step-by-step Tutorial
In this guide:
Section 1
: Prerequisites.Section 2
: Setup Delos Python Client.Section 3
: Chat with LLM. Parameters and examples.Section 4
: JSON mode and predefined output structures.Section 5
: Chat streaming.Section 6
: Chat beta, new endpoint.Section 7
: Handle Errors.
1. Prerequisites
Before you begin, ensure you have:
- An active DelosPlatform account
- API key from the API keys dashboard
2. Setup Delos Python Client
Using Python Delos client you can perform the API requests in a convenient way.
2.1. Install Delos Python Client:
Get the Delos Python client through PIP:
pip install delos
2.2. Authenticate Requests:
Initialize the client with your API key:
from delos import DelosClient delos_client = DelosClient(api_key=your-delos-api-key)
2.3. Call API:
You can start invoking any Delos endpoints. For example, let's try the /health
endpoint to check the validity of your API key and the availability of the client services:
response = delos_client.status_health() print(response)
3. Chat with LLM
Here is an example of a LLM chat request using Python client (/llm/chat
endpoint):
from delos import DelosClient delos_client = DelosClient(api_key="your-delos-api-key") response = delos_client.llm_chat( text="Hello, Delos!", model="mistral-small" ) print(response)
A successful response will return the AI's reply:
{ "request_id": "4fa2fb9d-d8ac-4995-8dd9-836323f11148", "response_id": "48b4a03a-e406-45e7-bf06-0d44b68f48af", "status_code": 200, "status": "success", "message": "Chat response received.", "data": { "answer": "Hello! How can I assist you today?" }, "timestamp": "2024-11-20T15:21:40.127776Z", "cost": "0.0023" }
3.1. Parameters:
Parameter | Description | Example |
---|---|---|
text | The text to send to LLM. | "What is the capital of France?" |
model | Large Language Model to use. | mistral-large , gpt-4o ... |
messages (optional) | List of previous messages. | [{"role":"assistant", "content":"Welcome! I am Delos."}] |
temperature (optional) | Randomness of the response. | 0.7 |
response_format (optional) | Choice to request JSON-parsed response. | {"type":"json_object"} or None |
The
model
allows to select the Large Language Model to chat with.The
temperature
is a float number (in between0
and1
) to control the randomness of LLM responses. Default value is0.7
. The lower it is, the more deterministic the responses will be.
from delos import DelosClient delos_client = DelosClient(api_key="") response = delos_client.llm_chat( text="Hello, Delos!", model="gpt-4o", temperature=0.7 ) print(response)
4. Specify output format
- The
response_format
allows to require json-parsed responses. For example:
from delos import DelosClient delos_client = DelosClient(api_key="") response = delos_client.llm_chat( text="What is the capital city and GDP of Germany?Reply in a JSON", model="gpt-4o", response_format={"type":"json_object"} ) print(response)
The AI's response will follow the JSON format. Please note that the more precise you are in your instructions and requirements, the better the response will align with your expectations. The response may be similar to the following:
{ "request_id": "4fa2fb9d-d8ac-4995-8dd9-836323f11148", "response_id": "48b4a03a-e406-45e7-bf06-0d44b68f48af", "status_code": 200, "status": "success", "message": "Chat response received.", "data": { "answer": { "capital": "Berlin", "GDP": "approximately 4.2 trillion USD" } }, "timestamp": "2024-11-20T15:21:40.127776Z", "cost": "0.0023" }
5. Streaming
Chat streaming is available in a separated endpoint, named /llm/chat_stream
. Parameters are received and read in the same format than in /llm/chat
endpoint, but now the response will be handled through a Streaming Response
, that will receive word by word the LLM response. Here is an example of a chat streaming call using Python client:
from delos import DelosClient delos_client = DelosClient(api_key="") response = delos_client.llm_chat_stream( text="Hello, Delos!", model="mistral-small" ) print(response)
A successful response will return the AI's reply, streamed word by word:
'0: ""\n\n' '0:"Hello"\n\n' '0:"!"\n\n' '0:" How"\n\n' '0:" can"\n\n' '0:" I"\n\n' '0:" assist"\n\n' '0:" you"\n\n' '0:" today"\n\n' '0:"?"\n\n'
When request_usage=True
, a final request details object is returned at the end of the stream, similar to:
'0:"Hi"\n\n' '0:"!"\n\n' '2': { 'id': 'ff4894ce-036e-412e-87c2-d40680863f31', 'choices': [{'delta': {}, 'finish_reason': 'stop'}], 'request_id': 'ff4894ce-036e-412e-87c2-d40680863f31', 'response_id': 'dc1044e2-a8e8-44d5-a94d-307b2fe8c42e', 'status_code': 200, 'status': 'success', 'message': 'Chat response received.\n(No previous `messages` have been read.)', 'timestamp': '2025-02-19T09:05:10.063001+00:00', 'cost': '0.00023' }
5.1. Parameters:
The parameters to provide for the /llm/chat_stream
are the same as for the /llm/chat
presented in previous sections.
Parameter | Description | Example |
---|---|---|
text | The text to send to LLM. | "What is the capital of France?" |
model | Large Language Model to use. | mistral-large , gpt-4o ... |
messages (optional) | List of previous messages. | [{"role":"assistant", "content":"Welcome! I am Delos."}] |
temperature (optional) | Randomness of the response. | 0.7 |
response_format (optional) | Choice to request JSON-parsed response. | {"type":"json_object"} or None |
request_usage (optional) | Whether to receive details object at the end of the streaming. | False |
The
model
is an allows to select the Large Language Model to chat with.The
temperature
, which is a float number (in between0
and1
) to control the randomness of LLM responses. Default value is0.7
. The lower it is, the more deterministic the responses will be.
from delos import DelosClient delos_client = DelosClient(api_key="") response = delos_client.llm_chat_stream( text="Hello, Delos!", model="gpt-4o-mini", temperature=0.7 ) print(response)
- The
response_format
allows to require json-parsed responses. For example:
from delos import DelosClient delos_client = DelosClient(api_key="") response = delos_client.llm_chat_stream( text="What is the capital city and GDP of Germany? Reply in a JSON", model="gpt-4o", response_format={"type":"json_object"} ) print(response)
6. New endpoint
A new endpoint named /llm/chat/beta
allows to interact with both chat regular requests and streaming requests, by toggling the parameter stream:bool
. Parameters are received and read in the same format than in /llm/chat/stream
and /llm/chat
endpoints.
6.1. Parameters:
The parameters to provide for the /llm/chat/beta
are the same as for the /llm/chat_stream
and /llm/chat
presented in previous sections.
Parameter | Description | Example |
---|---|---|
text | The text to send to LLM. | "What is the capital of France?" |
model | Large Language Model to use. | mistral-large , gpt-4o ... |
messages (optional) | List of previous messages. | [{"role":"assistant", "content":"Welcome! I am Delos."}] |
temperature (optional) | Randomness of the response. | 0.7 |
response_format (optional) | Choice to request JSON-parsed response. | {"type":"json_object"} or None |
request_usage (optional) | Whether to receive details object at the end of the streaming. | False |
⚠️ Warning:
When using the
response_format='{"type":"json_object"}'
, the Delos API does request the LLM a JSON response.However, for the sake of speed, in this streaming mode, the Delos API does not process the LLM response to ensure it is perfectly parseable. If ensuring parseability is a requirement in your pipeline, we recommend using the
LLM Chat
endpoint with theresponse_format
parameter.
7. Handle Errors
Common errors include:
- Missing API key
- No text provided
- No model provided
Example error response:
{ "status_code": 422, "status": "error", "message": "Validation error", "error": { "error_code": "422", "error_message": "Validation failed for the input fields.", "details": "[{'loc': ('header', 'api_key'), 'msg': 'Field required', 'type': 'missing'}]" } }