Chat with LLM

Chat with state-of-the-art LLM using Delos.

About LLM Chat

The LLM Chat Service allows for conversational interactions with the AI, providing a user-friendly interface for message exchanges and responses. It offers several state-of-the-art models, such as GPT-4o, Llama-3, and more.

Large Language ModelDeveloper
gpt-3.5 (legacy)OpenAI
gpt-4oOpenAI
gpt-4o-miniOpenAI
command-rCohere
command-r-plusCohere
llama-3-70b-instructMeta
mistral-largeMistral AI
mistral-smallMistral AI
claude-3.5-sonnetAnthropic
claude-3-haikuAnthropic

Key features of the DelosPlatform Chat Service:

  • Natural language interaction
  • Context-aware responses
  • Low-latency AI communication
  • Transparent cost tracking

Step-by-step Tutorial

In this guide:

  • Section 1: Prerequisites.
  • Section 2: Setup Delos Python Client.
  • Section 3: Chat with LLM. Parameters and examples.
  • Section 4: JSON mode and predefined output structures.
  • Section 5: Chat streaming.
  • Section 6: Chat beta, new endpoint.
  • Section 7: Handle Errors.

1. Prerequisites

Before you begin, ensure you have:

2. Setup Delos Python Client

Using Python Delos client you can perform the API requests in a convenient way.

2.1. Install Delos Python Client:

Get the Delos Python client through PIP:

pip install delos

2.2. Authenticate Requests:

Initialize the client with your API key:

from delos import DelosClient

delos_client = DelosClient(api_key=your-delos-api-key)

2.3. Call API:

You can start invoking any Delos endpoints. For example, let's try the /health endpoint to check the validity of your API key and the availability of the client services:

response = delos_client.status_health()
print(response)

3. Chat with LLM

Here is an example of a LLM chat request using Python client (/llm/chat endpoint):

from delos import DelosClient

delos_client = DelosClient(api_key="your-delos-api-key")
response = delos_client.llm_chat(
              text="Hello, Delos!",
              model="mistral-small"
            )
print(response)

A successful response will return the AI's reply:

{
  "request_id": "4fa2fb9d-d8ac-4995-8dd9-836323f11148",
  "response_id": "48b4a03a-e406-45e7-bf06-0d44b68f48af",
  "status_code": 200,
  "status": "success",
  "message": "Chat response received.",
  "data": {
    "answer": "Hello! How can I assist you today?"
  },
  "timestamp": "2024-11-20T15:21:40.127776Z",
  "cost": "0.0023"
}

3.1. Parameters:

ParameterDescriptionExample
textThe text to send to LLM."What is the capital of France?"
modelLarge Language Model to use.mistral-large, gpt-4o ...
messages (optional)List of previous messages.[{"role":"assistant", "content":"Welcome! I am Delos."}]
temperature (optional)Randomness of the response.0.7
response_format (optional)Choice to request JSON-parsed response.{"type":"json_object"} or None
  • The model allows to select the Large Language Model to chat with.

  • The temperature is a float number (in between 0 and 1) to control the randomness of LLM responses. Default value is 0.7. The lower it is, the more deterministic the responses will be.

from delos import DelosClient

delos_client = DelosClient(api_key="")
response = delos_client.llm_chat(
                text="Hello, Delos!",
                model="gpt-4o",
                temperature=0.7
              )
print(response)

4. Specify output format

  • The response_format allows to require json-parsed responses. For example:
from delos import DelosClient

delos_client = DelosClient(api_key="")
response = delos_client.llm_chat(
              text="What is the capital city and GDP of Germany?Reply in a JSON",
              model="gpt-4o",
              response_format={"type":"json_object"}
            )
print(response)

The AI's response will follow the JSON format. Please note that the more precise you are in your instructions and requirements, the better the response will align with your expectations. The response may be similar to the following:

{
  "request_id": "4fa2fb9d-d8ac-4995-8dd9-836323f11148",
  "response_id": "48b4a03a-e406-45e7-bf06-0d44b68f48af",
  "status_code": 200,
  "status": "success",
  "message": "Chat response received.",
  "data": {
    "answer": {
      "capital": "Berlin",
      "GDP": "approximately 4.2 trillion USD"
    }
  },
  "timestamp": "2024-11-20T15:21:40.127776Z",
  "cost": "0.0023"
}

5. Streaming

Chat streaming is available in a separated endpoint, named /llm/chat_stream. Parameters are received and read in the same format than in /llm/chat endpoint, but now the response will be handled through a Streaming Response, that will receive word by word the LLM response. Here is an example of a chat streaming call using Python client:

from delos import DelosClient

delos_client = DelosClient(api_key="")
response = delos_client.llm_chat_stream(
              text="Hello, Delos!",
              model="mistral-small"
            )
print(response)

A successful response will return the AI's reply, streamed word by word:

'0: ""\n\n'

'0:"Hello"\n\n'

'0:"!"\n\n'

'0:" How"\n\n'

'0:" can"\n\n'

'0:" I"\n\n'

'0:" assist"\n\n'

'0:" you"\n\n'

'0:" today"\n\n'

'0:"?"\n\n'

When request_usage=True, a final request details object is returned at the end of the stream, similar to:



'0:"Hi"\n\n'

'0:"!"\n\n'

'2': {
  'id': 'ff4894ce-036e-412e-87c2-d40680863f31',
  'choices': [{'delta': {}, 'finish_reason': 'stop'}],
  'request_id': 'ff4894ce-036e-412e-87c2-d40680863f31',
  'response_id': 'dc1044e2-a8e8-44d5-a94d-307b2fe8c42e',
  'status_code': 200,
  'status': 'success',
  'message': 'Chat response received.\n(No previous `messages` have been read.)',
  'timestamp': '2025-02-19T09:05:10.063001+00:00',
  'cost': '0.00023'
}

5.1. Parameters:

The parameters to provide for the /llm/chat_stream are the same as for the /llm/chat presented in previous sections.

ParameterDescriptionExample
textThe text to send to LLM."What is the capital of France?"
modelLarge Language Model to use.mistral-large, gpt-4o ...
messages (optional)List of previous messages.[{"role":"assistant", "content":"Welcome! I am Delos."}]
temperature (optional)Randomness of the response.0.7
response_format (optional)Choice to request JSON-parsed response.{"type":"json_object"} or None
request_usage (optional)Whether to receive details object at the end of the streaming.False
  • The model is an allows to select the Large Language Model to chat with.

  • The temperature, which is a float number (in between 0 and 1) to control the randomness of LLM responses. Default value is 0.7. The lower it is, the more deterministic the responses will be.

from delos import DelosClient

delos_client = DelosClient(api_key="")
response = delos_client.llm_chat_stream(
              text="Hello, Delos!",
              model="gpt-4o-mini",
              temperature=0.7
            )
print(response)
  • The response_format allows to require json-parsed responses. For example:
from delos import DelosClient

delos_client = DelosClient(api_key="")
response = delos_client.llm_chat_stream(
              text="What is the capital city and GDP of Germany? Reply in a JSON",
              model="gpt-4o",
              response_format={"type":"json_object"}
            )
print(response)

6. New endpoint

A new endpoint named /llm/chat/beta allows to interact with both chat regular requests and streaming requests, by toggling the parameter stream:bool. Parameters are received and read in the same format than in /llm/chat/stream and /llm/chat endpoints.

6.1. Parameters:

The parameters to provide for the /llm/chat/beta are the same as for the /llm/chat_stream and /llm/chat presented in previous sections.

ParameterDescriptionExample
textThe text to send to LLM."What is the capital of France?"
modelLarge Language Model to use.mistral-large, gpt-4o ...
messages (optional)List of previous messages.[{"role":"assistant", "content":"Welcome! I am Delos."}]
temperature (optional)Randomness of the response.0.7
response_format (optional)Choice to request JSON-parsed response.{"type":"json_object"} or None
request_usage (optional)Whether to receive details object at the end of the streaming.False

⚠️ Warning:

When using the response_format='{"type":"json_object"}', the Delos API does request the LLM a JSON response.

However, for the sake of speed, in this streaming mode, the Delos API does not process the LLM response to ensure it is perfectly parseable. If ensuring parseability is a requirement in your pipeline, we recommend using the LLM Chat endpoint with the response_format parameter.

7. Handle Errors

Common errors include:

  • Missing API key
  • No text provided
  • No model provided

Example error response:

{
  "status_code": 422,
  "status": "error",
  "message": "Validation error",
  "error": {
    "error_code": "422",
    "error_message": "Validation failed for the input fields.",
    "details": "[{'loc': ('header', 'api_key'), 'msg': 'Field required', 'type': 'missing'}]"
  }
}