Files Index

Query and analyze Index of Files using Delos.

About Index

Index are sets of documents that will be processed and analyzed together, allowing to ask questions and retrieve informations contained in the Index files.

They are referred through an unique index_uuid and a name of type string. You can find all the details in API Reference , but for a quick overview, these are the Index properties:

Index attributesDescriptionExample
index_uuidUnique identifier for the index"your-index-uuid"
nameName of the index"Financial Reports 2023"
statusIndex is active or in countdown (scheduled for deletion)active
vectorizedFile contents are embedded and ready for queryingfalse
created_atCreation timestamp"2024-11-15T15:03:00.219676+00:00
updated_atUpdate timestamp"2024-11-15T15:03:00.219681+00:00
expires_atExpiration date of the index, if scheduled for deletionNone
filesFiles linked to the the index, and storage."3 files: 147086 bytes"

Storage Limits

⚠️ Warning:

There is a total storage limit that is set for the total of files across all index. Once reached, no new files can be uploaded to existing index or new index created.

Existing index can still be queried, deleted, or some files removed from them to free space. You can also manage storage through the Dashboard .

Supported File Formats

  • Documents: .pdf, .docx, .doc, .odt, .txt, .md
  • Spreadsheets: .xlsx, .xls, .ods, .csv
  • Presentations: .pptx, .ppt, .odp
  • Notebooks: .ipynb

Step-by-step Tutorial

In this guide:

  • Section 1: Prerequisites.
  • Section 2: Setup Delos Python Client.
  • Section 3: Index operations. Parameters and examples.
    • Section 3.1: New Index + details: Create index and fetch index details.
    • Section 3.2: Modify Index: Rename an index, add files, delete files.
    • Section 3.3: Index Contents: Embedding & Querying: Embed an index, ask index.
    • Section 3.4: Index Tags: List index tags, update index tags, update files tags.
    • Section 3.5: Index Management: List all index, delete index, restore index.

1. Prerequisites

Before you begin, ensure you have:

2. Setup Delos Python Client

Using Python Delos client you can perform the API requests in a convenient way.

2.1. Install Delos Python Client:

Get the Delos Python client through PIP:

pip install delos

2.2. Authenticate Requests:

Initialize the client with your API key:

from delos import DelosClient

delos_client = DelosClient(api_key=your-delos-api-key)

2.3. Call API:

You can start invoking any Delos endpoints. For example, let's try the /health endpoint to check the validity of your API key and the availability of the client services:

response = delos_client.status_health()
print(response)

3. Index Operations

          

Index name a group of documents that are analyzed and processed together. They may be concerning the same topic, or share a common structure.

When asking a question to the Index, the Model will process these documents together, and retrieve the most relevant information in order to answer the question.

These are the Index requests available in Python delos_client:

GroupClient methodUsed for
Index Management.files_index_createCreate a new Index
.files_index_deleteDelete a specific Index
.files_index_restoreRestore a specific deleted Index
-------------------------------------------------------------------------------------------
Index Contents.files_index_files_addAdd files to an existing Index
.files_index_files_deleteDelete files from an existing Index
.files_index_renameRename an existing Index
-----------------------------------------------------------------------------------------
Index Details.files_index_listList all Index
.files_index_detailsSee the details of an Index
------------------------------------------------------------------------------------------
Index Querying.files_index_embedEmbed the files in the specified Index
.files_index_askQuery the files in a specific Index
-------------------------------------------------------------------------------------------
Index Tags.files_index_tags_getGet Index tags
.files_index_tags_updateModify Index tags
.files_index_files_tags_updateModify Index files tags

Let's create a Index to work with several files. We will first send the files to create the Index, then process them, and then the Index will be ready for our queries.

          

3.1. NEW INDEX + DETAILS

1. Create new Index:

In order to create a new index, which will be shared to your team:

response = delos_client.files_index_create(
    filepaths=['/path/to/document1.pdf', '/path/to/document2.docx'],
    name="my_new_index",
    read_images=False)
print(response)

The Index creation performs an inner call to the /files_parse service, in order to read all files contents. Here are all the parameters for the index creation request:

ParameterDescriptionExample
nameName for new index.TestFiles
filepathsList of paths to the files to be processed.[/path/to/file1.pdf, /path/to/file2.docx]
read_images (optional)Whether to scan images or not (default).False

The parameter read_images allows to enable or not the scanning of the images and graphic elements while processing the file contents. By default it is disabled (read_images=False). This option consumes more since it requires a more complex processing.

Expected response:

{
  "request_id": "3b096969-50cb-4325-9313-d49e821090c6",
  "response_id": "45d5f45a-7f36-4ac7-af47-689ae5050597",
  "status_code": 200,
  "status": "success",
  "message": "Index created successfully.",
  "data": {
    "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a",
    "name": "TestFiles",
    "created_at": "2025-02-19T14:49:50.872575+00:00",
    "updated_at": "2025-02-19T14:49:50.872599+00:00",
    "vectorized": false,
    "status": "active",
    "files": {
      "9a32bca9a2ddcdb97535aa38": {
        "file_hash": "9a32bca9a2ddcdb97535aa38",
        "filename": "financial_reports_2023.docx",
        "size": 577590
      },
      "3b09696950cb4325931003fa3": {
        "file_hash": "3b09696950cb4325931003fa3",
        "filename": "results_2024_q1.pdf",
        "size": 577590
      }
    }
  },
  "error": null,
  "timestamp": "2025-02-19T12:54:08.628050Z",
  "cost": 0.0075
}

The index_uuid received when creating the Index allows to perform any further operation on this Index.

          

2. See Index details:

You can retrieve the details of your created index by providing the index_uuid:

response = delos_client.files_index_details(index_uuid=index_uuid)
print(response)

The index details request only requires the UUID of the index to query:

ParameterDescriptionExample
index_uuidUnique identifier for the index.1111-1111111-1111-1111

The response will be similar to the following:

{
  "request_id": "3b096969-50cb-4325-9313-d49e821090c6",
  "response_id": "45d5f45a-7f36-4ac7-af47-689ae5050597",
  "status_code": 200,
  "status": "success",
  "message": "Index `8400c6e1-a185-4960-bc8a-b24edc74411a` (named `TestFiles`) details retrieved.",
  "data": {
    "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a",
    "name": "TestFiles",
    "vectorized": false,
    "status": "active",
    "expires_at": null,
    "created_at": "2025-02-19T14:49:50.872575+00:00",
    "updated_at": "2025-02-19T14:58:00.806729+00:00",
    "storage": {
      "size_bytes": 147086,
      "size_mb": 0.01,
      "num_files": 2
    },
    "files": [
      {
        "file_hash": "9a32bca9a2ddcdb97535aa38",
        "filename": "financial_reports_2023.docx",
        "size": 577590
      },
      {
        "file_hash": "3b09696950cb4325931003fa3",
        "filename": "results_2024_q1.pdf",
        "size": 577590
      }
    ]
  },
  "error": null,
  "timestamp": "2025-02-19T14:58:00.808005Z",
  "cost": 0.0
}

The details come handy to make sure which files are in every index, and the storage details associated to them:

  • The index_uuid allows to perform operations on this index, such as adding or removing files instantly from the index. It is an unique UUID that cannot be modified or customized, and will be unique and constant for each Index.

  • The name is modifiable, but it is also expected to be unique inside an organization.

  • The vectorized status shows the Index contents readiness for being queried.

  • Index status shows whether the Index is active or scheduled for deletion (countdown).

  • The expiry_date is set from 2h from current time, at the moment the deletion of the Index is requested (and therefore is only non-None for Index with status=countdown).

  • The storage field shows the number of files and size occupied in this index. Remember Storage is limited to 100 MB per organization (across all index). Your organization storage is also managable through the Dashboard.

          

3.2. MODIFY INDEX

You may want to rename an index or modify the set of files that an index contains.

1. Rename Index

You can rename an index by using the /rename_index endpoint:

response = delos_client.files_index_rename(
    new_index_uuid,
    "New name",
)
print(response)

The index rename request expects:

ParameterDescriptionExample
index_uuidUnique identifier for the index.1111-1111111-1111-1111
nameNew name for new index.Financial Reports 2023-24

Expected response:

{
  "request_id": "3b096969-50cb-4325-9313-d49e821090c6",
  "response_id": "45d5f45a-7f36-4ac7-af47-689ae5050597",
  "status_code": 200,
  "status": "success",
  "message": "Index `8400c6e1-a185-4960-bc8a-b24edc74411a` name changed from 'TestFiles' to 'Financial Reports 2023-24'.",
  "data": {
    "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a",
    "old_name": "TestFiles",
    "new_name": "Financial Reports 2023-24",
    "status": "active",
    "updated_at": "2025-02-19T15:06:16.575203+00:00"
  },
  "error": null,
  "timestamp": "2025-02-19T12:54:08.628050Z",
  "cost": 0.0
}

          

2. Add new Files to Index

For adding new files, use the /add_files_to_index endpoint. You can choose to enable the read_images parameter if graphic contents are relevant to your processing (by default it is disabled):


response = delos_client.files_index_add_files(
    index_uuid=your-index-uuid,
    filepaths=["files=path/to/document3.pdf", \\
               "files=path/to/document4.txt"],
    read_images=True,
)
print(response)

The request to add files expects:

ParameterDescriptionExample
index_uuidUnique identifier for the index.1111-1111111-1111-1111
filepathsList of paths to the files to be processed.[/path/to/file3.docx, /path/to/file4.pdf]
read_images (optional)Whether to scan images or not (default).False

Expected response:

{
  "request_id": "2093f52f-51ac-41ea-8b78-02367906906a",
  "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e",
  "status_code": 200,
  "status": "success",
  "message": "Files added and processed successfully.",
  "data": {
    "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a",
    "new_files": ["ba7ddb4dae0420a1fe0b5e55c3970eb1c7d27f"],
    "processed_chunks": 9
  },
  "error": null,
  "timestamp": "2025-02-19T15:07:25.479522Z",
  "cost": 0.0183
}

          

3. Delete Files from Index

Or to delete one or more files from the index, by providing the filehash (it can be retrieved from the index details):

files_hashes = ["ba7ddb4dae0420a1fe0b5e55c3970eb1c7d27f"]

response = delos_client.files_index_delete_files(
    index_uuid=your-index-uuid,
    files_hashes=files_hashes
)
print(response)

To delete files, the parameters are:

ParameterDescriptionExample
index_uuidUnique identifier for the index.1111-1111111-1111-1111
files_idsList of file_idsof files to be removed.[ba7dd-b4dae0420a1fe-0b5e55c3970eb1-c7d27f]

Expected response:

{
  "request_id": "2093f52f-51ac-41ea-8b78-02367906906a",
  "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e",
  "status_code": 200,
  "status": "success",
  "message": "File(s) deleted from index successfully",
  "data": {
    "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a",
    "remaining_files": ["9a32b-ca9a2ddc-db9753-5aa38"]
  },
  "error": null,
  "timestamp": "2025-02-19T15:07:25.479522Z",
  "cost": 0.0
}

You can request the index details again in order to make sure the files were correctly added or removed (see section 3.1 for Index details).

Also, you will be able to see the storage that those files in the index take in your quota, which is limited to 100 MB per organization (across all index).

          

3.3. EMBEDDING & QUERYING

1. Embed Index

In order to perform vectorized searches, you need to embed the index. This operation will calculate the embeddings of files belonging to the index:

response = delos_client.files_index_embed(index_uuid=your-index-uuid)
print(response)

To embed index, the parameters are:

ParameterDescriptionExample
index_uuidUnique identifier for the index.1111-1111111-1111-1111

Expected response:

{
  "request_id": "9bc55dfd-fe35-45ac-b639-fa4296f29058",
  "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e",
  "status_code": 200,
  "status": "success",
  "message": "Index `8400c6e1-a185-4960-bc8a-b24edc74411a` successfully vectorized.",
  "data": { "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a" },
  "error": null,
  "timestamp": "2025-02-19T15:07:25.479522Z",
  "cost": 0.000139
}

          

2. Query Index

Now that the index is vectorized, you can ask the index. The index will return the answer to the question based on the embeddings of files belonging to the index:

response = delos_client.files_index_ask(
    index_uuid=new_index_uuid,
    question="Where is located the bridge these articles mention?",
    output_language="en",
)

We can specify one or more files_ids to limit the files the Index is going to analyze in order to answer to your question:

response = delos_client.files_index_ask(
    index_uuid=new_index_uuid,
    question="Where is located the bridge these articles mention?",
    output_language="en",
    active_files = [
      "26a79-e1e7233ef12c-763c4a0e6b3221dd-ba54357d4",
      "f66d2-345d7c64ea7e-4428f87d537927b5-67d8eba00"
    ]
)

Filtering through tags is also possible (see section 3.4 for more information on Index tags).

The parameters tags and active_files are complementary, and can be used together or separatedly to filter files used in the research.

  • When they are not provided, all files in the index are considered.
  • When providing tags, files having at least one of these tags_user will be considered in this research.
  • When providing active_files, the files having these file_id will be considered in this research.
  • If both parameters are provided, the files matching both criteria will be considered in this research.

The responses will contain and answer to the question, as well as the sources of index file and page that contain the information to base the answer to the question.

To ask index, the parameters are:

ParameterDescriptionExample
index_uuidUnique identifier for the index.1111-1111111-1111-1111
questionQuestion on the Index files.Where is located the bridge these articles mention?
output_language (optional)Language for the response (default: the same used in question).en
active_files (optional)List of files within this Index to access for this question. Not providing it is equivalent to selecting all files.["efcb3-858b45a3ed-b306c9-ec78e3492f833ad1-8980"]
tags (optional)List of Index files tags to subselect for this question. Not providing it is equivalent to selecting all tags.["financial", "research"]

Expected response:

{
  "status": "success",
  "message": "Query processed successfully",
  "data": {
    "answer": "The article discusses the bridge of Brooklyn 'FILE:1 PAGE:2'.",
    "sources": {
      "1": "efcb3-858b45a3ed-b306c9-ec78e3492f833ad1-8980"
    }
  }
}

          

3.4. INDEX TAGS

Index can be labelled with tags, that allow to classify or group together related files inside a index. That enables to ask_index on an index while filtering documents that match at least one of the provided tags.

1. Get Index Tags

To see the tags enabled for an index, you can use the /get_index_tags endpoint:

response = delos_client.files_index_tags_get(index_uuid=your-index-uuid)
print(response)

These tags will be used as allowed options when uploading new files to the index, being added as suggested tags (tags_auto) while processing the files. Those are considered tags suggestions, and have no impact on the index querying.

2. Update Index Tags

To update the list of tags enabled for an index, you can use the /update_index_tags endpoint:

response = delos_client.files_index_tags_update(
    index_uuid=your-index-uuid,
    tags=["financial", "research", "quarterly"]
)
print(response)

3. Update Index Files Tags

You can update the tags of specific files within an index by providing the file_id through the /update_index_files_tags endpoint:

response = delos_client.files_index_files_tags_update(
    index_uuid=your-index-uuid,
    files_ids=["ba7ddb4dae0420a1fe0b5e55c3970eb1c7d27f"],
    tags=["financial", "research", "quarterly"]
)
print(response)

Notice that tags provided by the user (tags_user) through this endpoint will be used as strict matching criteria during index queries through ask_index endpoint. On the other hand, tags added automatically during files processing (tags_auto) are considered tags suggestions and do not affect index queries.

For example, let's see an example of querying a set of files in the index while using the tags parameter:

response = delos_client.files_index_ask(
    index_uuid=your-index-uuid,
    question="Where is located the bridge these articles mention?",
    output_language="en",
    active_files_ids = [
      "26a79-e1e7233ef12c-763c4a0e6b3221dd-ba54357d4",
      "f66d2-345d7c64ea7e-4428f87d537927b5-67d8eba00"
    ],
    tags=["cv","results"]
)

That query will process the results obtained from selected files that have at least one of the tags cv or results.

These two parameters are complementary, and can be used together or separatedly to filter files used in the research.

          

3.5. INDEX MANAGEMENT

1. List All Index

You can list all the Index in your team that are in active or countdown (scheduled deletion) status:

response = delos_client.files_index_list()
print(response)

(This function does not receive any parameter).

Expected response:

{
  "request_id": "2093f52f-51ac-41ea-8b78-02367906906a",
  "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e",
  "status_code": 200,
  "status": "success",
  "message": "Retrieved 2 index.",
  "data": {
    "index": [
      {
        "index_uuid": "your-index-uuid",
        "name": "my_new_index",
        "status": "active",
        "vectorized": true,
        "created_at": "2024-11-15T15:03:00.219676+00:00",
        "updated_at": "2024-11-15T15:03:00.219681+00:00",
        "expires_at": "2024-11-16T15:03:00.219682+00:00",
        "storage": {
          "size_bytes": 147086,
          "size_mb": 0.01,
          "num_files": 2
        }
      },
      {
        "index_uuid": "another-index-uuid",
        "name": "2024 Sales results",
        "status": "active",
        "vectorized": false,
        "created_at": "2024-11-15T15:03:00.219676+00:00",
        "updated_at": "2024-11-15T15:03:00.219681+00:00",
        "expires_at": "2024-11-16T15:03:00.219682+00:00",
        "storage": {
          "size_bytes": 577590,
          "size_mb": 0.55,
          "num_files": 3
        }
      }
    ],
    "total_storage": {
      "bytes": 289172,
      "mb": 0.028,
      "limit_mb": 100,
      "usage_percentage": 2.8
    }
  },
  "error": null,
  "timestamp": "2025-02-19T15:07:25.479522Z",
  "cost": 0.0
}

          

2. Delete an Index (⚠️ *warning*: delayed opperation)

You can delete an index if you no longer need to access it. Unlike the other endpoints, which perform the requests live, this endpoint provides a security marge to be effective. It will delete the index after 2h, giving time to reverse the operation in case of errors. Index that are marked for deletion receive the status "countdown" once the expiry date is set, instead of the "active" status.

response = delos_client.files_index_delete(new_index_uuid)
print(response)

To delete an index, the parameters are:

ParameterDescriptionExample
index_uuidUnique identifier for the index.1111-1111111-1111-1111

Expected response:

{
  "request_id": "2093f52f-51ac-41ea-8b78-02367906906a",
  "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e",
  "status_code": 200,
  "status": "success",
  "message": "Index marked for deletion",
  "data": {
    "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a",
    "expires_at": "2025-02-19T17:41:52.180933+00:00"
  },
  "error": null,
  "timestamp": "2025-02-19T15:41:52.180933Z",
  "cost": 0.0
}

          

3. Restore an Index scheduled deletion

After an index is marked for deletion, but before the expiry date, you can restore it. This will allow you to revert the operation in case of errors. It will restore the "active" status and cancel the scheduled deletion. This is only possible within the 2h timelapse (while index status=countdown).

response = delos_client.files_index_restore(new_index_uuid)
print(response)

To restore an index, the parameters are:

ParameterDescriptionExample
index_uuidUnique identifier for the index.1111-1111111-1111-1111

Expected response:

{
  "request_id": "2093f52f-51ac-41ea-8b78-02367906906a",
  "response_id": "dfb7440c-4b88-4cab-bfd4-0d554149ba8e",
  "status_code": 200,
  "status": "success",
  "message": "Index restored successfully",
  "data": {
    "index_uuid": "8400c6e1-a185-4960-bc8a-b24edc74411a",
    "status": "active"
  },
  "error": null,
  "timestamp": "2025-02-19T15:07:25.479522Z",
  "cost": 0.0
}