You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Setup and configure the document store that is pertinent to the RAG
1
+
Setup and configure the Vector store that is pertinent to the File search
2
2
pipeline:
3
3
4
4
* Create a vector store from the document IDs you received after uploading your
@@ -8,23 +8,18 @@ pipeline:
8
8
the cumulative size reaches 30 MB of documents given to upload to a vector store
9
9
or the document count reaches 200 files in a batch, whichever limit is hit first.
10
10
*[Deprecated] Attach the Vector Store to an OpenAI
11
-
[Assistant](https://platform.openai.com/docs/api-reference/assistants). Use
11
+
[Assistant]. Use
12
12
parameters in the request body relevant to an Assistant to flesh out
13
13
its configuration. Note that an assistant will only be created when you pass both
14
14
"model" and "instruction" in the request body otherwise only a vector store will be
15
15
created from the documents given.
16
16
17
-
If any one of the LLM service interactions fail, all service resources are
18
-
cleaned up. If an OpenAI vector Store is unable to be created, for example,
19
-
all file(s) that were uploaded to OpenAI are removed from
20
-
OpenAI. Failure can occur from OpenAI being down, or some parameter
21
-
value being invalid. It can also fail due to document types not being
22
-
accepted. This is especially true for PDFs that may not be parseable.
17
+
If any step in the LLM service interaction fails, all previously created resources are cleaned up automatically. For example, if the vector store creation fails, any files already uploaded to OpenAI are removed. Failures can be caused by service downtime, invalid parameter values, or unsupported document types — the latter is especially common with PDFs that cannot be parsed.
23
18
24
-
In the case of Openai, Vector store/assistant will be created asynchronously.
19
+
The Vector store/assistant will be created asynchronously.
25
20
The immediate response from this endpoint is `collection_job` object which is
26
21
going to contain the collection "job ID" and status. Once the collection has
27
22
been created, information about the collection will be returned to the user via
28
23
the callback URL. If a callback URL is not provided, clients can check the
29
24
`collection job info` endpoint with the `job_id`, to retrieve
Remove a collection from the platform. This is a two step process:
1
+
Remove a collection from the platform.
2
+
3
+
This is a two step process:
2
4
3
5
1. Delete all resources that were allocated: file(s), the Vector
4
6
Store, and the Assistant.
5
7
2. Delete the collection entry from the kaapi database.
6
8
7
9
No action is taken on the documents themselves: the contents of the
8
10
documents that were a part of the collection remain unchanged, those
9
-
documents can still be accessed via the documents endpoints. The response from this
10
-
endpoint will be a `collection_job` object which will contain the collection `job_id` and
11
-
status. When you take the id returned and use the `collection job info` endpoint,
11
+
documents can still be accessed via the documents endpoints. The endpoint returns the job ID and status of the collection delete operation. When you take the id returned and use the `collection job info` endpoint,
12
12
if the job is successful, you will get the status as successful.
13
13
Additionally, if a `callback_url` was provided in the request body,
14
14
you will receive a message indicating whether the deletion was successful or if it failed.
Retrieve detailed information about a specific collection by its collection id. This endpoint returns the collection object including its project, organization, timestamps, and service-specific details.
1
+
Retrieve detailed information about a specific collection by its collection id.
2
2
3
3
**Response Fields:**
4
4
5
-
**Note:** While the API schema shows both `llm_service_id`/`llm_service_name` AND `knowledge_base_id`/`knowledge_base_provider`, the actual response will only include the fields relevant to what was created:
5
+
**Note:** While the example response shows both `llm_service_id`/`llm_service_name` AND `knowledge_base_id`/`knowledge_base_provider`, the actual response will only include the fields relevant to what was created:
6
6
7
7
-**If an Assistant was created** (with model + instructions): The response will only include `llm_service_id` and `llm_service_name`
8
8
-**If only a Vector Store was created** (without model/instructions): The response will only include `knowledge_base_id` and `knowledge_base_provider`
@@ -11,4 +11,4 @@ Retrieve detailed information about a specific collection by its collection id.
11
11
12
12
If the `include_docs` flag in the parameter is true then you will get a list of document IDs associated with a given collection as well. Note that, documents returned are not only stored by Kaapi, but also by Vector store provider.
13
13
14
-
Additionally, if you set the `include_url` parameter to true, a signed URL will be included in the response, which is a clickable link to access the retrieved document. If you don't set it to true, the URL will not be included in the response.
14
+
Additionally, if you set the `include_url` parameter to true, a signed URL will be included in the response, which is a clickable link to access the retrieved document(s) of the collection you have retrieved. If you don't set it to true, the URL will not be included in the response.
Copy file name to clipboardExpand all lines: backend/app/api/docs/collections/job_info.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
-
Retrieve information about a collection job by the collection job ID. This endpoint provides detailed status and metadata for a specific collection job in Kaapi. It is especially useful for:
1
+
Retrieve information about a collection job by the collection job ID.
2
2
3
-
* Fetching the collection job object, including the collection job ID, the current status, and the associated collection details.
3
+
This endpoint is especially useful for:
4
+
5
+
* Fetching the collection job information, including the collection job ID, the current status, and the associated collection details.
4
6
5
7
* If the job has finished, has been successful and it was a job of creation of collection then this endpoint will fetch the associated collection details.
Copy file name to clipboardExpand all lines: backend/app/api/docs/collections/list.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@ List all _active_ collections that have been created and are not deleted.
2
2
3
3
**Response Fields:**
4
4
5
-
**Note:** While the API schema shows both `llm_service_id`/`llm_service_name` AND `knowledge_base_id`/`knowledge_base_provider`, each collection in the response will only include the fields relevant to what was created:
5
+
**Note:** While the example response shows both `llm_service_id`/`llm_service_name` AND `knowledge_base_id`/`knowledge_base_provider`, each collection in the response will only include the fields relevant to what was created:
6
6
7
-
-**If an Assistant was created** (with model + instructions): The response will only include `llm_service_id` and `llm_service_name` (e.g., `llm_service_name: "gpt-4o"` and the assistant ID)
8
-
-**If only a Vector Store was created** (without model/instructions): The response will only include `knowledge_base_id`and `knowledge_base_provider` (e.g., `knowledge_base_provider: "openai vector store"` and the vector store ID)
7
+
-**If an Assistant was created** (with model + instructions): The response will only include `llm_service_id`(the assistant ID) and `llm_service_name` (e.g., `llm_service_name: "gpt-4o"` and the assistant ID)
8
+
-**If only a Vector Store was created** (without model/instructions): The response will only include `knowledge_base_id`(the vector store ID) and `knowledge_base_provider` (e.g., `knowledge_base_provider: "openai vector store"`)
document invisible. It does not delete the document from cloud storage
3
-
or its information from the database.
1
+
Perform a delete of the document.
4
2
5
-
If the document is part of an active collection, those collections
6
-
will be deleted using the collections delete interface. Noteably, this
7
-
means all OpenAI Vector Store's and Assistant's to which this document
8
-
belongs will be deleted.
3
+
This makes the document invisible. It does not delete the document
4
+
from cloud storage or its information from the database.
5
+
6
+
If the document belongs to any active collections, those collections will also be deleted. This includes all associated knowledge bases — for example, any OpenAI vector stores that were created through this platform with this document.
This operation marks the document as deleted in the database while retaining its metadata. However, the actual file is
2
4
permanently deleted from cloud storage (e.g., S3) and cannot be recovered. Only the database record remains for reference
3
5
purposes.
4
6
5
-
If the document is part of an active collection, those collections
6
-
will be deleted using the collections delete interface. Noteably, this
7
-
means all OpenAI Vector Store's and Assistant's to which this document
8
-
belongs will be deleted.
7
+
If the document belongs to any active collections, those collections will also be deleted. This includes all associated knowledge bases — for example, any OpenAI vector stores that were created through this platform with this document.
Copy file name to clipboardExpand all lines: backend/app/api/docs/documents/upload.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,15 +4,15 @@ Upload a document to Kaapi.
4
4
- If a target format is specified, a transformation job will also be created to transform document into target format in the background. The response will include both the uploaded document details and information about the transformation job.
5
5
- If a callback URL is provided, you will receive a notification at that URL once the document transformation job is completed.
6
6
7
-
### Supported Transformations
7
+
### Supported Transformations:
8
8
9
-
The following (source_format → target_format) transformations are supported:
9
+
The following (source_format → target_format) transformations are supported for now:
10
10
11
11
- pdf → markdown
12
12
- zerox
13
13
14
-
### Transformers
14
+
### Transformers:
15
15
16
-
Available transformer names and their implementations, default transformer is zerox:
16
+
Available transformer names and their implementations, default transformer is zerox for now:
0 commit comments