You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Licensed under the Universal Permissive License v1.0 as shown at http://oss.oracle.com/licenses/upl.
8
8
-->
9
9
10
-
Differently from a common LLM playgrounds, that help to test an LLM on the information on which has been trained on, the OAIM Sandbox works on the chunks retrieved in the Oracle DB 23ai by similarity with the question provided, like in this example:
10
+
Differently from a common LLM playground, that helps to test an LLM on the information on which has been trained on, the OAIM Sandbox works on the chunks retrieved in the Oracle DB 23ai by similarity with the question provided, like in this example:
11
11
12
12

13
13
14
-
The playground could be used with or without the vector stores available, to proof that the pure LLMs configured to be used as completion LLM are aware or not about information you are looking for.
14
+
The playground could be used with or without the vector stores available, to check if a pure LLM configured is aware or not about information you are looking for.
15
15
16
16
You can, first of all:
17
17
18
18
-**Enable History and Context**: in this way any further question & answer provided will be re-sent in the context to help the LLM to answer with a better grounded info, if it's checked;
19
19
-**Clear History**: this button clear the context to better understand the LLM behaviour after a long conversation;
20
20
21
21
## Chat Model
22
-
Depending on the configuration done in the **Configuration**/**Models** page, you can choose one of the **Chat model** enlisted. For each of them you can modify the most important hyper-parameters:
22
+
Depending on the configuration done in the **Configuration**/**Models** page, you can choose one of the **Chat model** enlisted. For each of them you can modify the most important hyper-parameters like:
23
23
- Temperature
24
24
- Maximum Tokens
25
25
- Top P
26
26
- Frequency penalty
27
27
- Presence penalty
28
+
28
29
To understand each of them, refers for example on this document: [Concepts for Generative AI](https://docs.oracle.com/en-us/iaas/Content/generative-ai/concepts.htm).
29
30
30
31
## RAG params
31
32
32
-
Clicking on the **RAG**check box you can quickly turn on/off the knowledge base behind the chatbot, exploiting the Retrieval Augentened Generation pattern implemented into the Sandbox.
33
+
Clicking on the **RAG**checkbox you can quickly turn on/off the knowledge base behind the chatbot, exploiting the Retrieval Augentened Generation pattern implemented into the Oracle AI Microserves Sandbox.
33
34
34
35

35
36
36
37
Then you can set:
37
38
38
39
-**Enable Re-Ranking**: *under development*;
39
-
-**Search Type**: the Search Type reflects the two options available on the **Oracle DB 23ai**:
40
+
-**Search Type**: it reflects the two options available on the **Oracle DB 23ai**:
40
41
-**Similarity search**
41
42
-**Maximal Marginal Relevance**.
42
-
-**Top K**: define the number of nearest chunks found comparing the embedding vector derived by the question with the vectors associated in the vector store with each chunk. Take in consideration that a large number of chunk could fill the maximum context size accepted by the LLM becoming useless the exciding text part.
43
+
-**Top K**: define the number of nearest chunks, found comparing the embedding vector derived by the question with the vectors associated in the vectorstore with each chunk. Take in consideration that a large number of chunk could fill the maximum context size accepted by the LLM becoming useless the text that exceeds that limit.
43
44
44
-
To search one of the vectorstore table created into the DB and use it for the RAG, you could use one or the combination of more than one parameter to filter the desired vectorstore:
45
+
To search and select one of the vectorstore tables created into the DB and use it for the RAG, you could use one, or the combination of more than one parameter adopted in the chunking process, to filter the desired vectorstore:
45
46
46
47
-**Embedding Alias**
47
48
-**Embedding Model**
48
49
-**Chunk Size**
49
50
-**Chunk Overlap**
50
51
-**Distance Strategy**
51
52
52
-
Until this message will not disappear:
53
+
Until the following message will not disappear, it means that the final vectorstore is not yet selected:
53
54
54
55

55
56
56
-
it means that the final vectorstore is not yet selected.
57
-
58
57
The **Reset RAG** button allows you to restart the selection of another vectorestore table.
Thank you for your patience as we work on updating the documentation. Please check back soon for the latest updates.
13
-
{{% /notice %}}
12
+
Once you are satisfied with a specific configuration for your chatbot, the Sandbox allows you to `Download Settings` as they are in a **.json** format.
Licensed under the Universal Permissive License v1.0 as shown at http://oss.oracle.com/licenses/upl.
9
9
-->
10
10
11
-
When using the split/embed functionality of the Sandbox, you have 3 source options to for uploading the documents to be vectorized:
12
-
13
-

11
+
When using the split/embed functionality of the Sandbox, you can use the OCI Object storage. In this page we provide how to configure the OAIM Sandbox to use it.
Copy file name to clipboardExpand all lines: docs/content/sandbox/test_framework/_index.md
+18-17Lines changed: 18 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,52 +6,57 @@ weight = 30
6
6
Copyright (c) 2023, 2024, Oracle and/or its affiliates.
7
7
Licensed under the Universal Permissive License v1.0 as shown at http://oss.oracle.com/licenses/upl.
8
8
-->
9
-
Generating a test dataset Q&A pairs through an external LLM accelerates the massive test phase. The platform provides the integration with a framework designed for this scope, called Giskard, that analyze the document to identify also the high level topics related to the Q&A pairs generated, and include them in the test dataset.
9
+
Generating a test dataset Q&A pairs through an external LLM accelerates the massive test phase. The platform provides the integration with a framework designed for this scope, called Giskard, that analyze the document to identify also the high level topics related to the Q&A pairs generated, and include them into the test dataset.
10
10
11
11

12
12
13
13
The generation phase is optional, but normally very welcome to reduce the cost of the Proof-of-concepts, since requires a huge human effort.
14
14
15
+
Then, the questions are asked to the agent configured, collecting each answer provided and compared with the correct answer through an LLM, elected as judge, that classifies them and provides a justification for the positive or negative response, in a process described in the following picture.
16
+
15
17

16
18
17
-
Then, the Questions are asked to the agent configured, collecting each answer provided and compared with the correct answer through an LLM elected as judge that classifies them and provides a justification for the positive or negative response, in a process described in the following picture.
18
19
19
20
## Test Framework page
20
-
From the left side menu, you will access to the page on which selecting the **Generate new Test Dataset** you can upload as many pdf documents you want by which will be extracted contexts to be used to generate a defined number of Q&A, as shown in the following snapshot:
21
+
From the left side menu, you will access to the page on which, selecting the **Generate new Test Dataset**, you can upload as many pdf documents you want by which will be extracted contexts to be used to generate a defined number of Q&A, as shown in the following snapshot:
21
22
22
23

23
24
24
-
You can choose any of the models available to perform a Q&A generation process, since you could be interested to use an high profile, expensive model for the crucial dataset generation to evaluate the RAG app, eventually with a cheaper llm model in production. This phase not only generate the number of Q&A you need, but it will analyze the documents provided extracting a set of topics that could help to classify the questions generated and determine the improvement area.
25
+
You can choose any of the models available to perform a Q&A generation process, since you could be interested to use an high profile, expensive model for the crucial dataset generation to evaluate the RAG app, eventually with a cheaper llm model to put in production as chat model. This phase not only generate the number of Q&A you need, but it will analyze the document provided extracting a set of topics that could help to classify the questions generated and can help to find the area to be improved.
25
26
26
27
When the generation is over (it could takes time), as shown in the following snapshot:
27
28
28
29

29
30
30
-
you could:
31
+
you can:
31
32
32
33
* exclude a Q&A: clicking **Hide** you’ll drop the question from the final dataset if you consider it not meaningful;
33
-
* modify the text of **question** and the **Reference answer**: if you are not agree, you can updated the raw text generated, according the **Reference context** that is it fixed, like the **Metadata**.
34
-
After your updates, you could download the dataset to store for next test sessions.
34
+
* modify the text of the **question** and the **Reference answer**: if you are not agree, you can updated the raw text generated, according the **Reference context** that is it fixed, like the **Metadata**.
35
+
After your updates, you can download the dataset to store it for next test sessions.
35
36
36
-
Anyway, the generation process it’s optional. If you have already prepared a JSONL file with your Q&A, according this schema:
37
+
Anyway, the generation process it’s optional. If you already have prepared a JSONL file with your Q&A, according this schema:
37
38
38
39
***id**: an alphanumeric unique id like ”2f6d5ec5–4111–4ba3–9569–86a7bec8f971";
39
40
***question**
40
41
***reference_answer**: an example of answer considered right;
41
42
***reference_context**: a piece of document by which has been extracted the question;
42
43
***conversation_history**: it’s an array empty [], at the moment not evaluated;
43
-
***metadata**: a include nested json doc with extra info to be used for analytics aim, and must include the following fields: **question_type****[simple|complex]**, the **seed_document_id** (numeric), **topic**.
44
+
***metadata**: a include nested json doc with extra info to be used for analytics aim, and must include the following fields:
45
+
-**question_type****[simple|complex]**;
46
+
-**seed_document_id** (numeric);
47
+
-**topic**.
48
+
44
49
you can simply upload it as shown here:
45
50
46
51

47
52
48
-
If you need an example, let’s generate just one Q&A and download it, and add your own Q&As.
53
+
If you need an example, let’s generate just one Q&A and download it, and add to your own Q&As Test Dataset.
49
54
50
-
At this point, if you have generated or you will use an existing test dataset, you will run the overall test on a configuration currently selected on the left side:
55
+
At this point, if you have generated or you will use an existing test dataset, you can run the overall test on a configuration currently selected on the left side:
51
56
52
57

53
58
54
-
The top part is related to the LLM are you going to use for chat generation, and includes the most relevant hyper-parameters to use in the call. The lower part it’s related to the Vector Store used in which, apart the **Embedding Model**, **Chunk Size**, **Chunk Overlap** and **Distance Strategy**, that are fixed and coming from the **Split/Embed** process you have to perform before, you can modify:
59
+
The top part is related to the LLM are you going to be used for chat generation, and it includes the most relevant hyper-parameters to use in the call. The lower part it’s related to the Vector Store used in which, apart the **Embedding Model**, **Chunk Size**, **Chunk Overlap** and **Distance Strategy**, that are fixed and coming from the **Split/Embed** process you have to perform before, you can modify:
55
60
56
61
***Top K**: how many chunks should be included in the prompt’s context from nearer to the question found;
57
62
***Search Type**: that could be Similarity or Maximal Marginal Relevance. The first one is it commonly used, but the second one it’s related to an Oracle DB23ai feature that allows to exclude similar chunks from the top K and give space in the list to different chunks providing more relevant information.
@@ -60,7 +65,7 @@ At the end of the evaluation it will be provided an **Overall Correctness Score*
60
65
61
66

62
67
63
-
Moreover, a percentage by topics, the list of failures and the full list of Q&As evaluated. To each Q&A included into the test dataset, will be added:
68
+
Moreover, a percentage by topics, the list of failures and the full list of Q&As will be evaluated. To each Q&A included into the test dataset, will be added:
64
69
65
70
***agent_answer**: the actual answer provided by the RAG app;
66
71
***correctness**: a flag true/false that evaluates if the agent_answer matches the reference_answer;
@@ -69,7 +74,3 @@ Moreover, a percentage by topics, the list of failures and the full list of Q&As
69
74
The list of **Failures**, **Correctness by each Q&A**, as well as a **Report**, could be download and stored for future audit activities.
70
75
71
76
*In this way you can perform several tests using the same curated test dataset, generated or self-made, looking for the best performance RAG configuration*.
Licensed under the Universal Permissive License v1.0 as shown at http://oss.oracle.com/licenses/upl.
9
9
-->
10
10
11
-
An important key factor that influences the quality of answers depends from the prompt provided to the LLM, that includes the context information. To customize and test the effect, it’s available the **Prompts** voice of menu that offers pre-configured prompt templates that could be customized and associated to the RAG system.
11
+
An important key factor that influences the quality of answers it depends from the prompt provided to the LLM, that includes the context information. To customize and test the effect, it’s available the **Prompts** voice of menu that offers pre-configured list of prompt templates that could be customized and associated to the RAG system.
12
12
13
13

14
14
15
15
There are three options available:
16
16
-**Basic Example** : it is automatically paired with the no-rag, pure LLM chatbot configuration;
17
-
-**RAG Example** : it is automatically paired with the RAG checkbox set to True;
17
+
-**RAG Example** : it is automatically paired if the RAG checkbox set to True;
18
18
-**Custom** : it's applied to any RAG/no-RAG configuration.
0 commit comments