You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+18-11Lines changed: 18 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,16 +48,17 @@ conda activate sherpa
48
48
```
49
49
50
50
### Install Sherpa
51
-
This artifact uses the [Sherpa](https://github.com/Aggregate-Intellect/sherpa) to for the use cases. Specifically, it uses a slightly customized version of Sherpa v0.4.0, which is included in the `sherpa` folder in this repository. You can install Sherpa from the source code in this repository.
52
-
To install Sherpa from the source code, first, install with [poetry](https://python-poetry.org/).
53
-
```bash
54
-
pip install poetry
55
-
```
51
+
This artifact uses the [Sherpa](https://github.com/Aggregate-Intellect/sherpa) to for the use cases. Specifically, it uses a slightly customized version of Sherpa v0.4.0, which is included in the `sherpa` folder in this repository. You can install Sherpa from the source code with `pip` edit mode in this repository.
52
+
53
+
>[!NOTE]
54
+
>
55
+
> The following step is optional, as it has already been configured in the `requirements.txt` files in each use case folder. However, if you experience any issues with the installation from the `requirements.txt` files, remove the first line of the `requirements.txt` file in the use case folder and run the following commands to install Sherpa.
56
56
57
-
Then, you can run the following commands:
57
+
To install Sherpa from the source code, run the following commands in the top-level of the directory:
58
58
```bash
59
59
cd sherpa/src
60
-
poetry install --with optional
60
+
pip install -e .
61
+
cd ../..
61
62
```
62
63
63
64
### Install Dependencies
@@ -70,17 +71,23 @@ This repository uses several APIs for accessing the Large Language Models. You n
70
71
71
72
## Use cases
72
73
> [!NOTE]
73
-
> Excepting installing Sherpa, all the instructions for the use cases must be executed in the corresponding use case folder.
74
+
>
75
+
> All the instructions for the use cases must be executed in the corresponding use case folder.
74
76
75
77
the following folder contains material for each use case used in the paper:
76
-
*`human_eval` contains the material for the HumanEval benchmark for the code generation use case
77
-
*`clevr-human` contains the material for the Clevr-Human dataset for the question answering use case
78
-
*`state_based_modeling` contains the material for the class name generation use case
78
+
*`human_eval` contains the material for the HumanEval benchmark for the **code generation** use case
79
+
*`clevr-human` contains the material for the Clevr-Human dataset for the **question answering** use case
80
+
*`state_based_modeling` contains the material for the **class name generation** use case
79
81
80
82
Please refer the `README.md` in each folder for the details of the use case and how to run the experiments.
81
83
82
84
Each use case contains a `evaluation.ipynb` notebook that contains the steps to use generated results to create tables and figures in the paper.
83
85
86
+
## A Note on Other LLMs
87
+
The use cases in this repository is tested for the following LLMs: GPT-4o, GPT-4o-mini, Qwen/Qwen2.5-7B-Instruct-Turbo, and Meta-Llama-3.1-70B-Instruct-Turbo. However, you can use other LLMs by using the wrappers from LangChain.
88
+
89
+
Some new LLMs may require upgrade the LangChain version. For example, the latest `gpt-4.1-nano` requires to update `langchain_openai` with `pip install -U langchain_openai`. While newer version of the dependency may work, this repository is only tested with the specific versions used in the requirements of each use case.
90
+
84
91
## Citation
85
92
If you found this repository useful, please consider citing the following paper:
Copy file name to clipboardExpand all lines: clevr-human/README.md
+23-15Lines changed: 23 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,14 @@
1
1
# Sherpa for Clevr Human (The Question Answering Use Case)
2
2
3
+
> [!NOTE]
4
+
>
5
+
> The following command assumes you are in the `clevr-human` folder.
6
+
3
7
Designing a state machine for solving the question answering ask in the Clevr-Human dataset.
4
8
5
9
## Organization
6
10
The use case is organized as follows:
7
-
*`clevr_qa` contains the code implementation for the use case
11
+
*`clevr_qa` contains the code implementation for the use case. Specifically, it includes the implementation of the following approaches in the paper:
8
12
* The folder `react` contains the implementation of the ReACT approach
9
13
* The folder `routing` contains the implementation of the routing state machine approach
10
14
* The folder `state_machine` contains the implementation of the planning state machine approach
@@ -36,30 +40,27 @@ The use case is organized as follows:
36
40
# For conda
37
41
conda activate clevr
38
42
```
39
-
40
-
2. Install `sherpa` following the top-level Read
41
-
3. Install the requirements:
43
+
2. Install the requirements:
42
44
```bash
43
45
pip install -r requirements.txt
44
46
```
45
47
46
48
## Create Dataset
47
-
48
-
1. download the Download CLEVR v1.0 (no images) from [the Clevr website](https://cs.stanford.edu/people/jcjohns/clevr/)
49
-
2. put the `CLEVR_val_scenes.json` file to the `data` folder
50
-
3. Download human created questions from [here](https://cs.stanford.edu/people/jcjohns/iep/)
51
-
4. Put the `CLEVR-Humans-val` file to the `data` folder
52
-
5. Run `scripts/create_dataset.py` to create the dataset
53
-
6. This script will push the dataset to HuggingFace. Update the `--dataset_name` argument when running the experiments to your dataset name
54
-
55
-
7. The processed dataset is also available on [huggingface](https://huggingface.co/datasets/Dogdays/clevr_subset)
49
+
1. Download the dataset using the `download_datasets.sh` script,or follow the following manual instructions:
50
+
1. download the Download CLEVR v1.0 (no images) from [the Clevr website](https://cs.stanford.edu/people/jcjohns/clevr/)
51
+
2. put the `CLEVR_val_scenes.json` file to the `data` folder
52
+
3. Download human created questions from [here](https://cs.stanford.edu/people/jcjohns/iep/)
53
+
4. Put the `CLEVR-Humans-val.json` file to the `data` folder
54
+
2. Run `python -m scripts.create_dataset` to create the dataset
55
+
3. This script will push the dataset to HuggingFace. Update the `--hg_dataset_name` argument when running the experiments to your dataset name. To update the dataset, you may need to login HuggingFace from the command line using `huggingface-cli login` command. Please refer this [link](https://huggingface.co/docs/huggingface_hub/en/quick-start#authentication) for more details.
56
+
4. The processed dataset is also available on [huggingface](https://huggingface.co/datasets/Dogdays/clevr_subset)
56
57
57
58
## Setup the Environment Variables
58
59
Create a `.env` file and copy the content of `.env_template` to it. Then, set the `OPENAI_API_KEY` and `TOGETHER_API_KEY` variables to your OpenAI and TogetherAI API keys, respectively.
59
60
60
61
## Run Question Answering
61
62
62
-
Run the `python -m scripts.run_qa ` command to run the question answering task. The command has several arguments to control the behavior of the script. Use the `--help` argument to see the available options:
63
+
Run the `python -m scripts.run_qa ` command to run the question answering task. The command has several arguments to control the behavior of the script. Use the `--help` argument to see the available arguments:
63
64
64
65
***-h, --help**: show this help message and exit
65
66
***--dataset_name**: Name of the processed dataset on HuggingFace. Default is `Dogdays/clevr_subset`. You normally don't need to change this unless you have created your own dataset.
@@ -104,4 +105,11 @@ The evaluation steps are included in the `evaluation.ipynb` notebook to create t
104
105
jupyter notebook evaluation.ipynb
105
106
```
106
107
107
-
Execute all the cells in the notebook to generate the tables and figures.
108
+
Execute all the cells in the notebook to generate the tables and figures.
109
+
110
+
111
+
## Troubleshooting
112
+
* If you encountered `Unauthorized` error while access the dataset uploaded, make sure you have logged in to HuggingFace using the `huggingface-cli login` command.
113
+
* If you encounter a `BadRequestError` error about data type of the dataset while running the question answering task. Please compare the dataset you are using with the pre-processed dataset in this [link](https://huggingface.co/datasets/Dogdays/clevr_subset) and make sure the dataset is in the same format.
114
+
* If you encounter any `ModuleNotFoundError` error, make sure you have installed the requirements in the `requirements.txt` file and currently in the `clevr-human` folder.
115
+
* Also make sure that you have set the environment variables in the `.env` file correctly, especially the `OPENAI_API_KEY` and `TOGETHER_API_KEY` variables.
Copy file name to clipboardExpand all lines: human_eval/README.md
+57-6Lines changed: 57 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,8 @@
1
1
# Sherpa for HumanEval (The Code Generation Use Case)
2
+
> [!NOTE]
3
+
>
4
+
> The following command assumes you are in the `human_eval` folder.
5
+
2
6
> [!Warning]
3
7
>
4
8
> While this use case can also be run using a virtual environment, it is highly recommended to run it in a Docker container or similar isolated environment because the experiments may **execute arbitrary code generated by LLMs**. Below we describe the steps using Jupyter lab in a docker container. Use the virtual environment only if you are aware of the risks and have taken necessary precautions.
@@ -7,7 +11,7 @@
7
11
The use case is organized as follows:
8
12
*`run_programs.py`: The main script to run the code generation experiments
9
13
*`evaluation.ipynb`: The evaluation notebook to create tables and figures in the paper for the HumanEval use case
10
-
`llm_coder` contains the code implementation for the use case
14
+
*`llm_coder` contains the code implementation for the use case. Specifically, it includes the implementation of the following approaches in the paper:
11
15
*`agent_coder_improved` folder contains the implementation of the agent coder approach
12
16
*`test_based_sm_with_feedback` folder contains the implementation of the test-based state machine approach
13
17
*`coders/direct_prompt_coder.py` contains the implementation of the direct prompt approach
@@ -17,13 +21,18 @@ The use case is organized as follows:
17
21
*`requirements.txt` contains the requirements to run the use case
18
22
*`Dockerfile` and `docker-compose.yml` contains the Dockerfile to run the use case in a Docker container
19
23
20
-
## Installation
24
+
## Installation Preparation
21
25
1. First, download the code for the human_eval benchmark from: https://github.com/openai/human-eval
22
26
```bash
23
27
git clone https://github.com/openai/human-eval
24
28
```
25
-
2. Install [Docker](https://docs.docker.com/get-started/overview/) if you haven't already. You can follow the instructions on the Docker website for your operating system.
26
-
3. Build the Docker image using the provided Dockerfile and docker compose (Note that the build context is the root directory of this repository, as specified by `..`).
29
+
***NOTE:** Make sure you place the `human-eval` project under the same folder as this README file, i.e., the `human_eval` folder of the repository.
30
+
2. Then, copy the `human_eval` folder from this repository to the `human_eval` folder you just cloned.
31
+
32
+
33
+
## Installation with Docker (Recommended)
34
+
1. Install [Docker](https://docs.docker.com/get-started/overview/) if you haven't already. You can follow the instructions on the Docker website for your operating system.
35
+
2. Build the Docker image using the provided Dockerfile and docker compose (Note that the build context is the root directory of this repository, as specified by `..`).
3. Open the Jupyter lab in your browser with `http://localhost:8888`, and provide the value of the `token` from the terminal output. then you will be able to run the experiments in the subsequent steps.
43
52
53
+
## Installation with Virtual Environment
54
+
> [!Warning]
55
+
> This use case may run arbitrary code generated by LLMs. It is highly recommended to run it in a Docker container or similar isolated environment. Use the virtual environment only if you are aware of the risks and have taken necessary precautions.
56
+
57
+
1. Create a new virtual environment for this use case (not required, but recommended)
58
+
59
+
```bash
60
+
# For venv
61
+
python -m venv humaneval
62
+
63
+
# For conda
64
+
conda create -n humaneval python=3.12
65
+
```
66
+
67
+
Activate the virtual environment:
68
+
69
+
```bash
70
+
# For venv
71
+
source humaneval/bin/activate
72
+
# For conda
73
+
conda activate humaneval
74
+
```
75
+
76
+
2. Install the requirements:
77
+
```bash
78
+
pip install -r requirements.txt
79
+
```
80
+
81
+
4. Install the `human_eval` benchmark:
82
+
```bash
83
+
pip install -e human_eval
84
+
```
85
+
86
+
44
87
## Setup the Environment Variables
45
88
Create a `.env` file and copy the content of `.env_template` to it. Then, set the `OPENAI_API_KEY` and `TOGETHER_API_KEY` variables to your OpenAI and TogetherAI API keys, respectively.
46
89
47
90
48
91
## Run the Code Generation
49
-
First, open a new terminal in the Jupyter lab interface. Then, you can run the `run_programs.py` script to generate code using LLMs. This script contains the following commands:
92
+
First, open a new terminal in the Jupyter lab interface. Then, you can run the `run_programs.py` script to generate code using LLMs. This script contains the following arguments:
50
93
***-h,--help**: show this help message and exit
51
94
***--llm_family**: Provider of the model to use. One of {openai, togetherai}
52
95
***--llm_model**: Name of the LLM to use. The paper uses the following: gpt-4o (openai), gpt-4o-mini (openai), Qwen/Qwen2.5-7B-Instruct-Turbo (together), Qwen/Qwen2.5-7B-Instruct-Turbo (together) and Qwen/Qwen2.5-Coder-32B-Instruct (together). You can also use other LLMs from the two providers.
@@ -81,4 +124,12 @@ The `results` folder contains the cached 3-run results of the approaches used in
81
124
The result will generate a `jsonl` file with the generated code and the number of LLM calls made to generate the code. Each line in the file will contain a JSON object.
82
125
83
126
### Evaluation
84
-
The evaluation steps are included in the `evaluation.ipynb` notebook to create tables and figures in the paper for the HumanEval use case. You can open the notebook in Jupyter lab and execute all the cells to generate the tables and figures.
127
+
#### With Docker
128
+
The evaluation steps are included in the `evaluation.ipynb` notebook to create tables and figures in the paper for the HumanEval use case. You can open the notebook in Jupyter lab and execute all the cells to generate the tables and figures.
129
+
130
+
#### With virtural environment
131
+
```bash
132
+
jupyter notebook evaluation.ipynb
133
+
```
134
+
135
+
Execute all the cells in the notebook to generate the tables and figures.
Copy file name to clipboardExpand all lines: state_based_modeling/README.md
+8-4Lines changed: 8 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,15 @@
1
1
# Sherpa for Modeling (The Class Name Generation Use Case)
2
+
3
+
> [!NOTE]
4
+
>
5
+
> The following command assumes you are in the `state_based_modeling` folder.
6
+
2
7
Designing state machines for solving the class name generation task in the Modeling dataset
3
8
4
9
## Organization
5
10
The use case is organized as follows:
6
11
*`evaluation` contains the code implementation for the evaluation of the use case
7
-
*`modeling` contains the code implementation for the use case
12
+
*`modeling` contains the code implementation for the use case. Specifically, it includes the implementation of the following approaches in the paper:
8
13
* The file `model_class.py` contains the implementation of the Inspect state machine
9
14
* The file `model_class_mig.py` contains the implementation of the MIG state machine
10
15
*`ground_truth` contains the ground truth model for the dataset
@@ -37,8 +42,7 @@ The use case is organized as follows:
37
42
conda activate modeling
38
43
```
39
44
40
-
2. Install `sherpa` following the top-level Read
41
-
3. Install the requirements:
45
+
2. Install the requirements:
42
46
```bash
43
47
pip install -r requirements.txt
44
48
```
@@ -68,7 +72,7 @@ To repeat the experiments in the paper, run each LLM three times with the same c
68
72
The command will output a `txt` file for each class name problem, containing the class name generated by the LLM.
69
73
70
74
### Run the State Machine Approaches
71
-
Run `scripts.sm_main.py` to generate class names using the state machine approaches. It contains the following commands:
75
+
Run `scripts.sm_main.py` to generate class names using the state machine approaches. It contains the following arguments:
72
76
***-h, --help**: show this help message and exit
73
77
***--model_type**: Provider of the model to use. One of {openai, together}
74
78
***--llm**: Name of the LLM to use. The paper uses the following: gpt-4o (openai), gpt-4o-mini (openai), Qwen/Qwen2.5-7B-Instruct-Turbo (together), Qwen/Qwen2.5-7B-Instruct-Turbo (together) and Meta-Llama-3.
0 commit comments