TestChain

LLM-powered multi-agent framework with code-assisted reasoning for text-to-testcase generation

The image below shows the architecture of Testchain.

The image below shows an example of the code-assisted reasoning.

Dataset

All datasets used in the experiment can be found here.

README

Install

Install Python pkgs

pip install -r requirements.txt

Run

Run Approaches

Entry point

main.py

Args

args	description
--config	Path of the config file
--mode	Run mode, one of `TestAgent`, `TestChain`
--prompt_type	Prompt type, support `0-shot` (CodeT-TG) and `1-shot` (Reflexion-TG), `py_inter` (TestChain)
--api_key	OpenAI or DeepInfra API key
--base_url	`https://api.openai.com/v1` for OpenAI and `https://api.deepinfra.com/v1/openai` for DeepInfra

For example:

python main.py \
--config 'config/leetcode-hard/config-gpt4o.json' \
--mode 'TestChain' \
--prompt_type 'py_inter' \
--api_key 'xxx' \
--base_url 'xxx'

And the directory for the run results will be result/leetcode-hard/gpt4o/TestChain_py_inter.

Count Result

Entry point

count.py

Args

args	description
--base_dir	Result directory
--max_nums	Maximum number of assert statements retained for each problem, set to $10$ in our experiments

For example:

python count.py \
--base_dir 'result/leetcode-hard/gpt4o/TestChain_py_inter' \
--start 0 \
--end 39 \
--max_nums 10

Count Coverage

Entry point

count_coverage.py

Args

args	description
--dataset_path	Path of the dataset file
--base_dir	Result directory
--max_nums	Maximum number of assert statements retained for each problem, set to $10$ in our experiments
--time_limit	Maximum seconds for a single question. Due to coverage being implemented using Python `sys.settrace`, a more lenient time limit is needed. Set to $10$ in our experiments.

For example:

python count_coverage.py \
--dataset_path 'data/leetcode-hard-wo-examples.jsonl' \
--base_dir 'result/leetcode-hard/TestChain_py_inter' \
--start 0 \
--end 39 \
--max_nums 10 \
--time_limit 10

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
code_datasets		code_datasets
code_evaluator		code_evaluator
code_generator		code_generator
code_models		code_models
config		config
data		data
figures		figures
methods		methods
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
codet.py		codet.py
count.py		count.py
count_coverage.py		count_coverage.py
extract_inputs.py		extract_inputs.py
generate_ablation.py		generate_ablation.py
main.py		main.py
pytester_model.py		pytester_model.py
reflexion.py		reflexion.py
requirements.txt		requirements.txt
sampling.py		sampling.py
sampling_filtering.py		sampling_filtering.py
show_code.py		show_code.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TestChain

Dataset

Install

Run

Run Approaches

Count Result

Count Coverage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

lbaf23/testchain

Folders and files

Latest commit

History

Repository files navigation

TestChain

Dataset

Install

Run

Run Approaches

Count Result

Count Coverage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages