git clone --recurse-submodules https://github.com/casys-kaist/LLMServingSim.git
cd LLMServingSimConda can be downloaded from the following link.
curl -O https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Linux-x86_64.sh
bash Anaconda3-2024.06-1-Linux-x86_64.sh
conda env create -p ./env -f ./environment.yml
conda activate ./envconda create -n env_name python=3.9
conda activate env_name
conda install conda-forge::libprotobuf=3.6.1
conda install conda-forge::cmake=3.15
conda install cctbx202208::boost-cpp=1.74.0
pip install -r requirements.txtCommon issues while building ASTRA-Sim. If error regarding version of protoc happens see here.
cd astra-sim
./build/astra_analytical/build.sh
cd extern/graph_frontend/chakra
pip install .
cd ../../../../execution_engine/polymath
pip install .
cd ../..Config & Dataset Path:
- Network config path:
astra-sim/inputs/network/analytical/{config_name}.json - NPU config path:
execution_engine/codelets_src/codelets/examples/genesys/configs/{config_name}.json - Dataset path:
astra-sim/dataset/{dataset_name}.tsv
Test Run
python3 -u main.py --model_name 'gpt3-6.7b' --npu_num 1 --npu_group 1 --npu_mem 24 --dataset 'dataset/share-gpt-req100-rate10.tsv'
python3 -u main.py --model_name 'llama-7b' --npu_num 1 --npu_group 1 --npu_mem 24 --dataset 'dataset/share-gpt-req100-rate10.tsv'| Parameters | Supporting Options | Default Value | Notes |
|---|---|---|---|
| model_name | 'gpt2', 'gpt3-6.7b', 'gpt3-125m', 'gpt3-350m', 'gpt3-760m', 'gpt3-1.3bm', 'gpt3-2.7b', 'gpt3-6.7b', 'gpt3-13b', 'gpt3-30b', 'gpt3-175b', 'opt-125m', 'opt-350m', 'opt-1.3b', 'opt-2.7b', 'opt-2.7b', 'opt-6.7b', 'opt-13b', 'opt-30b', 'opt-66b', 'opt-175b', 'llama-7b', 'llama-30b', 'llama-70b' | 'gpt2' | |
| npu_num | Integer | 16 | |
| max_batch | Integer | 0 | 0: no limit |
| batch_delay | Integer | 0 | |
| scheduling | 'none', 'orca' | 'orca' | |
| parallel | 'pipeline', 'tensor', 'hybrid' | 'hybrid' | |
| npu_group | Integer | 1 | |
| npu_mem | Integer | 40 | |
| kv_manage | 'max', 'pow2', 'oracle', 'vllm' | 'vllm' | |
| block_size | Integer | 8 | |
| pim_type | 'none', 'local', 'pool' | 'none' | |
| sub_batch | Flag | False | Sub-batch Scheduling On/Off |
| dataset | Dataset Path | None | None: manually add requests in main.py |
| network | JSON File Name | None | None: following convention "fully_connected_{network_dim}d_{number_of_NPUs}d.json" |
| output | Output TSV Path | None | None: no tsv output only stdout |
| gen | Flag | False | Skip initiation phase On/Off |
| fast_run | Flag | False | Skip all compilation and force to use cached trace for fast simulation |
In all outputs, the unit of throughput is tokens/second, and the unit of simulation time is milliseconds.
The standard output shows which requests are being processed in each iteration of the simulator and displays the measured throughput at regular intervals. Additionally, it provides a summary of throughput and simulation time at the end.
{output_filename}-throughput.tsv contains the prompt and generation throughput at each interval.
{output_filename}-simulation-time.tsv contains the simulation time of each components.
cd evaluation./evaluation1.sh
./evaluation2.sh
...
./evaluation5.sh./evaluation_all.shFor detailed information about the evaluation, please refer to the README file in the evaluation folder.
If your error is similar to this, you can use the below solution.
/home/<user>/LLMServingSim/astra-sim/extern/graph_frontend/chakra/et_def/et_def.pb.h:17:2: error: #error This file was generated by an older version of protoc which is
17 | #error This file was generated by an older version of protoc which is
| ^~~~~
/home/<user>/LLMServingSim/astra-sim/extern/graph_frontend/chakra/et_def/et_def.pb.h:18:2: error: #error incompatible with your Protocol Buffer headers. Please
18 | #error incompatible with your Protocol Buffer headers. Please
| ^~~~~
/home/<user>/LLMServingSim/astra-sim/extern/graph_frontend/chakra/et_def/et_def.pb.h:19:2: error: #error regenerate this file with a newer version of protoc.
19 | #error regenerate this file with a newer version of protoc.
| ^~~~~This method explicitly sets the conda environment for CMake to use.
-
Activate the Conda Environment: First, activate the desired conda environment.
conda activate your_env_name
-
Set the CMAKE_PREFIX_PATH Environment Variable: Add the path of the activated conda environment to the
CMAKE_PREFIX_PATHenvironment variable.export CMAKE_PREFIX_PATH=$CONDA_PREFIX:$CMAKE_PREFIX_PATH
-
Activate the Conda Environment: First, activate the conda environment you want to modify.
conda activate your_env_name
-
Navigate to the Environment's Activation Script Directory: The activation scripts are located in the
etc/conda/activate.ddirectory within your conda environment. If this directory does not exist, create it along with the deactivation directory.mkdir -p $CONDA_PREFIX/etc/conda/activate.d mkdir -p $CONDA_PREFIX/etc/conda/deactivate.d
-
Create and Edit the Activation Script: Create a script named
set_cmake_prefix.shto set theCMAKE_PREFIX_PATHwhen the environment is activated.nano $CONDA_PREFIX/etc/conda/activate.d/set_cmake_prefix.shAdd the following content to this file:
#!/bin/bash export OLD_CMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=$CONDA_PREFIX:$CMAKE_PREFIX_PATH
-
Create and Edit the Deactivation Script: Create a script named
unset_cmake_prefix.shto reset theCMAKE_PREFIX_PATHwhen the environment is deactivated.nano $CONDA_PREFIX/etc/conda/deactivate.d/unset_cmake_prefix.shAdd the following content to this file:
#!/bin/bash export CMAKE_PREFIX_PATH=$OLD_CMAKE_PREFIX_PATH unset OLD_CMAKE_PREFIX_PATH
-
Set Script Permissions: Ensure the scripts are executable.
chmod +x $CONDA_PREFIX/etc/conda/activate.d/set_cmake_prefix.sh chmod +x $CONDA_PREFIX/etc/conda/deactivate.d/unset_cmake_prefix.sh