-
Notifications
You must be signed in to change notification settings - Fork 17
old quick start interconnects
1- Install CODES, ROSS and DUMPI using the installation instructions available at:
https://xgitlab.cels.anl.gov/codes/codes/wikis/installation
2- Download and untar the AMG 216 ranks, Multigrid 125 ranks and Crystal Router 100 ranks trace using the following:
wget https://portal.nersc.gov/project/CAL/doe-miniapps-mpi-traces/AMG/df_AMG_n216_dumpi.tar.gz
wget https://portal.nersc.gov/project/CAL/doe-miniapps-mpi-traces/CrystalRouter/100.tar.gz
The AMG and Crystal router traces use point to point operations. Multigrid uses both point to point and collective operations.
./src/network-workloads/model-net-mpi-replay --disable_compute=1 --sync=1 --num_net_traces=216 --lp-io-dir=dragonfly-amg --lp-io-use-suffix=1 --workload_file=df_traces/AMG/df_AMG_n216_dumpi/dumpi-2014.03.03.14.55.23- --workload_type=dumpi -- ../src/network-workloads/conf/dragonfly-custom/modelnet-test-dragonfly-theta.conf
Data files with network statistics will be generated in dragonfly-amg-xxx directory (xxx is for the suffix).
./src/network-workloads/model-net-mpi-replay --disable_compute=1 --sync=1 --num_net_traces=216 --lp-io-dir=fattree-amg --lp-io-use-suffix=1 --workload_file=../../df_traces/AMG/df_AMG_n216_dumpi/dumpi-2014.03.03.14.55.23- --workload_type=dumpi -- ../src/network-workloads/conf/modelnet-mpi-fattree-summit-k36-n3564.conf
Data files with network statistics will be generated in dragonfly-amg-xxx and fattree-amg-xxx directories (xxx is for the suffix).
Build and install Cortex using:
https://xgitlab.cels.anl.gov/codes/codes/wikis/codes-cortex-install
Re-configure and build CODES with Cortex.
Now run the Multigrid trace with 125 ranks.
./src/network-workloads/model-net-mpi-replay --disable_compute=1 --sync=1 --debug_cols=1 --num_net_traces=125 --lp-io-dir=fattree-mg --lp-io-use-suffix=1 --workload_file=../../df_traces/Multigrid/MultiGrid_C_n125_dumpi/dumpi-2014.03.06.23.48.13- --workload_type=dumpi -- ../src/network-workloads/conf/modelnet-mpi-fattree-summit-k36-n3564.conf
This should generate an additional file in the fattree-mg directory named as avg-all-reduce-time having the average time on a per rank basis to complete a MPI_AllReduce operation.
You will need a workloads file with the number of ranks in each job and path to the DUMPI traces. You will also need an allocation file with the list of ranks to be assigned to each job.
An example workloads file can be found at: codes/src/network-workloads/conf/workloads.conf -- Please modify the paths in the workload file according to the location of the DUMPI traces.
An example allocation file can be found at: codes/src/network-workloads/conf/allocation-conf.conf
Note: this simulation run will take several minutes.
./src/network-workloads/model-net-mpi-replay --disable_compute=1 --sync=1 --lp-io-dir=fattree-multi --lp-io-use-suffix=1 --workload_conf_file=../src/network-workloads/conf/workloads.conf --alloc_file=../src/network-workloads/conf/allocation-cont.conf --workload_type=dumpi -- ../src/network-workloads/conf/modelnet-mpi-fattree-summit-k36-n3564.conf
Use the random node configuration file with the dragonfly network model (random node placement on fat tree can generate extensive contention over the network).
./src/network-workloads/model-net-mpi-replay --disable_compute=1 --sync=1 --lp-io-dir=dragonfly-multi --lp-io-use-suffix=1 --workload_conf_file=../src/network-workloads/conf/workloads.conf --alloc_file=../src/network-workloads/conf/allocation-random.conf --workload_type=dumpi -- ../src/network-workloads/conf/dragonfly-custom/modelnet-test-dragonfly-theta.conf
Generate your own config files using different job allocations by using instructions at: scripts/allocation_gen/README.txt
Run the above simulations in parallel with --sync=3 (ROSS optimistic mode).
Generate background network traffic with Multigrid application trace.
./src/network-workloads/model-net-mpi-replay --disable_compute=1 --sync=1 --lp-io-dir=dragonfly-multi --lp-io-use-suffix=1 --workload_conf_file=../src/network-workloads/conf/workloads.conf --alloc_file=../src/network-workloads/conf/allocation-synthetic.conf --workload_type=dumpi -- ../src/network-workloads/conf/dragonfly-custom/modelnet-test-dragonfly-theta.conf
Default interval between background traffic messages is set to 100,000 ns. Try using --mean_interval=[500000, 25000], which should multiply the amount of background traffic generated by 2x and 4x respectively. The smaller the mean interval, messages are generated after a much smaller interval.