Skip to content

embryo-labs/Efficient-Disk-Learned-Index

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Efficient Learned Indexes on Disk

A microbenchmark and YCSB benchmark for evaluating learned indexes on disk.
More details can be found in our paper in SIGMOD 2024: Making In-Memory Learned Indexes Efficient on Disk.

Supported indexes

The indexes are included as git submodules. Add --recursive option when git clone-ing, or run git submodule init && git submodule update in the local repo.

Microbenchmark

  • RS
  • PGM-index
  • Compressed PGM
  • PGM_Disk
  • Cpr_PGM_Disk
  • CprLeCo_PGM_Disk
  • Zone Map
  • LeCo-Zonemap
  • LeCo-Zonemap-Disk

YCSB Benchmark

  • Disk-based PGM
  • Disk-based B+tree
  • Hybrid_B+tree_PGM-Disk
  • Hybrid_B+tree_LeCo-Disk
  • Hybrid_B+tree_RS
  • Hybrid_B+tree_PGM
  • Hybrid_ALEX_PGM-Disk
  • Hybrid_ALEX_LeCo-Disk
  • Hybrid_ALEX_RS
  • Hybrid_ALEX_PGM

Running the benchmark

Prepare datasets

Put the dataset in the datasets directory. You can use SOSD datasets or other datasets, as long as the data format meets:

key0
key1
key2
...

Run Microbenchmark

Please make sure that the datasets in the corresponding scripts are already stored in ./datasets/. The parameters in each experiment are included in each sub-script.

bash RunOnSingleDisk.sh

Run YCSB Benchmark

  1. Use the scripts of index-microbench to generate YCSB default workloads. The parameters in workload_config.inp are workload name and monoint. The workloads in our paper are:
    • Common Settings:
      recordcount=150000000
      operationcount=10000000
      requestdistribution=uniform
      # the other settings are the same in YCSB default workloads
    • Read-Only:
      readproportion=0
      updateproportion=0
      scanproportion=0
      insertproportion=1
    • Write-Only:
      readproportion=1
      updateproportion=0
      scanproportion=0
      insertproportion=0
    • Balanced:
      readproportion=0.5
      updateproportion=0
      scanproportion=0
      insertproportion=0.5
  2. Move the generated txn files in workloads (e.g., txn_monoint_workloadc) to ./ycsb_workloads/uniform_150/ in this repo.
  3. Generate the YCSB workloads on existing datasets:
bash scripts/build_benchmark.sh
bash PrepareYCSB.sh
  1. [optional] Modify the index parameters in scripts/params_cc (for write-only workload), scripts/params_c (for read-only workload), scripts/params_aa (for balanced workload)
  2. Run Single-Threaded YCSB Benchmark
bash RunYCSBOnSingleDisk.sh
  1. Run Multi-Threaded YCSB Benchmark The number of threads is included in RunMultiThreadedYCSB.sh
bash RunMultiThreadedYCSB.sh

About

[SIGMOD’24] Source code for the paper: Making In-Memory Learned Indexes Efficient on Disk

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •