Skip to content

Dataset collector to collect Sphinx-level metadata of mixnet flows on Nym in version `nym-binaries-v2023.5-rolo` for our IEEE TDSC article "Shift Your Shape: Correlating and Defending Mixnet Flows Based on Their Shapes".

License

Notifications You must be signed in to change notification settings

KULeuven-COSIC/shift-your-shape_flow-metadata-collector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flow Metadata Collector

This repository enables you to collect metadata datasets of mixnet flows at the Sphinx packet level in the way we did for our paper "Shift Your Shape: Correlating and Defending Mixnet Flows Based on Their Shapes". Please see our primary repository, shift-your-shape_correlating-and-defending-mixnet-flows-based-on-their-shapes, for more information.

Usage

Follow the Bash scripts and Jupyter Notebooks provided in this repository in ascending order. If you do so, you end up with two new folders per experimental setting for which you collect data: the raw dataset and its ready version. The latter one is suitable for use with the classifiers and instrumentation scripts we provide in shift-your-shape_correlating-and-defending-mixnet-flows-based-on-their-shapes.

Before attempting to adjust this repository to your setting, please read our paper first in order to understand the terminology we use and the setup we create.

This flow metadata collector assumes to create the data collection infrastructure on public cloud provider Hetzner. The scripts will call out to Hetzner's CLI tool hcloud for various instance provisioning steps. If you intend to use a different cloud provider, please make sure to replace these Hetzner-specific parts with equivalents for your cloud vendor. However, the step of instance provisioning is the extent to which these scripts are cloud-provider-specific. All other steps on the way to obtaining usable datasets for Nym in version nym-binaries-v2023.5-rolo are regular Bash or Python commands.

We assume that you run:

  • Scripts *_local_* on your local machine (e.g., work laptop),
  • Scripts *_on-gwreq_* manually on each of the provisioned gwreq-* instances,
  • Scripts *_on-endpoint_* manually on each of the provisioned endpt-* instances,
  • Scripts and Jupyter Notebooks *_on-jupyter_* manually on the jupyter instance provisioned with script A_local_spawn-jupyter-cloud-server.sh.

Thus, in its current configuration and unless changed by you, this data collector assumes 1 local machine, 3 gwreq-*, 12 endpt-* instances, and 1 jupyter instance.

This repository serves simultaneously as an archive for relevant runs of these scripts for our research project as well as a tool for others to collect datasets in a similar manner. Thus, before use, you have to adjust at least the following parts to your own setting:

  • Change the number of gwreq-* and endpt-* instances in 01_local_create-3-gwreq-12-endpoint.sh if you intend to deviate from the 3 gwreq-* and 12 endpt-* instances that this repository creates by default,
  • Adjust the various variables at the top of each Bash script and Jupyter Notebook to match the paths in your setup and targeted experiment (typically signified as MODIFY BEGIN and MODIFY END comment blocks),
  • Adjust the various Nym owner details (e.g., mixcorr_private_gateway_owner in 02_local_create-gwreq.sh) such that the created gwreq-* instances are properly accepted by your endpt-* nodes,
  • Tailor the large number of checks conducted by script 09_on-jupyter_check-raw-dataset.sh on each raw dataset to the specifics of that respective dataset.

For inspiration and reference, we publish the relevant scripts of this repository and their accompanying logs for the three datasets we collected on the live Nym mixnet for our research project. Read through them if you are unsure of how a specific piece of these scripts and Jupyter Notebooks works:

Licensing

This repository is licensed under GPLv3.

However, the following individual files are licensed under Apache-2.0:

About

Dataset collector to collect Sphinx-level metadata of mixnet flows on Nym in version `nym-binaries-v2023.5-rolo` for our IEEE TDSC article "Shift Your Shape: Correlating and Defending Mixnet Flows Based on Their Shapes".

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •