Skip to content

onnela-lab/mano

Mano

CI

Mano is a Python library and CLI tool that helps you write applications that interact with the Beiwe Research Platform.

Mano is developed and maintained by Onnela Lab, at the Harvard T.H. Chan School of Public Health, and is part of The Beiwe Platform. It is distributed under the open source BSD-3-Clause license.

Features:

  • Download data generated by the participants in your study.
  • Programmatically request lists of the studies you are authorized on from your Beiwe server.
  • Request additional information: the list of participants in your study, and the device settings currently enabled on your study.

Important

Some features of Mano may not be available if the server is running an older version of the Beiwe Backend. Please let your System Administrator know if you run into any issues. The Beiwe Platform provides extensive documentation for sysadmins to handle any issues they may encounter in the upgrade process. Resources can be found on the Beiwe Backend Wiki, and they can follow announcement issues on backend issues.

If you are encountering any other issue Please Report Them Here

Under Development

  • Backfill has been substantially expanded
  • automatic calculation of current file hashes
  • improved error messaging and logging
  • A collection of CLI tools for common tasks
  • Data compression support and tooling
  • Removal of the need for site passwords

Table of contents

  1. Requirements and Compatibility
  2. Installation
  3. Credential Setup
  4. Setup Your Keyring Object and Access Your Data
  5. Access Study Information
  6. Downloading Participant Data

Requirements and Compatibility

...

Click here if you encounter issues related to Old SSL Library Compatibility . . .


Beiwe servers require modern, secure TLS encryption, so old versions of SSL/TLS libraries provided by your operating system or development/build environment may cause issues. If you encounter these issues you can install one of the Miniconda Python distributions, which bundles a more up to date version of OpenSSL.


Installation

To install Mano is to just use Python's standard package manager, pip

pip install mano

Click Here For Developer Setup Instructions . . .


For developers of Mano:

  • Clone the repository with git clone git@github.com:onnela-lab/mano.git
  • cd into the cloned repository folder in your terminal
  • Ensure you are on the develop branch
  • Set up your python virtual environment running under Python 3.12 or higher

In your virtual environment run these commands:

pip install ".[dev]"    # installs all development dependencies.
pip uninstall mano      # removes Mano as an _installed_ package so you only operate on local code.
mypy --install-types    # Let mypy install any missing typing dependencies.

To confirm your development environment is set up correctly you can check that importing mano in a Python shell loads the local version with this code snippet. When in the root of the repo this should succeed, when run from anywhere else it should raise an ImportError or show the below Assertion error message if it finds an installed version of Mano.

import mano
from os.path import abspath, relpath
manopath = relpath(mano.__file__)
assert manopath == "mano/__init__.py", f"Mano was loaded from {abspath(mano.__file__)}, not local code." 

Before you make any changes you should run tests.

  • This is very easy, just run pytest from the root of the repository.
  • If anything fails on the develop or main branches please report an issue on GitHub.

And finally, make sure you can run the Mano CLI Tool without installing the package, from the root of the repo by running this command. It should print the help message.

python -m mano

Credential Setup

Mano requires special credentials from your Beiwe Platform website, plus a name so that it can identify them. These credentials can be generated by any user account on your Beiwe website with access to the study in question.

The base credentials have a JSON/Python Dictionary structure like this. Start by copying this snippet into a JSON or Python file and changing it to a short, useful name of your choice. You may need to have this identifier string in your code.

{
    "onnela-lab-server-example-credentials": {
        "URL": "...",
        "ACCESS_KEY": "...",
        "SECRET_KEY": "..."
    }
}
  • URL should be the base url (the login page) for the Beiwe Platform website you log in to.
  • ACCESS_KEY and SECRET_KEY are 64-character alphanumeric credentials that you generate on the Beiwe website:
    • Log in, go to Manage Credentials in the section on the upper-right of the page
    • Scroll down to the API Credentials section, follow the directions to generate credentials.
    • It is a good idea to use the naming feature to label them the same as you choose here.
    • Any credentials you generate are valid for all studies on which you are an authorized user.
  • Copy and paste your generated credentials into the appropriate keys.

This is your "Keyring"

The keyring is just a dictionary with the credentials and any encryption secrets you need to access and manage your data. Mano has multiple ways to load your keyring, including from an encrypted file. Read more in the sections below.

Click Here for instructions on loading credentials from a json file . . .

You can use the Python's built-in json module to load your credentials file into a dictionary.

(Unencrypted files are better than pasting the Keyring dictionary directly into your code, but only because you can then exclude them from, for example, a github repository. They are not secure. Only use unencrypted credentials on a trusted system with whole-drive encryption and restricted access.)

import json
with open("path/to/my_credentials_file.json", "r") as f:
    keyring = json.load(f)

Click Here for instructions on setting up a secured credentials file with Mano . . .


When you installed Mano it also installed utility called cryptease. It provides a crypt.py CLI utility for easy file encryption and decryption. (Mano also uses this tool internally when you use the encryption feature.)

[!Note] "crypt.py" looks like a file name, but it is actually an executable command. You may have a common but very old CLI program named simply "crypt" installed on your system, if so you should not confuse them.

You can use crypt.py to encrypt any file.

  • If you created a python file, start by copying credentials into a simple JSON file.
    • we will name ours my_credentials_file.json for this example.
  • Enter the following command in your terminal. Replace my_credentials_file.json your own file.
  • The output path entered here, ~/.nrg-keyring.enc, is the default location where Mano will look for encrypted credentials.

[!Note]

This task prompts you to enter a password in your terminal, but no output will be shown as you type it. This is normal behavior for password prompts in terminals. The backspace key still works as normaly. You will be prompted to enter the password a second time in confirmation.

$ crypt.py --encrypt my_credentials_file.json --output-file ~/.nrg-keyring.enc

You can then check the file content and test the password you provided with this command:

$ crypt.py --decrypt ~/.nrg-keyring.enc  # this command accepts an optional --output-file argument too

This will print the decrypted content of the file to the terminal (assuming the password is correct). If this succeeds you should delete the unencrypted copy of your credentials file.

We strongly recommend recording your Credentials File Passphrase in a password manager.


Click Here to view instructions for storing study encryption keys . . .


[!Warning] It is Strongly Recommended that you use the encrypted credentials feature of Mano as described in the previous collapsable section. If you do not, your Encryption Passphrase will be stored in plain text, rendering it pointless.

If you have encryption passphrases configured for your study data in the Beiwe Platform, you can encrypt downloaded data files at rest. This requires that your keyring file contains a SECRETS section with the study encryption passphrases.

Your credentials should look like this when you have encryption passphrases configured:

{
    "onnela-lab-server-example-credentials": {
        "URL": "...",
        "ACCESS_KEY": "...",
        "SECRET_KEY": "...",
        "SECRETS": {
            "Study Name": "encryption_passphrase_for_this_study",
        }
    }
}

The key in the SECRETS dictionary should identify the specific study from your Beiwe Platform server. (You may need to include this string in your code.)

TODO: does this study name key need to be an exact match?

[!Note] Do not confuse Encryption Passphrase and Credentials File Passphrase.

  • Do Not use the same value for both passphrases.
  • We Strongly Recommend using a password manager to store a secure backup of these values, if you lose them you will irretrievably lose access to your data and will have reconfigure and re- download all your data.

Click Here to view instructions for using environment variables for your credentials . . .


Environment Variables are useful for automated scripts, but must use slightly different names because there may be generic environment variables using identical names on the system.

  • for URL use BEIWE_URL
  • for ACCESS_KEY use BEIWE_ACCESS_KEY
  • for ACCESS_KEY use BEIWE_SECRET_KEY

To load your Keyring from data in environment variables, just pass a Python None to mano.load_keyring():

from mano import load_keyring
keyring = load_keyring(None)

(There are notes on using an encrypted keyring with environment variables in the next section.)

Click Here for instructions on using a secure keyring . . .

Secured credentials are stored in encrypted files and loaded using the mano.load_keyring() function. Mano will look for your encrypted credentials file in the default location (~/.nrg-keyring.enc) unless you provide a different path.

from mano import load_keyring

# load a file from the default location
keyring = load_keyring("onnela-lab-server-example-credentials")
# load a file from a custom location
keyring = load_keyring(
    "onnela-lab-server-example-credentials", "/path/to/my_credentials_file.enc"

If you are using an encrypted Keyring and do not provide it programmatically, Mano will prompt you to type it in directly.

There are two other mechanisms to provide it:

  • Setting an environment variable NRG_KEYRING_PASS where Mano is running.

    • in most Unix-style terminals you can run:
    export NRG_KEYRING_PASS="my_credential_file_passphrase"
    • in Windows Powershell it is:
    $env:NRG_KEYRING_PASS="my_credential_file_passphrase"
  • As a keyword argument to the load_keyring() function provide a password parameter:

    from mano import load_keyring
    keyring = load_keyring(
        "onnela-lab-server-example-credentials", password="my_credential_file_passphrase"
    )
  • We recommend against placing the decryption key as text in your code, or in any file that gets committed to a source control system like Git.

  • It's tough to know where to store that final variable containing a master decryption key securely. If you are on your own computer we recommend using the full drive encryption capability of your operating system to secure it while at rest, and typing in the passphrase when you need it.

TODO: test how this behaves inside Jupyter Notebooks

[!Important] Calling mano.keyring in a context where there is no terminal to prompt for user input will usually cause your code to hang.

Access Study Information

With your Keyring loaded you can query the server for information about studies you have access to, participants in those studies, and the study's data stream configuration.

from mano import fetch_accessible_studies, fetch_study_device_settings, fetch_users_in_study

# print out a list of studies you have access to
for study_name, study_id in fetch_accessible_studies(keyring):  
    print(f"{study_name}: {study_id}")

# (for this example we will just use the the last study the loop found)
for participant_id in fetch_users_in_study(keyring, study_id):
    print(participant_id)

# and print out the device settings for that study
for setting in fetch_study_device_settings(keyring, study_id):
    print(setting)

You can find all the data streams available on the Beiwe Platform by importing ALL_DATA_STREAMS from mano.constants. To get any specific data stream import the DataStream class.

from pprint import pprint
from mano.constants import DataStream, ALL_DATA_STREAMS

print("All the underlying data stream strings:")
pprint("-", ALL_DATA_STREAMS)

print("An example specific data stream:")
print("-", DataStream.ACCELEROMETER)  # etc.

The full list of data stream options on the DataStream class is ACCELEROMETER, AUDIO_RECORDING, ANDROID_LOG_FILE, BLUETOOTH, CALL_LOG, DEVICEMOTION, GPS, GYRO, IDENTIFIERS, IOS_LOG_FILE, MAGNETOMETER, POWER_STATE, PROXIMITY, REACHABILITY, SURVEY_ANSWERS, SURVEY_TIMINGS, TEXTS_LOG, and WIFI

Downloading Participant Data

With your Keyring loaded, you can download collected data from your Beiwe server and extract it to your filesystem using the mano.sync module.

Backfill

The primary tool you should use to download data is the backfill function in the mano.sync module.

The real world is messy. Data collection may be inconsistent, interrupted, or disordered for many reasons. The backfill function works with the data you already downloaded and the Beiwe Data Access API to ensure you have all of your data in one spot, without duplicates, with minimal overhead, and with useful feedback when things go wrong.

Click Here for a description of how backfill works . . .


With a little bit of information about where your data is stored, backfill will:

  • Scan all the Beiwe data files in the participant's data folder
  • Generate file hashes and assemble the required data structure
    • It does this whether the data is compressed, encrypted, neither, or both.
  • Query the server with parameters it needs to provide only new and updated data.
  • Download that data in batches to avoid network issues.
    • Using the compressed data-download endpoints to run efficiently and quickly.
  • Extract (and optionally decompress) those files where they need to go.
  • And it accepts all the extra parameters you would normally provide to the download function.

You can use Python's date or datetime objects, or ISO 8601 strings ("2020-02-20"), for the time parameters. Note that Beiwe Platform server expects UTC times, but will convert non-UTC times appropriately. (Timezone-naive datetime objects and strings will be treated as UTC.)

A simple usage of backfill looks like this:

from datetime import date
from mano import load_keyring
from mano.sync import backfill

keyring = {...}  # load your keyring however you like
study_id = "your_studys_id_string"

# get an up-to-date list of participants from the server (list of strings)
participants = list(fetch_users_in_study(keyring, study_id))

# set a start date, usually the start of your study. (end_date is an optional keyword parameter)
start_date = date(2020, 2, 20)

for participant_id in participants:
    # This folder may have the participant's data in it, in the structure Beiwe data downloads
    # use. The full path should contain the study id and then the participant id(s) for that study.
    participants_output_folder = f"/path/to/{study_id}/{participant_id}/"
    
    # Backfill will scan all relevant data files it finds
    # Note that this invocation will decompress the files, but it doesn't have to
    backfill(
        keyring, study_id, participant_id, participants_output_folder, start_date
    )

Note

Backfill supports the same lock and passphrase parameters for encrypting downloaded data files as the download function described below.

Download

If you have a need for a direct control of your download operation you can use the download function in the mano.sync module

Warning

Naive use of the download function will result in a query for all data for the specified study, which is many gigabytes. It will be loaded directly into memory before being written to disk. (there's a reason we have backfill!)

Note

The msync.download function returns a Python Standard Library zipfile.ZipFile object from which you extract files. This Zip file is not actually compressed, it is just a convenient container for the data files. Data files will themselves be compressed based on the compressed parameter you provide to download.

A simple usage of download that only queries a few data streams over a limited time range looks like this:

from zipfile import ZipFile        # this is part of the Python Standard Library
from mano.sync import download
from mano.constants import DataStream

keyring = {...}  # load your keyring however you like

study_id = "your_studys_id_string"
target_output_folder = f"/path/to/that/{study_id}/"

# select data streams using the DataStream class
data_streams = [DataStream.ACCELEROMETER, DataStream.IOS_LOG, DataStream.GPS]

# and let's set the time ranges using strings
time_start = '2015-10-01T00:00:00'
time_end = '2015-12-01T00:00:00'

# Download returns a `zipfile.ZipFile` object containing the requested data files.
zf: ZipFile = download(
    keyring,
    study_id,
    participant_id,
    data_streams=data_streams,
    time_start=time_start,
    time_end=time_end,
    compressed=True,  # we will use the CLI to manage the file compression later
)

# we will just use the `extractall()` method on that Zipfile
zf.extractall(output_folder)  # It does exactly what it says

By default msync.download attempts to download all of the data for the specified participant, which could end up being prohibitively large. For this reason, the msync.download function exposes parameters for data_streams, time_start, and time_end. By using these parameters you can limit your download operation to those constraints.

Note

The msync.download function returns a Python Standard Library zipfile.ZipFile object from which you extract files.

import logging
from mano import sync as msync

logging.basicConfig(level=logging.INFO)

output_folder = '/tmp/beiwe-data'  # set this to a real folder location

zf = msync.download(keyring, study_id, participant_id, data_streams=['identifiers'])

zf.extractall(output_folder)

Click Here for some further details about compressed downloads . . .


Mano can download the raw, compressed version of your data, and we recommend that you use the feature. On average the ZSTD-compressed files are one-fifth the size of uncompressed files.

However, because the compressed Data Access API endpoint is not available on older versions of the server software the compressed parameter is set to False on the download function by default, and will use the uncompressed download endpoints when set to true.

Backfill always hits the compressed download endpoint - because we discovered some bugs, and only sufficiently up-to-date servers will work correctly with it. This causes 404 errors on outdated servers.

These compressed source files each individually use a ~newer compression technology called ZSTD ("Zstandard", .zst) but are still bundled together in an uncompressed Zip archive.

ZSTD is crazy fast, hundreds of megabytes per second on even decade-old hardware, both to compress and decompress (unless you crank up the settings). You can use the mano CLI tool (documented later) to compress and decompress Beiwe data files.

A Known Issue: At time of testing (mid 2025) we found that almost all 3rd party tools claiming to support .zst files simply don't work correctly. We use a trivial invocation of the reference and only implementation of ZSTD to create these files, and a few tools work just fine. We filed bug reports where we could, we apologize for the inconvenience and hope this gets resolved.


Encrypting Data Files At Rest

You can pass the ZipFile object to save() if you wish to encrypt data stream files. You can also pass these parameters to backfill() and it will use them internally.

from zipfile import ZipFile
from mano.constants import DataStream
from mano.sync import backfill, download, save

lock_and_download_streams = [DataStream.GPS, DataStream.AUDIO_RECORDINGS]


# for backfill to handle automatically:
backfill(
    keyring,
    study_id,
    participant_id,
    output_folder,
    passphrase=data_encryption_key,
    lock=lock_and_download_streams,
    data_streams=lock_and_download_streams,
)


# or manually using the download + save pattern:
zf: ZipFile = download(
    keyring,
    study_id,
    [participant_id],
    data_streams=lock_and_download_streams,
)

# grab your encryption key out of the keyring
data_encryption_key = keyring["SECRETS"]["Study Name"]

msync.save(
    zf,
    participant_id,
    output_folder,
    lock=lock_and_download_streams,
    passphrase=data_encryption_key,
)

File Management Tools and and the CLI

Mano includes some utility functionality for managing the data files you download. These tools have both Python functions you can call in your code, and a command line interface (CLI) for use in your terminal.

To invoke the CLI commands just run mano in your terminal (after activating the appropriate Python environment) and read the help output. Here are some examples

Compress and Decompress Commands

Decompress any .zst files to their original content:

# Basic decompression is very easy, just hand it the folder. It will traverse into subfolders.
$ mano decompress ./study_folder/
$ mano decompress ./participant_folder/

# there are 2 options you can provide, --delete-zst, --overwrite

# delete the .zst files after decompressing:
$ mano decompress ./folder/ --delete-zst
# overwrite existing files instead of erroring with:
$ mano decompress ./folder/ --overwrite

Compress Files

Compression is virtually identical to decompression, but operates only on uncompressed files.

# subfolder traversal works the same way, just hand it the folder and it goes to work
$ mano compress ./study_folder/
$ mano compress ./participant_folder/

# Compress has 3 options

# Select specific compression level (0-22, default is 2)
# (higher levels are VERY slow but can save you about 30% more space)
$ mano compress ./folder/ -8

# Compress and delete the original uncompressed files with --delete-original:
$ mano compress ./folder/ --delete-original

# and like for decomerpss you can overwrite existing files with:
$ mano compress ./folder/ --overwrite

Global Options

You can also use these options with any command:

  • provide -y or --yes: Skip all confirmation prompts
  • --mt[#]: Set the number of threads
    • --mt4 uses 4 threads
    • --mt0 uses all available CPU cores
      • (this is the default behavior), it is usually best to leave as is

Examples of python file management functions

The Python functions called by the above CLI commands are located in the mano.file_management module. They mirror these CLI commands.

from mano.file_management import decompress_zst_files, compress_to_zst_files

target_folder = "./study_folder/"

# by default these are multithreaded across all available CPU cores
decompress_zst_files(target_folder, delete_zst=True, overwrite=False)
compress_to_zst_files(target_folder, compression_level=5, delete_original=False)

# to set the number of concurrent files to operate on, set the multithreading_count attribute on the
# global settings object before calling the functions
from mano.file_management import GlobalSettings
GlobalSettings.multithreading_count = 4  # use 4 threads

About

Mano - Beiwe research platform API

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors