Mano is a Python library and CLI tool that helps you write applications that interact with the Beiwe Research Platform.
Mano is developed and maintained by Onnela Lab, at the Harvard T.H. Chan School of Public Health, and is part of The Beiwe Platform. It is distributed under the open source BSD-3-Clause license.
- Download data generated by the participants in your study.
- Programmatically request lists of the studies you are authorized on from your Beiwe server.
- Request additional information: the list of participants in your study, and the device settings currently enabled on your study.
Important
Some features of Mano may not be available if the server is running an older version of the Beiwe Backend. Please let your System Administrator know if you run into any issues. The Beiwe Platform provides extensive documentation for sysadmins to handle any issues they may encounter in the upgrade process. Resources can be found on the Beiwe Backend Wiki, and they can follow announcement issues on backend issues.
If you are encountering any other issue Please Report Them Here
- Backfill has been substantially expanded
- automatic calculation of current file hashes
- improved error messaging and logging
- A collection of CLI tools for common tasks
- Data compression support and tooling
- Removal of the need for site passwords
- Requirements and Compatibility
- Installation
- Credential Setup
- Setup Your Keyring Object and Access Your Data
- Access Study Information
- Downloading Participant Data
- Please Report Any Issues You Encounter
- Mano is compatible with modern versions of Python [at time of writing this means 3.11+]
- Mano's CI runs on macOS (Unix), Ubuntu (Linux), and Windows.
Beiwe servers require modern, secure TLS encryption, so old versions of SSL/TLS libraries provided by your operating system or development/build environment may cause issues. If you encounter these issues you can install one of the Miniconda Python distributions, which bundles a more up to date version of OpenSSL.
To install Mano is to just use Python's standard package manager, pip
pip install manoFor developers of Mano:
- Clone the repository with
git clone git@github.com:onnela-lab/mano.git cdinto the cloned repository folder in your terminal- Ensure you are on the
developbranch - Set up your python virtual environment running under Python 3.12 or higher
In your virtual environment run these commands:
pip install ".[dev]" # installs all development dependencies.
pip uninstall mano # removes Mano as an _installed_ package so you only operate on local code.
mypy --install-types # Let mypy install any missing typing dependencies.To confirm your development environment is set up correctly you can check that importing mano in a
Python shell loads the local version with this code snippet. When in the root of the repo this should
succeed, when run from anywhere else it should raise an ImportError or show the below Assertion
error message if it finds an installed version of Mano.
import mano
from os.path import abspath, relpath
manopath = relpath(mano.__file__)
assert manopath == "mano/__init__.py", f"Mano was loaded from {abspath(mano.__file__)}, not local code." Before you make any changes you should run tests.
- This is very easy, just run
pytestfrom the root of the repository. - If anything fails on the
developor main branches please report an issue on GitHub.
And finally, make sure you can run the Mano CLI Tool without installing the package, from the root of the repo by running this command. It should print the help message.
python -m manoMano requires special credentials from your Beiwe Platform website, plus a name so that it can identify them. These credentials can be generated by any user account on your Beiwe website with access to the study in question.
The base credentials have a JSON/Python Dictionary structure like this. Start by copying this snippet into a JSON or Python file and changing it to a short, useful name of your choice. You may need to have this identifier string in your code.
{
"onnela-lab-server-example-credentials": {
"URL": "...",
"ACCESS_KEY": "...",
"SECRET_KEY": "..."
}
}URLshould be the base url (the login page) for the Beiwe Platform website you log in to.ACCESS_KEYandSECRET_KEYare 64-character alphanumeric credentials that you generate on the Beiwe website:- Log in, go to Manage Credentials in the section on the upper-right of the page
- Scroll down to the API Credentials section, follow the directions to generate credentials.
- It is a good idea to use the naming feature to label them the same as you choose here.
- Any credentials you generate are valid for all studies on which you are an authorized user.
- Copy and paste your generated credentials into the appropriate keys.
The keyring is just a dictionary with the credentials and any encryption secrets you need to access and manage your data. Mano has multiple ways to load your keyring, including from an encrypted file. Read more in the sections below.
You can use the Python's built-in json module to load your credentials file into a dictionary.
(Unencrypted files are better than pasting the Keyring dictionary directly into your code, but only because you can then exclude them from, for example, a github repository. They are not secure. Only use unencrypted credentials on a trusted system with whole-drive encryption and restricted access.)
import json
with open("path/to/my_credentials_file.json", "r") as f:
keyring = json.load(f)When you installed Mano it also installed utility called cryptease. It provides a crypt.py
CLI utility for easy file encryption and decryption. (Mano also uses this tool internally when you
use the encryption feature.)
[!Note] "
crypt.py" looks like a file name, but it is actually an executable command. You may have a common but very old CLI program named simply "crypt" installed on your system, if so you should not confuse them.
You can use crypt.py to encrypt any file.
- If you created a python file, start by copying credentials into a simple JSON file.
- we will name ours
my_credentials_file.jsonfor this example.
- we will name ours
- Enter the following command in your terminal. Replace
my_credentials_file.jsonyour own file. - The output path entered here,
~/.nrg-keyring.enc, is the default location where Mano will look for encrypted credentials.
[!Note]
This task prompts you to enter a password in your terminal, but no output will be shown as you type it. This is normal behavior for password prompts in terminals. The backspace key still works as normaly. You will be prompted to enter the password a second time in confirmation.
$ crypt.py --encrypt my_credentials_file.json --output-file ~/.nrg-keyring.encYou can then check the file content and test the password you provided with this command:
$ crypt.py --decrypt ~/.nrg-keyring.enc # this command accepts an optional --output-file argument tooThis will print the decrypted content of the file to the terminal (assuming the password is correct). If this succeeds you should delete the unencrypted copy of your credentials file.
We strongly recommend recording your Credentials File Passphrase in a password manager.
[!Warning] It is Strongly Recommended that you use the encrypted credentials feature of Mano as described in the previous collapsable section. If you do not, your Encryption Passphrase will be stored in plain text, rendering it pointless.
If you have encryption passphrases configured for your study data in the Beiwe Platform, you can
encrypt downloaded data files at rest. This requires that your keyring file contains a SECRETS
section with the study encryption passphrases.
Your credentials should look like this when you have encryption passphrases configured:
{
"onnela-lab-server-example-credentials": {
"URL": "...",
"ACCESS_KEY": "...",
"SECRET_KEY": "...",
"SECRETS": {
"Study Name": "encryption_passphrase_for_this_study",
}
}
}The key in the SECRETS dictionary should identify the specific study from your Beiwe Platform
server. (You may need to include this string in your code.)
TODO: does this study name key need to be an exact match?
[!Note] Do not confuse Encryption Passphrase and Credentials File Passphrase.
- Do Not use the same value for both passphrases.
- We Strongly Recommend using a password manager to store a secure backup of these values, if you lose them you will irretrievably lose access to your data and will have reconfigure and re- download all your data.
Environment Variables are useful for automated scripts, but must use slightly different names because there may be generic environment variables using identical names on the system.
- for
URLuseBEIWE_URL - for
ACCESS_KEYuseBEIWE_ACCESS_KEY - for
ACCESS_KEYuseBEIWE_SECRET_KEY
To load your Keyring from data in environment variables, just pass a Python None to
mano.load_keyring():
from mano import load_keyring
keyring = load_keyring(None)Secured credentials are stored in encrypted files and loaded using the mano.load_keyring() function.
Mano will look for your encrypted credentials file in the default location
(~/.nrg-keyring.enc) unless you provide a different path.
from mano import load_keyring
# load a file from the default location
keyring = load_keyring("onnela-lab-server-example-credentials")
# load a file from a custom location
keyring = load_keyring(
"onnela-lab-server-example-credentials", "/path/to/my_credentials_file.enc"If you are using an encrypted Keyring and do not provide it programmatically, Mano will prompt you to type it in directly.
There are two other mechanisms to provide it:
-
Setting an environment variable
NRG_KEYRING_PASSwhere Mano is running.- in most Unix-style terminals you can run:
export NRG_KEYRING_PASS="my_credential_file_passphrase"
- in Windows Powershell it is:
$env:NRG_KEYRING_PASS="my_credential_file_passphrase"
-
As a keyword argument to the
load_keyring()function provide apasswordparameter:from mano import load_keyring keyring = load_keyring( "onnela-lab-server-example-credentials", password="my_credential_file_passphrase" )
-
We recommend against placing the decryption key as text in your code, or in any file that gets committed to a source control system like Git.
-
It's tough to know where to store that final variable containing a master decryption key securely. If you are on your own computer we recommend using the full drive encryption capability of your operating system to secure it while at rest, and typing in the passphrase when you need it.
TODO: test how this behaves inside Jupyter Notebooks
[!Important] Calling
mano.keyringin a context where there is no terminal to prompt for user input will usually cause your code to hang.
With your Keyring loaded you can query the server for information about studies you have access to, participants in those studies, and the study's data stream configuration.
from mano import fetch_accessible_studies, fetch_study_device_settings, fetch_users_in_study
# print out a list of studies you have access to
for study_name, study_id in fetch_accessible_studies(keyring):
print(f"{study_name}: {study_id}")
# (for this example we will just use the the last study the loop found)
for participant_id in fetch_users_in_study(keyring, study_id):
print(participant_id)
# and print out the device settings for that study
for setting in fetch_study_device_settings(keyring, study_id):
print(setting)You can find all the data streams available on the Beiwe Platform by importing ALL_DATA_STREAMS
from mano.constants. To get any specific data stream import the DataStream class.
from pprint import pprint
from mano.constants import DataStream, ALL_DATA_STREAMS
print("All the underlying data stream strings:")
pprint("-", ALL_DATA_STREAMS)
print("An example specific data stream:")
print("-", DataStream.ACCELEROMETER) # etc.The full list of data stream options on the DataStream class is ACCELEROMETER,
AUDIO_RECORDING, ANDROID_LOG_FILE, BLUETOOTH, CALL_LOG, DEVICEMOTION, GPS, GYRO,
IDENTIFIERS, IOS_LOG_FILE, MAGNETOMETER, POWER_STATE, PROXIMITY, REACHABILITY,
SURVEY_ANSWERS, SURVEY_TIMINGS, TEXTS_LOG, and WIFI
With your Keyring loaded, you can download collected data from your Beiwe server and extract it to
your filesystem using the mano.sync module.
The primary tool you should use to download data is the backfill function in the mano.sync
module.
The real world is messy. Data collection may be inconsistent, interrupted, or disordered for
many reasons. The backfill function works with the data you already downloaded and the Beiwe Data
Access API to ensure you have all of your data in one spot, without duplicates, with minimal
overhead, and with useful feedback when things go wrong.
With a little bit of information about where your data is stored, backfill will:
- Scan all the Beiwe data files in the participant's data folder
- Generate file hashes and assemble the required data structure
- It does this whether the data is compressed, encrypted, neither, or both.
- Query the server with parameters it needs to provide only new and updated data.
- Download that data in batches to avoid network issues.
- Using the compressed data-download endpoints to run efficiently and quickly.
- Extract (and optionally decompress) those files where they need to go.
- And it accepts all the extra parameters you would normally provide to the
downloadfunction.
You can use Python's date or datetime objects, or ISO 8601 strings ("2020-02-20"), for the time
parameters. Note that Beiwe Platform server expects UTC times, but will convert non-UTC times
appropriately. (Timezone-naive datetime objects and strings will be treated as UTC.)
A simple usage of backfill looks like this:
from datetime import date
from mano import load_keyring
from mano.sync import backfill
keyring = {...} # load your keyring however you like
study_id = "your_studys_id_string"
# get an up-to-date list of participants from the server (list of strings)
participants = list(fetch_users_in_study(keyring, study_id))
# set a start date, usually the start of your study. (end_date is an optional keyword parameter)
start_date = date(2020, 2, 20)
for participant_id in participants:
# This folder may have the participant's data in it, in the structure Beiwe data downloads
# use. The full path should contain the study id and then the participant id(s) for that study.
participants_output_folder = f"/path/to/{study_id}/{participant_id}/"
# Backfill will scan all relevant data files it finds
# Note that this invocation will decompress the files, but it doesn't have to
backfill(
keyring, study_id, participant_id, participants_output_folder, start_date
)Note
Backfill supports the same lock and passphrase parameters for encrypting downloaded data files as the download function described below.
If you have a need for a direct control of your download operation you can use the
download function in the mano.sync module
Warning
Naive use of the download function will result in a query for all data for the specified
study, which is many gigabytes. It will be loaded directly into memory before being written to
disk. (there's a reason we have backfill!)
Note
The msync.download function returns a Python Standard Library zipfile.ZipFile object from which you extract files. This Zip
file is not actually compressed, it is just a convenient container for the data files. Data
files will themselves be compressed based on the compressed parameter you provide to download.
A simple usage of download that only queries a few data streams over a limited time range looks like this:
from zipfile import ZipFile # this is part of the Python Standard Library
from mano.sync import download
from mano.constants import DataStream
keyring = {...} # load your keyring however you like
study_id = "your_studys_id_string"
target_output_folder = f"/path/to/that/{study_id}/"
# select data streams using the DataStream class
data_streams = [DataStream.ACCELEROMETER, DataStream.IOS_LOG, DataStream.GPS]
# and let's set the time ranges using strings
time_start = '2015-10-01T00:00:00'
time_end = '2015-12-01T00:00:00'
# Download returns a `zipfile.ZipFile` object containing the requested data files.
zf: ZipFile = download(
keyring,
study_id,
participant_id,
data_streams=data_streams,
time_start=time_start,
time_end=time_end,
compressed=True, # we will use the CLI to manage the file compression later
)
# we will just use the `extractall()` method on that Zipfile
zf.extractall(output_folder) # It does exactly what it saysBy default msync.download attempts to download all of the data for the specified participant,
which could end up being prohibitively large. For this reason, the msync.download function exposes
parameters for data_streams, time_start, and time_end. By using these parameters you can
limit your download operation to those constraints.
Note
The msync.download function returns a Python Standard Library zipfile.ZipFile object from which you extract files.
import logging
from mano import sync as msync
logging.basicConfig(level=logging.INFO)
output_folder = '/tmp/beiwe-data' # set this to a real folder location
zf = msync.download(keyring, study_id, participant_id, data_streams=['identifiers'])
zf.extractall(output_folder)Mano can download the raw, compressed version of your data, and we recommend that you use the feature. On average the ZSTD-compressed files are one-fifth the size of uncompressed files.
However, because the compressed Data Access API endpoint is not available on older versions of the
server software the compressed parameter is set to False on the download function by default,
and will use the uncompressed download endpoints when set to true.
Backfill always hits the compressed download endpoint - because we discovered some bugs, and
only sufficiently up-to-date servers will work correctly with it. This causes 404 errors on outdated
servers.
These compressed source files each individually use a ~newer compression technology called ZSTD
("Zstandard", .zst) but are still bundled together in an uncompressed Zip archive.
ZSTD is crazy fast, hundreds of megabytes per second on even decade-old hardware, both to compress
and decompress (unless you crank up the settings). You can use the mano CLI tool (documented
later) to compress and decompress Beiwe data files.
A Known Issue: At time of testing (mid 2025) we found that almost all 3rd party tools
claiming to support .zst files simply don't work correctly. We use a trivial invocation of the
reference and only implementation of ZSTD to create these
files, and a few tools work just fine. We filed bug reports where we could, we apologize for the
inconvenience and hope this gets resolved.
You can pass the ZipFile object to save() if you wish to encrypt data stream files.
You can also pass these parameters to backfill() and it will use them internally.
from zipfile import ZipFile
from mano.constants import DataStream
from mano.sync import backfill, download, save
lock_and_download_streams = [DataStream.GPS, DataStream.AUDIO_RECORDINGS]
# for backfill to handle automatically:
backfill(
keyring,
study_id,
participant_id,
output_folder,
passphrase=data_encryption_key,
lock=lock_and_download_streams,
data_streams=lock_and_download_streams,
)
# or manually using the download + save pattern:
zf: ZipFile = download(
keyring,
study_id,
[participant_id],
data_streams=lock_and_download_streams,
)
# grab your encryption key out of the keyring
data_encryption_key = keyring["SECRETS"]["Study Name"]
msync.save(
zf,
participant_id,
output_folder,
lock=lock_and_download_streams,
passphrase=data_encryption_key,
)
Mano includes some utility functionality for managing the data files you download. These tools have both Python functions you can call in your code, and a command line interface (CLI) for use in your terminal.
To invoke the CLI commands just run mano in your terminal (after activating the appropriate Python
environment) and read the help output. Here are some examples
Decompress any .zst files to their original content:
# Basic decompression is very easy, just hand it the folder. It will traverse into subfolders.
$ mano decompress ./study_folder/
$ mano decompress ./participant_folder/
# there are 2 options you can provide, --delete-zst, --overwrite
# delete the .zst files after decompressing:
$ mano decompress ./folder/ --delete-zst
# overwrite existing files instead of erroring with:
$ mano decompress ./folder/ --overwriteCompression is virtually identical to decompression, but operates only on uncompressed files.
# subfolder traversal works the same way, just hand it the folder and it goes to work
$ mano compress ./study_folder/
$ mano compress ./participant_folder/
# Compress has 3 options
# Select specific compression level (0-22, default is 2)
# (higher levels are VERY slow but can save you about 30% more space)
$ mano compress ./folder/ -8
# Compress and delete the original uncompressed files with --delete-original:
$ mano compress ./folder/ --delete-original
# and like for decomerpss you can overwrite existing files with:
$ mano compress ./folder/ --overwrite
You can also use these options with any command:
- provide
-yor--yes: Skip all confirmation prompts --mt[#]: Set the number of threads--mt4uses 4 threads--mt0uses all available CPU cores- (this is the default behavior), it is usually best to leave as is
The Python functions called by the above CLI commands are located in the mano.file_management
module. They mirror these
CLI commands.
from mano.file_management import decompress_zst_files, compress_to_zst_files
target_folder = "./study_folder/"
# by default these are multithreaded across all available CPU cores
decompress_zst_files(target_folder, delete_zst=True, overwrite=False)
compress_to_zst_files(target_folder, compression_level=5, delete_original=False)
# to set the number of concurrent files to operate on, set the multithreading_count attribute on the
# global settings object before calling the functions
from mano.file_management import GlobalSettings
GlobalSettings.multithreading_count = 4 # use 4 threads