This package was created to convert WRDS SAS data to modern data formats. This package has three major functions, one for each of three popular data formats.
wrds_update(): Imports WRDS SAS data into a PostgreSQL database.wrds_update_pq(): Converts WRDS SAS data to parquet files.wrds_update_csv(): Converts WRDS SAS data to gzipped CSV files.
This package was primarily designed to handle WRDS data, but some support is provided for importing a local SAS file (*.sas7dbat) into a PostgreSQL database.
Functions prefixed with wrds_ are designed to pull data from WRDS via SSH.
For local SAS datasets, use functions that accept fpath (e.g., sas_to_pandas() and related helpers).
The software uses Python 3 and depends on SQLAlchemy and Paramiko. Some helper functions return Pandas DataFrames; Pandas is optional unless those functions are used.
To access WRDS non-interactively (e.g., from Python scripts), you must use SSH public-key authentication.
WRDS provides a dedicated SSH endpoint for key-based authentication:
wrds-cloud-sshkey.wharton.upenn.edu
WRDS supports modern SSH key types. We recommend ed25519:
ssh-keygen -t ed25519 -C "your_wrds_id@wrds"
Accept the default location (~/.ssh/id_ed25519).
You may use a passphrase if your SSH agent is running. For unattended jobs (cron / CI), an empty passphrase may be required.
Copy your public key to the WRDS SSH-key host:
cat ~/.ssh/id_ed25519.pub | \
ssh your_wrds_id@wrds-cloud-sshkey.wharton.upenn.edu \
"mkdir -p ~/.ssh && chmod 700 ~/.ssh && \
cat >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys"
If ~/.ssh does not exist on WRDS, the command above will create it.
Add an entry to ~/.ssh/config:
Host wrds
HostName wrds-cloud-sshkey.wharton.upenn.edu
User your_wrds_id
IdentityFile ~/.ssh/id_ed25519
IdentitiesOnly yes
You can now connect with:
ssh wrds
This configuration is also used automatically by paramiko, enabling
password-less access from Python.
If SSH still prompts for a password, run:
ssh -vvv wrds
and confirm that publickey appears in the list of authentication methods.
wrds2pg uses paramiko to execute SAS code on WRDS via SSH.
Password-based authentication will not work in unattended scripts.
For the wrds_update() function, you should have write access to a PostgreSQL database to store the data.
Environment variables that the code can use include:
PGDATABASE: The name of the PostgreSQL database you use.PGUSER: Your username on the PostgreSQL database.PGHOST: Where the PostgreSQL database is to be found (this will belocalhostif it's on the same machine as you're running the code on)WRDS_ID: Your WRDS ID.DATA_DIR: The local repository for parquet files.CSV_DIR: The local repository for compressed CSV files.
You can set these environment variables in (say) ~/.zprofile:
export PGHOST="localhost"
export PGDATABASE="crsp"
export WRDS_ID="iangow"
export PGUSER="igow"Two arguments table_name and schema are required.
Set WRDS_ID using either wrds_id=your_wrds_id in the function call or the environment variable WRDS_ID.
The wrds_update() function will use the environment variables PGHOST, PGDATABASE, and PGUSER if you have set them.
Otherwise, you need to provide values as arguments to wrds_udpate().
The default for PGPORT is 5432.
To tailor your request, specify the following arguments:
fix_missing: set toTrueto fix missing values. This addresses special missing values, which SAS'sPROC EXPORTdumps as strings. The default isFalse.fix_cr: set toTrueto fix characters. Default value isFalse.drop: specify columns to be dropped using SAS syntax (e.g.,drop="id name"will drop columnsidandname).obs: specify the maximum number of observations to download (e.g.,obs=10will import the first 10 rows from the table on WRDS).rename: rename columns (e.g.,rename="fee=mngt_fee"renamesfeetomngt_fee).force: set toTrueto force update. Default value isFalse.
The software can also upload a local SAS file to PostgreSQL.
You need to have local SAS in order to use this function.
Use fpath to specify the path to the file to be imported.
This software is available from PyPI.
To install of wrds2pg from there:
pip3 install wrds2pgTo install the development version wrds2pg from Github:
sudo -H pip3 install git+https://github.com/iangow/wrds2pg --upgradeExample usage:
from wrds2pg import wrds_update
# 1. Download crsp.mcti from wrds and upload to pg as crps.mcti
# Simplest version
wrds_update(table_name="mcti", schema="crsp")
# Tailored arguments
wrds_update(table_name="mcti", schema="crsp", host=your_pghost,
dbname=your_pg_database,
fix_missing=True, fix_cr=True, drop="b30ret b30ind", obs=10,
rename="caldt=calendar_date", force=True)Author: Ian Gow, iandgow@gmail.com Contributors: Jingyu Zhang, jingyu.zhang@chicagobooth.edu, Evan Jo.