Skip to content

Commit 166b090

Browse files
authored
Merge pull request #25 from SAFEHR-data/jeremy/pseudon
Pseudonymisation and FTPS
2 parents e352c33 + b4c7405 commit 166b090

24 files changed

+1856
-164
lines changed

.dockerignore

Lines changed: 0 additions & 3 deletions
This file was deleted.

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,6 @@ wheels/
1111

1212
# IDEs
1313
.idea/
14+
15+
# settings files (should not be in the source tree anyway, but just in case)
16+
*.env

.pre-commit-config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ repos:
2020
args: [--config-file=pyproject.toml]
2121
additional_dependencies:
2222
[
23+
"pandas-stubs",
2324
"types-psycopg2",
2425
"types-pika"
2526
]

.python-version

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
3.11
1+
3.13

Dockerfile

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,20 @@
1-
FROM python:3.14-slim-bookworm
1+
FROM python:3.13-slim-bookworm AS waveform_base
22
LABEL authors="Stephen Thompson, Jeremy Stein"
3+
# Cron is really small. For the sake of not having to reinstall it all the time,
4+
# put it on both images even though we only need it on exporter.
5+
RUN export DEBIAN_FRONTEND=noninteractive && \
6+
apt-get update && \
7+
apt-get install --yes --no-install-recommends cron && \
8+
apt-get autoremove --yes && apt-get clean --yes && rm -rf /var/lib/apt/lists/*
39
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
4-
WORKDIR /app
510
ARG UVCACHE=/root/.cache/uv
6-
COPY pyproject.toml uv.lock* /app/
11+
COPY PIXL /PIXL
12+
WORKDIR /app
13+
COPY waveform-controller/pyproject.toml waveform-controller/uv.lock /app/
714
RUN --mount=type=cache,target=${UVCACHE} uv pip install --system .
8-
COPY . /app/
15+
COPY waveform-controller/. /app/
916
RUN --mount=type=cache,target=${UVCACHE} uv pip install --system .
17+
FROM waveform_base AS waveform_controller
1018
CMD ["emap-extract-waveform"]
19+
FROM waveform_base AS waveform_exporter
20+
ENTRYPOINT ["/app/exporter-scripts/entrypoint.sh"]

README.md

Lines changed: 42 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -35,23 +35,42 @@ emap docker up -d
3535

3636
## 2 Install and deploy waveform controller using docker
3737

38-
Configuration, copy the configuration file to the config directory and edit
39-
as necessary. Remove the comment telling you not to put secrets in it.
38+
Create a root directory for your installation of the waveform-controller project,
39+
separate to the Emap project root.
4040

41+
### Expected top-level dir structure
4142
```
42-
cp settings.env.EXAMPLE config/settings.env
43+
├── PIXL
44+
├── config
45+
├── waveform-controller
46+
└── waveform-export
4347
```
48+
49+
### Instructions for achieving this structure
50+
51+
Clone this repo (`waveform-controller`) and [PIXL](https://github.com/SAFEHR-data/PIXL),
52+
both inside your root directory.
53+
54+
Set up the config files as follows:
55+
```
56+
mkdir config
57+
cp waveform-controller/config.EXAMPLE/controller.env.EXAMPLE config/controller.env
58+
cp waveform-controller/config.EXAMPLE/exporter.env.EXAMPLE config/settings.env
59+
cp waveform-controller/config.EXAMPLE/hasher.env.EXAMPLE config/hasher.env
60+
```
61+
From the new config files, remove the comments telling you not to put secrets in it, as instructed.
62+
4463
If it doesn't already exist you should create a directory named
4564
`waveform-export` in the parent directory to store the saved waveform
4665
messages.
4766

4867
```
49-
mkdir ../waveform-export
68+
mkdir waveform-export
5069
```
5170

52-
Build and start the controller with docker
71+
Build and start the controller and exporter with docker
5372
```
54-
cd ../waveform-controller
73+
cd waveform-controller
5574
docker compose build
5675
docker compose up -d
5776
```
@@ -67,5 +86,22 @@ Each row of the csv will contain
6786

6887
`csn, mrn, units, samplingRate, observationTime, waveformData`
6988

89+
## Perform a parquet conversion (including de-id)
90+
At the time of writing, the cron pipeline is not set up. This section shows
91+
how to perform an ad-hoc de-id.
92+
```
93+
docker compose run waveform-controller emap-csv-pseudon --csv /waveform-export/original-csv/my_original_csv.csv
94+
```
95+
96+
## Perform an export
97+
At the time of writing, the cron pipeline is not set up. This section shows
98+
how to perform an ad-hoc FTPS upload.
99+
100+
Exported files must be under the WAVEFORM_PSEUDONYMISED_PARQUET directory.
101+
Files passed in must be given relative to this directory:
102+
```
103+
docker compose run --entrypoint "" waveform-exporter emap-send-ftps my_pseudonymised_file.parquet
104+
```
105+
70106
# Developing
71107
See [developing docs](docs/develop.md)
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# This is an EXAMPLE file, do not put real secrets in here.
2-
# Copy it to ./config/settings.env and then DELETE THIS COMMENT.
2+
# Copy it to ../config/controller.env and then DELETE THIS COMMENT.
33
UDS_DBNAME="fakeuds"
44
UDS_USERNAME="inform_user"
55
UDS_PASSWORD="inform"
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# This is an EXAMPLE file, do not put real secrets in here.
2+
# Copy it to ../config/exporter.env and then DELETE THIS COMMENT.
3+
# When does the exporter run
4+
EXPORTER_CRON_SCHEDULE="14 5 * * *"
5+
FTPS_HOST=myftps.example.com
6+
FTPS_PORT=990
7+
FTPS_USERNAME=
8+
FTPS_PASSWORD=

config.EXAMPLE/hasher.env.EXAMPLE

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# This is an EXAMPLE file, do not put real secrets in here.
2+
# Copy it to ../config/hasher.env and then DELETE THIS COMMENT.
3+
HASHER_API_AZ_CLIENT_ID=
4+
HASHER_API_AZ_CLIENT_PASSWORD=
5+
HASHER_API_AZ_TENANT_ID=
6+
HASHER_API_AZ_KEY_VAULT_NAME=

config/.gitignore

Lines changed: 0 additions & 2 deletions
This file was deleted.

0 commit comments

Comments
 (0)