-
Notifications
You must be signed in to change notification settings - Fork 3
docs(guides): Crypt4GH_proTES and SPE deployment Tutorial #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideAdds two new administrator-facing documentation guides describing how to set up and use secure processing environments for sensitive data: one focused on Crypt4GH-based encryption workflows orchestrated via proTES/Funnel, and another outlining an SPE architecture in de.NBI Cloud using WESkit, Slurm, MinIO, and LS Login. Sequence diagram for Crypt4GH encryption workflow via proTES and FunnelsequenceDiagram
actor Admin
participant proTES as proTES_gateway
participant TES as TES_Funnel_server
participant Worker as Funnel_worker_node
participant Storage as Local_storage
participant DB as BoltDB_database
Admin->>proTES: POST task1_keygen
proTES->>TES: create_task(task1_keygen.json)
TES->>DB: store_task_definition
TES->>Worker: schedule_task(task1_keygen)
Worker->>Worker: start_container(crypt4gh_tutorial)
Worker->>Storage: write_keys(sender_sk, sender_pk, recipient_sk, recipient_pk, recipient_pk_copy)
Worker-->>TES: report_task_status(COMPLETED)
TES-->>proTES: task_status(COMPLETED)
proTES-->>Admin: task1 result locations
Admin->>proTES: POST task2_encrypt_file
proTES->>TES: create_task(task2_encrypt_file.json)
TES->>Worker: schedule_task(task2_encrypt_file)
Worker->>Storage: read_keys(sender_sk, recipient_pk)
Worker->>Worker: download_file_and_encrypt
Worker->>Storage: write_encrypted_file_and_size
Worker-->>TES: report_task_status(COMPLETED)
Admin->>proTES: POST task3_decrypt_and_write_size
proTES->>TES: create_task(task3_decrypt_and_write_size.json)
TES->>Worker: schedule_task(task3_decrypt_and_write_size)
Worker->>Storage: read_encrypted_file_and_recipient_sk
Worker->>Worker: decrypt_and_compute_md5sum
Worker->>Storage: write_decrypted_md5sum
Worker-->>TES: report_task_status(COMPLETED)
TES-->>proTES: task_status(COMPLETED)
proTES-->>Admin: final_result_locations
Flow diagram for Crypt4GH key generation, encryption, and decryption pipelineflowchart LR
A[Start_tutorial] --> B[task1_keygen_generate_crypt4gh_keypairs]
B --> B1[Sender_sk_pk_and_recipient_sk_pk_written_to_storage]
B1 --> C[task2_encrypt_file_download_logo_and_record_size]
C --> C1[Write_plain_size_file_to_storage]
C1 --> C2[Encrypt_size_file_with_sender_sk_and_recipient_pk]
C2 --> C3[Store_encrypted_c4gh_file_in_encrypted_directory]
C3 --> D[Transfer_encrypted_file_to_secure_environment]
D --> E[task3_decrypt_and_write_size_read_encrypted_file_and_recipient_sk]
E --> F[Decrypt_and_compute_md5sum]
F --> G[Write_md5sum_file_to_decrypted_directory]
G --> H[End_pipeline]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 13 issues, and left some high level feedback:
- In the
crypt4gh_to_protestutorial, the third task (task3_decrypt_and_write_size.json) never actually decrypts the.c4ghfile and instead runsmd5sumon a non-existentunited_kingdom_logo_size.txt; please revise the executor command and output paths so they match the described decryption workflow and previous task outputs. - The final
curlexample incrypt4gh_to_protes.mdpoststask4_decrypt_and_write_size.json, but the document definestask3_decrypt_and_write_size.json; align the file name and task number to avoid confusion when users follow the tutorial. - Both new guides contain several typos and placeholders (e.g.
enrypted,recomend,impemented,Authentification,???for redirect URLs, andaccessable); a focused pass to correct spelling and replace placeholders with concrete values or explicit instructions will make the tutorials significantly clearer and easier to follow.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In the `crypt4gh_to_protes` tutorial, the third task (`task3_decrypt_and_write_size.json`) never actually decrypts the `.c4gh` file and instead runs `md5sum` on a non-existent `united_kingdom_logo_size.txt`; please revise the executor command and output paths so they match the described decryption workflow and previous task outputs.
- The final `curl` example in `crypt4gh_to_protes.md` posts `task4_decrypt_and_write_size.json`, but the document defines `task3_decrypt_and_write_size.json`; align the file name and task number to avoid confusion when users follow the tutorial.
- Both new guides contain several typos and placeholders (e.g. `enrypted`, `recomend`, `impemented`, `Authentification`, `???` for redirect URLs, and `accessable`); a focused pass to correct spelling and replace placeholders with concrete values or explicit instructions will make the tutorials significantly clearer and easier to follow.
## Individual Comments
### Comment 1
<location> `docs/guides/guide-admin/crypt4gh_to_protes.md:4` </location>
<code_context>
+# Sensitive data encryption and processing using Crypt4GH and proTES
+## Description
+This tutorial presents processing of encrypted sensitive data using TES/[Funnel](https://github.com/ohsu-comp-bio/funnel). Each job is triggered via [proTES](https://github.com/elixir-cloud-aai/proTES), a stable and scalable GA4GH [TES](https://github.com/ga4gh/task-execution-schemas) gateway.
+A [Crypt4GH](https://crypt4gh.readthedocs.io/) enrypted dataset is decrypted using the private key of the recipient. We recomend to store the key in a secure location and use it only for decryption. Transfer of the private key to the TES instance could be done using signed URL.
+
+
</code_context>
<issue_to_address>
**issue (typo):** Fix spelling and article usage in the Crypt4GH description sentence.
Please fix the typos (`enrypted` → `encrypted`, `recomend` → `recommend`) and update the last clause to read: “using a signed URL.”
```suggestion
A [Crypt4GH](https://crypt4gh.readthedocs.io/) encrypted dataset is decrypted using the private key of the recipient. We recommend to store the key in a secure location and use it only for decryption. Transfer of the private key to the TES instance could be done using a signed URL.
```
</issue_to_address>
### Comment 2
<location> `docs/guides/guide-admin/crypt4gh_to_protes.md:8` </location>
<code_context>
+
+
+We assume two parties that need to exchange information: a sender and a recipient. For each party, we generate a `Crypt4GH` key pair and send the recipient’s public key to the sender. All keys are generated inside a container (Task 1) and exported to the configured storage via TES outputs.
+For simplicity, we download a file that was presented [elsewhere](https://github.com/vschnei/elixir-cloud-demos) and assume it contains sensitive information. We encrypt this data (Task 2) using the sender’s private key and the recipient’s public key with `Crypt4GH`, producing a `*.c4gh` encrypted file. Once the encrypted file is transfered to to a secure processing environment, it can be then decrypted (Task 3) using the recipient’s secret key.
+
+
</code_context>
<issue_to_address>
**issue (typo):** Correct spelling and duplicated word in the sentence about transferring the encrypted file.
Change "transfered" to "transferred" and remove the duplicate "to" in that sentence.
```suggestion
For simplicity, we download a file that was presented [elsewhere](https://github.com/vschnei/elixir-cloud-demos) and assume it contains sensitive information. We encrypt this data (Task 2) using the sender’s private key and the recipient’s public key with `Crypt4GH`, producing a `*.c4gh` encrypted file. Once the encrypted file is transferred to a secure processing environment, it can be then decrypted (Task 3) using the recipient’s secret key.
```
</issue_to_address>
### Comment 3
<location> `docs/guides/guide-admin/crypt4gh_to_protes.md:266` </location>
<code_context>
+
+Finally, all task and scheduler data is stored in the embedded database. For further configuration options, see the [Funnel documentation](https://ohsu-comp-bio.github.io/funnel/).
+
+Our pipeleine allows to use proTES, a robust and scalable [Global Alliance for Genomics and Health(GA4GH)](https://www.ga4gh.org/) [Task Execution Service (TES) API](https://github.com/ga4gh/task-execution-schemas) gateway. It enables the federation of tasks across on-premise, hybrid, and multi-cloud environments composed of TES nodes. See official [proTES](https://github.com/elixir-cloud-aai/proTES) documentation for deployment and configuration.
+
+
</code_context>
<issue_to_address>
**suggestion (typo):** Fix typo and improve grammar in the proTES pipeline description.
Change "pipeleine" to "pipeline" and rephrase "allows to use proTES" to "allows the use of proTES" or "allows you to use proTES". Also add a space in "Health(GA4GH)" → "Health (GA4GH)" for readability.
```suggestion
Our pipeline allows the use of proTES, a robust and scalable [Global Alliance for Genomics and Health (GA4GH)](https://www.ga4gh.org/) [Task Execution Service (TES) API](https://github.com/ga4gh/task-execution-schemas) gateway. It enables the federation of tasks across on-premise, hybrid, and multi-cloud environments composed of TES nodes. See official [proTES](https://github.com/elixir-cloud-aai/proTES) documentation for deployment and configuration.
```
</issue_to_address>
### Comment 4
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:5` </location>
<code_context>
+
+This tutorial presents the implementation of a SPE in the de.NBI Cloud (ELIXIR-DE) using ELIXIR and open-source services.
+
+The aim of this tutorial is the need for a Secure Processing Environment (SPE) to analyse large volumes of sensitive data generated by biomedical and clinical research. Easy and secure access to such environments accelerates research and enables participation by researchers with limited resources.
+
+Users of an SPE can run workflows on sensitive data, without ever gaining access to the actual data. The data is processed securely and the user can only access the results of the workflows.
</code_context>
<issue_to_address>
**suggestion (typo):** Improve grammar in the sentence describing the aim of the tutorial.
The sentence “The aim of this tutorial is the need for a Secure Processing Environment (SPE)” reads awkwardly. Please rephrase, e.g. “The aim of this tutorial is to address the need for a Secure Processing Environment (SPE)…” or “…is to describe a Secure Processing Environment (SPE)…”.
```suggestion
The aim of this tutorial is to address the need for a Secure Processing Environment (SPE) to analyse large volumes of sensitive data generated by biomedical and clinical research. Easy and secure access to such environments accelerates research and enables participation by researchers with limited resources.
```
</issue_to_address>
### Comment 5
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:32` </location>
<code_context>
+### Execution
+WESkit allows execution of [Snakemake](https://snakemake.readthedocs.io/en/v7.32.3/) and [Nextflow](https://www.nextflow.io/docs/latest/) workflows by sending a request to the compute infrastructure (Cloud/Cluster). Find details in the [WESkit docs](https://gitlab.com/one-touch-pipeline/weskit/documentation).
+
+A Slurm cluster can be deployed with little effort using [BiBiGrid](https://cloud.denbi.de/wiki/Tutorials/BiBiGrid/), a framework for creating and managing cloud clusters. BiBiGrid uses Ansible to configure cloud images and setup an on-demand SLURM cluster. Alternatively use any other Slurm deployment.
+
+Access to the SPE must be restricted due to national restrictions and laws. Collaborators and foreign researchers need to obtain permission from the Identity Provider to use the SPE. A permission allows them to authenticate at the Identity Provider site and request workflow execution via WESkit on the SLURM cluster.
</code_context>
<issue_to_address>
**nitpick (typo):** Use the correct verb form for "set up".
Here "setup" is used as a verb; please change it to "set up" ("...and set up an on-demand SLURM cluster").
```suggestion
A Slurm cluster can be deployed with little effort using [BiBiGrid](https://cloud.denbi.de/wiki/Tutorials/BiBiGrid/), a framework for creating and managing cloud clusters. BiBiGrid uses Ansible to configure cloud images and set up an on-demand SLURM cluster. Alternatively use any other Slurm deployment.
```
</issue_to_address>
### Comment 6
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:37` </location>
<code_context>
+Access to the SPE must be restricted due to national restrictions and laws. Collaborators and foreign researchers need to obtain permission from the Identity Provider to use the SPE. A permission allows them to authenticate at the Identity Provider site and request workflow execution via WESkit on the SLURM cluster.
+
+### Results
+Finally, results are stored in a storage that is mounted into the cluster and an interface that is only accessable via LS-Login. Sensitive data is not managed by WESkit or accessible in the result storage.
+
+## Step 1: WESkit
</code_context>
<issue_to_address>
**issue (typo):** Correct the spelling of "accessible".
```suggestion
Finally, results are stored in a storage that is mounted into the cluster and an interface that is only accessible via LS-Login. Sensitive data is not managed by WESkit or accessible in the result storage.
```
</issue_to_address>
### Comment 7
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:56` </location>
<code_context>
+
+The SPE uses MinIO/S3 to provide researchers access to non-sensitive results data. Depending on the environment, there are several options available on how to [deploy MinIO](https://github.com/minio/minio?tab=readme-ov-file). To configure OpenID please refer to the [MinIO OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html).
+
+In this scenario we create a bucket "results" in MinIO and allow all authorized user to access MinIO with read-access on the results data.
+
+Note: Minio as storage provider removes it's open source license, therefore it might be advisable to switch to a different storage solution. Refer to [legacy binary releases](https://github.com/minio/minio?tab=readme-ov-file#legacy-binary-releases) for the last open source release.
</code_context>
<issue_to_address>
**nitpick (typo):** Use plural for "users" when referring to all authorized users.
Here this should read "all authorized users" to match the plural subject.
```suggestion
In this scenario we create a bucket "results" in MinIO and allow all authorized users to access MinIO with read-access on the results data.
```
</issue_to_address>
### Comment 8
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:60-62` </location>
<code_context>
+
+### Results crawler
+
+To make the non-sensitive results available in, a crawler continuously checks for new results and copies them to MinIO. This can be impemented as a shell script running as a cron job.
+
+A simple example script is given below:
</code_context>
<issue_to_address>
**issue (typo):** Remove an extra word and fix a typo in the results crawler description.
Please remove the extra "in" after "available" and correct "impemented" to "implemented".
```suggestion
### Results crawler
To make the non-sensitive results available, a crawler continuously checks for new results and copies them to MinIO. This can be implemented as a shell script running as a cron job.
```
</issue_to_address>
### Comment 9
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:94` </location>
<code_context>
+done
+```
+
+This scripts regulary checks the WESkit results folder. WESkit logs information about a workflow execution in the file `log.json`, once the workflow execution finished. The scripts checks if the `log.json` file exists and in case uploads then the result files `results.csv` into the S3 bucket. Uploaded run-directories are tagged with a `upload_token` file to prevent redundant uploads.
+
+## Step 3: User Interface
</code_context>
<issue_to_address>
**suggestion (typo):** Fix subject-verb agreement, spelling, and word order in the description of the crawler behavior.
Suggested wording: "This script regularly checks the WESkit results folder... The script checks if the `log.json` file exists and, if so, uploads the result file `results.csv` to the S3 bucket." This fixes “scripts” → “script”, “regulary” → “regularly”, and “scripts checks” → “script checks”, and smooths the phrasing around uploading the result files.
```suggestion
This script regularly checks the WESkit results folder. WESkit logs information about a workflow execution in the file `log.json` once the workflow execution has finished. The script checks if the `log.json` file exists and, if so, uploads the result file `results.csv` to the S3 bucket. Uploaded run-directories are tagged with an `upload_token` file to prevent redundant uploads.
```
</issue_to_address>
### Comment 10
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:58` </location>
<code_context>
+
+In this scenario we create a bucket "results" in MinIO and allow all authorized user to access MinIO with read-access on the results data.
+
+Note: Minio as storage provider removes it's open source license, therefore it might be advisable to switch to a different storage solution. Refer to [legacy binary releases](https://github.com/minio/minio?tab=readme-ov-file#legacy-binary-releases) for the last open source release.
+
+### Results crawler
</code_context>
<issue_to_address>
**issue (typo):** Correct the possessive "its" and consider adjusting phrasing about MinIO.
Change "it's" to the possessive "its" ("its open source license"). You could also rephrase to something like: "MinIO as a storage provider has removed its open source license" for clarity.
```suggestion
Note: MinIO as a storage provider has removed its open source license, therefore it might be advisable to switch to a different storage solution. Refer to [legacy binary releases](https://github.com/minio/minio?tab=readme-ov-file#legacy-binary-releases) for the last open source release.
```
</issue_to_address>
### Comment 11
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:98` </location>
<code_context>
+
+## Step 3: User Interface
+
+To offer a user interface for the SPE, the simplest way is to use a [customized version](https://gitlab.com/one-touch-pipeline/weskit/gui/-/tree/spe4hd_demo) of the WESkit GUI. It offers a light weight web application to allow researchers to run and monitor workflows. The WESkit GUI repository can be used as a blueprint to create a customized website.
+
+## Step 4: Authentification and Authorization
</code_context>
<issue_to_address>
**nitpick (typo):** Use "lightweight" as a single word.
In the sentence "It offers a light weight web application...", update "light weight" to "lightweight".
```suggestion
To offer a user interface for the SPE, the simplest way is to use a [customized version](https://gitlab.com/one-touch-pipeline/weskit/gui/-/tree/spe4hd_demo) of the WESkit GUI. It offers a lightweight web application to allow researchers to run and monitor workflows. The WESkit GUI repository can be used as a blueprint to create a customized website.
```
</issue_to_address>
### Comment 12
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:100` </location>
<code_context>
+
+To offer a user interface for the SPE, the simplest way is to use a [customized version](https://gitlab.com/one-touch-pipeline/weskit/gui/-/tree/spe4hd_demo) of the WESkit GUI. It offers a light weight web application to allow researchers to run and monitor workflows. The WESkit GUI repository can be used as a blueprint to create a customized website.
+
+## Step 4: Authentification and Authorization
+
+Authentication and authorization is implemented using OIDC. This setup uses the [LS-Login infrastructure](https://lifescience-ri.eu/ls-login/) to for OIDC integration. The LS-Login documentation contains a [guide](https://lifescience-ri.eu/ls-login/documentation/service-provider-documentation/service-provider-documentation.html) on how to register a new service.
</code_context>
<issue_to_address>
**issue (typo):** Correct the spelling of "Authentication" in the section title.
Use “Authentication” rather than “Authentification” in the title.
```suggestion
## Step 4: Authentication and Authorization
```
</issue_to_address>
### Comment 13
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:102` </location>
<code_context>
+
+## Step 4: Authentification and Authorization
+
+Authentication and authorization is implemented using OIDC. This setup uses the [LS-Login infrastructure](https://lifescience-ri.eu/ls-login/) to for OIDC integration. The LS-Login documentation contains a [guide](https://lifescience-ri.eu/ls-login/documentation/service-provider-documentation/service-provider-documentation.html) on how to register a new service.
+
+In this tutorial, we assume a single LS-Login service for all the deployed tools (WESkit, MinIO, WebApp). This requires that the following three addresses are valid as OIDC redirect URLs:
</code_context>
<issue_to_address>
**issue (typo):** Remove the extra "to" and fix pluralization in the LS-Login description.
In the phrase "to for OIDC integration," one "to" is redundant; "for OIDC integration" is sufficient. Also, change "detailed instruction" to "detailed instructions" to match the intended plural meaning.
Suggested implementation:
```
Authentication and authorization is implemented using OIDC. This setup uses the [LS-Login infrastructure](https://lifescience-ri.eu/ls-login/) for OIDC integration. The LS-Login documentation contains a [guide](https://lifescience-ri.eu/ls-login/documentation/service-provider-documentation/service-provider-documentation.html) on how to register a new service.
```
```
LS-Login can be activated in MinIO either by using the MinIO console using the OIDC configuration or by setting environmental variables, as described in the MinIO [OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html). There are detailed instructions in the [ELIXIR-Cloud-AAI documentation](https://elixir-cloud-aai.github.io/guides/guide-admin/services_to_ls_aai/) for using MinIO with LS-Login.
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Signed-off-by: schneiva <valentin.schneider-lunitz@charite.de>
f094a17 to
7186cc2
Compare
svedziok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
| @@ -0,0 +1,464 @@ | |||
| # Setting up Crypt4GH encryption/decryption in Funnel | |||
|
|
|||
| This guide explains how to configure and deploy an environment that enables encryption and decryption of sensitive data files using TES/[Funnel](https://github.com/ohsu-comp-bio/funnel) with [proTES](https://github.com/elixir-cloud-aai/proTES) as a stable and scalable [GA4GH TES](https://github.com/ga4gh/task-execution-schemas) gateway. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please describe a use case here and give a short summary on how this use-case will be implemented with this tutorial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds administrator-focused documentation for setting up secure processing of sensitive data within ELIXIR/de.NBI Cloud environments, including SPE deployment and Crypt4GH-based encrypted workflows executed via proTES/Funnel.
Changes:
- Adds an SPE administrator guide covering WESkit + Slurm, S3/MinIO results handling, and LS Login (OIDC) integration.
- Adds a Crypt4GH + proTES/Funnel tutorial including key generation and TES task examples.
- Updates MkDocs navigation to expose the new administrator guides.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 14 comments.
| File | Description |
|---|---|
| mkdocs.yml | Adds two new admin guide entries to the documentation navigation. |
| docs/guides/guide-admin/sensitive_data_analysis.md | New SPE deployment tutorial for sensitive data analysis (WESkit/Slurm + S3/MinIO + LS Login). |
| docs/guides/guide-admin/crypt4gh_to_protes.md | New Crypt4GH + proTES/Funnel tutorial with example TES tasks and security notes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ### Task 2: Encrypt a File | ||
|
|
||
| This task retrieves a file, encrypts it using Crypt4GH, and stores both the encrypted file and metadata. Create a file named `task2_encrypt_file.json`: | ||
|
|
||
| ```json | ||
| { | ||
| "name": "Encrypt file with crypt4gh", | ||
| "description": "Retrieve a file, record its size, and encrypt it using data holder and researcher keys", | ||
| "inputs": [ | ||
| { | ||
| "name": "data_holder_sk", | ||
| "description": "data_holder secret key", | ||
| "url": "file:///tmp/funnel-storage/keys/data_holder/data_holder.sec", | ||
| "path": "/inputs/keys/data_holder/data_holder.sec", | ||
| "type": "FILE" | ||
| }, | ||
| { | ||
| "name": "researcher_pk", | ||
| "description": "researcher public key", | ||
| "url": "file:///tmp/funnel-storage/keys/researcher/researcher.pub", | ||
| "path": "/inputs/keys/researcher/researcher.pub", | ||
| "type": "FILE" | ||
| } | ||
| ], | ||
| "outputs": [ | ||
| { | ||
| "name": "encrypted_file", | ||
| "description": "Encrypted file", | ||
| "url": "file:///tmp/funnel-storage/encrypted/united_kingdom_logo_size.txt.c4gh", | ||
| "path": "/outputs/encrypted/united_kingdom_logo_size.txt.c4gh", | ||
| "type": "FILE" | ||
| }, | ||
| { | ||
| "name": "size_file", | ||
| "description": "Text file containing original file size", | ||
| "url": "file:///tmp/funnel-storage/raw/united_kingdom_logo_size.txt", | ||
| "path": "/outputs/raw/united_kingdom_logo_size.txt", | ||
| "type": "FILE" | ||
| } | ||
| ], | ||
| "executors": [ | ||
| { | ||
| "image": "quay.io/grbot/crypt4gh-tutorial", | ||
| "command": [ | ||
| "/bin/bash", | ||
| "-c", | ||
| "curl -L -o /tmp/file.png http://britishfamily.co.uk/wp-content/uploads/2015/02/MADE_IN_BRITAIN_web_300x300.png && stat -c %s /tmp/file.png > /outputs/raw/united_kingdom_logo_size.txt && crypt4gh encrypt --sk /inputs/keys/data_holder/data_holder.sec --recipient_pk /inputs/keys/researcher/researcher.pub < /outputs/raw/united_kingdom_logo_size.txt > /outputs/encrypted/united_kingdom_logo_size.txt.c4gh" | ||
| ], |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Task 2 is titled/introduced as "Encrypt a File", but the executor command encrypts /outputs/raw/united_kingdom_logo_size.txt (the file size text), not the downloaded file itself. Either adjust the text to make it clear this is encrypting a small derived value for demonstration, or update the command/output names to actually encrypt the downloaded file.
| "outputs": [ | ||
| { | ||
| "name": "decrypted_file", | ||
| "description": "Decrypted size text file", |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Task 3, the output entry is described as a "Decrypted size text file", but it actually points to an MD5 checksum file (*_md5sum.txt) generated by the executor. Update the description to match what is produced (or change the output to be the decrypted file if that’s the intent).
| "description": "Decrypted size text file", | |
| "description": "MD5 checksum of decrypted file", |
| "command": [ | ||
| "/bin/sh", | ||
| "-c", | ||
| "mkdir -p /outputs/decrypted && /bin/md5sum /outputs/decrypted/united_kingdom_logo_size.txt > /outputs/decrypted/united_kingdom_logo_md5sum.txt" |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Task 3 computes an MD5 checksum of /outputs/decrypted/united_kingdom_logo_size.txt, but no step in the task writes that file to /outputs/decrypted. Given the encrypted input is staged at /inputs/encrypted/united_kingdom_logo_size.txt.c4gh, the checksum command should reference the decrypted input location/name produced by the middleware; otherwise the example will fail when run.
| "mkdir -p /outputs/decrypted && /bin/md5sum /outputs/decrypted/united_kingdom_logo_size.txt > /outputs/decrypted/united_kingdom_logo_md5sum.txt" | |
| "mkdir -p /outputs/decrypted && /bin/md5sum /inputs/decrypted/united_kingdom_logo_size.txt > /outputs/decrypted/united_kingdom_logo_md5sum.txt" |
| - Administrators: | ||
| - "guides/guide-admin/index.md" | ||
| - "LS Login configuration": "guides/guide-admin/services_to_ls_aai.md" | ||
| - "Sensitive Data analysis in Secure Processing Environments (SPE)": "guides/guide-admin/sensitive_data_analysis_spe.md" |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The navigation entry points to guides/guide-admin/sensitive_data_analysis_spe.md, but the added document is docs/guides/guide-admin/sensitive_data_analysis.md. This will create a broken link / missing page in the built site; update the nav path (or rename the file) so it matches the actual filename.
| - "Sensitive Data analysis in Secure Processing Environments (SPE)": "guides/guide-admin/sensitive_data_analysis_spe.md" | |
| - "Sensitive Data analysis in Secure Processing Environments (SPE)": "guides/guide-admin/sensitive_data_analysis.md" |
| # Install Go protocol buffer plugins | ||
| go install google.golang.org/protobuf/cmd/protoc-gen-go@latest | ||
| go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs use go install ...@latest, which makes the tutorial non-reproducible over time (and can break unexpectedly). Prefer pinning explicit module versions (or at least referencing a known-good version) so readers can reliably follow the steps.
| # Install Go protocol buffer plugins | |
| go install google.golang.org/protobuf/cmd/protoc-gen-go@latest | |
| go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest | |
| # Install Go protocol buffer plugins (use pinned versions for reproducibility) | |
| go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.36.0 | |
| go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@v1.5.1 |
|
|
||
| 1. **Install WESkit:** Simple deployment [using Docker](https://gitlab.com/one-touch-pipeline/weskit/documentation/-/blob/master/admin/README.md). | ||
| 2. **Set up compute environment:** WESkit must be configured according to the [compute environment](https://gitlab.com/one-touch-pipeline/weskit/documentation/-/blob/master/admin/executor.md). | ||
| 3. **Provide workflows:** In this scenario, a data controller has to validate and provide every workflow on the compute evironment. Only then they are available for the researchers. WESkit provides instructions for [workflow installation](https://gitlab.com/one-touch-pipeline/weskit/documentation/-/blob/master/admin/workflow-installation.md). Workflows are Snakemake or Nextflow scripts, along with all dependencies and additional data. |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling: "compute evironment" should be "compute environment".
| 3. **Provide workflows:** In this scenario, a data controller has to validate and provide every workflow on the compute evironment. Only then they are available for the researchers. WESkit provides instructions for [workflow installation](https://gitlab.com/one-touch-pipeline/weskit/documentation/-/blob/master/admin/workflow-installation.md). Workflows are Snakemake or Nextflow scripts, along with all dependencies and additional data. | |
| 3. **Provide workflows:** In this scenario, a data controller has to validate and provide every workflow on the compute environment. Only then they are available for the researchers. WESkit provides instructions for [workflow installation](https://gitlab.com/one-touch-pipeline/weskit/documentation/-/blob/master/admin/workflow-installation.md). Workflows are Snakemake or Nextflow scripts, along with all dependencies and additional data. |
| LS-Login can be activated in MinIO either by using the MinIO console using the OIDC configuration or by setting environmental variables, as described in the MinIO [OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html). There are detailed instructions in the [ELIXIR-Cloud-AAI documentation](https://elixir-cloud-aai.github.io/guides/guide-admin/services_to_ls_aai/) for using MinIO with LS-Login. | ||
|
|
||
| In this tutorial, we assume a single LS-Login service for all the deployed tools (WESkit, MinIO, WebApp). This requires that the following three addresses are valid as OIDC redirect URLs. | ||
|
|
||
| ### LS-Login in MinIO | ||
|
|
||
| LS-Login can be activated in MinIO either by using the MinIO console using the OIDC configuration or by setting environmental variables, as described in the MinIO [OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html). There are detailed instruction in the [ELIXIR-Cloud-AAI documentation](https://elixir-cloud-aai.github.io/guides/guide-admin/services_to_ls_aai/) for using MinIO with LS-Login. |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section repeats the same LS-Login-in-MinIO paragraph twice (once before the "LS-Login in MinIO" heading and again under it). Consider keeping a single instance to avoid divergence and make the flow clearer.
| LS-Login can be activated in MinIO either by using the MinIO console using the OIDC configuration or by setting environmental variables, as described in the MinIO [OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html). There are detailed instructions in the [ELIXIR-Cloud-AAI documentation](https://elixir-cloud-aai.github.io/guides/guide-admin/services_to_ls_aai/) for using MinIO with LS-Login. | |
| In this tutorial, we assume a single LS-Login service for all the deployed tools (WESkit, MinIO, WebApp). This requires that the following three addresses are valid as OIDC redirect URLs. | |
| ### LS-Login in MinIO | |
| LS-Login can be activated in MinIO either by using the MinIO console using the OIDC configuration or by setting environmental variables, as described in the MinIO [OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html). There are detailed instruction in the [ELIXIR-Cloud-AAI documentation](https://elixir-cloud-aai.github.io/guides/guide-admin/services_to_ls_aai/) for using MinIO with LS-Login. | |
| In this tutorial, we assume a single LS-Login service for all the deployed tools (WESkit, MinIO, WebApp). This requires that the following three addresses are valid as OIDC redirect URLs. | |
| ### LS-Login in MinIO | |
| LS-Login can be activated in MinIO either by using the MinIO console using the OIDC configuration or by setting environmental variables, as described in the MinIO [OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html). There are detailed instructions in the [ELIXIR-Cloud-AAI documentation](https://elixir-cloud-aai.github.io/guides/guide-admin/services_to_ls_aai/) for using MinIO with LS-Login. |
| "command": [ | ||
| "/bin/sh", | ||
| "-c", | ||
| "mkdir -p /outputs/decrypted && /bin/md5sum /outputs/decrypted/united_kingdom_logo_size.txt > /outputs/decrypted/united_kingdom_logo_md5sum.txt" |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Task 3 uses md5sum to generate a checksum for verification (/bin/md5sum /outputs/decrypted/united_kingdom_logo_size.txt). MD5 is a weak hash function with practical collision attacks, so an attacker who can influence the decrypted content could craft different data with the same checksum and bypass integrity checks. Use a stronger hash function such as SHA-256 for any integrity or verification of decrypted data in this workflow.
| funnel node run --config worker-config.yaml & | ||
| ``` | ||
|
|
||
| Verify that both services are running by checking the logs or accessing the Funnel server API at `http://<server-ip>:8000`. |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Funnel API is demonstrated as being accessed over plain HTTP (http://<server-ip>:8000), even though this environment is intended for handling sensitive genomic data. Using unencrypted HTTP allows any attacker on the network path to eavesdrop on or tamper with task metadata and results, potentially exposing sensitive information or altering jobs. Configure the Funnel server with TLS and update the documentation to use https:// endpoints (and client authentication if available) for all non-local access.
| Verify that both services are running by checking the logs or accessing the Funnel server API at `http://<server-ip>:8000`. | |
| Verify that both services are running by checking the logs or accessing the Funnel server API over HTTPS at `https://<server-ip>:8443` (or your configured TLS port). For local testing on the same host only, you may instead use `http://localhost:8000`. |
| curl -X POST http://localhost:8080/ga4gh/tes/v1/tasks \ | ||
| -H "Content-Type: application/json" \ | ||
| -d @task1_keygen.json | ||
| # Submit Task 2: Encrypt file (wait for Task 1 to complete) | ||
| curl -X POST http://localhost:8080/ga4gh/tes/v1/tasks \ | ||
| -H "Content-Type: application/json" \ | ||
| -d @task2_encrypt_file.json | ||
| # Submit Task 3: Decrypt file (wait for Task 2 to complete) | ||
| curl -X POST http://localhost:8080/ga4gh/tes/v1/tasks \ | ||
| -H "Content-Type: application/json" \ | ||
| -d @task3_decrypt_and_write_size.json |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example curl commands submit TES tasks to proTES over plain HTTP (http://localhost:8080/ga4gh/tes/v1/tasks) and explicitly suggest replacing localhost with a remote server address. When used against a non-local endpoint, this exposes task payloads and any embedded file locations or tokens to network attackers who can intercept or modify requests. Show https:// URLs here and document that the proTES endpoint must be served over TLS in production, with clients configured to validate the certificate.
Add Administrator Guides for Sensitive Data Processing
Summary
This PR adds two administrator guides for setting up secure processing environments (SPE) to handle sensitive data within the ELIXIR Cloud.
Changes:
crypt4gh_to_protes.md - Tutorial on encryption and processing of sensitive data using Crypt4GH and proTES/Funnel
sensitive_data_analysis.md - Guide for implementing a complete Secure Processing Environment (SPE) in the de.NBI
Summary by Sourcery
Add administrator tutorials for encrypting and processing sensitive data using Crypt4GH with proTES/Funnel and for implementing a Secure Processing Environment (SPE) for sensitive data analysis in de.NBI Cloud.
Documentation: