Skip to content

Conversation

@vschnei
Copy link

@vschnei vschnei commented Jan 5, 2026

Add Administrator Guides for Sensitive Data Processing

Summary

This PR adds two administrator guides for setting up secure processing environments (SPE) to handle sensitive data within the ELIXIR Cloud.

Changes:
crypt4gh_to_protes.md - Tutorial on encryption and processing of sensitive data using Crypt4GH and proTES/Funnel
sensitive_data_analysis.md - Guide for implementing a complete Secure Processing Environment (SPE) in the de.NBI

Summary by Sourcery

Add administrator tutorials for encrypting and processing sensitive data using Crypt4GH with proTES/Funnel and for implementing a Secure Processing Environment (SPE) for sensitive data analysis in de.NBI Cloud.

Documentation:

  • Add a step-by-step guide for setting up Crypt4GH-based encryption workflows executed via proTES and Funnel.
  • Add an administrator guide for deploying a Secure Processing Environment (SPE) with WESkit, Slurm, MinIO/S3, and LS Login for sensitive data analysis.

@vschnei vschnei self-assigned this Jan 5, 2026
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jan 5, 2026

Reviewer's Guide

Adds two new administrator-facing documentation guides describing how to set up and use secure processing environments for sensitive data: one focused on Crypt4GH-based encryption workflows orchestrated via proTES/Funnel, and another outlining an SPE architecture in de.NBI Cloud using WESkit, Slurm, MinIO, and LS Login.

Sequence diagram for Crypt4GH encryption workflow via proTES and Funnel

sequenceDiagram
  actor Admin
  participant proTES as proTES_gateway
  participant TES as TES_Funnel_server
  participant Worker as Funnel_worker_node
  participant Storage as Local_storage
  participant DB as BoltDB_database

  Admin->>proTES: POST task1_keygen
  proTES->>TES: create_task(task1_keygen.json)
  TES->>DB: store_task_definition
  TES->>Worker: schedule_task(task1_keygen)
  Worker->>Worker: start_container(crypt4gh_tutorial)
  Worker->>Storage: write_keys(sender_sk, sender_pk, recipient_sk, recipient_pk, recipient_pk_copy)
  Worker-->>TES: report_task_status(COMPLETED)
  TES-->>proTES: task_status(COMPLETED)
  proTES-->>Admin: task1 result locations

  Admin->>proTES: POST task2_encrypt_file
  proTES->>TES: create_task(task2_encrypt_file.json)
  TES->>Worker: schedule_task(task2_encrypt_file)
  Worker->>Storage: read_keys(sender_sk, recipient_pk)
  Worker->>Worker: download_file_and_encrypt
  Worker->>Storage: write_encrypted_file_and_size
  Worker-->>TES: report_task_status(COMPLETED)

  Admin->>proTES: POST task3_decrypt_and_write_size
  proTES->>TES: create_task(task3_decrypt_and_write_size.json)
  TES->>Worker: schedule_task(task3_decrypt_and_write_size)
  Worker->>Storage: read_encrypted_file_and_recipient_sk
  Worker->>Worker: decrypt_and_compute_md5sum
  Worker->>Storage: write_decrypted_md5sum
  Worker-->>TES: report_task_status(COMPLETED)
  TES-->>proTES: task_status(COMPLETED)
  proTES-->>Admin: final_result_locations
Loading

Flow diagram for Crypt4GH key generation, encryption, and decryption pipeline

flowchart LR
  A[Start_tutorial] --> B[task1_keygen_generate_crypt4gh_keypairs]
  B --> B1[Sender_sk_pk_and_recipient_sk_pk_written_to_storage]
  B1 --> C[task2_encrypt_file_download_logo_and_record_size]
  C --> C1[Write_plain_size_file_to_storage]
  C1 --> C2[Encrypt_size_file_with_sender_sk_and_recipient_pk]
  C2 --> C3[Store_encrypted_c4gh_file_in_encrypted_directory]
  C3 --> D[Transfer_encrypted_file_to_secure_environment]
  D --> E[task3_decrypt_and_write_size_read_encrypted_file_and_recipient_sk]
  E --> F[Decrypt_and_compute_md5sum]
  F --> G[Write_md5sum_file_to_decrypted_directory]
  G --> H[End_pipeline]
Loading

File-Level Changes

Change Details Files
Introduce a step-by-step tutorial for encrypting, transferring, and processing sensitive data using Crypt4GH with tasks orchestrated through proTES and Funnel.
  • Document prerequisites and VM layout for Funnel server, worker, and proTES deployment with Docker-based setup instructions.
  • Provide example Funnel server and worker configuration YAML files, including storage and scheduler settings.
  • Describe a three-task Crypt4GH pipeline with full TES task JSON payloads for key generation, file encryption, and decryption/verification.
  • Show how to submit TES tasks via proTES using curl against the GA4GH TES API endpoint.
docs/guides/guide-admin/crypt4gh_to_protes.md
Add an administrator guide for deploying and operating a Secure Processing Environment (SPE) for sensitive data analysis in de.NBI Cloud using WESkit and Slurm.
  • Explain the overall SPE architecture with secure execution backend, external S3-compatible storage for results, and LS Login for user authentication.
  • Detail required infrastructure and roles of WESkit, Slurm cluster, and MinIO, including restrictions on access to sensitive data.
  • Outline WESkit deployment steps, workflow provisioning, and compute environment configuration for Snakemake/Nextflow workflows.
  • Describe MinIO deployment and OIDC integration with LS Login, including an example result-crawler shell script to move non-sensitive outputs to S3.
  • Document the role of a user-facing GUI (custom WESkit UI) and OIDC-based authentication/authorization across WESkit, MinIO, and the web app.
docs/guides/guide-admin/sensitive_data_analysis.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 13 issues, and left some high level feedback:

  • In the crypt4gh_to_protes tutorial, the third task (task3_decrypt_and_write_size.json) never actually decrypts the .c4gh file and instead runs md5sum on a non-existent united_kingdom_logo_size.txt; please revise the executor command and output paths so they match the described decryption workflow and previous task outputs.
  • The final curl example in crypt4gh_to_protes.md posts task4_decrypt_and_write_size.json, but the document defines task3_decrypt_and_write_size.json; align the file name and task number to avoid confusion when users follow the tutorial.
  • Both new guides contain several typos and placeholders (e.g. enrypted, recomend, impemented, Authentification, ??? for redirect URLs, and accessable); a focused pass to correct spelling and replace placeholders with concrete values or explicit instructions will make the tutorials significantly clearer and easier to follow.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the `crypt4gh_to_protes` tutorial, the third task (`task3_decrypt_and_write_size.json`) never actually decrypts the `.c4gh` file and instead runs `md5sum` on a non-existent `united_kingdom_logo_size.txt`; please revise the executor command and output paths so they match the described decryption workflow and previous task outputs.
- The final `curl` example in `crypt4gh_to_protes.md` posts `task4_decrypt_and_write_size.json`, but the document defines `task3_decrypt_and_write_size.json`; align the file name and task number to avoid confusion when users follow the tutorial.
- Both new guides contain several typos and placeholders (e.g. `enrypted`, `recomend`, `impemented`, `Authentification`, `???` for redirect URLs, and `accessable`); a focused pass to correct spelling and replace placeholders with concrete values or explicit instructions will make the tutorials significantly clearer and easier to follow.

## Individual Comments

### Comment 1
<location> `docs/guides/guide-admin/crypt4gh_to_protes.md:4` </location>
<code_context>
+# Sensitive data encryption and processing using Crypt4GH and proTES 
+## Description
+This tutorial presents processing of encrypted sensitive data using TES/[Funnel](https://github.com/ohsu-comp-bio/funnel). Each job is triggered via [proTES](https://github.com/elixir-cloud-aai/proTES), a stable and scalable GA4GH [TES](https://github.com/ga4gh/task-execution-schemas) gateway. 
+A [Crypt4GH](https://crypt4gh.readthedocs.io/) enrypted dataset is decrypted using the private key of the recipient. We recomend to store the key in a secure location and use it only for decryption. Transfer of the private key to the TES instance could be done using signed URL.
+
+
</code_context>

<issue_to_address>
**issue (typo):** Fix spelling and article usage in the Crypt4GH description sentence.

Please fix the typos (`enrypted``encrypted`, `recomend``recommend`) and update the last clause to read: “using a signed URL.”

```suggestion
A [Crypt4GH](https://crypt4gh.readthedocs.io/) encrypted dataset is decrypted using the private key of the recipient. We recommend to store the key in a secure location and use it only for decryption. Transfer of the private key to the TES instance could be done using a signed URL.
```
</issue_to_address>

### Comment 2
<location> `docs/guides/guide-admin/crypt4gh_to_protes.md:8` </location>
<code_context>
+
+
+We assume two parties that need to exchange information: a sender and a recipient. For each party, we generate a `Crypt4GH` key pair and send the recipient’s public key to the sender. All keys are generated inside a container (Task 1) and exported to the configured storage via TES outputs.
+For simplicity, we download a file that was presented [elsewhere](https://github.com/vschnei/elixir-cloud-demos) and assume it contains sensitive information. We encrypt this data (Task 2) using the sender’s private key and the recipient’s public key with `Crypt4GH`, producing a `*.c4gh` encrypted file. Once the encrypted file is transfered to to a secure processing environment, it can be then decrypted (Task 3) using the recipient’s secret key.
+
+
</code_context>

<issue_to_address>
**issue (typo):** Correct spelling and duplicated word in the sentence about transferring the encrypted file.

Change "transfered" to "transferred" and remove the duplicate "to" in that sentence.

```suggestion
For simplicity, we download a file that was presented [elsewhere](https://github.com/vschnei/elixir-cloud-demos) and assume it contains sensitive information. We encrypt this data (Task 2) using the sender’s private key and the recipient’s public key with `Crypt4GH`, producing a `*.c4gh` encrypted file. Once the encrypted file is transferred to a secure processing environment, it can be then decrypted (Task 3) using the recipient’s secret key.
```
</issue_to_address>

### Comment 3
<location> `docs/guides/guide-admin/crypt4gh_to_protes.md:266` </location>
<code_context>
+
+Finally, all task and scheduler data is stored in the embedded database. For further configuration options, see the [Funnel documentation](https://ohsu-comp-bio.github.io/funnel/).
+
+Our pipeleine allows to use proTES, a robust and scalable [Global Alliance for Genomics and Health(GA4GH)](https://www.ga4gh.org/) [Task Execution Service (TES) API](https://github.com/ga4gh/task-execution-schemas) gateway. It enables the federation of tasks across on-premise, hybrid, and multi-cloud environments composed of TES nodes. See official [proTES](https://github.com/elixir-cloud-aai/proTES) documentation for deployment and configuration.
+
+
</code_context>

<issue_to_address>
**suggestion (typo):** Fix typo and improve grammar in the proTES pipeline description.

Change "pipeleine" to "pipeline" and rephrase "allows to use proTES" to "allows the use of proTES" or "allows you to use proTES". Also add a space in "Health(GA4GH)" → "Health (GA4GH)" for readability.

```suggestion
Our pipeline allows the use of proTES, a robust and scalable [Global Alliance for Genomics and Health (GA4GH)](https://www.ga4gh.org/) [Task Execution Service (TES) API](https://github.com/ga4gh/task-execution-schemas) gateway. It enables the federation of tasks across on-premise, hybrid, and multi-cloud environments composed of TES nodes. See official [proTES](https://github.com/elixir-cloud-aai/proTES) documentation for deployment and configuration.
```
</issue_to_address>

### Comment 4
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:5` </location>
<code_context>
+
+This tutorial presents the implementation of a SPE in the de.NBI Cloud (ELIXIR-DE) using ELIXIR and open-source services. 
+
+The aim of this tutorial is the need for a Secure Processing Environment (SPE) to analyse large volumes of sensitive data generated by biomedical and clinical research. Easy and secure access to such environments accelerates research and enables participation by researchers with limited resources.
+
+Users of an SPE can run workflows on sensitive data, without ever gaining access to the actual data. The data is processed securely and the user can only access the results of the workflows.
</code_context>

<issue_to_address>
**suggestion (typo):** Improve grammar in the sentence describing the aim of the tutorial.

The sentence “The aim of this tutorial is the need for a Secure Processing Environment (SPE)” reads awkwardly. Please rephrase, e.g. “The aim of this tutorial is to address the need for a Secure Processing Environment (SPE)…” or “…is to describe a Secure Processing Environment (SPE)…”.

```suggestion
The aim of this tutorial is to address the need for a Secure Processing Environment (SPE) to analyse large volumes of sensitive data generated by biomedical and clinical research. Easy and secure access to such environments accelerates research and enables participation by researchers with limited resources.
```
</issue_to_address>

### Comment 5
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:32` </location>
<code_context>
+### Execution
+WESkit allows execution of [Snakemake](https://snakemake.readthedocs.io/en/v7.32.3/) and [Nextflow](https://www.nextflow.io/docs/latest/) workflows by sending a request to the compute infrastructure (Cloud/Cluster). Find details in the [WESkit docs](https://gitlab.com/one-touch-pipeline/weskit/documentation).
+
+A Slurm cluster can be deployed with little effort using [BiBiGrid](https://cloud.denbi.de/wiki/Tutorials/BiBiGrid/), a framework for creating and managing cloud clusters. BiBiGrid uses Ansible to configure cloud images and setup an on-demand SLURM cluster. Alternatively use any other Slurm deployment.
+
+Access to the SPE must be restricted due to national restrictions and laws. Collaborators and foreign researchers need to obtain permission from the Identity Provider to use the SPE. A permission allows them to authenticate at the Identity Provider site and request workflow execution via WESkit on the SLURM cluster.
</code_context>

<issue_to_address>
**nitpick (typo):** Use the correct verb form for "set up".

Here "setup" is used as a verb; please change it to "set up" ("...and set up an on-demand SLURM cluster").

```suggestion
A Slurm cluster can be deployed with little effort using [BiBiGrid](https://cloud.denbi.de/wiki/Tutorials/BiBiGrid/), a framework for creating and managing cloud clusters. BiBiGrid uses Ansible to configure cloud images and set up an on-demand SLURM cluster. Alternatively use any other Slurm deployment.
```
</issue_to_address>

### Comment 6
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:37` </location>
<code_context>
+Access to the SPE must be restricted due to national restrictions and laws. Collaborators and foreign researchers need to obtain permission from the Identity Provider to use the SPE. A permission allows them to authenticate at the Identity Provider site and request workflow execution via WESkit on the SLURM cluster.
+
+### Results
+Finally, results are stored in a storage that is mounted into the cluster and an interface that is only accessable via LS-Login. Sensitive data is not managed by WESkit or accessible in the result storage.
+
+## Step 1: WESkit
</code_context>

<issue_to_address>
**issue (typo):** Correct the spelling of "accessible".

```suggestion
Finally, results are stored in a storage that is mounted into the cluster and an interface that is only accessible via LS-Login. Sensitive data is not managed by WESkit or accessible in the result storage.
```
</issue_to_address>

### Comment 7
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:56` </location>
<code_context>
+
+The SPE uses MinIO/S3 to provide researchers access to non-sensitive results data. Depending on the environment, there are several options available on how to [deploy MinIO](https://github.com/minio/minio?tab=readme-ov-file). To configure OpenID please refer to the [MinIO OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html). 
+
+In this scenario we create a bucket "results" in MinIO and allow all authorized user to access MinIO with read-access on the results data.
+
+Note: Minio as storage provider removes it's open source license, therefore it might be advisable to switch to a different storage solution. Refer to [legacy binary releases](https://github.com/minio/minio?tab=readme-ov-file#legacy-binary-releases) for the last open source release.
</code_context>

<issue_to_address>
**nitpick (typo):** Use plural for "users" when referring to all authorized users.

Here this should read "all authorized users" to match the plural subject.

```suggestion
In this scenario we create a bucket "results" in MinIO and allow all authorized users to access MinIO with read-access on the results data.
```
</issue_to_address>

### Comment 8
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:60-62` </location>
<code_context>
+
+### Results crawler
+
+To make the non-sensitive results available in, a crawler continuously checks for new results and copies them to MinIO. This can be impemented as a shell script running as a cron job.
+
+A simple example script is given below:
</code_context>

<issue_to_address>
**issue (typo):** Remove an extra word and fix a typo in the results crawler description.

Please remove the extra "in" after "available" and correct "impemented" to "implemented".

```suggestion
### Results crawler

To make the non-sensitive results available, a crawler continuously checks for new results and copies them to MinIO. This can be implemented as a shell script running as a cron job.
```
</issue_to_address>

### Comment 9
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:94` </location>
<code_context>
+done
+```
+
+This scripts regulary checks the WESkit results folder. WESkit logs information about a workflow execution in the file `log.json`, once the workflow execution finished. The scripts checks if the `log.json` file exists and in case uploads then the result files `results.csv` into the S3 bucket. Uploaded run-directories are tagged with a `upload_token` file to prevent redundant uploads.
+
+## Step 3: User Interface
</code_context>

<issue_to_address>
**suggestion (typo):** Fix subject-verb agreement, spelling, and word order in the description of the crawler behavior.

Suggested wording: "This script regularly checks the WESkit results folder... The script checks if the `log.json` file exists and, if so, uploads the result file `results.csv` to the S3 bucket." This fixes “scripts” → “script”, “regulary” → “regularly”, and “scripts checks” → “script checks”, and smooths the phrasing around uploading the result files.

```suggestion
This script regularly checks the WESkit results folder. WESkit logs information about a workflow execution in the file `log.json` once the workflow execution has finished. The script checks if the `log.json` file exists and, if so, uploads the result file `results.csv` to the S3 bucket. Uploaded run-directories are tagged with an `upload_token` file to prevent redundant uploads.
```
</issue_to_address>

### Comment 10
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:58` </location>
<code_context>
+
+In this scenario we create a bucket "results" in MinIO and allow all authorized user to access MinIO with read-access on the results data.
+
+Note: Minio as storage provider removes it's open source license, therefore it might be advisable to switch to a different storage solution. Refer to [legacy binary releases](https://github.com/minio/minio?tab=readme-ov-file#legacy-binary-releases) for the last open source release.
+
+### Results crawler
</code_context>

<issue_to_address>
**issue (typo):** Correct the possessive "its" and consider adjusting phrasing about MinIO.

Change "it's" to the possessive "its" ("its open source license"). You could also rephrase to something like: "MinIO as a storage provider has removed its open source license" for clarity.

```suggestion
Note: MinIO as a storage provider has removed its open source license, therefore it might be advisable to switch to a different storage solution. Refer to [legacy binary releases](https://github.com/minio/minio?tab=readme-ov-file#legacy-binary-releases) for the last open source release.
```
</issue_to_address>

### Comment 11
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:98` </location>
<code_context>
+
+## Step 3: User Interface
+
+To offer a user interface for the SPE, the simplest way is to use a [customized version](https://gitlab.com/one-touch-pipeline/weskit/gui/-/tree/spe4hd_demo) of the WESkit GUI. It offers a light weight web application to allow researchers to run and monitor workflows. The WESkit GUI repository can be used as a blueprint to create a customized website.
+
+## Step 4: Authentification and Authorization
</code_context>

<issue_to_address>
**nitpick (typo):** Use "lightweight" as a single word.

In the sentence "It offers a light weight web application...", update "light weight" to "lightweight".

```suggestion
To offer a user interface for the SPE, the simplest way is to use a [customized version](https://gitlab.com/one-touch-pipeline/weskit/gui/-/tree/spe4hd_demo) of the WESkit GUI. It offers a lightweight web application to allow researchers to run and monitor workflows. The WESkit GUI repository can be used as a blueprint to create a customized website.
```
</issue_to_address>

### Comment 12
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:100` </location>
<code_context>
+
+To offer a user interface for the SPE, the simplest way is to use a [customized version](https://gitlab.com/one-touch-pipeline/weskit/gui/-/tree/spe4hd_demo) of the WESkit GUI. It offers a light weight web application to allow researchers to run and monitor workflows. The WESkit GUI repository can be used as a blueprint to create a customized website.
+
+## Step 4: Authentification and Authorization
+
+Authentication and authorization is implemented using OIDC. This setup uses the [LS-Login infrastructure](https://lifescience-ri.eu/ls-login/) to for OIDC integration. The LS-Login documentation contains a [guide](https://lifescience-ri.eu/ls-login/documentation/service-provider-documentation/service-provider-documentation.html) on how to register a new service.
</code_context>

<issue_to_address>
**issue (typo):** Correct the spelling of "Authentication" in the section title.

Use “Authentication” rather than “Authentification” in the title.

```suggestion
## Step 4: Authentication and Authorization
```
</issue_to_address>

### Comment 13
<location> `docs/guides/guide-admin/sensitive_data_analysis.md:102` </location>
<code_context>
+
+## Step 4: Authentification and Authorization
+
+Authentication and authorization is implemented using OIDC. This setup uses the [LS-Login infrastructure](https://lifescience-ri.eu/ls-login/) to for OIDC integration. The LS-Login documentation contains a [guide](https://lifescience-ri.eu/ls-login/documentation/service-provider-documentation/service-provider-documentation.html) on how to register a new service.
+
+In this tutorial, we assume a single LS-Login service for all the deployed tools (WESkit, MinIO, WebApp). This requires that the following three addresses are valid as OIDC redirect URLs:
</code_context>

<issue_to_address>
**issue (typo):** Remove the extra "to" and fix pluralization in the LS-Login description.

In the phrase "to for OIDC integration," one "to" is redundant; "for OIDC integration" is sufficient. Also, change "detailed instruction" to "detailed instructions" to match the intended plural meaning.

Suggested implementation:

```
Authentication and authorization is implemented using OIDC. This setup uses the [LS-Login infrastructure](https://lifescience-ri.eu/ls-login/) for OIDC integration. The LS-Login documentation contains a [guide](https://lifescience-ri.eu/ls-login/documentation/service-provider-documentation/service-provider-documentation.html) on how to register a new service.

```

```
LS-Login can be activated in MinIO either by using the MinIO console using the OIDC configuration or by setting environmental variables, as described in the MinIO [OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html). There are detailed instructions in the [ELIXIR-Cloud-AAI documentation](https://elixir-cloud-aai.github.io/guides/guide-admin/services_to_ls_aai/) for using MinIO with LS-Login.

```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@vschnei vschnei requested review from kraatz and svedziok and removed request for kraatz January 6, 2026 10:41
@vschnei vschnei changed the title Crypt4GH and proTES Tutorial docs(guides): Crypt4GH and proTES Tutorial Jan 12, 2026
Signed-off-by: schneiva <valentin.schneider-lunitz@charite.de>
@vschnei vschnei changed the title docs(guides): Crypt4GH and proTES Tutorial docs(guides): Crypt4GH_proTES and SPE deployment Tutorial Jan 12, 2026
Copy link
Contributor

@svedziok svedziok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@@ -0,0 +1,464 @@
# Setting up Crypt4GH encryption/decryption in Funnel

This guide explains how to configure and deploy an environment that enables encryption and decryption of sensitive data files using TES/[Funnel](https://github.com/ohsu-comp-bio/funnel) with [proTES](https://github.com/elixir-cloud-aai/proTES) as a stable and scalable [GA4GH TES](https://github.com/ga4gh/task-execution-schemas) gateway.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please describe a use case here and give a short summary on how this use-case will be implemented with this tutorial

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds administrator-focused documentation for setting up secure processing of sensitive data within ELIXIR/de.NBI Cloud environments, including SPE deployment and Crypt4GH-based encrypted workflows executed via proTES/Funnel.

Changes:

  • Adds an SPE administrator guide covering WESkit + Slurm, S3/MinIO results handling, and LS Login (OIDC) integration.
  • Adds a Crypt4GH + proTES/Funnel tutorial including key generation and TES task examples.
  • Updates MkDocs navigation to expose the new administrator guides.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 14 comments.

File Description
mkdocs.yml Adds two new admin guide entries to the documentation navigation.
docs/guides/guide-admin/sensitive_data_analysis.md New SPE deployment tutorial for sensitive data analysis (WESkit/Slurm + S3/MinIO + LS Login).
docs/guides/guide-admin/crypt4gh_to_protes.md New Crypt4GH + proTES/Funnel tutorial with example TES tasks and security notes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +273 to +320
### Task 2: Encrypt a File

This task retrieves a file, encrypts it using Crypt4GH, and stores both the encrypted file and metadata. Create a file named `task2_encrypt_file.json`:

```json
{
"name": "Encrypt file with crypt4gh",
"description": "Retrieve a file, record its size, and encrypt it using data holder and researcher keys",
"inputs": [
{
"name": "data_holder_sk",
"description": "data_holder secret key",
"url": "file:///tmp/funnel-storage/keys/data_holder/data_holder.sec",
"path": "/inputs/keys/data_holder/data_holder.sec",
"type": "FILE"
},
{
"name": "researcher_pk",
"description": "researcher public key",
"url": "file:///tmp/funnel-storage/keys/researcher/researcher.pub",
"path": "/inputs/keys/researcher/researcher.pub",
"type": "FILE"
}
],
"outputs": [
{
"name": "encrypted_file",
"description": "Encrypted file",
"url": "file:///tmp/funnel-storage/encrypted/united_kingdom_logo_size.txt.c4gh",
"path": "/outputs/encrypted/united_kingdom_logo_size.txt.c4gh",
"type": "FILE"
},
{
"name": "size_file",
"description": "Text file containing original file size",
"url": "file:///tmp/funnel-storage/raw/united_kingdom_logo_size.txt",
"path": "/outputs/raw/united_kingdom_logo_size.txt",
"type": "FILE"
}
],
"executors": [
{
"image": "quay.io/grbot/crypt4gh-tutorial",
"command": [
"/bin/bash",
"-c",
"curl -L -o /tmp/file.png http://britishfamily.co.uk/wp-content/uploads/2015/02/MADE_IN_BRITAIN_web_300x300.png && stat -c %s /tmp/file.png > /outputs/raw/united_kingdom_logo_size.txt && crypt4gh encrypt --sk /inputs/keys/data_holder/data_holder.sec --recipient_pk /inputs/keys/researcher/researcher.pub < /outputs/raw/united_kingdom_logo_size.txt > /outputs/encrypted/united_kingdom_logo_size.txt.c4gh"
],
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task 2 is titled/introduced as "Encrypt a File", but the executor command encrypts /outputs/raw/united_kingdom_logo_size.txt (the file size text), not the downloaded file itself. Either adjust the text to make it clear this is encrypting a small derived value for demonstration, or update the command/output names to actually encrypt the downloaded file.

Copilot uses AI. Check for mistakes.
"outputs": [
{
"name": "decrypted_file",
"description": "Decrypted size text file",
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Task 3, the output entry is described as a "Decrypted size text file", but it actually points to an MD5 checksum file (*_md5sum.txt) generated by the executor. Update the description to match what is produced (or change the output to be the decrypted file if that’s the intent).

Suggested change
"description": "Decrypted size text file",
"description": "MD5 checksum of decrypted file",

Copilot uses AI. Check for mistakes.
"command": [
"/bin/sh",
"-c",
"mkdir -p /outputs/decrypted && /bin/md5sum /outputs/decrypted/united_kingdom_logo_size.txt > /outputs/decrypted/united_kingdom_logo_md5sum.txt"
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task 3 computes an MD5 checksum of /outputs/decrypted/united_kingdom_logo_size.txt, but no step in the task writes that file to /outputs/decrypted. Given the encrypted input is staged at /inputs/encrypted/united_kingdom_logo_size.txt.c4gh, the checksum command should reference the decrypted input location/name produced by the middleware; otherwise the example will fail when run.

Suggested change
"mkdir -p /outputs/decrypted && /bin/md5sum /outputs/decrypted/united_kingdom_logo_size.txt > /outputs/decrypted/united_kingdom_logo_md5sum.txt"
"mkdir -p /outputs/decrypted && /bin/md5sum /inputs/decrypted/united_kingdom_logo_size.txt > /outputs/decrypted/united_kingdom_logo_md5sum.txt"

Copilot uses AI. Check for mistakes.
- Administrators:
- "guides/guide-admin/index.md"
- "LS Login configuration": "guides/guide-admin/services_to_ls_aai.md"
- "Sensitive Data analysis in Secure Processing Environments (SPE)": "guides/guide-admin/sensitive_data_analysis_spe.md"
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The navigation entry points to guides/guide-admin/sensitive_data_analysis_spe.md, but the added document is docs/guides/guide-admin/sensitive_data_analysis.md. This will create a broken link / missing page in the built site; update the nav path (or rename the file) so it matches the actual filename.

Suggested change
- "Sensitive Data analysis in Secure Processing Environments (SPE)": "guides/guide-admin/sensitive_data_analysis_spe.md"
- "Sensitive Data analysis in Secure Processing Environments (SPE)": "guides/guide-admin/sensitive_data_analysis.md"

Copilot uses AI. Check for mistakes.
Comment on lines +73 to +75
# Install Go protocol buffer plugins
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs use go install ...@latest, which makes the tutorial non-reproducible over time (and can break unexpectedly). Prefer pinning explicit module versions (or at least referencing a known-good version) so readers can reliably follow the steps.

Suggested change
# Install Go protocol buffer plugins
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
# Install Go protocol buffer plugins (use pinned versions for reproducibility)
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.36.0
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@v1.5.1

Copilot uses AI. Check for mistakes.

1. **Install WESkit:** Simple deployment [using Docker](https://gitlab.com/one-touch-pipeline/weskit/documentation/-/blob/master/admin/README.md).
2. **Set up compute environment:** WESkit must be configured according to the [compute environment](https://gitlab.com/one-touch-pipeline/weskit/documentation/-/blob/master/admin/executor.md).
3. **Provide workflows:** In this scenario, a data controller has to validate and provide every workflow on the compute evironment. Only then they are available for the researchers. WESkit provides instructions for [workflow installation](https://gitlab.com/one-touch-pipeline/weskit/documentation/-/blob/master/admin/workflow-installation.md). Workflows are Snakemake or Nextflow scripts, along with all dependencies and additional data.
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: "compute evironment" should be "compute environment".

Suggested change
3. **Provide workflows:** In this scenario, a data controller has to validate and provide every workflow on the compute evironment. Only then they are available for the researchers. WESkit provides instructions for [workflow installation](https://gitlab.com/one-touch-pipeline/weskit/documentation/-/blob/master/admin/workflow-installation.md). Workflows are Snakemake or Nextflow scripts, along with all dependencies and additional data.
3. **Provide workflows:** In this scenario, a data controller has to validate and provide every workflow on the compute environment. Only then they are available for the researchers. WESkit provides instructions for [workflow installation](https://gitlab.com/one-touch-pipeline/weskit/documentation/-/blob/master/admin/workflow-installation.md). Workflows are Snakemake or Nextflow scripts, along with all dependencies and additional data.

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +110
LS-Login can be activated in MinIO either by using the MinIO console using the OIDC configuration or by setting environmental variables, as described in the MinIO [OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html). There are detailed instructions in the [ELIXIR-Cloud-AAI documentation](https://elixir-cloud-aai.github.io/guides/guide-admin/services_to_ls_aai/) for using MinIO with LS-Login.

In this tutorial, we assume a single LS-Login service for all the deployed tools (WESkit, MinIO, WebApp). This requires that the following three addresses are valid as OIDC redirect URLs.

### LS-Login in MinIO

LS-Login can be activated in MinIO either by using the MinIO console using the OIDC configuration or by setting environmental variables, as described in the MinIO [OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html). There are detailed instruction in the [ELIXIR-Cloud-AAI documentation](https://elixir-cloud-aai.github.io/guides/guide-admin/services_to_ls_aai/) for using MinIO with LS-Login.
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section repeats the same LS-Login-in-MinIO paragraph twice (once before the "LS-Login in MinIO" heading and again under it). Consider keeping a single instance to avoid divergence and make the flow clearer.

Suggested change
LS-Login can be activated in MinIO either by using the MinIO console using the OIDC configuration or by setting environmental variables, as described in the MinIO [OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html). There are detailed instructions in the [ELIXIR-Cloud-AAI documentation](https://elixir-cloud-aai.github.io/guides/guide-admin/services_to_ls_aai/) for using MinIO with LS-Login.
In this tutorial, we assume a single LS-Login service for all the deployed tools (WESkit, MinIO, WebApp). This requires that the following three addresses are valid as OIDC redirect URLs.
### LS-Login in MinIO
LS-Login can be activated in MinIO either by using the MinIO console using the OIDC configuration or by setting environmental variables, as described in the MinIO [OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html). There are detailed instruction in the [ELIXIR-Cloud-AAI documentation](https://elixir-cloud-aai.github.io/guides/guide-admin/services_to_ls_aai/) for using MinIO with LS-Login.
In this tutorial, we assume a single LS-Login service for all the deployed tools (WESkit, MinIO, WebApp). This requires that the following three addresses are valid as OIDC redirect URLs.
### LS-Login in MinIO
LS-Login can be activated in MinIO either by using the MinIO console using the OIDC configuration or by setting environmental variables, as described in the MinIO [OIDC Documentation](https://min.io/docs/minio/linux/operations/external-iam/configure-openid-external-identity-management.html). There are detailed instructions in the [ELIXIR-Cloud-AAI documentation](https://elixir-cloud-aai.github.io/guides/guide-admin/services_to_ls_aai/) for using MinIO with LS-Login.

Copilot uses AI. Check for mistakes.
"command": [
"/bin/sh",
"-c",
"mkdir -p /outputs/decrypted && /bin/md5sum /outputs/decrypted/united_kingdom_logo_size.txt > /outputs/decrypted/united_kingdom_logo_md5sum.txt"
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task 3 uses md5sum to generate a checksum for verification (/bin/md5sum /outputs/decrypted/united_kingdom_logo_size.txt). MD5 is a weak hash function with practical collision attacks, so an attacker who can influence the decrypted content could craft different data with the same checksum and bypass integrity checks. Use a stronger hash function such as SHA-256 for any integrity or verification of decrypted data in this workflow.

Copilot uses AI. Check for mistakes.
funnel node run --config worker-config.yaml &
```

Verify that both services are running by checking the logs or accessing the Funnel server API at `http://<server-ip>:8000`.
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Funnel API is demonstrated as being accessed over plain HTTP (http://<server-ip>:8000), even though this environment is intended for handling sensitive genomic data. Using unencrypted HTTP allows any attacker on the network path to eavesdrop on or tamper with task metadata and results, potentially exposing sensitive information or altering jobs. Configure the Funnel server with TLS and update the documentation to use https:// endpoints (and client authentication if available) for all non-local access.

Suggested change
Verify that both services are running by checking the logs or accessing the Funnel server API at `http://<server-ip>:8000`.
Verify that both services are running by checking the logs or accessing the Funnel server API over HTTPS at `https://<server-ip>:8443` (or your configured TLS port). For local testing on the same host only, you may instead use `http://localhost:8000`.

Copilot uses AI. Check for mistakes.
Comment on lines +410 to +422
curl -X POST http://localhost:8080/ga4gh/tes/v1/tasks \
-H "Content-Type: application/json" \
-d @task1_keygen.json
# Submit Task 2: Encrypt file (wait for Task 1 to complete)
curl -X POST http://localhost:8080/ga4gh/tes/v1/tasks \
-H "Content-Type: application/json" \
-d @task2_encrypt_file.json
# Submit Task 3: Decrypt file (wait for Task 2 to complete)
curl -X POST http://localhost:8080/ga4gh/tes/v1/tasks \
-H "Content-Type: application/json" \
-d @task3_decrypt_and_write_size.json
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example curl commands submit TES tasks to proTES over plain HTTP (http://localhost:8080/ga4gh/tes/v1/tasks) and explicitly suggest replacing localhost with a remote server address. When used against a non-local endpoint, this exposes task payloads and any embedded file locations or tokens to network attackers who can intercept or modify requests. Show https:// URLs here and document that the proTES endpoint must be served over TLS in production, with clients configured to validate the certificate.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants