Skip to content

Commit d9e8cbc

Browse files
authored
Merge pull request #38 from vkt1414/master
A github actions workflow to test if getting_started colab notebooks running properly in latest colab docker runtime env.
2 parents a415970 + 8ab822f commit d9e8cbc

9 files changed

+9036
-0
lines changed

.github/workflows/test_colab.yml

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
name: Check Commits and Colab Images
2+
3+
on:
4+
push:
5+
branches: [ "master" ]
6+
pull_request:
7+
branches: [ "master" ]
8+
workflow_dispatch:
9+
schedule:
10+
- cron: 0 12 */1 * *
11+
12+
jobs:
13+
check_commits_and_images:
14+
runs-on: ubuntu-latest
15+
permissions:
16+
contents: write
17+
18+
steps:
19+
- name: Checkout code
20+
uses: actions/checkout@v4
21+
22+
- name: Set up Python
23+
uses: actions/setup-python@v4
24+
with:
25+
python-version: 3.x
26+
27+
- name: Install dependencies
28+
run: pip install requests pandas google-cloud-bigquery pyarrow nbformat
29+
30+
- name: Authorize Google Cloud
31+
uses: google-github-actions/auth@v1
32+
with:
33+
credentials_json: ${{ secrets.SERVICE_ACCOUNT_KEY }}
34+
create_credentials_file: true
35+
export_environment_variables: true
36+
37+
- name: Run check-commits.py and check-colab-images.py, preprocess notebooks scripts
38+
run: |
39+
python test/src/check-commits.py
40+
python test/src/check-colab-images.py
41+
python test/src/preProcessNotebooks.py
42+
43+
- name: Set result output
44+
id: set-result
45+
run: |
46+
if [[ -f "check_colab_images_result.txt" ]]; then
47+
RESULT=$(cat "check_colab_images_result.txt")
48+
echo "RESULT=$RESULT" >> $GITHUB_ENV
49+
fi
50+
51+
- name: Free Disk Space (Ubuntu)
52+
uses: jlumbroso/free-disk-space@main
53+
with:
54+
tool-cache: false
55+
android: true
56+
dotnet: true
57+
haskell: true
58+
large-packages: true
59+
docker-images: true
60+
swap-storage: true
61+
62+
- name: Docker login
63+
uses: docker/login-action@v3
64+
with:
65+
username: ${{ secrets.DOCKER_USERNAME }}
66+
password: ${{ secrets.DOCKER_PASSWORD }}
67+
68+
- name: Pull from GCP and Push Docker image to Docker Hub
69+
if: env.RESULT == 'true'
70+
run: |
71+
docker pull us-docker.pkg.dev/colab-images/public/runtime:latest
72+
docker tag us-docker.pkg.dev/colab-images/public/runtime:latest imagingdatacommons/idc-testing-colab:latest
73+
docker push imagingdatacommons/idc-testing-colab:latest
74+
75+
- name: Pull Docker image from Docker Hub
76+
if: env.RESULT == 'false'
77+
run: |
78+
docker pull imagingdatacommons/idc-testing-colab:latest
79+
80+
- name: Copy Google Cloud credentials to Docker container
81+
run: |
82+
CREDENTIALS_FILE_PATH="${{ env.GOOGLE_APPLICATION_CREDENTIALS }}"
83+
CREDENTIALS_FILE_NAME=$(basename "$CREDENTIALS_FILE_PATH")
84+
GOOGLE_APPLICATION_CREDENTIALS="/content/$CREDENTIALS_FILE_NAME"
85+
echo "GOOGLE_APPLICATION_CREDENTIALS=$GOOGLE_APPLICATION_CREDENTIALS" >> $GITHUB_ENV
86+
87+
- name: Run notebook with papermill
88+
run: |
89+
for nb in part1_prerequisites part2_searching_basics part3_exploring_cohorts; do
90+
docker run -d --name colab -v "$(pwd):/content" -e GOOGLE_APPLICATION_CREDENTIALS="${{ env.GOOGLE_APPLICATION_CREDENTIALS }}" imagingdatacommons/idc-testing-colab:latest
91+
docker exec -t colab /bin/bash -c "pip install papermill"
92+
docker exec -t colab /bin/bash -c "set -o xtrace && set -o errexit && set -o pipefail && set -o nounset && set +o errexit && cd content/ && papermill /content/notebooks/getting_started/${nb}.ipynb /content/test/outputs/${nb}_papermill_output.ipynb && set -o errexit && ls -A"
93+
#docker exec -t colab /bin/bash -c "jupyter nbconvert --to html --ExtractOutputPreprocessor.enabled=False /content/test/outputs/output_${nb}.ipynb"
94+
docker stop colab
95+
docker rm colab
96+
done
97+
98+
- name: Commit changes
99+
if: ${{ github.event_name != 'pull_request' }}
100+
uses: stefanzweifel/git-auto-commit-action@v4
101+
with:
102+
commit_message: 'Check colab env'
103+
file_pattern: 'test/*.csv test/outputs/*.ipynb'
104+
branch: 'master'
105+
106+
#- name: Check output notebooks for errors
107+
# run: |
108+
# for nb in part1_prerequisites part2_searching_basics part3_exploring_cohorts; do
109+
# if grep -q '"name": "stderr"\|"status": "failed"' test/outputs/output_${nb}.ipynb; then
110+
# echo "Error messages found in the ${nb} notebook output:"
111+
# cat test/outputs/output_${nb}.ipynb
112+
# exit 1
113+
# else
114+
# echo "No errors found in the ${nb} notebook output."
115+
# fi
116+
# done
117+
# exit $EXIT_CODE

test/README.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Check Commits and Colab Images
2+
3+
This GitHub repository uses a GitHub Actions workflow to check if getting started notebooks in IDC-Tutorials are working as expected in the Google Colab environment.
4+
5+
# Status
6+
7+
[![Getting Started Notebooks in the latest Colab environment](https://github.com/ImagingDataCommons/IDC-Tutorials/actions/workflows/test_colab.yml/badge.svg)](https://github.com/ImagingDataCommons/IDC-Tutorials/actions/workflows/test_colab.yml)
8+
9+
## Workflow
10+
11+
1. **Check for Image Changes**:
12+
- Make an API call to Artifact Registry to check if there are new Docker images.
13+
14+
```shell
15+
gcloud artifacts docker tags list us-docker.pkg.dev/colab-images/public/runtime --format=json --quiet
16+
```
17+
- Compare the `sh256digest` with the previous latest image.
18+
19+
2. **Preprocess Notebooks**:
20+
- Use an IDC Google Cloud Project ID, instead of getting it interactively.
21+
- Handle typical authentication from Colab notebooks using Application Default Credentials instead of `auth.authenticate_user()`.
22+
- The action `google-github-actions` when used with `export_environment_variables: true` exposes the path of Application Default Credentials with the env variable GOOGLE_APPLICATION_CREDENTIALS.
23+
- Some notebooks require the user to enter the query. In such cases, the expected query is induced.
24+
25+
3. **Docker Image Handling**:
26+
- If the Colab Docker image is changed, pull it and push it to Docker Hub (as the frequency of Colab image updates is shorter than the frequency of pulling the image for testing, we do not want to pile up charges by using Artifact Registry directly).
27+
- If no changes, just pull the image from Docker Hub.
28+
- To save disk space, use [`jlumbroso/free-disk-space@main`](https://github.com/jlumbroso/free-disk-space) to gain additional storage.
29+
30+
4. **Running Notebooks with Papermill**:
31+
- Attach the repository source directory to the container's `/content` folder.
32+
- Install the [`papermill`](https://papermill.readthedocs.io/) package to run the notebooks.
33+
- Capture `papermill` output and handle any errors.
34+
35+
5. **Update Repository**:
36+
- Automatically commit the output files generated by the Docker container using [`stefanzweifel/git-auto-commit-action@v4`](https://github.com/stefanzweifel/git-auto-commit-action).
37+
- Offers a quick way to see, at which cell the notebook failed.
38+
39+
## Prerequisites
40+
41+
Before using the workflow, make sure to set the required secrets in your repository:
42+
43+
- `SERVICE_ACCOUNT_KEY`: Google Cloud service account key JSON (make sure to convert it to ONE LINE JSON).
44+
Note: minimum permissions required for the service account: `Bigquery User`
45+
- `DOCKER_USERNAME`: Docker Hub username.
46+
- `DOCKER_PASSWORD`: Docker Hub password or access token.
47+
48+
## Resources
49+
50+
- [Papermill](https://papermill.readthedocs.io/)
51+
- [Application Default Credentials based login](https://cloud.google.com/sdk/gcloud/reference/auth/application-default/login)
52+
- [Google GitHub Actions](https://github.com/google-github-actions)
53+
- [Commits](https://github.com/vkt1414/track-colab-env/commits/main)
54+
- [Space Saving](https://github.com/jlumbroso/free-disk-space)
55+
56+
## License
57+
58+
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

test/colab-images-list.csv

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
date,tag,sha256,docker_pull_tag,docker_pull_sha256_tag
2+
,latest,sha256:4a26494c9c92ab4d0515e0715d79dfecbe8cfacb9b86fcfd55bc0274cb89530d,us-docker.pkg.dev/colab-images/public/runtime:latest,us-docker.pkg.dev/colab-images/public/runtime@sha256:4a26494c9c92ab4d0515e0715d79dfecbe8cfacb9b86fcfd55bc0274cb89530d
3+
20230515,release-colab-20230515-060150-RC00,sha256:3a8fc58f7e81b96dc59a2fb48b7973802f59fdd634fb538569228d830a7e76a9,us-docker.pkg.dev/colab-images/public/runtime:release-colab-20230515-060150-RC00,us-docker.pkg.dev/colab-images/public/runtime@sha256:3a8fc58f7e81b96dc59a2fb48b7973802f59fdd634fb538569228d830a7e76a9
4+
20230622,release-colab-20230622-060123-RC01,sha256:7dac57e02aae4e83aab349563190a71bdd07374e1365f53bd6a50280046c6091,us-docker.pkg.dev/colab-images/public/runtime:release-colab-20230622-060123-RC01,us-docker.pkg.dev/colab-images/public/runtime@sha256:7dac57e02aae4e83aab349563190a71bdd07374e1365f53bd6a50280046c6091
5+
20230711,release-colab-20230711-060203-RC00,sha256:53dc33f450cd162d8a42c5aff02d50ac24eb9fc68be77f0374614ad07247e9cd,us-docker.pkg.dev/colab-images/public/runtime:release-colab-20230711-060203-RC00,us-docker.pkg.dev/colab-images/public/runtime@sha256:53dc33f450cd162d8a42c5aff02d50ac24eb9fc68be77f0374614ad07247e9cd
6+
20230803,release-colab-20230803-060151-RC00,sha256:ae8a5bf22a84c67fb4b35aa4b1f19dac94b01b56a97c5c7bb15db57552e8d38c,us-docker.pkg.dev/colab-images/public/runtime:release-colab-20230803-060151-RC00,us-docker.pkg.dev/colab-images/public/runtime@sha256:ae8a5bf22a84c67fb4b35aa4b1f19dac94b01b56a97c5c7bb15db57552e8d38c
7+
20230921,release-colab_20230921-060057_RC00,sha256:4a26494c9c92ab4d0515e0715d79dfecbe8cfacb9b86fcfd55bc0274cb89530d,us-docker.pkg.dev/colab-images/public/runtime:release-colab_20230921-060057_RC00,us-docker.pkg.dev/colab-images/public/runtime@sha256:4a26494c9c92ab4d0515e0715d79dfecbe8cfacb9b86fcfd55bc0274cb89530d

0 commit comments

Comments
 (0)