Skip to content

Commit 4a4f731

Browse files
authored
chore: Project refactor (#17)
* Documentation refactor * Spec archiving * Clippy warnings * Doc warnings * Tagging/releasing * fix: resolve test race condition by using unique temp files for each test All tests in projection_pushdown_test.rs and projection_optimization_test.rs were sharing the same temp file path, causing race conditions when tests ran in parallel (default in CI). This caused intermittent failures where results.len() was 0 instead of 1. Fixed by: - Modified create_test_*_file() to accept test_name parameter - Updated all test functions to use unique file paths - FASTQ: /tmp/test_projection_{test_name}.fastq - VCF: /tmp/test_projection_optimization_{test_name}.vcf This ensures tests can run safely in parallel without file conflicts.
1 parent e1249df commit 4a4f731

File tree

87 files changed

+4589
-252
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

87 files changed

+4589
-252
lines changed

.github/RELEASE.md

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Release Process
2+
3+
This document describes how to create a new release for datafusion-bio-formats.
4+
5+
## Prerequisites
6+
7+
- You must have write access to the repository
8+
- The workflow can only be triggered from the `master` branch
9+
- All tests and checks must pass
10+
11+
## Creating a Release
12+
13+
### 1. Manual Release via GitHub Actions
14+
15+
1. Go to the [Actions tab](../../actions) in GitHub
16+
2. Select the "Release" workflow from the left sidebar
17+
3. Click "Run workflow" button (top right)
18+
4. Select the version bump type:
19+
- **patch**: Bug fixes and minor changes (0.1.0 → 0.1.1)
20+
- **minor**: New features, backward compatible (0.1.0 → 0.2.0)
21+
- **major**: Breaking changes (0.1.0 → 1.0.0)
22+
5. Optionally mark as pre-release
23+
6. Click "Run workflow"
24+
25+
### What the Workflow Does
26+
27+
The release workflow will automatically:
28+
29+
1. ✅ Bump the version in all crate `Cargo.toml` files
30+
2. ✅ Update `Cargo.lock`
31+
3. ✅ Run all tests to ensure everything passes
32+
4. ✅ Run clippy checks
33+
5. ✅ Build documentation
34+
6. ✅ Commit the version changes
35+
7. ✅ Create and push a git tag (e.g., `v0.2.0`)
36+
8. ✅ Generate a changelog from git commits
37+
9. ✅ Create a GitHub Release with the changelog
38+
39+
### Semantic Versioning
40+
41+
This project follows [Semantic Versioning 2.0.0](https://semver.org/):
42+
43+
- **MAJOR** version: Incompatible API changes
44+
- **MINOR** version: Add functionality in a backward compatible manner
45+
- **PATCH** version: Backward compatible bug fixes
46+
47+
### Current Version
48+
49+
Current version: **v0.1.0**
50+
51+
### Version History
52+
53+
- `v0.1.0` - Initial release
54+
55+
## Publishing to crates.io
56+
57+
Publishing to crates.io is currently disabled in the workflow. To enable it:
58+
59+
1. Add a `CARGO_TOKEN` secret to your GitHub repository:
60+
- Go to Settings → Secrets and variables → Actions
61+
- Add a new secret named `CARGO_TOKEN` with your crates.io API token
62+
2. Uncomment the "Publish to crates.io" step in `.github/workflows/release.yml`
63+
64+
## Rollback
65+
66+
If you need to roll back a release:
67+
68+
1. Delete the tag locally and remotely:
69+
```bash
70+
git tag -d v0.2.0
71+
git push origin :refs/tags/v0.2.0
72+
```
73+
2. Delete the GitHub Release in the Releases page
74+
3. Revert the version bump commit:
75+
```bash
76+
git revert <commit-hash>
77+
git push origin master
78+
```
79+
80+
## Troubleshooting
81+
82+
### Workflow fails on "Only run on master branch"
83+
84+
Make sure you're running the workflow from the `master` branch, not `main` or any other branch.
85+
86+
### Tests fail during release
87+
88+
The workflow will abort if any tests fail. Fix the issues and try again.
89+
90+
### Tag already exists
91+
92+
If the tag already exists, you'll need to delete it first or bump to a different version.
93+
94+
## Example Commit Messages
95+
96+
Good commit messages help generate better changelogs:
97+
98+
-`feat: add support for VCF 4.3 format`
99+
-`fix: resolve memory leak in BGZF reader`
100+
-`docs: update installation instructions`
101+
-`perf: improve GFF parsing performance by 30%`
102+
-`refactor: simplify table provider interface`
103+
104+
## Questions?
105+
106+
For questions or issues with the release process, please:
107+
- Open an issue
108+
- Contact the maintainers

.github/dependabot.yml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
version: 2
2+
updates:
3+
# Cargo dependencies
4+
- package-ecosystem: "cargo"
5+
directory: "/"
6+
schedule:
7+
interval: "monthly"
8+
groups:
9+
datafusion:
10+
patterns:
11+
- "datafusion*"
12+
noodles:
13+
patterns:
14+
- "noodles*"
15+
tokio:
16+
patterns:
17+
- "tokio*"
18+
open-pull-requests-limit: 10
19+
20+
# GitHub Actions
21+
- package-ecosystem: "github-actions"
22+
directory: "/"
23+
schedule:
24+
interval: "monthly"
25+
open-pull-requests-limit: 5

.github/workflows/ci.yml

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: CI
33
on:
44
push:
55
branches:
6-
- main
6+
- master
77
pull_request:
88

99
jobs:
@@ -13,7 +13,7 @@ jobs:
1313
group: ${{ github.workflow }}-${{ github.ref }}
1414
steps:
1515
- name: Checkout code
16-
uses: actions/checkout@v2
16+
uses: actions/checkout@v4
1717
with:
1818
submodules: "recursive"
1919
fetch-depth: 1
@@ -25,7 +25,7 @@ jobs:
2525
components: 'clippy, rustfmt'
2626

2727
- name: Cache Cargo registry and build
28-
uses: actions/cache@v3
28+
uses: actions/cache@v4
2929
with:
3030
path: |
3131
~/.cargo/registry
@@ -38,5 +38,13 @@ jobs:
3838
- name: Check formatting
3939
run: cargo fmt --all -- --check
4040

41+
- name: Run clippy
42+
run: cargo clippy --all-targets --all-features -- -D warnings
43+
44+
- name: Build documentation
45+
run: cargo doc --no-deps --all-features
46+
env:
47+
RUSTDOCFLAGS: "-D warnings"
48+
4149
- name: Run tests
4250
run: cargo test --all

.github/workflows/release.yml

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
name: Release
2+
3+
on:
4+
workflow_dispatch:
5+
inputs:
6+
version_type:
7+
description: 'Version bump type'
8+
required: true
9+
type: choice
10+
options:
11+
- patch
12+
- minor
13+
- major
14+
default: 'patch'
15+
pre_release:
16+
description: 'Mark as pre-release'
17+
required: false
18+
type: boolean
19+
default: false
20+
21+
jobs:
22+
release:
23+
runs-on: ubuntu-22.04
24+
# Only run on master branch
25+
if: github.ref == 'refs/heads/master'
26+
permissions:
27+
contents: write
28+
29+
steps:
30+
- name: Checkout code
31+
uses: actions/checkout@v4
32+
with:
33+
fetch-depth: 0
34+
token: ${{ secrets.GITHUB_TOKEN }}
35+
36+
- name: Setup Rust
37+
uses: actions-rust-lang/setup-rust-toolchain@v1
38+
with:
39+
toolchain: '1.86.0'
40+
components: 'clippy, rustfmt'
41+
42+
- name: Install cargo-edit for version bumping
43+
run: cargo install cargo-edit
44+
45+
- name: Configure git
46+
run: |
47+
git config user.name "github-actions[bot]"
48+
git config user.email "github-actions[bot]@users.noreply.github.com"
49+
50+
- name: Get current version
51+
id: current_version
52+
run: |
53+
CURRENT_VERSION=$(grep -m 1 '^version = ' datafusion/bio-format-core/Cargo.toml | sed 's/version = "\(.*\)"/\1/')
54+
echo "version=$CURRENT_VERSION" >> $GITHUB_OUTPUT
55+
echo "Current version: $CURRENT_VERSION"
56+
57+
- name: Bump version in all crates
58+
id: bump_version
59+
run: |
60+
# Bump version in all workspace members
61+
for crate in datafusion/bio-format-*/Cargo.toml; do
62+
echo "Bumping version in $crate"
63+
cargo set-version --manifest-path "$crate" --bump ${{ inputs.version_type }}
64+
done
65+
66+
# Get the new version from the first crate
67+
NEW_VERSION=$(grep -m 1 '^version = ' datafusion/bio-format-core/Cargo.toml | sed 's/version = "\(.*\)"/\1/')
68+
echo "version=$NEW_VERSION" >> $GITHUB_OUTPUT
69+
echo "New version: $NEW_VERSION"
70+
71+
- name: Update Cargo.lock
72+
run: cargo check --all
73+
74+
- name: Run tests
75+
run: cargo test --all
76+
77+
- name: Run clippy
78+
run: cargo clippy --all-targets --all-features -- -D warnings
79+
80+
- name: Build documentation
81+
run: cargo doc --no-deps --all-features
82+
env:
83+
RUSTDOCFLAGS: "-D warnings"
84+
85+
- name: Commit version bump
86+
run: |
87+
git add -A
88+
git commit -m "chore: release v${{ steps.bump_version.outputs.version }}"
89+
90+
- name: Create and push tag
91+
run: |
92+
git tag -a "v${{ steps.bump_version.outputs.version }}" -m "Release v${{ steps.bump_version.outputs.version }}"
93+
git push origin master
94+
git push origin "v${{ steps.bump_version.outputs.version }}"
95+
96+
- name: Generate changelog
97+
id: changelog
98+
run: |
99+
# Get the previous tag
100+
PREV_TAG=$(git describe --tags --abbrev=0 HEAD^ 2>/dev/null || echo "")
101+
102+
if [ -z "$PREV_TAG" ]; then
103+
echo "First release - no previous tag found"
104+
CHANGELOG="First release of datafusion-bio-formats"
105+
else
106+
echo "Generating changelog from $PREV_TAG to v${{ steps.bump_version.outputs.version }}"
107+
CHANGELOG=$(git log ${PREV_TAG}..HEAD --pretty=format:"- %s (%h)" --no-merges)
108+
fi
109+
110+
# Save changelog to file
111+
echo "$CHANGELOG" > /tmp/changelog.md
112+
echo "## What's Changed" >> /tmp/release_notes.md
113+
echo "" >> /tmp/release_notes.md
114+
echo "$CHANGELOG" >> /tmp/release_notes.md
115+
echo "" >> /tmp/release_notes.md
116+
echo "**Full Changelog**: https://github.com/${{ github.repository }}/compare/${PREV_TAG}...v${{ steps.bump_version.outputs.version }}" >> /tmp/release_notes.md
117+
118+
- name: Create GitHub Release
119+
uses: softprops/action-gh-release@v1
120+
with:
121+
tag_name: v${{ steps.bump_version.outputs.version }}
122+
name: Release v${{ steps.bump_version.outputs.version }}
123+
body_path: /tmp/release_notes.md
124+
draft: false
125+
prerelease: ${{ inputs.pre_release }}
126+
generate_release_notes: false
127+
env:
128+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
129+
130+
# Optional: Publish to crates.io (uncomment when ready)
131+
# - name: Publish to crates.io
132+
# run: |
133+
# # Publish crates in dependency order
134+
# cargo publish --manifest-path datafusion/bio-format-core/Cargo.toml --token ${{ secrets.CARGO_TOKEN }}
135+
# sleep 10 # Wait for crates.io to update
136+
#
137+
# # Publish dependent crates
138+
# for crate in datafusion/bio-format-{fastq,vcf,bam,bed,gff,fasta,cram}/Cargo.toml; do
139+
# cargo publish --manifest-path "$crate" --token ${{ secrets.CARGO_TOKEN }}
140+
# sleep 10
141+
# done
142+
# env:
143+
# CARGO_TOKEN: ${{ secrets.CARGO_TOKEN }}
144+
145+
- name: Summary
146+
run: |
147+
echo "## Release Summary" >> $GITHUB_STEP_SUMMARY
148+
echo "" >> $GITHUB_STEP_SUMMARY
149+
echo "- **Previous Version**: ${{ steps.current_version.outputs.version }}" >> $GITHUB_STEP_SUMMARY
150+
echo "- **New Version**: ${{ steps.bump_version.outputs.version }}" >> $GITHUB_STEP_SUMMARY
151+
echo "- **Version Bump**: ${{ inputs.version_type }}" >> $GITHUB_STEP_SUMMARY
152+
echo "- **Pre-release**: ${{ inputs.pre_release }}" >> $GITHUB_STEP_SUMMARY
153+
echo "- **Tag**: v${{ steps.bump_version.outputs.version }}" >> $GITHUB_STEP_SUMMARY
154+
echo "" >> $GITHUB_STEP_SUMMARY
155+
echo "✅ Release created successfully!" >> $GITHUB_STEP_SUMMARY

CHANGELOG.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [Unreleased]
9+
10+
### Added
11+
12+
- CRAM file format support with reference-based compression
13+
- FASTA file format support for biological sequences
14+
- GFF file format support for genome annotations
15+
- BED file format support for genomic intervals
16+
- BAM file format support for sequence alignments
17+
- VCF file format support for genetic variants with case-sensitive INFO/FORMAT fields
18+
- FASTQ file format support with parallel BGZF reading
19+
- Core utilities crate with object storage support (GCS, S3, Azure)
20+
- Comprehensive documentation for all crates with usage examples
21+
- CI workflow with formatting, linting, documentation, and testing checks
22+
- Dependabot configuration for automated dependency updates
23+
- Apache-2.0 licensing
24+
- Workspace metadata inheritance across all crates
25+
26+
### Changed
27+
28+
- Upgraded DataFusion to version 50.3.0
29+
- Enhanced README with badges, quick start examples, and development instructions
30+
- Improved crate-level documentation with `#![warn(missing_docs)]` lint
31+
32+
### Fixed
33+
34+
- Preserved case sensitivity for VCF INFO and FORMAT fields
35+
36+
## [0.1.0] - 2025-01-XX
37+
38+
### Added
39+
40+
- Initial release of datafusion-bio-formats workspace
41+
- Support for 8 bioinformatics file formats (FASTQ, VCF, BAM, BED, GFF, FASTA, CRAM)
42+
- Integration with Apache DataFusion query engine
43+
- Cloud storage support via OpenDAL
44+
- BGZF parallel reading for compressed genomic files
45+
46+
[Unreleased]: https://github.com/biodatageeks/datafusion-bio-formats/compare/v0.1.0...HEAD
47+
[0.1.0]: https://github.com/biodatageeks/datafusion-bio-formats/releases/tag/v0.1.0

Cargo.toml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,13 @@ members = [ "datafusion/bio-format-bam", "datafusion/bio-format-bed",
77
"datafusion/bio-format-cram",
88
]
99

10+
[workspace.package]
11+
license = "Apache-2.0"
12+
authors = ["BiodataGeeks Team"]
13+
repository = "https://github.com/biodatageeks/datafusion-bio-formats"
14+
homepage = "https://github.com/biodatageeks/datafusion-bio-formats"
15+
edition = "2024"
16+
1017

1118
[workspace.dependencies]
1219
datafusion = {version = "50.3.0"}

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Apache License
1+
yes Apache License
22
Version 2.0, January 2004
33
http://www.apache.org/licenses/
44

0 commit comments

Comments
 (0)