Centralize SIMD with Highway and vendor all dependencies as submodules#39
Centralize SIMD with Highway and vendor all dependencies as submodules#39KimBioInfoStudio wants to merge 14 commits intoOpenGene:mainfrom
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…module Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Refactor filter.cpp: use countQualityMetrics/countAdjacentDiffs - Refactor sequence.cpp: use centralized reverseComplement - Refactor adaptertrimmer.cpp: use countMismatches - Remove simdutil.h (merged into simd.cpp) - Add simd_test.cpp - Upgrade C++ standard to C++17 for gtest compatibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add isa-l (v2.31.0) and libdeflate (v1.25) as git submodules matching fastp's vendoring pattern. Default `make` links system libs; `make static` builds from submodules. Remove stale src/libdeflate.h (v1.6) in favor of third_party/libdeflate/libdeflate.h (v1.25). Fix igzip include to match fastp's flat include style. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace outdated conda/system library instructions with dual-mode build documentation matching fastp: Option A (static build from submodules, recommended) and Option B (link against system libraries). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove dual-mode build (system libs vs submodules) to eliminate potential header/library version mismatch. Default `make` now builds isa-l and libdeflate from submodules and links statically. Simplify README build instructions accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Vendor googletest v1.17.0 as a git submodule under third_party/. Build it from source via cmake during `make test`, eliminating the need to install googletest on the system. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
macOS reports arm64 via uname -m but isa-l's Makefile.unx expects aarch64 for the NEON/SVE assembly paths. Without this fix, only the C baseline code is compiled, resulting in ~60% slower gzip decompression. Pass host_cpu=aarch64 arch=aarch64 on ARM macOS to build the full optimized library (85 objects vs 23 C-only). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use actions/checkout@v4 with submodules: recursive - Remove system library installs (isa-l, libdeflate, highway, googletest) - Only need build-essential, cmake, nasm on Ubuntu - macOS runners have cmake/Xcode toolchain preinstalled Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
isa-l dev branch uses __arm_streaming (SVE) and ARM assembly syntax that requires Xcode 17+. macos-14 (Xcode 15.4) and macos-13 are too old. macos-15 provides Xcode 16.2+ which supports these features. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
No performance improvement? |
Seems in v0.4.1 we already use highway for SIMD, so we do not have perf improve here |
|
please address the compilation errors:: /Applications/Xcode.app/Contents/Developer/usr/bin/make -C ./third_party/isa-l -f Makefile.unx lib host_cpu=aarch64 arch=aarch64 |
Seems u clone without recursive to fetch submodule, pls clone it with BTW, better to have a clear tmpdir for this branch, if u use upstream repo maybe some conflicts |
oh yes, I almost forgot about that. |
Replace 8x Eq + 8x IfThenElse comparison chain with a single And + TableLookupBytes using the low nibble of DNA base ASCII codes. DNA bases A/a(1), C/c(3), T/t(4), G/g(7) have unique low nibbles, enabling a 16-byte lookup table for complement mapping. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
simd.h/simd.cppmodule usingforeach_target.h+HWY_DYNAMIC_DISPATCHfor runtime CPU target selection, ported from fastp's proven patternmake -jbuilds everything from submodules;git clone --recursiveis all that's neededarm64but isa-l expectsaarch64; without the fix only C fallback code is compiled (~60% slower gzip decompression)Details
SIMD Refactoring
src/simd.hwith 4 public functions:countQualityMetrics,reverseComplement,countAdjacentDiffs,countMismatches, plustestSimdfor validationsrc/simd.cppwith Highway multi-target dispatch (compiles for multiple SIMD targets, selects best at runtime)filter.cpp,sequence.cpp,adaptertrimmer.cppto use centralized SIMD functionssimdutil.htest/simd_test.cppgtest wrapperBuild System
third_party/-lisal -ldeflate -lhwyneeded)src/libdeflate.h(v1.6) in favor of submodule's v1.25Other
testdata/gen_ont_reads.py) for benchmarkingE2E Benchmark
Dataset: 10,000 ONT reads (64.5M bases, median 5kb, N50 8.3kb)
Platform: Apple M4 Pro (14 cores), 48 GB RAM, macOS
FASTQ (uncompressed, 124.5 MB)
FASTQ.GZ (compressed, 60.8 MB)
Result: No performance regression. The SIMD refactoring is architectural (centralization, portability, maintainability) — the hot path (adapter search) was already SIMD-accelerated, so raw throughput is equivalent.
Test plan
make -jbuilds successfully (static linking from submodules)make test— all 13 tests pass./fastplong -i testdata/ont_10k.fastq.gz -o /dev/nullworks correctly🤖 Generated with Claude Code