Skip to content

Add duckdb recipe for emscripten#4983

Open
wolfv wants to merge 10 commits intoemscripten-forge:mainfrom
wolfv:duckdb-clean
Open

Add duckdb recipe for emscripten#4983
wolfv wants to merge 10 commits intoemscripten-forge:mainfrom
wolfv:duckdb-clean

Conversation

@wolfv
Copy link
Contributor

@wolfv wolfv commented Mar 1, 2026

Template A: Checklist for adding a package

Pre-submission Checks

  • Package requires building for emscripten-wasm32 platform (not a noarch package), in other words, the package requires compilation.

Recipe Structure

Added recipes/recipes_emscripten/[package-name]/recipe.yaml with proper structure:

  • context section with version (and optionally name)
  • package section with name and version using Jinja2 templates
  • source section with:
    • Source URL is valid and points to archive file (.tar.gz, .tar.bz2, .tar.xz, .tgz, or .zip)
    • Source URL contains ${{ version }} template for version updates
    • SHA256 hash is correct (verified with curl -sL <url> | sha256sum)
    • Patches (if any) are included in [package-name]/patches/ directory
  • build section with appropriate script/method
    • Python packages: ${PYTHON} -m pip install . ${PIP_ARGS}
    • R packages: $R CMD INSTALL $R_ARGS .
    • C++ packages: Uses emcmake/emmake or emconfigure/emmake
    • Rust packages: Uses rust-nightly and maturin or appropriate Rust build tool
    • Build number is 0
    • If the script is longer than 3 lines, a build.sh is included
  • requirements section (build, host, run as needed)
  • tests section
    • Python packages: test_import_[package].py file created and referenced
    • C++ packages: Test executable or package_contents test
    • R packages: Package contents test
  • about section with license, homepage, summary

Template B: Checklist for updating a package

  • ⚠️ Bump build number if the version remains unchanged
  • Or reset build number to 0 if updating the package to a newer version

PR Formatting

  • PR title follows format: Add [package-name] or Update [package-name] to [version]
  • PR description includes:
    • Version being added/updated
    • Any special build considerations or patches applied

Package Details

  • Package Name:
  • Version:

Build Notes

wolfv and others added 10 commits March 1, 2026 10:22
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The applied patch makes the source tree dirty, causing duckdb's custom
setuptools_scm version scheme to fail with "Dev distance is 0, cannot
bump version." Setting SETUPTOOLS_SCM_PRETEND_VERSION bypasses the scm
detection entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DuckDB's setuptools_scm version scheme errors when distance=0 and
dirty=True (which happens because we apply patches). The version
scheme tries to bump a dev version but fails since distance is 0.
Fix by treating distance=0 as a tag release regardless of dirty state.

Also removes the SETUPTOOLS_SCM_PRETEND_VERSION workaround since
duckdb explicitly strips that env var.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
duckdb's CMake falls back to CMAKE_INSTALL_LIBDIR when SKBUILD_PLATLIB_DIR
is not detected, placing _duckdb.so in a lib/ subdirectory under
site-packages. Move it to the correct location after install so that
`import duckdb` can find the native module.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The .so was being built as a regular wasm module instead of a side
module, so it couldn't be dynamically loaded. Add -sSIDE_MODULE=1
and -sWASM_BIGINT flags to ensure the native extension is built as
a relocatable side module that can be loaded at runtime.

Also removes the lib/ path workaround and CMAKE_INSTALL_LIBDIR
override since SIDE_MODULE linking may change the install behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
scikit-build-core drives CMake, so env vars alone aren't enough.
Use the established emscripten-forge pattern: set up the Emscripten
toolchain file and a CMAKE_PROJECT_INCLUDE that tells CMake shared
libs are supported and should be built as SIDE_MODULE with WASM_BIGINT.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
wasm-ld doesn't support --export-dynamic-symbol. DuckDB's CMake hits
the "UNIX AND NOT APPLE" branch for Emscripten, which passes these
unsupported flags. Add an EMSCRIPTEN guard to skip them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DuckDB's CMake links libduckdb_static.a twice to resolve circular
dependencies between the core library and extensions. Native linkers
handle this fine, but with LTO enabled wasm-ld merges all objects
and reports duplicate symbol errors. Disable interprocedural
optimization to work around this.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DuckDB links libduckdb_static.a twice in the link command to resolve
circular dependencies between core and extensions. Native linkers
handle repeated archives by processing them left-to-right, but
wasm-ld treats duplicate definitions as errors regardless of LTO
settings. Pass --allow-multiple-definition to wasm-ld.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
duckdb._version uses importlib.metadata.version("duckdb") to get the
package version at import time. Excluding .dist-info removes the
metadata that makes this work.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@wolfv
Copy link
Contributor Author

wolfv commented Mar 2, 2026

@IsabelParedes it works! The lint problem is due to using git sources. The reason for that is that the splitted duckdb pulls in duckdb as a git submodule. We could possibly workaround that but using the git source is a bit more convenient.

source:
git: https://github.com/duckdb/duckdb-python.git
tag: v${{ version }}
# expected_commit: a12f36ca411007f5eb48919448f61c7498112553
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you allow --experimental, this would work and "pin" the tag to a commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do allow experimental for a couple of recipes, you could add this package too

if (recipe == "arrow") or (recipe == "thrift"):
cmd.extend(["--experimental"])

@IsabelParedes
Copy link
Member

@IsabelParedes it works! The lint problem is due to using git sources. The reason for that is that the splitted duckdb pulls in duckdb as a git submodule. We could possibly workaround that but using the git source is a bit more convenient.

We try to avoid git sources because the Version Bot currently cannot update packages that do not have ${{ version }} in the source url. So this package would have to be updated manually. But that's fine for now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants