Add optional __declspec(dllimport) to amd_* libm functions, for faster speed by leekillough · Pull Request #37 · amd/aocl-libm-ose

leekillough · 2026-05-26T16:33:19Z

Optional decoration that marks public AOCL-LibM entry points as imported from libalm.dll on Windows. Without this attribute, taking the address of an amd_* function (or storing it in a function pointer in a hot loop) captures the local import-thunk address; every call then pays an extra indirect jmp through the IAT. For sub-3 ns functions like amd_expf that extra hop is ~15-30% of the per-call cost. Decorating the prototypes with __declspec(dllimport) tells MSVC / clang-cl to emit the optimized movq __imp_* form, which dereferences the IAT slot once and stores the actual function address - giving a single indirect call per invocation.

This is Opt-in (default is unchanged - no decoration):

Define ALM_DLLIMPORT before #include <amdlibm.h> when linking dynamically against libalm.dll on Windows to enable the faster call sequence. Safe to leave undefined: behavior is identical to previous releases (no breaking change for callers that link against libalm-static.lib or that depend on the existing codegen).

On non-Windows platforms, ALM_API decorator is always empty regardless of ALM_DLLIMPORT.

Opt-in chosen deliberately to keep the change backward-compatible: existing customers (including all libalm-static.lib users) get bit-identical codegen unless they explicitly add -DALM_DLLIMPORT. The exports themselves stay on scripts/libalm.def, so no __declspec(dllexport) is needed in the header.

Verification on libm_microbench (1-run sanity, build_dllimport/, millions of calls per second):

        func    UCRT     AOCL_WIN     AWD old    AWD new
        expf    476.6M   464.7M       293.3M     454.5M   <- recovered
        log2f   473.1M   466.4M       361.7M     464.7M   <- recovered
        expm1   291.3M   335.4M       255.2M     340.3M   <- recovered
        log2    319.4M   454.6M       400.2M     407.0M   <- partial
        log1p   235.1M   325.2M       287.5M     313.7M   <- partial
        hypot   285.0M   227.8M       190.1M     221.7M   <- partial
        remainder 170.1M 252.5M       209.3M     229.1M   <- partial

The fp32 B-mechanism cluster (sinf, cosf, log10f, cbrt, pow) was unchanged, as predicted - those gaps live in libalm.dll itself, not the dispatch path. lround did NOT recover (321M -> 325M); worth a closer look in a follow-up. The disassembly flip is confirmed at the .obj level:

leaq amd_expf (capture thunk) -> movq __imp_amd_expf (deref IAT slot for actual address).

-------- Optional decoration that marks public AOCL-LibM entry points as imported from libalm.dll on Windows. Without this attribute, taking the address of an amd_* function (or storing it in a function pointer in a hot loop) captures the local import-thunk address; every call then pays an extra indirect jmp through the IAT. For sub-3 ns functions like amd_expf that extra hop is ~15-30% of the per-call cost. Decorating the prototypes with __declspec(dllimport) tells MSVC / clang-cl to emit the optimized `movq __imp_*` form, which dereferences the IAT slot once and stores the actual function address - giving a single indirect call per invocation. Opt-in (default is unchanged - no decoration): Define ALM_DLLIMPORT before #include <amdlibm.h> when linking dynamically against libalm.dll on Windows to enable the faster call sequence. Safe to leave undefined: behavior is identical to previous releases (no breaking change for callers that link against libalm-static.lib or that depend on the existing codegen). On non-Windows platforms ALM_API is always empty regardless of ALM_DLLIMPORT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional __declspec(dllimport) to amd_* libm functions, for faster speed#37

Add optional __declspec(dllimport) to amd_* libm functions, for faster speed#37
leekillough wants to merge 1 commit into
amd:devfrom
leekillough:fix_dllimport_slowdown

leekillough commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

leekillough commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant