Add optional __declspec(dllimport) to amd_* libm functions, for faster speed#37
Open
leekillough wants to merge 1 commit into
Open
Add optional __declspec(dllimport) to amd_* libm functions, for faster speed#37leekillough wants to merge 1 commit into
leekillough wants to merge 1 commit into
Conversation
--------
Optional decoration that marks public AOCL-LibM entry points as
imported from libalm.dll on Windows. Without this attribute, taking
the address of an amd_* function (or storing it in a function
pointer in a hot loop) captures the local import-thunk address;
every call then pays an extra indirect jmp through the IAT. For
sub-3 ns functions like amd_expf that extra hop is ~15-30% of the
per-call cost. Decorating the prototypes with __declspec(dllimport)
tells MSVC / clang-cl to emit the optimized `movq __imp_*` form,
which dereferences the IAT slot once and stores the actual function
address - giving a single indirect call per invocation.
Opt-in (default is unchanged - no decoration):
Define ALM_DLLIMPORT before #include <amdlibm.h> when linking
dynamically against libalm.dll on Windows to enable the faster
call sequence. Safe to leave undefined: behavior is identical
to previous releases (no breaking change for callers that link
against libalm-static.lib or that depend on the existing
codegen).
On non-Windows platforms ALM_API is always empty regardless of
ALM_DLLIMPORT.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Optional decoration that marks public AOCL-LibM entry points as imported from
libalm.dllon Windows. Without this attribute, taking the address of anamd_*function (or storing it in a function pointer in a hot loop) captures the local import-thunk address; every call then pays an extra indirectjmpthrough the IAT. For sub-3 ns functions likeamd_expfthat extra hop is ~15-30% of the per-call cost. Decorating the prototypes with__declspec(dllimport)tellsMSVC/clang-clto emit the optimizedmovq __imp_*form, which dereferences the IAT slot once and stores the actual function address - giving a single indirect call per invocation.This is Opt-in (default is unchanged - no decoration):
On non-Windows platforms,
ALM_APIdecorator is always empty regardless ofALM_DLLIMPORT.Opt-in chosen deliberately to keep the change backward-compatible: existing customers (including all
libalm-static.libusers) get bit-identical codegen unless they explicitly add-DALM_DLLIMPORT. The exports themselves stay onscripts/libalm.def, so no__declspec(dllexport)is needed in the header.Verification on libm_microbench (1-run sanity, build_dllimport/, millions of calls per second):
The
fp32B-mechanism cluster (sinf,cosf,log10f,cbrt,pow) was unchanged, as predicted - those gaps live inlibalm.dllitself, not the dispatch path.lrounddid NOT recover (321M -> 325M); worth a closer look in a follow-up. The disassembly flip is confirmed at the.objlevel:leaq amd_expf(capture thunk) ->movq __imp_amd_expf(deref IAT slot for actual address).