Skip to content

Add Execution Provider conformance test suite#28968

Open
GopalakrishnanN wants to merge 3 commits into
mainfrom
GopalakrishnanN/ep-conformance-tests
Open

Add Execution Provider conformance test suite#28968
GopalakrishnanN wants to merge 3 commits into
mainfrom
GopalakrishnanN/ep-conformance-tests

Conversation

@GopalakrishnanN

@GopalakrishnanN GopalakrishnanN commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Description

Adds a parameterized GoogleTest suite (EpContract/EpConformanceTest) that encodes invariants every IExecutionProvider implementation must satisfy, independent of the hardware backend.

New file: onnxruntime/test/framework/execution_provider_conformance_test.cc (auto-discovered by the existing test/framework/*.cc CMake glob — no build-file changes needed).

The suite currently runs against the CPU EP in both arena and non-arena allocator configurations (14 instances = 7 invariants × 2 configs, all passing). CUDA/DML/WebGPU/XNNPACK are registered behind USE_* guards; a factory that returns nullptr (EP compiled but unavailable at runtime) skips rather than fails.

Invariants enforced

  1. Type() is non-empty and stable across repeated calls and across independent instances.
  2. GetPreferredLayout() returns a defined DataLayout (NCHW/NHWC).
  3. The CPU mem types (OrtMemTypeCPUInput/OrtMemTypeCPUOutput) map to CPU-accessible devices.
  4. CreatePreferredAllocators() returns no null entries and is repeatable (documented as stateless).
  5. Preferred allocators return usable memory (host-readable/writable when the allocator is CPU-accessible).
  6. IDataTransfer CPU↔CPU round-trip preserves data exactly (when the EP provides a data transfer that advertises the copy).
  7. Read-only metadata queries are callable and self-consistent on a bare instance (GetDeviceId() == GetDevice().Id()).

Why this change matters

ONNX Runtime depends on 20+ execution providers that are all expected to be substitutable behind the IExecutionProvider interface — a Liskov-substitution contract. Today those expectations are implicit: scattered across header prose, framework code that silently assumes them, and tribal knowledge. There is no single place that states "every EP must do X," and nothing mechanically prevents a new or modified EP from violating them.

That gap has real cost:

  • Silent contract drift. An EP can return an out-of-range layout, a null allocator, or a non-CPU device for OrtMemTypeCPUInput, and the breakage only surfaces deep inside the framework (input/output staging copies, layout transforms, kernel assignment) as a confusing crash or wrong result — far from the root cause.
  • High onboarding cost for new EPs. Authors must reverse-engineer the unwritten rules. A shared suite turns them into an executable checklist.
  • Regression risk. Refactors to allocators, data transfer, or device handling can quietly violate an invariant with no targeted test to catch it.

This suite converts those implicit assumptions into enforced, executable contracts. It is deliberately conservative: it asserts only documented, backend-agnostic guarantees and never dereferences non-CPU memory from the test thread, so it is safe to run for EPs whose device memory cannot be inspected on the host. Adoption cost is intentionally minimal — adding an EP to the coverage is a single guarded line in GetEpConformanceParams().

Motivation and Context

  • Establishes a formal, shared baseline contract for execution providers.
  • Catches EP contract violations at the unit-test level instead of as downstream inference failures.
  • Lowers the cost of correctly authoring and reviewing new EPs.

Testing

Built onnxruntime_test_all (Windows, RelWithDebInfo) and ran the suite:

[==========] Running 14 tests from 1 test suite.
...
[==========] 14 tests from 1 test suite ran.
[  PASSED  ] 14 tests.

Relationship to EP capability segregation (#29087)

This suite currently asserts only invariants that apply to every EP, because those are the methods every IExecutionProvider exposes. It cannot yet enforce invariants for optional capabilities — graph capture/replay, TunableOp tuning, subgraph compilation, and data-layout preference — because there is no way to tell which EPs actually support them. That gap shows up directly in MetadataQueriesAreCallable, where GetTuningContext() can only be exercised as a (void)ep->GetTuningContext(); "doesn't crash" smoke check rather than a real assertion.

The capability-segregation work in #29087 (which adds Get*Capability() query hooks returning a narrow capability mix-in, or nullptr when unsupported) unlocks capability-scoped conformance tests that run only on EPs that advertise a given capability:

if (auto* tuning = ep->GetTuningCapability()) {
  // invariants every tuning-capable EP must satisfy
}
if (auto* dl = ep->GetDataLayoutCapability()) {
  // e.g. an EP advertising a data-layout preference must report a valid DataLayout
}

Once #29087 lands, this suite can (a) upgrade the GetTuningContext() smoke check into a real, scoped assertion and (b) extend coverage to graph-capture / compile / data-layout invariants — running each check only where it applies instead of calling no-op defaults on every EP.

The two PRs are complementary and independent: this suite enforces the IExecutionProvider contract, while #29087 makes the optional parts of that contract individually queryable. They touch different files (execution_provider_conformance_test.cc vs execution_provider_capabilities_test.cc), and #29087's execution_provider.h change is purely additive, so there is no ordering dependency or conflict between them.

Parameterized GoogleTest suite (EpContract/EpConformanceTest) encoding universal IExecutionProvider invariants every EP must satisfy: stable Type(), valid preferred layout, CPU mem types map to CPU-accessible devices, non-null/repeatable preferred allocators, usable allocations, CPU data-transfer round-trip integrity, and consistent metadata queries. EPs are registered via factories (no static-init construction); unavailable EPs skip rather than fail. Adding a new EP is a single guarded line in GetEpConformanceParams().

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a parameterized GoogleTest suite (EpContract/EpConformanceTest) under onnxruntime/test/framework/ to codify and enforce baseline invariants expected of all IExecutionProvider implementations (e.g., stable Type(), valid preferred layout, CPU memtype mapping, allocator behavior, optional data transfer correctness, and basic metadata query consistency).

Changes:

  • Introduces a new parameterized conformance test fixture for IExecutionProvider contract invariants.
  • Registers CPU EP instances (arena + non-arena) and conditionally registers additional EPs behind USE_* guards.
  • Validates allocator usability and (when applicable) CPU↔CPU IDataTransfer::CopyTensor round-trip correctness.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/framework/execution_provider_conformance_test.cc Outdated
Gopalakrishnan Nallasamy added 2 commits June 9, 2026 19:26
PreferredAllocatorsAllocateUsableMemory previously called Alloc()/Free() on every preferred allocator, only gating the host memory read/write behind UsesCpuMemory(). On WebGPU (arm64 Debug CI) the GpuBufferAllocator creates buffers mapped at creation; Free() routes through the buffer manager which throws EnforceBufferUnmapped ('Buffer is still mapped'), failing the test.

Restrict the standalone Alloc/Free exercise to CPU-accessible allocators, whose lifecycle is backend-agnostic, and GTEST_SKIP when an EP exposes no such allocator. Device allocators remain covered by PreferredAllocatorsAreNonNullAndRepeatable. Fixes the webgpu build-and-test (arm64, Debug) leg on PR #28968.
Address PR review: in ORT_USE_EP_API_ADAPTERS builds, DefaultWebGpuExecutionProvider() ORT_ENFORCEs (aborting the entire unit-test run) when the dynamic plugin EP is initialized to a different EP name, instead of cleanly returning nullptr. Mirror the guard used by base_tester.cc and default_providers.cc by listing the built-in WebGPU EP only under 'defined(USE_WEBGPU) && !defined(ORT_USE_EP_API_ADAPTERS)'.
@edgchen1

Copy link
Copy Markdown
Contributor

it's a good idea to have more tests that EP authors can use for verification.

IExecutionProvider is an internal ORT interface. the preferred way to develop a new EP is as a plugin EP. given that, I think we should treat the plugin EP API (e.g., OrtEp) as the contract and test against that. we should enable these tests for arbitrary plugin EPs too. onnxruntime_provider_test is meant to support plugin EPs and might be a useful reference (it was added in #25689).

@justinchuby

Copy link
Copy Markdown
Contributor

I like the idea of defining invariants clearly

@GopalakrishnanN

Copy link
Copy Markdown
Contributor Author

Thanks @edgchen1 — agreed, the plugin OrtEp ABI is the contract that matters most for external EP authors, and conformance there is more valuable than against the internal IExecutionProvider.

A few thoughts on getting there:

  • The invariants in this suite map fairly cleanly onto the OrtEp surface — GetPreferredLayout()OrtEp::GetPreferredDataLayout (OrtEpDataLayout), CreatePreferredAllocators()OrtEp::CreateAllocator / OrtEpFactory::CreateAllocator, and the data-transfer round-trip → OrtEpFactory::CreateDataTransfer + OrtDataTransferImpl::CanCopy/CopyTensors. So a plugin-level suite is quite feasible; it's mostly re-expressing the same checks against the ABI rather than inventing new invariants.
  • That said, it's a non-trivial chunk of work (driving an EP through OrtEpFactory, mapping OrtMemoryDevice/OrtMemoryInfo, etc.), and onnxruntime_provider_test (Move provider tests to onnxruntime_provider_test and enable use of plugin EPs #25689, with ORT_UNIT_TEST_ENABLE_DYNAMIC_PLUGIN_EP_USAGE) is the right home — I can parameterize over the example plugin EP plus dynamically-specified ones, the same way this suite parameterizes over the USE_* EPs.
  • The in-tree EPs are still IExecutionProvider implementations, so I'd like to keep these checks as an internal baseline that guards against contract drift when those EPs or the shared framework code (allocators, data transfer, layout, device handling) get refactored.

So my proposal: land this PR as the IExecutionProvider baseline, and do a focused follow-up that adds an OrtEp-level conformance suite in onnxruntime_provider_test reusing these invariant definitions. If you'd rather I hold this PR and pivot straight to the plugin-EP version, I'm happy to do that instead — just let me know which you prefer.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@edgchen1

Copy link
Copy Markdown
Contributor

So my proposal: land this PR as the IExecutionProvider baseline, and do a focused follow-up that adds an OrtEp-level conformance suite in onnxruntime_provider_test reusing these invariant definitions. If you'd rather I hold this PR and pivot straight to the plugin-EP version, I'm happy to do that instead — just let me know which you prefer.

I suppose it wouldn't hurt to test both the internal interface and the plugin EP interface, but I think the latter would be more useful for EP development going forward. Some of the ORT-owned EPs (like CUDA and WebGPU) have already been converted to plugin EPs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants