[ENH] V1 → V2 API Migration - core structure #1576

geetu040 · 2025-12-30T04:38:36Z

Towards #1575

This PR sets up the core folder and file structure along with base scaffolding for the API v1 → v2 migration.

It includes:

Skeleton for the HTTP client, backend, and API context
Abstract resource interfaces and versioned stubs (*V1, *V2)
Minimal wiring to allow future version switching and fallback support

No functional endpoints are migrated yet. This PR establishes a stable foundation for subsequent migration and refactor work.

codecov-commenter · 2025-12-31T13:50:18Z

Codecov Report

❌ Patch coverage is 0% with 207 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.37%. Comparing base (645ef01) to head (5762185).

Files with missing lines	Patch %	Lines
openml/_api/http/client.py	0.00%	69 Missing ⚠️
openml/_api/resources/tasks.py	0.00%	47 Missing ⚠️
openml/_api/config.py	0.00%	32 Missing ⚠️
openml/_api/runtime/core.py	0.00%	27 Missing ⚠️
openml/_api/resources/datasets.py	0.00%	9 Missing ⚠️
openml/_api/resources/base.py	0.00%	8 Missing ⚠️
openml/_api/runtime/fallback.py	0.00%	6 Missing ⚠️
openml/_api/__init__.py	0.00%	4 Missing ⚠️
openml/_api/resources/__init__.py	0.00%	3 Missing ⚠️
openml/_api/http/__init__.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1576      +/-   ##
==========================================
- Coverage   52.78%   50.37%   -2.41%     
==========================================
  Files          36       46      +10     
  Lines        4331     4538     +207     
==========================================
  Hits         2286     2286              
- Misses       2045     2252     +207

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fkiraly · 2026-01-09T08:33:21Z

openml/_api/config.py

+    cache: CacheConfig
+
+
+settings = Settings(


I would move the settings to the individual classes. I think this design introduces too high coupling of the classes to this file. You cannot move the classes around, or add a new API version without making non-extensible changes to this file here - because APISettings will require a constructor change and new classes it accepts.

Instead, a better design is to apply the strategy pattern cleanly to the different API definitions - v1 and v2 - and move the config either to their __init__, or a set_config (or similar) method.

fkiraly

Overall really great, I have a design suggestion related to the configs.

The config.py file and the coupling on it breaks an otherwise nice strategy pattern.

I recommend to follow the strategy pattern cleanly instead, and move the configs into the class instances, see above.

This will make the backend API much more extensible and cohesive.

SimonBlanke · 2026-01-09T09:16:52Z

openml/_api/config.py

+            key="...",
+        ),
+        v2=APIConfig(
+            server="http://127.0.0.1:8001/",


should this be hardcoded? I guess this is just for your local development

this is hard-coded, they are the default values though the local endpoints will be replaced by remote server when deployed hopefully before merging this in main

SimonBlanke · 2026-01-09T09:27:47Z

openml/_api/runtime/core.py

+
+    if strict:
+        return v2
+


In a previous commit the 'FallbackProxy' was used here. Do we still need this class?

I removed this because of the ruff errors. I'll put them back and fix the pre-commit when the class is implemented.

SimonBlanke · 2026-01-09T09:33:38Z

openml/_api/http/client.py

+        if use_cache:
+            try:
+                return self._get_cache_response(cache_dir)
+            # TODO: handle ttl expired error


The PR is out of draft, but this caching is not implemented. I guess this is out of scope for this PR.

Yes the PR is currently a draft, should I mark it with draft as well? There are a bunch of work items that I'll separate if they can worked without affecting derived PRs, otherwise implement myself. For caching specifically I plan to implement it myself otherwise stacking is going to be challenging.

In general a draft marks a PR where changes are not finalized. I do it the following way: If a PR is not finished and needs developer input for implementation I open a draft. If I think the PR is finished and can be merged, then I change it to a normal PR. But I cannot find an explanation that matches my procedure, so I guess this is just my practice.

I agree. I'll put this as draft since that doesn't hurt it from getting reviewed or other people working on top of it.

SimonBlanke · 2026-01-09T09:38:18Z

openml/_api/resources/tasks.py

+
+        return task
+
+    def _create_task_from_xml(self, xml: str) -> OpenMLTask:


This method already exists here: https://github.com/openml/openml-python/pull/1576/files/74ab3662b6be04d001b1e8dade3f695ca80bcfad#diff-fdaf60448460bf4c7af496380c2f8967b0cabe577a9153256b8397f9f80e0eccR460

Is it really needed at both locations or can we remove one of them? That would be good to avoid duplicate code

yes you are right. this resource implementation was just to give out an example, it will be removed anyways and this duplication will be taken care of in the derived PR specifically for tasks.

@SimonBlanke the thread on discord discussing this, in case you want to weigh in.

This thread is not accessible to me. (though I assume from context it's also not that important)

SimonBlanke · 2026-01-09T09:44:18Z

openml/_api/resources/datasets.py

+from openml._api.resources.base import DatasetsAPI
+
+if TYPE_CHECKING:
+    from responses import Response


In production this would be requests, right? You used responses for the mocking here during development.

yes this should should be requests, I'll fix it.

PGijsbers

Overall I agree with the suggested changes. This seems like a reasonable way to provide a unified interface for two different backends, and also separate out some concerns that were previously coupled or scattered more than they should (e.g., caching, configurations).

My main concern is with the change to caching behavior. I have a minor concern over the indirection APIContext introduces (perhaps I misunderstood its purpose), and the introduction of allowing Response return values.

In my comments you will also find some things that may already have been "wrong" in the old implementation. In that case, I think it simply makes sense to make the change now so I repeat it here for convenience.

PGijsbers · 2026-01-14T15:52:46Z

openml/_api/http/client.py

+    from openml._api.config import APIConfig
+
+
+class CacheMixin:


The ttl should probably heavily depend on the path. If we do end up using caching at this level, we should use the Cache-Control HTTP Header response so the server can inform us how long to keep it in cache for (something that, I believe, neither servers do right now). A dataset search query can change if any dataset description changes (to either be now included or excluded), so caching probably shouldn't even be on by default for such type of queries. Dataset descriptions might change, but likely not very frequently. Dataset data files or computed qualities should (almost?) never change. This is the reason that the current implementation only caches description, features, qualities, and the dataset itself.

With this implementation, you also introduce some new issues:

What if the paths change, or even the query parameters? there is now dead cache. Do we now add cache cleanup routines? How does openml-python know what is no longer valid if they were responses with high TTL?

URLs may be (much) longer than the default max path of Windows (260 characters). If I'm not mistaken, this will lead to an issue unless you specifically work around it.

More of an implementation detail, but authenticated and unauthenticated requests are not differentiated. If a user accidentally makes an unauthenticated request, gets an error, and then authenticates they would still get an error.

PGijsbers · 2026-01-15T09:51:59Z

openml/_api/resources/base.py

+
+class DatasetsAPI(ResourceAPI, ABC):
+    @abstractmethod
+    def get(self, dataset_id: int) -> OpenMLDataset | tuple[OpenMLDataset, Response]: ...


From an API design perspective, I am not sure what the usecase for the user is to want access to the requests.Response. The only case I can think of is to parse the data itself but if the user wants to do that I reckon we failed in our API design? I will be the first to admit that error handling needs to be improved (both on the server and client side), but I don't think this makes sense. Am I missing something?

This comment of course applies not just to the DatasetsAPI, but all of the resource apis

PGijsbers · 2026-01-15T10:13:44Z

openml/_api/runtime/core.py

+        tasks=TasksV1(v1_http),
+    )
+
+    if version == "v1":


nit: supported versions should be encoded in an Enum. This helps function signatures (type checking, code completion) and reduces chance for erroneous input.

PGijsbers · 2026-01-15T10:18:08Z

openml/_api/__init__.py

+from openml._api.runtime.core import APIContext
+
+
+def set_api_version(version: str, *, strict: bool = False) -> None:
+    api_context.set_version(version=version, strict=strict)
+
+
+api_context = APIContext()


It's not clear what the function of the APIContext is here. Why do we need it and cannot just use backend directly? E.g.:

Suggested change

from openml._api.runtime.core import APIContext

def set_api_version(version: str, *, strict: bool = False) -> None:

api_context.set_version(version=version, strict=strict)

api_context = APIContext()

from openml._api.runtime.core import build_backend

_backend = build_backend("v1", strict=False)

def set_api_version(version: str, *, strict: bool = False) -> None:

global _backend

_backend = build_backend(version=version, strict=strict)

def backend() -> APIBackend:

return _backend

If it is just to avoid the pitfall where users assign the returned value to a local variable with a scope that is too long lived, then the same would apply if users would assign api_context.backend to a variable. We could instead extend the APIBackend class to allow updates to its attributes?

PGijsbers · 2026-01-15T10:19:01Z

openml/_api/config.py

+from dataclasses import dataclass
+from typing import Literal
+
+DelayMethod = Literal["human", "robot"]


nit: Should be an enum.

PGijsbers · 2026-01-15T10:26:28Z

openml/_api/config.py

+    server: str
+    base_url: str
+    key: str
+    timeout: int = 10  # seconds


nit: Add a unit suffix (timeout_seconds) so the unit is clear without navigating to the source.

ps. I also considered typing it as datetime.timedelta but considering you probably only use it in seconds and there is a real risk of developers erroneously using datetime.timedelta.seconds instead of datetime.timedelta.total_seconds(), I think keeping it an integer is better.

PGijsbers · 2026-01-15T10:28:18Z

openml/_api/config.py

+class ConnectionConfig:
+    retries: int = 3
+    delay_method: DelayMethod = "human"
+    delay_time: int = 1  # seconds


nit: here too, including the unit makes sense (delay_time_seconds)

PGijsbers · 2026-01-15T10:32:43Z

openml/_api/config.py

+@dataclass
+class CacheConfig:
+    dir: str = "~/.openml/cache"
+    ttl: int = 60 * 60 * 24 * 7  # one week


nit: Considering the TTL of the HTTP standard is already defined in seconds, maybe it is fine to exclude it in the variable name? Though as noted above there is a discussion to be had about having this as a cache level property in the first place.
For future reference, setting the value to timedelta(weeks=1).total_seconds() is preferred over the arithmetic+comment.

PGijsbers · 2026-01-15T10:39:05Z

openml/_api/config.py

+
+@dataclass
+class CacheConfig:
+    dir: str = "~/.openml/cache"


Default should continue to respect XDG_CACHE_HOME.

set up folder structure and base code

0159f47

geetu040 mentioned this pull request Dec 30, 2025

[ENH] V1 → V2 API Migration #1575

Open

25 tasks

Merge branch 'main' into migration

58e9175

geetu040 added 2 commits January 1, 2026 20:12

Merge branch 'main' into migration

bdd65ff

fix pre-commit

52ef379

This was referenced Jan 7, 2026

[ENH] V1 → V2 API Migration - estimation procedures #1604

Open

[ENH] V1 → V2 API Migration - evaluation measures #1603

Open

[ENH] V1 → V2 API Migration - evaluations #1606

Open

geetu040 and others added 4 commits January 7, 2026 22:14

refactor

5dfcbce

implement cache_dir

2acbe99

refactor

af99880

Merge branch 'main' into pr/1576

74ab366

JATAYU000 mentioned this pull request Jan 8, 2026

[ENH] V1 → V2 API Migration - datasets #1608

Draft

rohansen856 mentioned this pull request Jan 8, 2026

[ENH] V1 → V2 API Migration - studies #1610

Open

fkiraly reviewed Jan 9, 2026

View reviewed changes

fkiraly requested changes Jan 9, 2026

View reviewed changes

SimonBlanke reviewed Jan 9, 2026

View reviewed changes

geetu040 changed the title ~~[ENH] Migration: set up core/base structure~~ [ENH] V1 → V2 API Migration - core structure Jan 9, 2026

geetu040 marked this pull request as draft January 12, 2026 18:47

geetu040 added 2 commits January 15, 2026 14:51

undo changes in tasks/functions.py

4c75e16

Merge branch 'main' into migration

5762185

PGijsbers reviewed Jan 15, 2026

View reviewed changes


		return task

		def _create_task_from_xml(self, xml: str) -> OpenMLTask:

-from openml._api.runtime.core import APIContext
-def set_api_version(version: str, *, strict: bool = False) -> None:
-    api_context.set_version(version=version, strict=strict)
-api_context = APIContext()
+from openml._api.runtime.core import build_backend
+_backend = build_backend("v1", strict=False)
+def set_api_version(version: str, *, strict: bool = False) -> None:
+    global _backend
+    _backend = build_backend(version=version, strict=strict)
+def backend() -> APIBackend:
+    return _backend

Uh oh!

[ENH] V1 → V2 API Migration - core structure #1576

Are you sure you want to change the base?

[ENH] V1 → V2 API Migration - core structure #1576

Uh oh!

Conversation

geetu040 commented Dec 30, 2025

Uh oh!

codecov-commenter commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SimonBlanke Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geetu040 Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PGijsbers Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PGijsbers left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PGijsbers Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PGijsbers Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

codecov-commenter commented Dec 31, 2025 •

edited

Loading

SimonBlanke Jan 9, 2026 •

edited

Loading

geetu040 Jan 9, 2026 •

edited

Loading

PGijsbers Jan 14, 2026 •

edited

Loading

PGijsbers left a comment •

edited

Loading

PGijsbers Jan 15, 2026 •

edited

Loading

PGijsbers Jan 15, 2026 •

edited

Loading