diff --git a/.gitignore b/.gitignore
index 15f5b27..1fdaef5 100644
--- a/.gitignore
+++ b/.gitignore
@@ -12,10 +12,10 @@ rslv/data
*.so
# Distribution / packaging
+dist/
.Python
build/
develop-eggs/
-dist/
downloads/
eggs/
.eggs/
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
new file mode 100644
index 0000000..f2c5892
--- /dev/null
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,15 @@
+repos:
+- repo: https://github.com/pre-commit/pre-commit-hooks
+ rev: v2.3.0
+ hooks:
+ - id: check-yaml
+ - id: end-of-file-fixer
+ - id: trailing-whitespace
+- repo: https://github.com/astral-sh/uv-pre-commit
+ # uv version.
+ rev: 0.8.14
+ hooks:
+ # Update the uv lockfile
+ - id: uv-lock
+ - id: uv-export
+ args: ["--no-hashes", "--no-dev", "--no-group", "docs", "--output-file=requirements.txt"]
diff --git a/README.md b/README.md
index 0d91991..420edbd 100644
--- a/README.md
+++ b/README.md
@@ -2,9 +2,9 @@
FastAPI implementation of an identifier resolver.
-The `rslv` implementation provides a generic identifier resolution service that may be adapted
-to support different schemes. The base requirement is that identifiers are of the
-form `scheme:content` where `scheme` is a scheme name (e.g. "doi" or "ark") and `content` is the
+The `rslv` implementation provides a generic identifier resolution service that may be adapted
+to support different schemes. The base requirement is that identifiers are of the
+form `scheme:content` where `scheme` is a scheme name (e.g. "doi" or "ark") and `content` is the
value of the identifier (e.g. `10.12345/foo` or `99999/fd99`).
## Operation
@@ -38,4 +38,4 @@ With `uvicorn` installed, a development instance of `rslv` can be started from t
python rslv/app.py
```
-The service may be accessed at http://localhost:8000/
\ No newline at end of file
+The service may be accessed at http://localhost:8000/
diff --git a/_docsrc/build.sh b/_docsrc/build.sh
index 887ff2b..8925549 100755
--- a/_docsrc/build.sh
+++ b/_docsrc/build.sh
@@ -57,4 +57,4 @@ codebraid \
--wrap=none \
--output "$dest" \
--overwrite \
- "$src"
\ No newline at end of file
+ "$src"
diff --git a/_docsrc/configuration.md b/_docsrc/configuration.md
index 0d3f977..f420c98 100644
--- a/_docsrc/configuration.md
+++ b/_docsrc/configuration.md
@@ -7,7 +7,7 @@ title: Configuring rslv
## Service Setup
- `config.py` options
-- `logging.conf`
+- `logging.conf`
## Resolver Configuration
@@ -55,5 +55,3 @@ scheme:prefix/value
Target Template
The target is specified in the definition as a template with placeholders that are filled by components of the parsed identifier.
-
-
diff --git a/_docsrc/doc_parts/__init__.py b/_docsrc/doc_parts/__init__.py
index 9025dcc..e20d7d5 100644
--- a/_docsrc/doc_parts/__init__.py
+++ b/_docsrc/doc_parts/__init__.py
@@ -108,4 +108,3 @@ def defn_match_table(pids=EXAMPLE_PIDS):
}
results.append(result)
print(markdown_table(results).set_params(row_sep='markdown', quote=False).get_markdown())
-
diff --git a/_docsrc/index.md b/_docsrc/index.md
index fc57f67..85ba8e9 100644
--- a/_docsrc/index.md
+++ b/_docsrc/index.md
@@ -4,7 +4,7 @@ title: "`rslv` Generic Resolver Service"
`rslv` implements a resolver service. That is, given an identifier string, the service returns information about the identifier or redirects to the known location of the identified resource.
-`rslv` is written in Python and requires python version 3.9 or later. `rslv` may be run as a command line application or more typically, as a web service.
+`rslv` is written in Python and requires python version 3.9 or later. `rslv` may be run as a command line application or more typically, as a web service.
## Installation
@@ -36,7 +36,7 @@ For development purposes, `rslv` may be run from the command line to provide a t
python rslv/app.py
```
-A production deployment should use an ASGI server such as [Uvicorn](https://www.uvicorn.org/) or [Nginx Unit](https://unit.nginx.org/). Uivcorn will generally be deployed behind another web server such as Apache or Nginx whereas `Unit` may be deployed as the web server.
+A production deployment should use an ASGI server such as [Uvicorn](https://www.uvicorn.org/) or [Nginx Unit](https://unit.nginx.org/). Uivcorn will generally be deployed behind another web server such as Apache or Nginx whereas `Unit` may be deployed as the web server.
@@ -45,4 +45,4 @@ Alternatively, `rslv` may be deployed to a cloud provider such as `Vercel`
-##
\ No newline at end of file
+##
diff --git a/_docsrc/matching.md b/_docsrc/matching.md
index cf9dfbe..579fa34 100644
--- a/_docsrc/matching.md
+++ b/_docsrc/matching.md
@@ -22,7 +22,7 @@ It does this by splitting the input identifier string into various components an
-**Figure 1.** Overview of process for handling a user supplied identifier string. The string is split into components as a `parsed_pid` instance. That instance is matched against the available definitions. A match provides a `pid_definition` instance which is used with the `If a match is found then the response is a redirect to the registered target or the matched definition metadata.
+**Figure 1.** Overview of process for handling a user supplied identifier string. The string is split into components as a `parsed_pid` instance. That instance is matched against the available definitions. A match provides a `pid_definition` instance which is used with the `If a match is found then the response is a redirect to the registered target or the matched definition metadata.
@@ -38,7 +38,7 @@ The provided identifier string is split into several components (Figure 2) by ap
3. Left trim whitespace or any instances of the characters `:`, `/` from the second portion. This portion is the `content`.
4. Split `content` at the first occurrence of the forward slash character ("/").
5. The first portion is the `prefix`
-6. Left trim whitespace pr any instance of the characters `:`, `/` from the second portion. This portion is the `value`.
+6. Left trim whitespace pr any instance of the characters `:`, `/` from the second portion. This portion is the `value`.
@@ -50,7 +50,7 @@ identifier = ark:12345/some_value/with?extra=foo
| | |
scheme | value
prefix
-
+
scheme = "ark"
content = "12345/some_value/with?extra=foo"
prefix = "12345"
@@ -59,7 +59,7 @@ value = "some_value/with?extra=foo"
-**Figure 2.** Components of a `parsed_pid`. After parsing, extracted components of the identifier are available for locating a matching definition and formatting the response.
+**Figure 2.** Components of a `parsed_pid`. After parsing, extracted components of the identifier are available for locating a matching definition and formatting the response.
@@ -105,7 +105,7 @@ examples = [
"ark:99999/foozle",
"ark:example/foozle",
"ark:99999/fk4qwerty",
- "ark:99999/fkqwerty",
+ "ark:99999/fkqwerty",
]
doc_parts.defn_match_table(pids=examples)
```
diff --git a/docs/matching.md b/docs/matching.md
index d7cba20..1644e22 100644
--- a/docs/matching.md
+++ b/docs/matching.md
@@ -1,6 +1,6 @@
---
comment: |
- ’’’ codebraid pandoc –katex –from markdown+tex_math_single_backslash –filter pandoc-sidenote
+ ’’’ codebraid pandoc –katex –from markdown+tex_math_single_backslash –filter pandoc-sidenote
–to html5+smart –template=$HOME/.pandoc/templates/template.html5 \
--css=$HOME/.pandoc/theme.css –toc –wrap=none matching.md \> matching.html ’’’
@@ -49,7 +49,7 @@ The provided identifier string is split into several components (Figure 2) by ap
| | |
scheme | value
prefix
-
+
scheme = "ark"
content = "12345/some_value/with?extra=foo"
prefix = "12345"
diff --git a/pyproject.toml b/pyproject.toml
index 8261d3a..7082751 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
[project]
name = "rslv"
-version = "0.9.6"
+version = "0.10.0"
description = "Provides an identifier resolver service in FastAPI."
authors = [{ name = "datadavev", email = "605409+datadavev@users.noreply.github.com" }]
requires-python = ">=3.9,<3.13"
@@ -28,6 +28,7 @@ dev = [
"httpx>=0.28.1,<0.29",
"flake8>=7.1.1,<8",
"black>=25.1.0,<26",
+ "pre-commit>=4.3.0",
]
cli = [
"click>=8,<9",
@@ -47,4 +48,3 @@ default-groups = [
]
[tool.poetry_bumpversion.file."rslv/__init__.py"]
-
diff --git a/requirements.txt b/requirements.txt
index af2f6f0..d2f0935 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,5 +1,5 @@
# This file was autogenerated by uv via the following command:
-# uv export --no-hashes --no-group dev --no-group docs --format requirements-txt
+# uv export --no-hashes --no-dev --no-group docs --output-file=requirements.txt
-e .
annotated-types==0.7.0
# via pydantic
diff --git a/rslv/config.py b/rslv/config.py
index 70ec662..f52040d 100644
--- a/rslv/config.py
+++ b/rslv/config.py
@@ -46,6 +46,8 @@ class Settings(pydantic_settings.BaseSettings):
# Note that this should be set False on services offering one-to-one matching of
# definitions to PIDs. For N2T and arks.org this sould be true to match legacy behavior.
auto_introspection: bool = True
+ # Optional header that if set, service returns a 200 code instead of redirect.
+ request_no_redirect: str = "x-no-redirect"
def load_settings():
diff --git a/rslv/lib_rslv/__init__.py b/rslv/lib_rslv/__init__.py
index 3eeffa9..34e9f03 100644
--- a/rslv/lib_rslv/__init__.py
+++ b/rslv/lib_rslv/__init__.py
@@ -39,14 +39,14 @@ def split_identifier_string(pid_str: str) -> typing.Dict[str, typing.Any]:
parsed["scheme"] = _parts[0].strip().lower()
try:
parsed["content"] = _parts[1].lstrip(" /:")
- parsed["content"] = parsed["content"].strip() # type: ignore
+ parsed["content"] = parsed["content"].strip() # type: ignore
except IndexError:
return parsed
- _parts = parsed["content"].split("/", 1) # type: ignore
+ _parts = parsed["content"].split("/", 1) # type: ignore
parsed["prefix"] = _parts[0].strip()
try:
parsed["value"] = _parts[1].lstrip(" /")
- parsed["value"] = parsed["value"].strip() # type: ignore
+ parsed["value"] = parsed["value"].strip() # type: ignore
except IndexError:
pass
return parsed
diff --git a/rslv/lib_rslv/piddefine.py b/rslv/lib_rslv/piddefine.py
index 8cb7a64..d30b014 100644
--- a/rslv/lib_rslv/piddefine.py
+++ b/rslv/lib_rslv/piddefine.py
@@ -442,12 +442,18 @@ def parse(
)
parts["suffix"] = pid_str[suffix_pos:]
- # Hack alert - need to deal with the oddness of ARK identifiers ignoring hyphens.
+ # Hack alert - Optionally need to deal with the oddness of ARK identifiers ignoring hyphens.
if parts["scheme"] == "ark":
- # remove hyphens from the content and value portions, but not from the query portion, if present...
- parts["content"] = rslv.lib_rslv.remove_hyphens(parts["content"])
- parts["value"] = rslv.lib_rslv.remove_hyphens(parts["value"])
- parts["suffix"] = rslv.lib_rslv.remove_hyphens(parts["suffix"])
+ # Hyphen stripping for ARKs is optionally set at the definition level
+ # and defaults to True to match the legacy resolver behavior
+ strip_ark_hyphens = True
+ if pid_definition.properties is not None:
+ strip_ark_hyphens = pid_definition.properties.get("strip_hyphens", True)
+ if strip_ark_hyphens:
+ # remove hyphens from the content and value portions, but not from the query portion, if present...
+ parts["content"] = rslv.lib_rslv.remove_hyphens(parts["content"])
+ parts["value"] = rslv.lib_rslv.remove_hyphens(parts["value"])
+ parts["suffix"] = rslv.lib_rslv.remove_hyphens(parts["suffix"])
return parts, pid_definition
def list_schemes(self, valid_targets_only: bool = False):
diff --git a/rslv/routers/resolver.py b/rslv/routers/resolver.py
index 82b8315..a403561 100644
--- a/rslv/routers/resolver.py
+++ b/rslv/routers/resolver.py
@@ -122,12 +122,16 @@ def get_service_info(request: fastapi.Request, valid: bool = True):
def handle_get_info(
- request: fastapi.Request, cleaned_identifier: CleanedIdentifierRequest
+ request: fastapi.Request,
+ cleaned_identifier: CleanedIdentifierRequest,
+ pid_config,
+ pid_parts: dict,
+ definition: typing.Optional[rslv.lib_rslv.piddefine.PidDefinition]
):
- pid_config = rslv.lib_rslv.piddefine.PidDefinitionCatalog(request.state.dbsession)
- pid_parts, definition = pid_config.parse(
- cleaned_identifier.cleaned, resolve_synonym=False
- )
+ #pid_config = rslv.lib_rslv.piddefine.PidDefinitionCatalog(request.state.dbsession)
+ #pid_parts, definition = pid_config.parse(
+ # cleaned_identifier.cleaned, resolve_synonym=False
+ #)
# TODO: This is where a definition specific handler can be used for
# further processing of the PID, e.g. to remove hyphens from an ark.
# Basically, add a property to the definition that contains the name
@@ -216,7 +220,18 @@ def get_info(
str(request.url), identifier, request.app.state.settings.service_pattern
)
- return handle_get_info(request, cleaned_identifier)
+ pid_config = rslv.lib_rslv.piddefine.PidDefinitionCatalog(request.state.dbsession)
+ pid_parts, definition = pid_config.parse(
+ cleaned_identifier.cleaned, resolve_synonym=False
+ )
+
+ return handle_get_info(
+ request,
+ cleaned_identifier,
+ pid_config,
+ pid_parts,
+ definition
+ )
@router.head(
@@ -275,16 +290,22 @@ def get_resolve(
str(request.url), identifier, request.app.state.settings.service_pattern
)
- # If the request was for introspection (inflection) use the info handler
- if cleaned_identifier.is_introspection:
- return handle_get_info(request, cleaned_identifier)
# Get the identifier configuration catalog
pid_config = rslv.lib_rslv.piddefine.PidDefinitionCatalog(request.state.dbsession)
# Split the identifier string into components and find the best match from the catalog
pid_parts, definition = pid_config.parse(cleaned_identifier.cleaned)
- # TODO: see above in get_info for PID handling with specific schemes.
+
+ # If the request was for introspection (inflection) use the info handler
+ if cleaned_identifier.is_introspection:
+ return handle_get_info(
+ request,
+ cleaned_identifier,
+ pid_config,
+ pid_parts,
+ definition
+ )
if definition is None:
# Return a 404 response and include the pid parts in the body with a
@@ -307,7 +328,13 @@ def get_resolve(
None,
"",
]:
- return handle_get_info(request, cleaned_identifier)
+ return handle_get_info(
+ request,
+ cleaned_identifier,
+ pid_config,
+ pid_parts,
+ definition
+ )
# If the PID value part matches the value part of the matched definition,
# then return the definition information. This is sketchy behavior but included
# here because it follows the legacy N2T behavior. It can be disabled through
@@ -316,11 +343,21 @@ def get_resolve(
request.app.state.settings.auto_introspection
and pid_parts["value"] == definition.value
):
- return handle_get_info(request, cleaned_identifier)
+ return handle_get_info(
+ request,
+ cleaned_identifier,
+ pid_config,
+ pid_parts,
+ definition
+ )
# OK, past all the edge cases, redirect the client to the registered target.
pid_parts["canonical"] = pid_format(pid_parts, definition.canonical)
pid_parts["status_code"] = response_status_code
headers = {"Location": _target}
+ # Check if request includes no redirect header and
+ # override the redirect if so.
+ if request.app.state.settings.request_no_redirect in request.headers:
+ response_status_code = 200
return fastapi.responses.JSONResponse(
content=pid_parts,
headers=headers,
diff --git a/rslv/static/style.css b/rslv/static/style.css
index 7a06dcd..1cb1983 100644
--- a/rslv/static/style.css
+++ b/rslv/static/style.css
@@ -193,4 +193,4 @@ ul.blog-posts li a:visited {
.helptext {
color: #aaa;
}
-}
\ No newline at end of file
+}
diff --git a/rslv/templates/index.html b/rslv/templates/index.html
index acab596..31ad6d3 100644
--- a/rslv/templates/index.html
+++ b/rslv/templates/index.html
@@ -34,4 +34,4 @@