Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ rslv/data
*.so

# Distribution / packaging
dist/
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
Expand Down
15 changes: 15 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/astral-sh/uv-pre-commit
# uv version.
rev: 0.8.14
hooks:
# Update the uv lockfile
- id: uv-lock
- id: uv-export
args: ["--no-hashes", "--no-dev", "--no-group", "docs", "--output-file=requirements.txt"]
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

FastAPI implementation of an identifier resolver.

The `rslv` implementation provides a generic identifier resolution service that may be adapted
to support different schemes. The base requirement is that identifiers are of the
form `scheme:content` where `scheme` is a scheme name (e.g. "doi" or "ark") and `content` is the
The `rslv` implementation provides a generic identifier resolution service that may be adapted
to support different schemes. The base requirement is that identifiers are of the
form `scheme:content` where `scheme` is a scheme name (e.g. "doi" or "ark") and `content` is the
value of the identifier (e.g. `10.12345/foo` or `99999/fd99`).

## Operation
Expand Down Expand Up @@ -38,4 +38,4 @@ With `uvicorn` installed, a development instance of `rslv` can be started from t
python rslv/app.py
```

The service may be accessed at http://localhost:8000/
The service may be accessed at http://localhost:8000/
2 changes: 1 addition & 1 deletion _docsrc/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,4 @@ codebraid \
--wrap=none \
--output "$dest" \
--overwrite \
"$src"
"$src"
4 changes: 1 addition & 3 deletions _docsrc/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ title: Configuring rslv
## Service Setup

- `config.py` options
- `logging.conf`
- `logging.conf`

## Resolver Configuration

Expand Down Expand Up @@ -55,5 +55,3 @@ scheme:prefix/value
Target Template

The target is specified in the definition as a template with placeholders that are filled by components of the parsed identifier.


1 change: 0 additions & 1 deletion _docsrc/doc_parts/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,4 +108,3 @@ def defn_match_table(pids=EXAMPLE_PIDS):
}
results.append(result)
print(markdown_table(results).set_params(row_sep='markdown', quote=False).get_markdown())

6 changes: 3 additions & 3 deletions _docsrc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: "`rslv` Generic Resolver Service"

`rslv` implements a resolver service. That is, given an identifier string, the service returns information about the identifier or redirects to the known location of the identified resource.

`rslv` is written in Python and requires python version 3.9 or later. `rslv` may be run as a command line application or more typically, as a web service.
`rslv` is written in Python and requires python version 3.9 or later. `rslv` may be run as a command line application or more typically, as a web service.

## Installation

Expand Down Expand Up @@ -36,7 +36,7 @@ For development purposes, `rslv` may be run from the command line to provide a t
python rslv/app.py
```

A production deployment should use an ASGI server such as [Uvicorn](https://www.uvicorn.org/) or [Nginx Unit](https://unit.nginx.org/). Uivcorn will generally be deployed behind another web server such as Apache or Nginx whereas `Unit` may be deployed as the web server.
A production deployment should use an ASGI server such as [Uvicorn](https://www.uvicorn.org/) or [Nginx Unit](https://unit.nginx.org/). Uivcorn will generally be deployed behind another web server such as Apache or Nginx whereas `Unit` may be deployed as the web server.



Expand All @@ -45,4 +45,4 @@ Alternatively, `rslv` may be deployed to a cloud provider such as `Vercel`



##
##
10 changes: 5 additions & 5 deletions _docsrc/matching.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ It does this by splitting the input identifier string into various components an

<figcaption>

**Figure 1.** Overview of process for handling a user supplied identifier string. The string is split into components as a `parsed_pid` instance. That instance is matched against the available definitions. A match provides a `pid_definition` instance which is used with the `If a match is found then the response is a redirect to the registered target or the matched definition metadata.
**Figure 1.** Overview of process for handling a user supplied identifier string. The string is split into components as a `parsed_pid` instance. That instance is matched against the available definitions. A match provides a `pid_definition` instance which is used with the `If a match is found then the response is a redirect to the registered target or the matched definition metadata.

</figcaption>

Expand All @@ -38,7 +38,7 @@ The provided identifier string is split into several components (Figure 2) by ap
3. Left trim whitespace or any instances of the characters `:`, `/` from the second portion. This portion is the `content`.
4. Split `content` at the first occurrence of the forward slash character ("/").
5. The first portion is the `prefix`
6. Left trim whitespace pr any instance of the characters `:`, `/` from the second portion. This portion is the `value`.
6. Left trim whitespace pr any instance of the characters `:`, `/` from the second portion. This portion is the `value`.

<figure>

Expand All @@ -50,7 +50,7 @@ identifier = ark:12345/some_value/with?extra=foo
| | |
scheme | value
prefix

scheme = "ark"
content = "12345/some_value/with?extra=foo"
prefix = "12345"
Expand All @@ -59,7 +59,7 @@ value = "some_value/with?extra=foo"

<figcaption>

**Figure 2.** Components of a `parsed_pid`. After parsing, extracted components of the identifier are available for locating a matching definition and formatting the response.
**Figure 2.** Components of a `parsed_pid`. After parsing, extracted components of the identifier are available for locating a matching definition and formatting the response.

</figcaption>
</figure>
Expand Down Expand Up @@ -105,7 +105,7 @@ examples = [
"ark:99999/foozle",
"ark:example/foozle",
"ark:99999/fk4qwerty",
"ark:99999/fkqwerty",
"ark:99999/fkqwerty",
]
doc_parts.defn_match_table(pids=examples)
```
4 changes: 2 additions & 2 deletions docs/matching.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
comment: |
’’’ codebraid pandoc –katex –from markdown+tex_math_single_backslash –filter pandoc-sidenote
’’’ codebraid pandoc –katex –from markdown+tex_math_single_backslash –filter pandoc-sidenote
–to html5+smart –template=$HOME/.pandoc/templates/template.html5 \
--css=$HOME/.pandoc/theme.css –toc –wrap=none matching.md \> matching.html ’’’

Expand Down Expand Up @@ -49,7 +49,7 @@ The provided identifier string is split into several components (Figure 2) by ap
| | |
scheme | value
prefix

scheme = "ark"
content = "12345/some_value/with?extra=foo"
prefix = "12345"
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "rslv"
version = "0.9.6"
version = "0.10.0"
description = "Provides an identifier resolver service in FastAPI."
authors = [{ name = "datadavev", email = "605409+datadavev@users.noreply.github.com" }]
requires-python = ">=3.9,<3.13"
Expand Down Expand Up @@ -28,6 +28,7 @@ dev = [
"httpx>=0.28.1,<0.29",
"flake8>=7.1.1,<8",
"black>=25.1.0,<26",
"pre-commit>=4.3.0",
]
cli = [
"click>=8,<9",
Expand All @@ -47,4 +48,3 @@ default-groups = [
]

[tool.poetry_bumpversion.file."rslv/__init__.py"]

2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# This file was autogenerated by uv via the following command:
# uv export --no-hashes --no-group dev --no-group docs --format requirements-txt
# uv export --no-hashes --no-dev --no-group docs --output-file=requirements.txt
-e .
annotated-types==0.7.0
# via pydantic
Expand Down
2 changes: 2 additions & 0 deletions rslv/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ class Settings(pydantic_settings.BaseSettings):
# Note that this should be set False on services offering one-to-one matching of
# definitions to PIDs. For N2T and arks.org this sould be true to match legacy behavior.
auto_introspection: bool = True
# Optional header that if set, service returns a 200 code instead of redirect.
request_no_redirect: str = "x-no-redirect"


def load_settings():
Expand Down
6 changes: 3 additions & 3 deletions rslv/lib_rslv/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,14 +39,14 @@ def split_identifier_string(pid_str: str) -> typing.Dict[str, typing.Any]:
parsed["scheme"] = _parts[0].strip().lower()
try:
parsed["content"] = _parts[1].lstrip(" /:")
parsed["content"] = parsed["content"].strip() # type: ignore
parsed["content"] = parsed["content"].strip() # type: ignore
except IndexError:
return parsed
_parts = parsed["content"].split("/", 1) # type: ignore
_parts = parsed["content"].split("/", 1) # type: ignore
parsed["prefix"] = _parts[0].strip()
try:
parsed["value"] = _parts[1].lstrip(" /")
parsed["value"] = parsed["value"].strip() # type: ignore
parsed["value"] = parsed["value"].strip() # type: ignore
except IndexError:
pass
return parsed
Expand Down
16 changes: 11 additions & 5 deletions rslv/lib_rslv/piddefine.py
Original file line number Diff line number Diff line change
Expand Up @@ -442,12 +442,18 @@ def parse(
)
parts["suffix"] = pid_str[suffix_pos:]

# Hack alert - need to deal with the oddness of ARK identifiers ignoring hyphens.
# Hack alert - Optionally need to deal with the oddness of ARK identifiers ignoring hyphens.
if parts["scheme"] == "ark":
# remove hyphens from the content and value portions, but not from the query portion, if present...
parts["content"] = rslv.lib_rslv.remove_hyphens(parts["content"])
parts["value"] = rslv.lib_rslv.remove_hyphens(parts["value"])
parts["suffix"] = rslv.lib_rslv.remove_hyphens(parts["suffix"])
# Hyphen stripping for ARKs is optionally set at the definition level
# and defaults to True to match the legacy resolver behavior
strip_ark_hyphens = True
if pid_definition.properties is not None:
strip_ark_hyphens = pid_definition.properties.get("strip_hyphens", True)
if strip_ark_hyphens:
# remove hyphens from the content and value portions, but not from the query portion, if present...
parts["content"] = rslv.lib_rslv.remove_hyphens(parts["content"])
parts["value"] = rslv.lib_rslv.remove_hyphens(parts["value"])
parts["suffix"] = rslv.lib_rslv.remove_hyphens(parts["suffix"])
return parts, pid_definition

def list_schemes(self, valid_targets_only: bool = False):
Expand Down
61 changes: 49 additions & 12 deletions rslv/routers/resolver.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,12 +122,16 @@ def get_service_info(request: fastapi.Request, valid: bool = True):


def handle_get_info(
request: fastapi.Request, cleaned_identifier: CleanedIdentifierRequest
request: fastapi.Request,
cleaned_identifier: CleanedIdentifierRequest,
pid_config,
pid_parts: dict,
definition: typing.Optional[rslv.lib_rslv.piddefine.PidDefinition]
):
pid_config = rslv.lib_rslv.piddefine.PidDefinitionCatalog(request.state.dbsession)
pid_parts, definition = pid_config.parse(
cleaned_identifier.cleaned, resolve_synonym=False
)
#pid_config = rslv.lib_rslv.piddefine.PidDefinitionCatalog(request.state.dbsession)
#pid_parts, definition = pid_config.parse(
# cleaned_identifier.cleaned, resolve_synonym=False
#)
# TODO: This is where a definition specific handler can be used for
# further processing of the PID, e.g. to remove hyphens from an ark.
# Basically, add a property to the definition that contains the name
Expand Down Expand Up @@ -216,7 +220,18 @@ def get_info(
str(request.url), identifier, request.app.state.settings.service_pattern
)

return handle_get_info(request, cleaned_identifier)
pid_config = rslv.lib_rslv.piddefine.PidDefinitionCatalog(request.state.dbsession)
pid_parts, definition = pid_config.parse(
cleaned_identifier.cleaned, resolve_synonym=False
)

return handle_get_info(
request,
cleaned_identifier,
pid_config,
pid_parts,
definition
)


@router.head(
Expand Down Expand Up @@ -275,16 +290,22 @@ def get_resolve(
str(request.url), identifier, request.app.state.settings.service_pattern
)

# If the request was for introspection (inflection) use the info handler
if cleaned_identifier.is_introspection:
return handle_get_info(request, cleaned_identifier)

# Get the identifier configuration catalog
pid_config = rslv.lib_rslv.piddefine.PidDefinitionCatalog(request.state.dbsession)

# Split the identifier string into components and find the best match from the catalog
pid_parts, definition = pid_config.parse(cleaned_identifier.cleaned)
# TODO: see above in get_info for PID handling with specific schemes.

# If the request was for introspection (inflection) use the info handler
if cleaned_identifier.is_introspection:
return handle_get_info(
request,
cleaned_identifier,
pid_config,
pid_parts,
definition
)

if definition is None:
# Return a 404 response and include the pid parts in the body with a
Expand All @@ -307,7 +328,13 @@ def get_resolve(
None,
"",
]:
return handle_get_info(request, cleaned_identifier)
return handle_get_info(
request,
cleaned_identifier,
pid_config,
pid_parts,
definition
)
# If the PID value part matches the value part of the matched definition,
# then return the definition information. This is sketchy behavior but included
# here because it follows the legacy N2T behavior. It can be disabled through
Expand All @@ -316,11 +343,21 @@ def get_resolve(
request.app.state.settings.auto_introspection
and pid_parts["value"] == definition.value
):
return handle_get_info(request, cleaned_identifier)
return handle_get_info(
request,
cleaned_identifier,
pid_config,
pid_parts,
definition
)
# OK, past all the edge cases, redirect the client to the registered target.
pid_parts["canonical"] = pid_format(pid_parts, definition.canonical)
pid_parts["status_code"] = response_status_code
headers = {"Location": _target}
# Check if request includes no redirect header and
# override the redirect if so.
if request.app.state.settings.request_no_redirect in request.headers:
response_status_code = 200
return fastapi.responses.JSONResponse(
content=pid_parts,
headers=headers,
Expand Down
2 changes: 1 addition & 1 deletion rslv/static/style.css
Original file line number Diff line number Diff line change
Expand Up @@ -193,4 +193,4 @@ ul.blog-posts li a:visited {
.helptext {
color: #aaa;
}
}
}
2 changes: 1 addition & 1 deletion rslv/templates/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,4 @@ <h1>rslv - Identifier Resolution Service</h1>
</content>
</main>
<footer></footer>
</body>
</body>
11 changes: 11 additions & 0 deletions tests/test_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,13 @@ def do_add(cfg, entry):
scheme="ark", prefix="12345", value="up", properties={"name": "Frank"}
),
)
do_add(
cfg,
rslv.lib_rslv.piddefine.PidDefinition(
scheme="ark", prefix="12345", value="nostrip",
properties={"name": "Frank", "strip_hyphens":False}
),
)
cfg.refresh_metadata()


Expand All @@ -90,6 +97,10 @@ def do_add(cfg, entry):
"bark:99999/fk44wlr;jglerig",
{"scheme": "ark", "prefix": "99999", "value": "fk4"},
),
(
"ark:12345/nostrip-test",
{"scheme": "ark", "prefix": "12345", "value": "nostrip", "suffix":"-test"},
)
)


Expand Down
Loading