New storage layout extension NNNN-uri-direct-storage-layout#63
New storage layout extension NNNN-uri-direct-storage-layout#63
Conversation
|
Should there be a note about the possibility of many to one mappings created by the idea of regex->string replacement? A warning or advice about what to do in the event of collision? Is the replacement strictly regex->string or is regex->regex-replacement intended? I ask because later there is a comment about static string replacement which sort of implies there might be something dynamic too How should multiple replacements be handled? Are the regexes applied in turn? Ie. does |
|
@alvinsw : Is this extension still of interest? There are open questions above. |
|
Yes I think the replacement is a weak point here. After discussion with @ptsefton we'll take off all the parameters including the one for string replacement. So the spec should be simpler and straightforward. |
|
We still need the layout option where we can directly map a URI to directory paths without any hashing. |
|
Thanks for replying @alvinsw . If you are still interested in moving this forward, could you push the update to reduce parameters you proposed in your Aug 9, 2023 #63 (comment). (We aren't sure whether #57 is going to move forward but anyway the comments from @neilsjefferies do not apply to your use case.) |
|
@zimeon I have updated the document and removed all parameters. Can this be approved now? |
|
Sorry, we need to have the suffix parameter back. I have updated the PR with a new commit. |
There was a problem hiding this comment.
suffix issues
- Line 14 text assumes that the suffix is
/__object__when it should be written in terms of thesuffixparameter - Why does the
suffixdefault to/__object__? -- this seems an oddly Pythonic choice. Would it be better if the default value were nothing as that seems more generally useful? - What are the limitations on the
suffixparameter? Is nothing OK (I hope so)? Does it need to start with a slash if not nothing? Is it OK to have multiple path segments (/one/two)? Should dangerous things not be allowed (/../../../../)? Must it not end with a slash? - There should be some warning to note that using the suffix as a way to tell whether one has reached an object in the layout is unsafe (e.g. id
https://example.org/a/__object__/b/would give root pathhttps_example.org/a/__object__/b/__object) - The warning about not creating nested object should be in the limitations section (not just example 2). This applies even with a suffix (see bullet above)
Procedure
- The reference for determining "is a URI" should be given. https://datatracker.ietf.org/doc/html/rfc3986#section-3 perhaps?
- What about the optional
userinfoand:portelements of theauthoritycomponent of the URI that contains thehostname? - The scheme in a
file://URI isfile(without the://or just:if a relative file URI) - Removing the
filescheme means relative file paths and non-URI strings overlap in the output object root path (e.g.file:helloandhelloboth map tohello). Does that deserve mention? - Syntax - bullet sequence is 1, 3, 4
- Point 4 -- one is appending to the final "OCFL object root path" not pathname
|
|
||
| This storage root extension describes a transparent path-based OCFL storage layout. URI and path based identifiers are mapped directly to multi-level directory path that are direct children of the OCFL storage root directory. | ||
|
|
||
| This extension assumes that the OCFL object identifier is a URI or a path name which is used directly to create nested paths under the OCFL storage root. An extra directory called `__object__` is added to the path to safely ensure that OCFL object is not nested under anoother OCFL object. |
|
|
||
| #### Mappings | ||
|
|
||
| NOTE: The [The Archive and Package (arcp) URI scheme](https://www.research.manchester.ac.uk/portal/files/76956641/arcp.html) (ARCP) is used in these examples. It allows URI IDs to be minted locally by an archive which can be used in Linked Data systems as URIs. |
There was a problem hiding this comment.
minor suggestion: drop the duplicate 'The'
|
|
||
| NOTE: The [The Archive and Package (arcp) URI scheme](https://www.research.manchester.ac.uk/portal/files/76956641/arcp.html) (ARCP) is used in these examples. It allows URI IDs to be minted locally by an archive which can be used in Linked Data systems as URIs. | ||
|
|
||
| | Object ID | Object Root Path | |
There was a problem hiding this comment.
At line:19 above, you mention:
Object IDs cannot include characters that are illegal in directory names (for example, slash or backslash)
However, in your examples, the Object IDs contain /.
|
|
||
| ### Example 2 | ||
|
|
||
| This example demonstrates the effect of using a custom `suffix` to change the default `/__object__` name convention as the leaf directory that contains an OCFL Object. If set to an empty string, the user must ensure that all the supplied URIs have a structure that does not allow nested objects. |
There was a problem hiding this comment.
If set to an empty string, the user must ensure that all the supplied URIs have a structure that does not allow nested objects.
This would benefit from further specification, and should likely be better suited for the 'Parameters' section.
What is the suggested behavior if a collision or nesting occurs? An invalid OCFL Storage Root would be the result if not handled.
| ## Procedure | ||
|
|
||
| The following is an outline of the steps to map an OCFL object identifier to an OCFL object root path: | ||
| 1. If the identifier is a URI, parse the URI and identify the scheme, hostname, and path, and ignore the rest. |
There was a problem hiding this comment.
Percent-encoding (e.g., %20) in URIs may need special consideration due to filesystem compatibility problems.
| * **Name:** suffix | ||
| * **Description:** The suffix to be appended to the end of the path | ||
| * **Type:** string | ||
| * **Default:** "/__object__" |
There was a problem hiding this comment.
Suggestion to escape the __ due to markdown turning this into bold.
e.g., __object__ vs object
We want to be able to directly map a URI into the storage layout directory.
It may be similar to #57 but simpler and just use the path of any valir URI directly.