Skip to content

Context update order in merge (processing step) #471

@notactuallyfinn

Description

@notactuallyfinn

In #470 ld_merge_containers are added.
While merging the results of the harvesters (starting with the most important one and adding less important ones one by one), the context of the ld_merge_dict "A" which contains the progress of the merge is updated with the new context "B".
This is done by a) updating the last dict in A.context with all values in "B"s dictionary/-ies and by b) putting all strings in "B" in reversed order of occurrence before all other items in A.context.
a) leads to the problem that if A.context[-1] maps the same compacted key to a different iri then "B", "A"s mapping is overwritten although "B" is the context of data with less importance:

from hermes.model.types import ld_dict
from hermes.model.merge.container import ld_merge_dict

obj = ld_merge_dict([{}], context=[{"codemeta": "https://doi.org/10.5063/schema/codemeta-2.0/"}])
obj["codemeta:softwareSuggestions"] = "https://github.com/softwarepub/hermes/issues"
new_obj = ld_dict([{}], context=[{"codemeta": "https://doi.org/10.5063/schema/codemeta-1.0/"}])
new_obj["codemeta:zippedCode"] = "https://github.com/softwarepub/hermes"

obj.update(new_obj)  # resulting context is only [{"codemeta": "https://doi.org/10.5063/schema/codemeta-1.0/"}]
assert obj["https://doi.org/10.5063/schema/codemeta-1.0/zippedCode"] == ["https://github.com/softwarepub/hermes"]  # True
assert obj["codemeta:zippedCode"] == ["https://github.com/softwarepub/hermes"]  # True
assert obj["https://doi.org/10.5063/schema/codemeta-2.0/softwareSuggestions"] == ["https://github.com/softwarepub/hermes/issues"]  # True
assert obj["codemeta:softwareSuggestions"] == ["https://github.com/softwarepub/hermes/issues"]  # KeyError

b) just leads to weird priorities and shouldn't affect the merging.

My suggestion is to set "A"s context to "B" concatenated with "A"s old context. This avoids deleting compaction options and will make the conversions in "A" more important then those in "B".

Metadata

Metadata

Assignees

No one assigned

    Labels

    2️ process/validateThe processing/validation step in the workflowinvalidThis doesn't seem right

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions