One possible patch is to augment the EAD exported from ArchivesSpace to ensure that each JPCA finding aid will have the expected c01-c03 structure. This issue will be tagged to a branch that can be used for testing that approach.
Longer description, referenced elsewhere:
To improve the researcher’s user experience in RCV, additional grouping levels will be added to ArchivesSpace to cut down on the extremely flat lists of A-Z files. While the JPCA finding aids to date have had very specific and predicable hierarchical arrangements, these new groupings will result in varying depths of the EAD hierarchy. Such varying depths of description are not only a very common feature in archival description, but they are also vital to employ and expect because they represent the experience informed by international best practices, which rely on an archivist’s judgement to both create and update archival description over time. In other words, all finding aids should be considered living documents, and their structure should be expected to change, due to researcher feedback and shifts in historical understanding.
Practically speaking, what this upcoming change means is that Osprey’s “get_aspace_refids.py” Python script will need to be updated so that it does not rely on numbered EAD components, which never extend beyond the “c03” depth. Instead, the depth in JPCA finding aids will likely extend to at least until the “c05” depth. And within the Smithsonian EAD corpus, the depth currently extends to the “c11” depth!
To make the Osprey script fully compatible with ArchivesSpace and EAD, just a few small changes should be required:
- When the EAD file is requested from the ArchivesSpace endpoint, the “numbered_cs=true” parameter should be removed altogether or changed to false. This will ensure that the code only needs to reference “c” elements, rather than c01, c02, c03, as it currently does, and later adding c04 and c05, as well as c06, c07, c08, c09, c10, and c11, to support the Smithsonian finding aids, and finally c12, to support the full EAD 2002 standard. See
|
"{}{}/resource_descriptions/{}.xml?include_unpublished=false&include_daos=true&numbered_cs=true&logger.info_pdf=false&ead3=false".format( |
- Rather than parsing for c03 elements explicitly, the code could either process the “c” elements recursively, or (perhaps best of all) it could look for the lowest levels of description in a finding aid that also have a container. E.g., “//c[did/container][not(descendant::c[did/container])]”, as an example XPath statement that would return all “c” elements that would need to be processed by Osprey, regardless of their depth (although this approach would likely require https://pypi.org/project/saxonche/ rather than lxml, which has extremely limited XPath support)
- A few other data points would need to be grabbed differently (e.g., c01/unittitle + c02/unittitle).
-
- Because of this issue, we will try out the approach of modifying the EAD from ASpace to ensure that we can still rely on the c01 and c02 unittitle elements.
- This current test,
|
r = requests.get("{}/repositories/2/resources?page=1".format(settings.aspace_api), headers=Headers) |
|
|
|
list_resources = json.loads(r.text.encode('utf-8'))['results'] |
|
|
|
for resource in list_resources: |
|
repository_id = resource['repository']['ref'] |
|
resource_id = resource['ead_id'] |
|
resource_title = resource['title'] |
|
resource_tree = resource['tree']['ref'] |
|
r = requests.get( |
|
"{}{}/resource_descriptions/{}.xml?include_unpublished=false&include_daos=true&numbered_cs=true&print_pdf=false&ead3=false".format( |
|
settings.aspace_api, repository_id, resource_tree.split('/')[4]), headers=Headers) |
|
|
|
# get root element |
|
tree = ET.fromstring(r.text) |
|
root = ET.ElementTree(tree).getroot() |
|
|
|
ns = "{urn:isbn:1-931666-22-9}" |
|
|
|
# Implement later more elegant |
|
c01_list = root.findall('.//' + ns + 'archdesc/' + ns + 'dsc/' + ns + 'c01') |
|
|
|
i = 0 |
|
|
|
# Run the hierarchy, c01 -> c02 -> c03 |
|
for c01_item in c01_list: |
|
# try: |
|
# iterate child elements of item |
|
refid_1 = c01_item.attrib['id'] |
|
unit_title = c01_item.find('.//' + ns + 'did/' + ns + 'unittitle').text |
|
c02_items = c01_item.findall('.//' + ns + 'c02') |
|
for c02_item in c02_items: |
|
refid_2 = c02_item.attrib['id'] |
|
try: |
|
fol_type = c02_item.find('.//' + ns + 'did/' + ns + 'unittitle').text |
|
except AttributeError: |
|
print(unit_title) |
|
print("109") |
|
exit |
|
try: |
|
c03_items = c02_item.findall('.//' + ns + 'c03') |
|
except AttributeError: |
|
print("129") |
|
print(unit_title) |
|
exit |
|
for c03_item in c03_items: |
|
refid_3 = c03_item.attrib['id'] |
|
print("{}|{}|{}".format(refid_1, refid_2, refid_3)) |
, would also need to be updated.
One possible patch is to augment the EAD exported from ArchivesSpace to ensure that each JPCA finding aid will have the expected c01-c03 structure. This issue will be tagged to a branch that can be used for testing that approach.
Longer description, referenced elsewhere:
To improve the researcher’s user experience in RCV, additional grouping levels will be added to ArchivesSpace to cut down on the extremely flat lists of A-Z files. While the JPCA finding aids to date have had very specific and predicable hierarchical arrangements, these new groupings will result in varying depths of the EAD hierarchy. Such varying depths of description are not only a very common feature in archival description, but they are also vital to employ and expect because they represent the experience informed by international best practices, which rely on an archivist’s judgement to both create and update archival description over time. In other words, all finding aids should be considered living documents, and their structure should be expected to change, due to researcher feedback and shifts in historical understanding.
Practically speaking, what this upcoming change means is that Osprey’s “get_aspace_refids.py” Python script will need to be updated so that it does not rely on numbered EAD components, which never extend beyond the “c03” depth. Instead, the depth in JPCA finding aids will likely extend to at least until the “c05” depth. And within the Smithsonian EAD corpus, the depth currently extends to the “c11” depth!
To make the Osprey script fully compatible with ArchivesSpace and EAD, just a few small changes should be required:
MassDigi-tools/unit_projects/JPC_Archive_Digitization/ASpace_to_Osprey/get_aspace_refids.py
Line 98 in d433e87
MassDigi-tools/unit_projects/JPC_Archive_Digitization/systems_tests/aspace_refid_test.py
Lines 23 to 70 in d433e87