About the series:
The East Carolina University Pirates are engaged in a large-scale migration project to evaluate, prep, and load data from several dispersed databases into ArchivesSpace. Over the next two years, ECU will share the journey from careening data to weighing anchor to sailing into production. By regularly posting progress ECU aims to empower you, the ArchivesSpace community, to know that you can do it, too!
Authors: Kelly Spring and Michael Reece
Pirates aren’t normally known for their housekeeping skills. Cataloging pirates, however, are an entirely different story. In their quarters, name and subject authorities are kept as neat and tidy as possible. At ECU, authorities were added to finding aids by Special Collections Cataloging via a database external to Archivist’s Toolkit. Given our move to ArchivesSpace for archival metadata with a simultaneous transition to Fedora for digital objects, the migration team needed to make some decisions about future workflow, guidelines, and storage of agents and subjects.
Initially, the team wasn’t sure if authorities would be duplicated across Fedora and ArchivesSpace, and, if so, which would be the database of record. But we’re not in the practice of raising a red flag on our own crew. Instead, we decided to create one central repository for authorities that will service both the digital objects in Fedora and the archival collections in ArchivesSpace. This repository is being developed in Fedora and will use linked data to connect with information on the Web and increase visibility of our resources by making them readable and accessible by machines. In other words, it’ll be as if we’ve gone from dancing a jig on our own vessel to participating in a pirate flash mob!
To prepare us for our dancing debut, our Lead Programmer began by pulling subjects from the local database and organizing them in a spreadsheet by type and number of substrings. He then downloaded the Library of Congress Subject Heading (LCSH) datasetand created a trimmed down version to make loading and searching easier. Much like a game of cribbage, he created a console application to pull the local subjects from the spreadsheet and search for a match in the trimmed down authorities file. Unlike cribbage, if a match was found, the associated LCSH URI was added to the spreadsheet. If no match was found andthere were multiple substrings, the last one was removed, and the new value was searched. Play proceeded through a succession of hands until all parts of the subject were searched, and, in the end, any unmatched subjects were considered locally sourced.
The finished subject mapping went to Special Collections Cataloging for review. A similar set of steps was completed for name authorities. Although the review of subject and name mapping is still underway, when completed the subjects will be added to Fedora resulting in a newly minted, locally authoritative URL for each subject.
Curious how the data is structured? For all you linked data pirates out there, the relevant predicates in the triple are
- rdfs:labelused for the full authoritative string value of the subject
- owl:sameAsused to hold an LCSH URI, if it exists in the spreadsheet
- rdf:listused to hold an ordered list of hashed URIs that each reference rdfs:label/owl:sameAspairs that hold the substring values and any corresponding LCSH URIs.
In landlubber speak, an rdfs:labelmight be “Pirates.” An owl:sameAslink would point to the authority in the Library of Congress database, in this case,http://id.loc.gov/authorities/subjects/sh85102432. And the rdf:listwould hold a list of references to subdivisions of “Pirates” such as “Pirates–Spanish colonies” or “Pirates–Social life and customs.”
Blisterin’ barnacles, with this new structure to our data how will the cataloging team create, edit, and assign authorities? The Lead Programmer is building a web application on top of Fedora that will manage our authorities and integrate with ArchivesSpace. The application will communicate with Fedora to facilitate the creation, editing, searching and export of authority data and will communicate with ArchivesSpace to facilitate the creation of links between authorities and resources. The authoritative values for names and subjects will flow from Fedora to ArchivesSpace, but not vice versa. This gives catalogers the ability to maintain a shipshape authoritative list of names/subjects while continuing to allow archivists the ability to maintain contact information and other data not normally found in an authority.