Contributed by: Kelly Spring, Access Archivist, East Carolina University
Wouldn’t it be nice to have turn by turn directions to find those vexing errors that you inherently know are in your legacy data? Sometimes charting exactly how to identify messy data can feel like working with an astrolabe to triangulate a position. But, pirate jokes aside, it is possible to define a method to move forward.
At ECU, our initial approach is to test migrating everything we have in Archivists Toolkit to ArchivesSpace. Obvious, right? However, since our container lists and authorities live in databases outside of AT, we’ll also run additional tests: One series of tests to push authorities into AT and migrate to AS, and another round of tests to add container lists to AT and migrate to AS.
During test migrations, the programmer will keep a captain’s log of anything that fails, which will provide a list of data that could seriously capsize our ship. The rest of the crew will divide into three subgroups, one for each of our repositories, to pinpoint further errors. Mapping discrepancies will be identified by employing the ol’ view-in-source vs. view-in-target method. Style and content errors will also be recorded by the subgroups, but only after referencing our archival description guidelines and explicitly defining what to look for. Using a handy template provided by the Orbis Cascade Alliance the subgroups will note elements including the problem, priority, extent, and clean-up strategy.
Sounds like smooth sailing! But, what about shifts in the wind or turbulent seas? In other words, how are we going to catch the data that falls through the cracks? Let’s say our sea-monster of a container list database simply won’t go into AT. In that case, our migration team would test importing EADs extracted from our .NET Web system into AS to find errors and/or would run the Harvard EAD checker and Yale Schematron over our files. What about an authority entanglement? For that we would evaluate by node export from AT and, if necessary, use a tool like OpenRefine to reconcile against LCNAF and LCSH.
Our team has a few more resources to define before we heave down. Soon, though, we’ll lift our eye patches and begin looking for those pesky barnacles that need cleaning.
*Special thanks to Orbis Cascade, Harvard, Yale, and OpenRefine for making your resources freely available ?
Kelly Spring is the Project Manager of the ArchivesSpace Migration at East Carolina University.
About Arrr-chivesSpace Migration: The East Carolina University Pirates are engaged in a large-scale migration project to evaluate, prep, and load data from several dispersed databases into ArchivesSpace. Over the next two years, ECU will share the journey from careening data to weighing anchor to sailing into production. By regularly posting progress ECU aims to empower you, the ArchivesSpace community, to know that you can do it, too!
User Insights is a blog series that highlights diverse perspectives and experiences of ArchivesSpace users to enrich our entire community through shared stories, strategies, and lessons learned. This series aims to provide insight to the archivists, librarians, information technologists, developers, and so many other contributors using ArchivesSpace to preserve permanently valuable records and provide access to our researchers.