17
edits
Changes
added Consuming Big Linked Open Data in Practice: Authority Shifts and Identifier Drift
ArchivesSpace is a new web application for managing archival collections. It has a browser-based interface for entering and editing metadata, and can import data serialized as EAD, MARC, and several other formats. But there may be situations where neither of these are quite what you want. For instance, you may have a large folder of images that each need a digital object record; or you may want to export an EAD for every collection in your repository; or calculate the total extent of your collection; or execute a global search and replace; or batch-update barcodes, etc. You could write a plugin using ArchivesSpace’s plugin API, but that requires facility with Ruby as well as access to the environment where the application is running. A more lightweight approach is to access your data through ArchivesSpace’s powerful REST API, and process it using whatever scripting language you prefer. This talk will present some simple “scriptaloging” solutions that a moderately skilled programmer can use to automate data entry or import tasks using an extendable command line tool written in NodeJS (https://www.npmjs.org/package/as-cli) and loosely inspired by Drupal’s drush utilitly.
== Consuming Big Linked Open Data in Practice: Authority Shifts and Identifier Drift ==
* Kathryn Stine, katstine@berkeley.edu, UC Berkeley
* Stephanie Collett, stephanie.collett@ucop.edu, California Digital Library, UC
Increasingly, authoritative datasets of interest to libraries (subjects, names, classifications, etc.) are are available in bulk, exposed as linked open data. Unfettered access can allow libraries to aggregate, connect, and augment data in new ways that will benefit users. This talk will describe our exploratory experience integrating bulk data from the Virtual International Authority File (VIAF) into HathiTrust metadata to improve discovery and collection management.
Authoritative data is not static - datasets change with new contributions and re-clustering, resulting in new identifier relationships. We will describe the challenges this presents with accessing, processing, and syncing our metadata with a massive, complex linked dataset. We will talk about our technical approach to navigating an ecosystem of identifiers and mitigating cached identifier drift between systems as authority data shifts. We aim to spark conversation about data accessibility and the relationships between local, consortial, and authoritative metadata as the library community moves beyond “Hello, world” linked data examples to integrating this data at scale into existing systems.