Difference between revisions of "HAMR: Human/Authority Metadata Reconciliation"

From Code4Lib
Jump to: navigation, search
(Need to do)
(Need to do)
Line 68: Line 68:
== Need to do ==
== Need to do ==
# Create basic code framework
# Implement metadata retrieval from authority ''(done for crossref in ryan's code)''
# Implement metadata retrieval from authority ''(done for crossref in ryan's code)''
# Design structure of plugins
# Design structure of plugins
# Design matching algorithm
# Design matching algorithm

Revision as of 20:23, 7 February 2011

HAMR: Human/Authority Metadata Reconciliation

Sean Chen, Tim Donohue, Joshua Gomez, Ranti Junus, Ryan Scherle

A tool for a curator to determine whether the various fields of a metadata record are correct. Takes a metadata record, locates any identifiers (e.g., DOI, PMID). Retrieves a copy of the metadata record from an authoritative source (e.g., CrossRef, PubMed). Displays a human-readable page that compares fields in the initial record with fields in the authoritative record. Each field is color-coded based on how well it matches, so the curator can quickly identify discrepancies.

Narrowing the focus for today:

  • Dublin core (maybe qualified)
  • framework that allows multiple authority sources
  • NOT focusing on author names (ORCID is already working on this), except the fact that they are strings, and we'll do basic string matching
  • 1 to 1 matching. Even if you want to eventually match with multiple authorities, you'd only do one at a time

Possible authority sources:

Thoughts / Questions:

  • Is there a way to do most/all of this via Javascript/AJAX/JQuery? Could it be a simple Javascript framework you could "drop" into any metadata editing interface?


Output Spec

  • We will use a simple XML output consisting of paired (and possibly unpaired) values.
  • The root element will contain an attribute signifying the source of the authority metadata.
  • The <match> element will be used to pair values, with a strength attribute to signify the string distance.
  • Within each match element will be exactly 2 metadata elements with attributes signifying the source of each value: either the local input or the remote authority data.
  • An <nonmatch> element will be used for unpaired values.

Sample Output

<hamr authority="PubMed">
    <match strength="100%">
        <creator src="input">Trojan, Tommy</creator>
        <creator src="authority">Trojan, Tommy</creator>
    <match strength="90%">
        <title src="input">Great American Article</title>
        <title src="authority">Great American Article, The</title>
        <subject src="input">Medical Stuff</subject>
        <type src="authority">text</type>

Need to do

  1. Implement metadata retrieval from authority (done for crossref in ryan's code)
  2. Design structure of plugins
  3. Design matching algorithm