Difference between revisions of "HAMR: Human/Authority Metadata Reconciliation"

From Code4Lib
Jump to: navigation, search
(Sample Output)
(Sample Output)
Line 47: Line 47:
  
 
=== Sample Output ===
 
=== Sample Output ===
<source lang="XML">
+
<pre>
<hamr authority="PubMed">
+
<hamr authority="PubMed">
    <match strength="100%">
+
    <match strength="100%">
        <creator src="input">Trojan, Tommy</creator>
+
        <creator src="input">Trojan, Tommy</creator>
        <creator src="authority">Trojan, Tommy</creator>
+
        <creator src="authority">Trojan, Tommy</creator>
    </match>
+
    </match>
    <match strength="90%">
+
    <match strength="90%">
        <title src="input">Great American Article</title>
+
        <title src="input">Great American Article</title>
        <title src="authority">Great American Article, The</title>
+
        <title src="authority">Great American Article, The</title>
    </match>
+
    </match>
    <nonmatch>
+
    <nonmatch>
        <subject src="input">Medical Stuff</subject>
+
        <subject src="input">Medical Stuff</subject>
    </nonmatch>
+
    </nonmatch>
    <nonmatch>
+
    <nonmatch>
        <type src="authority">text</type>
+
        <type src="authority">text</type>
    </nonmatch>
+
    </nonmatch>
</hamr>
+
</hamr>
</source>
+
</pre>
  
 
== Need to do ==
 
== Need to do ==

Revision as of 19:47, 7 February 2011

HAMR: Human/Authority Metadata Reconciliation

Sean Chen, Tim Donohue, Joshua Gomez, Ranti Junus, Ryan Scherle

A tool for a curator to determine whether the various fields of a metadata record are correct. Takes a metadata record, locates any identifiers (e.g., DOI, PMID). Retrieves a copy of the metadata record from an authoritative source (e.g., CrossRef, PubMed). Displays a human-readable page that compares fields in the initial record with fields in the authoritative record. Each field is color-coded based on how well it matches, so the curator can quickly identify discrepancies.


Narrowing the focus for today:

  • Dublin core (maybe qualified)
  • framework that allows multiple authority sources
  • NOT focusing on author names (ORCID is already working on this), except the fact that they are strings, and we'll do basic string matching
  • 1 to 1 matching. Even if you want to eventually match with multiple authorities, you'd only do one at a time

Possible authority sources:

Thoughts / Questions:

  • Is there a way to do most/all of this via Javascript/AJAX/JQuery? Could it be a simple Javascript framework you could "drop" into any metadata editing interface?

Code

Output Spec

  • We will use a simple XML output consisting of paired (and possibly unpaired) values.
  • The root element will contain an attribute signifying the source of the authority metadata.
  • The <match> element will be used to pair values, with a strength attribute to signify the string distance.
  • Within each match element will be exactly 2 metadata elements with attributes signifying the source of each value: either the local input or the remote authority data.
  • An <nonmatch> element will be used for unpaired values.

Sample Output

<hamr authority="PubMed">
    <match strength="100%">
        <creator src="input">Trojan, Tommy</creator>
        <creator src="authority">Trojan, Tommy</creator>
    </match>
    <match strength="90%">
        <title src="input">Great American Article</title>
        <title src="authority">Great American Article, The</title>
    </match>
    <nonmatch>
        <subject src="input">Medical Stuff</subject>
    </nonmatch>
    <nonmatch>
        <type src="authority">text</type>
    </nonmatch>
</hamr>

Need to do

  1. Create basic code framework
  2. Implement metadata retrieval from authority
  3. Design matching algorithm