Difference between revisions of "HAMR: Human/Authority Metadata Reconciliation"
From Code4Lib
Joshuago78 (Talk | contribs) (→Sample Output) |
(→Need to do) |
||
| Line 70: | Line 70: | ||
# Create basic code framework | # Create basic code framework | ||
# Implement metadata retrieval from authority | # Implement metadata retrieval from authority | ||
| + | # Design structure of plugins | ||
# Design matching algorithm | # Design matching algorithm | ||
Revision as of 20:18, 7 February 2011
HAMR: Human/Authority Metadata Reconciliation
Sean Chen, Tim Donohue, Joshua Gomez, Ranti Junus, Ryan Scherle
A tool for a curator to determine whether the various fields of a metadata record are correct. Takes a metadata record, locates any identifiers (e.g., DOI, PMID). Retrieves a copy of the metadata record from an authoritative source (e.g., CrossRef, PubMed). Displays a human-readable page that compares fields in the initial record with fields in the authoritative record. Each field is color-coded based on how well it matches, so the curator can quickly identify discrepancies.
Narrowing the focus for today:
- Dublin core (maybe qualified)
- framework that allows multiple authority sources
- NOT focusing on author names (ORCID is already working on this), except the fact that they are strings, and we'll do basic string matching
- 1 to 1 matching. Even if you want to eventually match with multiple authorities, you'd only do one at a time
Possible authority sources:
- PubMed
- Sample pubmed query (in Java): DSpace PubMedPrefillStep.java (From Populate Metadata from PubMed)
- See 'retrievePubmedXML()' in above java code for actual call to PubMed
- Mapping happens here: See pmid-to-dim.xsl for a sample XSLT crosswalk to translate PubMed format to a qualified dublin core (internal DSpace metadata format)
- More examples of querying PubMed: http://www.my-whiteboard.com/how-to-automate-pubmed-search-using-perl-php-or-java/
- Sample pubmed query (in Java): DSpace PubMedPrefillStep.java (From Populate Metadata from PubMed)
- CrossRef
- simply send the DOI to crossref, and get JSON/XML back
- Metadata Search -- send a text query, receive a list of matching records
- OpenURL search
- google scholar - does it have an API?
- mendeley - Mendeley API
- vivo
- bibapp
Thoughts / Questions:
- Is there a way to do most/all of this via Javascript/AJAX/JQuery? Could it be a simple Javascript framework you could "drop" into any metadata editing interface?
Contents
Code
Output Spec
- We will use a simple XML output consisting of paired (and possibly unpaired) values.
- The root element will contain an attribute signifying the source of the authority metadata.
- The <match> element will be used to pair values, with a strength attribute to signify the string distance.
- Within each match element will be exactly 2 metadata elements with attributes signifying the source of each value: either the local input or the remote authority data.
- An <nonmatch> element will be used for unpaired values.
Sample Output
<hamr authority="PubMed">
<match strength="100%">
<creator src="input">Trojan, Tommy</creator>
<creator src="authority">Trojan, Tommy</creator>
</match>
<match strength="90%">
<title src="input">Great American Article</title>
<title src="authority">Great American Article, The</title>
</match>
<nonmatch>
<subject src="input">Medical Stuff</subject>
</nonmatch>
<nonmatch>
<type src="authority">text</type>
</nonmatch>
</hamr>
Need to do
- Create basic code framework
- Implement metadata retrieval from authority
- Design structure of plugins
- Design matching algorithm