Changes

HAMR: Human/Authority Metadata Reconciliation

774 bytes added, 20:29, 9 March 2012
no edit summary
[[HAMR: Human/Authority Metadata Reconciliation]]
Initial design/prototype by: Sean Chen, Tim Donohue, Joshua Gomez, Ranti Junus, Ryan Scherle
A tool for a curator to determine whether the various fields of a metadata record are correct. Takes a metadata record, locates any identifiers (e.g., DOI, PMID). Retrieves a copy of the metadata record from an authoritative source (e.g., CrossRef, PubMed). Displays a human-readable page that compares fields in the initial record with fields in the authoritative record. Each field is color-coded based on how well it matches, so the curator can quickly identify discrepancies.
== UI Prototype (uses static data) ==
http://dl.dropbox.com/u/9074989/code4lib/unverified.html
 
== Basic design ==
Narrowing the focus for todayan initial usable version:
* Dublin core (maybe qualified)
* framework that allows multiple authority sources
*** Mapping happens here: See [https://wiki.duraspace.org/display/DSPACE/PubMedPrefill-pmid+dim.xsl pmid-to-dim.xsl] for a sample XSLT crosswalk to translate PubMed format to a qualified dublin core (internal DSpace metadata format)
** More examples of querying PubMed: http://www.my-whiteboard.com/how-to-automate-pubmed-search-using-perl-php-or-java/
** Useful tool for finding PubMed IDs: http://www.ncbi.nlm.nih.gov/entrez/getids.cgi
* CrossRef
** simply send the DOI to crossref, and get JSON/XML back
<pre>
function compareRecords(localDubCore, authDubCore)
recordMatches = []
for each element-type:
loc = array of local values
// 0 value="Benson, Arnold", match="", strength=""
// 1 value="Terrence, D.", match="a2[3]", strength="100%"
elementMatches = compareElements(loc, auth) recordMatches.add(elementMatches)
if strength > loc element's current strength value
overwrite loc element's strength and match values
//this second loop routine pulls set of non-nested loops pull out the strongest matches
for each item in auth
//x = some arbitrary barrier for a decent enough match
if element strength > x AND if matching element is still in the a1 loc list
pop each element and add their values to output array
for each item in loc if element strength > x AND if matching element is still in the auth list pop each element and add their values to output array //now do cleanup and look for values that have no decent matches for each element in loc pop element and add to output array without match //x loc="Heyward", auth="", strength="" for each element in auth pop element and add to output array without match //x loc="", auth="Perry", strength="" return output
</pre>
</hamr>
</pre>
 
== Static UI Example ==
http://dl.dropbox.com/u/9074989/code4lib/unverified.html
== Need to do ==
21
edits