Working with MARC

Revision as of 21:44, 7 April 2010 by 128.36.155.91 (Talk) (MARC Programming Libraries)

Revision as of 21:44, 7 April 2010 by 128.36.155.91 (Talk) (MARC Programming Libraries)

Working with MARC

MARC stands for Machine Readable Cataloging, and many folks in the code4lib community find themselves working with MARC records at some point. This page is meant to be a round-up of the tools for working with MARC. If you want a general introduction to the standard, the wikipedia article is a good place to start.

Desktop tools

MarcEdit http://people.oregonstate.edu/~reeset/marcedit/html/index.php

Getting Marc Indexed for Search Engines

MARC in Solr

SolrMarc http://code.google.com/p/solrmarc/

Solr http://lucene.apache.org/solr

MARC in Zebra

Getting Started with Zebra http://wiki.code4lib.org/index.php/Getting_Started_with_Zebra

Zebra http://www.indexdata.com/zebra

MARC Programming Libraries

Project Language Links Notes
MARC4J Java http://marc4j.tigris.org/
javamarc Java http://github.com/billdueber/javamarc Fork of MARC4J
MARC/pm Perl http://marcpm.sf.net Umbrella project; see also CPAN
pymarc Python http://github.com/edsu/pymarc/
File_MARC PHP http://pear.php.net/package/File_MARC/ PEAR package; fork of PHP-MARC
PHP-MARC PHP http://www.emilda.org/index.php?q=php-marc Abandoned(?)
ruby-marc Ruby http://rubyforge.org/projects/marc/
http://wiki.code4lib.org/index.php/Ruby-marc
enhanced-marc Ruby http://github.com/rsinger/enhanced-marc Convenience methods for ruby-marc
marc21 Scheme http://code.google.com/p/marc21
marcerl Erlang svn://pubserv.oclc.org/marcerl Very alpha code
Scala-MARC Scala http://github.com/achelous/Scala-MARC

Getting Sample Data

One common question is where to get sample MARC records for testing or playing around with. If you work at a library, chances are good that you can get some records out of your ILS (go ask your systems librarian if you don't know how to do this yourself). If you don't work in a library, you can get MARC bibliographic records from the Internet Archive at http://www.archive.org/details/marcrecords.

There is a nascent movement within the code4lib community to establish a test set of problematic MARC records, especially records that are representative of the kinds of weirdness that is encountered in real libraries. It is hoped that this could eventually become a test corpus against which to run various MARC processing implementations. For more information, watch Simon Spero's excellent talk from Code4LibCon 2010.

MARC records for authority data are more common. The Getty Vocabularies makes both the The Art & Architecture Thesaurus (AAT) and The Union List of Artist Names (ULAN) freely available. The Guidelines On Subject Access To Individual Works Of Fiction, Drama, Etc. records are available from Northwestern University. The Medical Subject Headings (MeSH) are available in many formats, one of them being MARC.