Difference between revisions of "Working with MARC"

From Code4Lib
Jump to: navigation, search
(MARC Programming Libraries)
(Updated link.)
 
(12 intermediate revisions by 7 users not shown)
Line 9: Line 9:
 
* [http://www.bl.uk/bibliographic/ukmarc.html British Library UKMARC Pages]
 
* [http://www.bl.uk/bibliographic/ukmarc.html British Library UKMARC Pages]
 
* [http://archive.ifla.org/VI/3/p1996-1/sec-uni.htm IFLA UNIMARC Pages]
 
* [http://archive.ifla.org/VI/3/p1996-1/sec-uni.htm IFLA UNIMARC Pages]
 
+
* [http://www.oclc.org/content/bibformats/en.html OCLC MARC Pages]
  
 
== Desktop tools ==
 
== Desktop tools ==
  
* [http://people.oregonstate.edu/~reeset/marcedit/html/index.php MarcEdit]
+
* [http://marcedit.reeset.net/ MarcEdit]
 +
 
 +
* [http://csharpmarc.net/ C# MARC Editor]: is a simple and light weight MARC Editor for Windows
  
 
* [http://www.auto-graphics.com/download/SHOWMARC.EXE Showmarc]: is a DOS program that will show all the MARC fields used and how many times each is used.
 
* [http://www.auto-graphics.com/download/SHOWMARC.EXE Showmarc]: is a DOS program that will show all the MARC fields used and how many times each is used.
Line 26: Line 28:
  
 
* [http://www.bl.uk/bibliographic/usemarcon.html USEMARCON] is a multi-platform rule-based MARC record manipulation program. It is a command-line utility but there is also a GUI for it.
 
* [http://www.bl.uk/bibliographic/usemarcon.html USEMARCON] is a multi-platform rule-based MARC record manipulation program. It is a command-line utility but there is also a GUI for it.
 
  
 
== Getting Marc Indexed for Search Engines ==
 
== Getting Marc Indexed for Search Engines ==
Line 32: Line 33:
 
=== MARC in Solr ===
 
=== MARC in Solr ===
  
* SolrMarc http://code.google.com/p/solrmarc/
+
* SolrMarc https://github.com/solrmarc/solrmarc
  
 
* Solr http://lucene.apache.org/solr
 
* Solr http://lucene.apache.org/solr
 +
 +
* Catmandu http://librecat.org (provides also loading into ElasticSearch, MongoDB and others)
  
 
=== MARC in Zebra ===
 
=== MARC in Zebra ===
Line 77: Line 80:
 
| MARC.NET || C# || http://github.com/willkurt/MARC.NET || basic start, not thoroughly 'real world' tested
 
| MARC.NET || C# || http://github.com/willkurt/MARC.NET || basic start, not thoroughly 'real world' tested
 
|-valign="top"
 
|-valign="top"
| marc_record.js || JavaScript || http://www.pusc.it/bib/mel/marc_record.js || Part of [http://www.pusc.it/bib/mel/ MARC Editor Lite]
+
| marc_record.js || JavaScript || http://www.pusc.it/bib/mel/marc_record.js (dead link) || Part of [http://www.pusc.it/bib/mel/ MARC Editor Lite] (dead link)
 +
|-valign="top"
 +
| marcjs || JavaScript (node) || https://github.com/fredericd/marcjs ||
 
|-valign="top"
 
|-valign="top"
 
| USEMARCON || C++ || http://www.nationallibrary.fi/libraries/format/usemarcon.html || A rule-based MARC record conversion library
 
| USEMARCON || C++ || http://www.nationallibrary.fi/libraries/format/usemarcon.html || A rule-based MARC record conversion library
 
|-valign="top"
 
|-valign="top"
 
| clj-marc || Clojure || http://github.com/phochste/clj-marc || Basic MARC21 and Aleph500 sequential export parser
 
| clj-marc || Clojure || http://github.com/phochste/clj-marc || Basic MARC21 and Aleph500 sequential export parser
 +
|-valign="top"
 +
| MARC4J.Net || C# || https://github.com/mxurshid/MARC4J.Net || https://www.nuget.org/packages/MARC4J.Net
 +
|-valian="top"
 +
| marc4js || JavaScript (Node.js) || https://github.com/jiaola/marc4js || Read/transform/write records with Node stream api. Handles MARC8 and UTF8. 
 
|}
 
|}
  
Line 92: Line 101:
 
|-valign="top"
 
|-valign="top"
 
| MarcXimiL || Python || http://marcximil.sourceforge.net/ || Bibliographic Similarity Analysis Framework  
 
| MarcXimiL || Python || http://marcximil.sourceforge.net/ || Bibliographic Similarity Analysis Framework  
 +
|-valign="top"
 +
| Catmandu || Perl || http://librecat.org || An ETL-framework to extract, transform and load MARC (and other formats) from/to various databases, indexes
 
|}
 
|}
 
  
 
== Getting Sample Data ==
 
== Getting Sample Data ==
Line 104: Line 114:
  
 
MARC records for authority data are more common. The [http://www.getty.edu/research/conducting_research/vocabularies/download.html Getty Vocabularies] makes both the The Art & Architecture Thesaurus (AAT) and The Union List of Artist Names (ULAN) freely available. The [http://www.library.northwestern.edu/public/gsafd/ Guidelines On Subject Access To Individual Works Of Fiction, Drama, Etc.] records are available from Northwestern University. The [http://www.nlm.nih.gov/mesh/filelist.html Medical Subject Headings (MeSH)] are available in many formats, one of them being MARC.
 
MARC records for authority data are more common. The [http://www.getty.edu/research/conducting_research/vocabularies/download.html Getty Vocabularies] makes both the The Art & Architecture Thesaurus (AAT) and The Union List of Artist Names (ULAN) freely available. The [http://www.library.northwestern.edu/public/gsafd/ Guidelines On Subject Access To Individual Works Of Fiction, Drama, Etc.] records are available from Northwestern University. The [http://www.nlm.nih.gov/mesh/filelist.html Medical Subject Headings (MeSH)] are available in many formats, one of them being MARC.
 +
 +
== Reporting on How MARC Has Been Used ==
 +
 +
[http://experimental.worldcat.org/marcusage/ MARC Usage in WorldCat] - A site that reports on how MARC has been used within the 300 million record WorldCat database

Latest revision as of 14:07, 16 December 2016

MARC stands for Machine Readable Cataloging, and many folks in the code4lib community find themselves working with MARC records at some point. This page is meant to be a round-up of the tools for working with MARC. If you want a general introduction to the standard, the Wikipedia article is a good place to start. MARC data is usually expressed either in ISO 2709 ("binary") form or MARCXML form.


Reference information

Desktop tools

  • Showmarc: is a DOS program that will show all the MARC fields used and how many times each is used.
  • MARC Record Translation Program (MARC RTP) is a command line utility that shows fields and subfields used in a collection of MARC records and then converts, and selectively imports, into databases built with general-purpose applications.
  • The FRBR Display Tool takes a file of MARC records and creats XML and HTML files arranged using the Functional Requirements for Bibliographic Records principles.
  • MarcXGen is a Marc URL extractor and HTML generator. Useful for link checking MARC records.
  • MARCMaker and MARCBreaker are DOS programs by the Library of Congress for converting MARC records to a text format and back.
  • USEMARCON is a multi-platform rule-based MARC record manipulation program. It is a command-line utility but there is also a GUI for it.

Getting Marc Indexed for Search Engines

MARC in Solr

MARC in Zebra


MARC Programming Libraries

Project Language Links Notes
MARC4J Java http://marc4j.tigris.org/
javamarc Java http://github.com/billdueber/javamarc Fork of MARC4J
MARC/Perl Perl http://marcpm.sf.net Umbrella project; see also CPAN
pymarc Python http://github.com/edsu/pymarc/
File_MARC PHP http://pear.php.net/package/File_MARC/ PEAR package; sanctioned fork of PHP-MARC
PHP-MARC PHP http://www.emilda.org/index.php?q=php-marc Abandoned(?); served as basis for File_MARC
ruby-marc Ruby http://rubyforge.org/projects/marc/
http://wiki.code4lib.org/index.php/Ruby-marc
enhanced-marc Ruby http://github.com/rsinger/enhanced-marc Convenience methods for ruby-marc
marc21 Scheme http://code.google.com/p/marc21
marcerl Erlang svn://pubserv.oclc.org/marcerl Very alpha code
Scala-MARC Scala http://github.com/achelous/Scala-MARC
MARC Library (SobekCM) C# http://sourceforge.net/projects/marclibrary/ Implemented in .NET 4.0 with LINQ and streams with Z39.50 support
CSharp MARC C# http://csharpmarc.net Based upon File_MARC Pear packaged for PHP, but restyled for use in .NET
MARC.NET C# http://github.com/willkurt/MARC.NET basic start, not thoroughly 'real world' tested
marc_record.js JavaScript http://www.pusc.it/bib/mel/marc_record.js (dead link) Part of MARC Editor Lite (dead link)
marcjs JavaScript (node) https://github.com/fredericd/marcjs
USEMARCON C++ http://www.nationallibrary.fi/libraries/format/usemarcon.html A rule-based MARC record conversion library
clj-marc Clojure http://github.com/phochste/clj-marc Basic MARC21 and Aleph500 sequential export parser
MARC4J.Net C# https://github.com/mxurshid/MARC4J.Net https://www.nuget.org/packages/MARC4J.Net
marc4js JavaScript (Node.js) https://github.com/jiaola/marc4js Read/transform/write records with Node stream api. Handles MARC8 and UTF8.

A feed of commit messages and release announcements from many of the projects listed above can be found at http://pipes.yahoo.com/gmcharlt/marctoolchanges.

Utilities and Frameworks

Project Language Links Notes
MarcXimiL Python http://marcximil.sourceforge.net/ Bibliographic Similarity Analysis Framework
Catmandu Perl http://librecat.org An ETL-framework to extract, transform and load MARC (and other formats) from/to various databases, indexes

Getting Sample Data

One common question is where to get sample MARC records for testing or playing around with. If you work at a library, chances are good that you can get some records out of your ILS (go ask your systems librarian if you don't know how to do this yourself). If you don't work in a library, you can get MARC bibliographic records from the Internet Archive.

You can also get MARCXML data for titles in HathiTrust through OAI-PMH.

There is a nascent movement within the code4lib community to establish a test set of problematic MARC records, especially records that are representative of the kinds of weirdness that is encountered in real libraries. It is hoped that this could eventually become a test corpus against which to run various MARC processing implementations. For more information, watch Simon Spero's excellent talk from Code4LibCon 2010.

MARC records for authority data are more common. The Getty Vocabularies makes both the The Art & Architecture Thesaurus (AAT) and The Union List of Artist Names (ULAN) freely available. The Guidelines On Subject Access To Individual Works Of Fiction, Drama, Etc. records are available from Northwestern University. The Medical Subject Headings (MeSH) are available in many formats, one of them being MARC.

Reporting on How MARC Has Been Used

MARC Usage in WorldCat - A site that reports on how MARC has been used within the 300 million record WorldCat database