Difference between revisions of "Parsing Library Data"

From Code4Lib
Jump to: navigation, search
(Titles)
m (Library of Congress Call Number)
 
(2 intermediate revisions by 2 users not shown)
Line 32: Line 32:
 
MARC 21 Field(s): [http://www.loc.gov/marc/bibliographic/bd050.html 050]
 
MARC 21 Field(s): [http://www.loc.gov/marc/bibliographic/bd050.html 050]
  
 +
* Learn more
 +
** [http://www.loc.gov/catworkshop/courses/fundamentalslcc/ Fundamentals of Library of Congress Classification] workshop materials
 
* Normalization
 
* Normalization
 
** [http://code.google.com/p/library-callnumber-lc/wiki/Home library-callnumber-lc Project Wiki Page]
 
** [http://code.google.com/p/library-callnumber-lc/wiki/Home library-callnumber-lc Project Wiki Page]
** [http://rocky.uta.edu/doran/sortlc/ sortLC, code for sorting LCCs]
+
** [http://rocky.uta.edu/doran/sortlc/ sortLC, code for normalizing and sorting LCCs]
 
+
** [http://homepages.wmich.edu/~zimmer/other_index.html cnparse.lib, normalizes for sorting]
  
 
==Personal Names==
 
==Personal Names==
Line 64: Line 66:
 
** [http://journal.code4lib.org/articles/1758 Deciphering Journal Abbreviations with JAbbr]
 
** [http://journal.code4lib.org/articles/1758 Deciphering Journal Abbreviations with JAbbr]
 
* 245 Indicator 2
 
* 245 Indicator 2
**[http://wiki.code4lib.org/index.php/245_indicator_2 Parsing titles to determine number of nonfiling characters]
+
**[[245_indicator_2|Parsing titles to determine number of nonfiling characters]]
  
 
==Subject Headings==
 
==Subject Headings==

Latest revision as of 15:43, 6 April 2011

The legacy data that libraries must deal with is often challenging to parse algorithmically. MARC is just the first layer--once you peel that back, you find that you have an elaborate mish-mash of elements, each of which with its own idiosyncrasies. This page is meant to serve as a place for the Code4lib community to track and share information, problems, methodologies, code, pseudo-code, etc. about nuts-and-bolts parsing of legacy library data.

Identifiers

Library of Congress Control Number

MARC 21 Field(s): 001 003 010


OCLC Control Number

Marc 21 Field(s): 001 003 035


ISBN

MARC 21 Field(s): 020


ISSN

MARC 21 Field(s): 022


Dewey Decimal Call Number

MARC 21 Field(s): 082


Library of Congress Call Number

MARC 21 Field(s): 050

Personal Names

MARC 21 Field(s): (Name Headings) 100 600 700 800; Also X00 - Personal Names-General Information (Uncontrolled Names) 245$c 505$r 511 720


Corporate Names

MARC 21 Field(s): 110 610 710 810; Also X10 - Corporate Names-General Information


Titles

MARC 21 Field(s): (Transcribed Titles) 245 505$t (Alternate Titles) 210 222 242 246 247 (Uniform Titles) 130 240 243 630 730 830; Also X30 - Uniform Titles-General Information

Subject Headings

MARC 21 Field(s): 600 610 611 630 648 650 651 653 654 655 656 657 658 662 690-699

URLs

MARC 21 Field(s): 856