Changes

Parsing Library Data

5,448 bytes added, 20:50, 31 March 2011
New page: The legacy data that libraries must deal with is often challenging to parse algorithmically. MARC is just the first layer--once you peel that back, you find that you have an elaborate mish...
The legacy data that libraries must deal with is often challenging to parse algorithmically. MARC is just the first layer--once you peel that back, you find that you have an elaborate mish-mash of elements, each of which with its own idiosyncrasies. This page is meant to serve as a place for the Code4lib community to track and share information, problems, methodologies, code, pseudo-code, etc. about nuts-and-bolts parsing of legacy library data.

==Identifiers==

===Library of Congress Control Number===
MARC 21 Field(s): [http://www.loc.gov/marc/bibliographic/bd001.html 001] [http://www.loc.gov/marc/bibliographic/bd003.html 003] [http://www.loc.gov/marc/bibliographic/bd010.html 010]


===OCLC Control Number===
Marc 21 Field(s): [http://www.loc.gov/marc/bibliographic/bd001.html 001] [http://www.loc.gov/marc/bibliographic/bd003.html 003] [http://www.loc.gov/marc/bibliographic/bd035.html 035]


===ISBN===
MARC 21 Field(s): [http://www.loc.gov/marc/bibliographic/bd020.html 020]

* Problems with parsing in MARC
** [http://bibwild.wordpress.com/2011/03/31/why-marc-makes-computers-cry-exhibit-273-issnisbn-z/ Why MARC makes computers cry: Exhibit #273: ISSN/ISBN $z]


===ISSN===
MARC 21 Field(s): [http://www.loc.gov/marc/bibliographic/bd022.html 022]

* Problems with parsing in MARC
** [http://bibwild.wordpress.com/2011/03/31/why-marc-makes-computers-cry-exhibit-273-issnisbn-z/ Why MARC makes computers cry: Exhibit #273: ISSN/ISBN $z]


===Dewey Decimal Call Number===
MARC 21 Field(s): [http://www.loc.gov/marc/bibliographic/bd082.html 082]


===Library of Congress Call Number===
MARC 21 Field(s): [http://www.loc.gov/marc/bibliographic/bd050.html 050]

* Normalization
** [http://code.google.com/p/library-callnumber-lc/wiki/Home library-callnumber-lc Project Wiki Page]
** [http://rocky.uta.edu/doran/sortlc/ sortLC, code for sorting LCCs]


==Personal Names==

MARC 21 Field(s): (Name Headings) [http://www.loc.gov/marc/bibliographic/bd100.html 100] [http://www.loc.gov/marc/bibliographic/bd600.html 600] [http://www.loc.gov/marc/bibliographic/bd700.html 700] [http://www.loc.gov/marc/bibliographic/bd800.html 800]; Also [http://www.loc.gov/marc/bibliographic/bdx00.html X00 - Personal Names-General Information] (Uncontrolled Names) [http://www.loc.gov/marc/bibliographic/bd245.html 245$c] [http://www.loc.gov/marc/bibliographic/bd511.html 505$r] [http://www.loc.gov/marc/bibliographic/bd511.html 511] [http://www.loc.gov/marc/bibliographic/bd720.html 720]

* Parsing name parts
** [http://journal.code4lib.org/articles/2138 Automated Metadata Formatting for Cornell’s Print-on-Demand Books]
* Identifying the name of a person that played a particular role
** [http://journal.code4lib.org/articles/775 Identifying FRBR Work-Level Data in MARC Bibliographic Records for Manifestations of Moving Images]


==Corporate Names==
MARC 21 Field(s): [http://www.loc.gov/marc/bibliographic/bd110.html 110] [http://www.loc.gov/marc/bibliographic/bd610.html 610] [http://www.loc.gov/marc/bibliographic/bd710.html 710] [http://www.loc.gov/marc/bibliographic/bd810.html 810]; Also [http://www.loc.gov/marc/bibliographic/bdx10.html X10 - Corporate Names-General Information]

* Distinguishing corporate names from personal names
** [http://journal.code4lib.org/articles/2138 Automated Metadata Formatting for Cornell’s Print-on-Demand Books]


==Titles==

MARC 21 Field(s): (Transcribed Titles) [http://www.loc.gov/marc/bibliographic/bd245.html 245] [http://www.loc.gov/marc/bibliographic/bd505.html 505$t] (Alternate Titles) [http://www.loc.gov/marc/bibliographic/bd210.html 210] [http://www.loc.gov/marc/bibliographic/bd222.html 222] [http://www.loc.gov/marc/bibliographic/bd242.html 242] [http://www.loc.gov/marc/bibliographic/bd246.html 246] [http://www.loc.gov/marc/bibliographic/bd247.html 247] (Uniform Titles) [http://www.loc.gov/marc/bibliographic/bd130.html 130] [http://www.loc.gov/marc/bibliographic/bd240.html 240] [http://www.loc.gov/marc/bibliographic/bd243.html 243] [http://www.loc.gov/marc/bibliographic/bd630.html 630] [http://www.loc.gov/marc/bibliographic/bd730.html 730] [http://www.loc.gov/marc/bibliographic/bd830.html 830]; Also [http://www.loc.gov/marc/bibliographic/bdx30.html X30 - Uniform Titles-General Information]

* Normalization
** [http://journal.code4lib.org/articles/1758 Deciphering Journal Abbreviations with JAbbr]
** [http://journal.code4lib.org/articles/3832 Interpreting MARC: Where's the Bibliographic Data?]
* Matching techniques
** [http://journal.code4lib.org/articles/1758 Deciphering Journal Abbreviations with JAbbr]


==Subject Headings==
MARC 21 Field(s): [http://www.loc.gov/marc/bibliographic/bd600.html 600] [http://www.loc.gov/marc/bibliographic/bd610.html 610] [http://www.loc.gov/marc/bibliographic/bd611.html 611] [http://www.loc.gov/marc/bibliographic/bd630.html 630] [http://www.loc.gov/marc/bibliographic/bd648.html 648] [http://www.loc.gov/marc/bibliographic/bd650.html 650] [http://www.loc.gov/marc/bibliographic/bd651.html 651] [http://www.loc.gov/marc/bibliographic/bd653.html 653] [http://www.loc.gov/marc/bibliographic/bd654.html 654] [http://www.loc.gov/marc/bibliographic/bd655.html 655] [http://www.loc.gov/marc/bibliographic/bd656.html 656] [http://www.loc.gov/marc/bibliographic/bd657.html 657] [http://www.loc.gov/marc/bibliographic/bd658.html 658] [http://www.loc.gov/marc/bibliographic/bd662.html 662] [http://www.loc.gov/marc/bibliographic/bd69x.html 690-699]
5
edits