Parsing Library Data

From Code4Lib

Revision as of 00:12, 1 April 2011 by Doran (Talk | contribs) (→‎Titles)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to: navigation, search

The legacy data that libraries must deal with is often challenging to parse algorithmically. MARC is just the first layer--once you peel that back, you find that you have an elaborate mish-mash of elements, each of which with its own idiosyncrasies. This page is meant to serve as a place for the Code4lib community to track and share information, problems, methodologies, code, pseudo-code, etc. about nuts-and-bolts parsing of legacy library data.

Contents

1 Identifiers
2 Personal Names
3 Corporate Names
4 Titles
5 Subject Headings
6 URLs

Identifiers

Library of Congress Control Number

MARC 21 Field(s): 001 003 010

OCLC Control Number

Marc 21 Field(s): 001 003 035

ISBN

MARC 21 Field(s): 020

Problems with parsing in MARC
- Why MARC makes computers cry: Exhibit #273: ISSN/ISBN $z

ISSN

MARC 21 Field(s): 022

Problems with parsing in MARC
- Why MARC makes computers cry: Exhibit #273: ISSN/ISBN $z

Dewey Decimal Call Number

MARC 21 Field(s): 082

Library of Congress Call Number

MARC 21 Field(s): 050

Normalization
- library-callnumber-lc Project Wiki Page
- sortLC, code for sorting LCCs

Personal Names

MARC 21 Field(s): (Name Headings) 100 600 700 800; Also X00 - Personal Names-General Information (Uncontrolled Names) 245$c 505$r 511 720

Parsing name parts
- Automated Metadata Formatting for Cornell’s Print-on-Demand Books
Identifying the name of a person that played a particular role
- Identifying FRBR Work-Level Data in MARC Bibliographic Records for Manifestations of Moving Images

Corporate Names

MARC 21 Field(s): 110 610 710 810; Also X10 - Corporate Names-General Information

Distinguishing corporate names from personal names
- Automated Metadata Formatting for Cornell’s Print-on-Demand Books

Titles

MARC 21 Field(s): (Transcribed Titles) 245 505$t (Alternate Titles) 210 222 242 246 247 (Uniform Titles) 130 240 243 630 730 830; Also X30 - Uniform Titles-General Information

Normalization
- Deciphering Journal Abbreviations with JAbbr
- Interpreting MARC: Where's the Bibliographic Data?
Matching techniques
- Deciphering Journal Abbreviations with JAbbr
245 Indicator 2
- Parsing titles to determine number of nonfiling characters

Subject Headings

MARC 21 Field(s): 600 610 611 630 648 650 651 653 654 655 656 657 658 662 690-699

URLs

MARC 21 Field(s): 856

Usage of the 856 MARC Field: A Sample

Retrieved from "https://wiki.code4lib.org/index.php?title=Parsing_Library_Data&oldid=7741"