Difference between revisions of "Parsing Library Data"
(New page: The legacy data that libraries must deal with is often challenging to parse algorithmically. MARC is just the first layer--once you peel that back, you find that you have an elaborate mish...) |
m (→Library of Congress Call Number) |
||
(4 intermediate revisions by 3 users not shown) | |||
Line 32: | Line 32: | ||
MARC 21 Field(s): [http://www.loc.gov/marc/bibliographic/bd050.html 050] | MARC 21 Field(s): [http://www.loc.gov/marc/bibliographic/bd050.html 050] | ||
+ | * Learn more | ||
+ | ** [http://www.loc.gov/catworkshop/courses/fundamentalslcc/ Fundamentals of Library of Congress Classification] workshop materials | ||
* Normalization | * Normalization | ||
** [http://code.google.com/p/library-callnumber-lc/wiki/Home library-callnumber-lc Project Wiki Page] | ** [http://code.google.com/p/library-callnumber-lc/wiki/Home library-callnumber-lc Project Wiki Page] | ||
− | ** [http://rocky.uta.edu/doran/sortlc/ sortLC, code for sorting LCCs] | + | ** [http://rocky.uta.edu/doran/sortlc/ sortLC, code for normalizing and sorting LCCs] |
− | + | ** [http://homepages.wmich.edu/~zimmer/other_index.html cnparse.lib, normalizes for sorting] | |
==Personal Names== | ==Personal Names== | ||
Line 63: | Line 65: | ||
* Matching techniques | * Matching techniques | ||
** [http://journal.code4lib.org/articles/1758 Deciphering Journal Abbreviations with JAbbr] | ** [http://journal.code4lib.org/articles/1758 Deciphering Journal Abbreviations with JAbbr] | ||
− | + | * 245 Indicator 2 | |
+ | **[[245_indicator_2|Parsing titles to determine number of nonfiling characters]] | ||
==Subject Headings== | ==Subject Headings== | ||
MARC 21 Field(s): [http://www.loc.gov/marc/bibliographic/bd600.html 600] [http://www.loc.gov/marc/bibliographic/bd610.html 610] [http://www.loc.gov/marc/bibliographic/bd611.html 611] [http://www.loc.gov/marc/bibliographic/bd630.html 630] [http://www.loc.gov/marc/bibliographic/bd648.html 648] [http://www.loc.gov/marc/bibliographic/bd650.html 650] [http://www.loc.gov/marc/bibliographic/bd651.html 651] [http://www.loc.gov/marc/bibliographic/bd653.html 653] [http://www.loc.gov/marc/bibliographic/bd654.html 654] [http://www.loc.gov/marc/bibliographic/bd655.html 655] [http://www.loc.gov/marc/bibliographic/bd656.html 656] [http://www.loc.gov/marc/bibliographic/bd657.html 657] [http://www.loc.gov/marc/bibliographic/bd658.html 658] [http://www.loc.gov/marc/bibliographic/bd662.html 662] [http://www.loc.gov/marc/bibliographic/bd69x.html 690-699] | MARC 21 Field(s): [http://www.loc.gov/marc/bibliographic/bd600.html 600] [http://www.loc.gov/marc/bibliographic/bd610.html 610] [http://www.loc.gov/marc/bibliographic/bd611.html 611] [http://www.loc.gov/marc/bibliographic/bd630.html 630] [http://www.loc.gov/marc/bibliographic/bd648.html 648] [http://www.loc.gov/marc/bibliographic/bd650.html 650] [http://www.loc.gov/marc/bibliographic/bd651.html 651] [http://www.loc.gov/marc/bibliographic/bd653.html 653] [http://www.loc.gov/marc/bibliographic/bd654.html 654] [http://www.loc.gov/marc/bibliographic/bd655.html 655] [http://www.loc.gov/marc/bibliographic/bd656.html 656] [http://www.loc.gov/marc/bibliographic/bd657.html 657] [http://www.loc.gov/marc/bibliographic/bd658.html 658] [http://www.loc.gov/marc/bibliographic/bd662.html 662] [http://www.loc.gov/marc/bibliographic/bd69x.html 690-699] | ||
+ | |||
+ | ==URLs== | ||
+ | MARC 21 Field(s): [http://www.loc.gov/marc/bibliographic/bd856.html 856] | ||
+ | |||
+ | * [http://roytennant.com/proto/856/ Usage of the 856 MARC Field: A Sample] |
Latest revision as of 15:43, 6 April 2011
The legacy data that libraries must deal with is often challenging to parse algorithmically. MARC is just the first layer--once you peel that back, you find that you have an elaborate mish-mash of elements, each of which with its own idiosyncrasies. This page is meant to serve as a place for the Code4lib community to track and share information, problems, methodologies, code, pseudo-code, etc. about nuts-and-bolts parsing of legacy library data.
Contents
Identifiers
Library of Congress Control Number
OCLC Control Number
ISBN
MARC 21 Field(s): 020
- Problems with parsing in MARC
ISSN
MARC 21 Field(s): 022
- Problems with parsing in MARC
Dewey Decimal Call Number
MARC 21 Field(s): 082
Library of Congress Call Number
MARC 21 Field(s): 050
- Learn more
- Fundamentals of Library of Congress Classification workshop materials
- Normalization
Personal Names
MARC 21 Field(s): (Name Headings) 100 600 700 800; Also X00 - Personal Names-General Information (Uncontrolled Names) 245$c 505$r 511 720
- Parsing name parts
- Identifying the name of a person that played a particular role
Corporate Names
MARC 21 Field(s): 110 610 710 810; Also X10 - Corporate Names-General Information
- Distinguishing corporate names from personal names
Titles
MARC 21 Field(s): (Transcribed Titles) 245 505$t (Alternate Titles) 210 222 242 246 247 (Uniform Titles) 130 240 243 630 730 830; Also X30 - Uniform Titles-General Information
- Normalization
- Matching techniques
- 245 Indicator 2
Subject Headings
MARC 21 Field(s): 600 610 611 630 648 650 651 653 654 655 656 657 658 662 690-699
URLs
MARC 21 Field(s): 856