Difference between revisions of "MARC Problems"
(New page: = Things Difficult or Impossible to Do With Our Data = Whenever anyone says "I don't see why there's a problem parsing our AACR2/MARC data, all the data is there", I want a list of things...)
Revision as of 22:19, 23 September 2009
Things Difficult or Impossible to Do With Our Data
Whenever anyone says "I don't see why there's a problem parsing our AACR2/MARC data, all the data is there", I want a list of things I've tried to do/get with our data and not been able to.
So now I'm going to make a list as I go. Some of these can be blamed on MARC, some on AACR2, some on ISBD, some on ILS software used for maintaining MARC, some on cataloger tradition, or cataloger mistake.But they're all things that just about anyone working with a large quantity of real world MARC data are going to have trouble doing.
Figure out what an 856 is
Is it a link to full text? A link to table of contents? Something else? I want my software to know, so my software can easily tell the user that full text is available and give them the link. You can sort of kind of estimate it. http://roytennant.com/proto/856/analysis.html
Format field 505 contents
505 notes are really hard to read all mashed together. I'd like to list them one entry per line. But it's very difficult to tell where one entry begins and another ends. Sure, you can split on "--" (and you need to split on '--' even in so-called 'formatted' contents notes), but that's not foolproof. Sometimes a '.' or a ';' split an entry -- but sometimes they don't, they are internal to an entry. No good algorithm.
Figure out if my library holds a particular volume and issue of a serial
This is a clear user need, that I'd like to be able to tell them, when I know they are interested in a particular volume/issue. I guess MFHD is _theoretically_ capable of expressing this. But hardly anyone's ILS is actually going to produce anything that can be machine-interpreted. And I'm not even sure MFHD can express it -- if you think that ISBD-like standard for using punctuation and such to express 'runs' counts, forget about it, that doesn't really result in unambiguous machine-parseable statements even when users don't make mistakes entering it, which they do.