Getting Started with Zebra

From Code4Lib
Revision as of 17:08, 18 June 2008 by Ericleasemorgan (Talk | contribs) (This page outlines how to create a simple index of MARC records using Zebra, and then describes how to access the index.)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

I will try to outline here how to index (and search) MARC records using Zebra, but tweaking the indexing process is a bit trickier than I know how to do.

1. Install yaz, zebra, and all of their friends. I have found that the "standard" make process works pretty well, but allow yaz and zebra to specify where it puts various configuration files. The extra specification is not worth the effort.

2. Save your MARC records someplace on your file system. By "binary" MARC records, I suppose you mean "real" MARC records -- MARC records in communications format -- MARC records as the types of records fed to traditional integrated library systems. This is opposed to some flavor of XML or "tagged format" often used for display.

3. Create a zebra.cfg file, and have it look something like this:

 # global paths
 profilePath: .:./etc:/usr/local/share/idzebra-2.0/tab
 modulePath: /usr/local/lib/idzebra-2.0/modules
 #
 # turn ranking on
 rank: rank-1
 #
 # define a database of marc records called opac
 opac.database: opac
 opac.recordtype: grs.marcxml.marc21
 attset: bib1.att
 attset: explain.att

4. Index your MARC records with the following command. You should see lot's of great stuff sent to STDOUT.

 zebraidx -g opac update <path to MARC records>

You have now created your index. Once you get this far with indexing, you will want to tweak various .abs files (I think) to enhance the indexing process. This particular thing is not my forte. It seems like black magic to most of us. This is not a Zebra-specific problem; this is a problem with Z39.50.

Next, you need to implement the client/server end of things:

5. Start your server. This will be a Z39.50 server -- a "kewl" library-centric protocol that existed before the Internet got hot:

 zebrasrv localhost:9999 &

6. Use yaz-client to search your index:

 $ yaz-client
 Z> open localhost:9999/opac
 Z> find origami
 Z> show 1
 Z> quit

Using the yaz-client almost requires a knowledge of Z39.50. Attached should be a Perl script that allows you to search your server in a bit more user-friendly way. To use it you will need to install a few Perl modules and then edit the constant called DATABASE.

Even though Z39.50 is/was "kewl" it is still pretty icky. SRU is better -- definitely a step in the right direction, and Zebra supports SRU out of the box. [1]

7. Create an an SRU configuration file looking something like this:

 <yazgfs>
   <server>
     <config>zebra.cfg</config>
     <cql2rpn>pqf.properties</cql2rpn>
   </server>
 </yazgfs>

8. Acquire a "better" pqf.properties file. PQF is about querying Z39.50 databases. It is ugly. It was designed in a non-Internet world. Instead of knowing that 1=4 means search the title field, you want to simply search the title. Attached is a "better" pqf.properties file, and it is "better" because it maps things like 1=4 to Dublin Core equivalents. Save it in a directory called etc in the same directory as your zebra.cfg file. (Notice how the zebra.cfg file, above, denotes etc as being in zebra's path.)

9. Kill your presently running Z39.50 server.

10. Start up a SRU server:

 zebrasrv -f sru.cfg localhost:9999 &

11. Use your HTTP client to search the SRU server. Queries will look like this:

 http://localhost:9999/opac?operation=searchRetrieve&version=1.1&query=origami&maximumRecords=5

The result should be a stream of XML ready for XSLT processing.

All of the above is almost exactly what I did to create an index of MARC records harvested from the Library of Congress and the University of Michigan's OAI data repository (MBooks). [2] Take a look at the HTML source. Notice how the client in this regard is only one HTML file containing a form, one CSS file for style, and one XSL file for XML to HTML transformation.

[1] SRU - http://www.loc.gov/standards/sru/

[2] Example SRU interface - http://infomotions.com/ii/