Changes

Jump to: navigation, search

Getting Started with Zebra

10,298 bytes added, 00:54, 12 February 2010
m
no edit summary
I will try to outline here how to index (and search) [[MARC ]] records using [[Zebra]], but tweaking the indexing process is a bit trickier than I know how to do.
1. Install [[yaz]], zebra, and all of their friends. I have found that the "standard" make process works pretty well, but allow yaz and zebra to specify where it puts various configuration files. The extra specification is not worth the effort.
2. Save your MARC records someplace on your file system. By "binary" MARC records, I suppose you mean "real" MARC records -- MARC records in communications format -- MARC records as the types of records fed to traditional integrated library systems. This is opposed to some flavor of XML or "tagged format" often used for display.
profilePath: .:./etc:/usr/local/share/idzebra-2.0/tab
modulePath: /usr/local/lib/idzebra-2.0/modules
#
# turn ranking on
rank: rank-1
#
# define a database of marc records called opac
opac.database: opac
Next, you need to implement the client/server end of things:
5. Start your server. This will be a [[Z39.50 ]] server -- a "kewl" library-centric protocol that existed before the Internet got hot:
zebrasrv localhost:9999 &
Using the yaz-client almost requires a knowledge of Z39.50. Attached should be a Perl script that allows you to search your server in a bit more user-friendly way. To use it you will need to install a few Perl modules and then edit the constant called DATABASE.
Even though [[Z39.50 ]] is/was "kewl" it is still pretty icky. SRU is better -- definitely a step in the right direction, and Zebra supports SRU out of the box. [1]
7. Create an an [[SRU ]] configuration file looking something like this:
<yazgfs>
</yazgfs>
8. Acquire a "better" pqf.properties file. [[PQF ]] is about querying Z39.50 databases. It is ugly. It was designed in a non-Internet world. Instead of knowing that 1=4 means search the title field, you want to simply search the title. Attached is a "better" pqf.properties file, and it is "better" because it maps things like 1=4 to Dublin Core equivalents. Save it in a directory called etc in the same directory as your zebra.cfg file. (Notice how the zebra.cfg file, above, denotes etc as being in zebra's path.)
9. Kill your presently running Z39.50 server.
[2] Example SRU interface - http://infomotions.com/ii/
 
Appendix A: opac.pl
 
#!/usr/bin/perl
# opac.pl - a simple z39.50 client
# Eric Lease Morgan <emorgan@nd.edu>
# 2007-06-05 - based on previous work with ZOOM Perl
# require
use MARC::Record;
use strict;
use ZOOM;
# define
use constant DATABASE => 'wilson.infomotions.com:9999/ii'; # test server
# get the query
my $query = shift;
# sanity check
if ( ! $query ) {
print "Usage: $0 query\n";
exit;
}
# create an connection and search
my $connection = new ZOOM::Connection( DATABASE, 0, count => 1, preferredRecordSyntax => "usmarc" );
my $results = $connection->search_pqf( qq[$query] );
# loop through the first 50 hits results
my $index = 0;
for my $i ( 0 .. 49 ) {
# get the record
my $record = $results->record( $i )->raw;
my $marc = MARC::Record->new_from_usmarc( $record );
# extract some data
my $author = $marc->author;
my $title = $marc->title_proper;
my $date = $marc->publication_date;
# display
print " author: $author\n";
print " title: $title\n";
print " date: $date\n";
print "\n";
}
 
Appendix B: pqf.properties
 
# $Id: pqf.properties,v 1.13 2006/09/20 10:12:29 mike Exp $
#
# Propeties file to drive org.z3950.zing.cql.CQLNode's toPQF()
# back-end and the YAZ CQL-to-PQF converter. This specifies the
# interpretation of various CQL indexes, relations, etc. in terms
# of Type-1 query attributes.
#
# This configuration file generates queries using BIB-1 attributes.
# See http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html
# for the Maintenance Agency's work-in-progress mapping of Dublin Core
# indexes to Attribute Architecture (util, XD and BIB-2)
# attributes.
# Identifiers for prefixes used in this file. (index.*)
set.cql = info:srw/cql-context-set/1/cql-v1.1
set.rec = info:srw/cql-context-set/2/rec-1.1
set.dc = info:srw/cql-context-set/1/dc-v1.1
set.bath = http://zing.z3950.org/cql/bath/2.0/
# The default set when an index doesn't specify one: Dublin Core
set = info:srw/cql-context-set/1/dc-v1.1
# The default index when none is specified by the query
index.cql.serverChoice = 1=any 2=102
index.cql.allRecords = 1=_ALLRECORDS 2=103
index.rec.id = 1=12
index.dc.title = 1=title 2=102
index.dc.subject = 1=subject 2=102
index.dc.creator = 1=1003 2=102
index.dc.author = 1=author 2=102
index.dc.editor = 1=1020
index.dc.publisher = 1=publisher
index.dc.description = 1=62
index.dc.date = 1=30
index.dc.resourceType = 1=1031
index.dc.format = 1=1034
index.dc.resourceIdentifier = 1=key
index.dc.source = 1=1019
index.dc.language = 1=54
index.dc.relation = 1=?
index.dc.coverage = 1=?
index.dc.rights = 1=?
# Relation attributes are selected according to the CQL relation by
# looking up the "relation.<relation>" property:
#
relation.< = 2=1
relation.le = 2=2
relation.eq = 2=3
relation.exact = 2=3
relation.ge = 2=4
relation.> = 2=5
relation.<> = 2=6
# These two are what Zebra uses -- may not work on other servers
relation.all = 4=6
relation.any = 4=105
# BIB-1 doesn't have a server choice relation, so we just make the
# choice here, and use equality (which is clearly correct).
relation.scr = 2=3
# Relation modifiers.
relationModifier.relevant = 2=102
relationModifier.fuzzy = 5=103
relationModifier.stem = 2=101
relationModifier.phonetic = 2=100
# Non-standard extensions to provoke Zebra's inline sorting
relationModifier.sort = 7=1
relationModifier.sort-desc = 7=2
relationModifier.numeric = 4=109
# Position attributes may be specified for anchored terms (those
# beginning with "^", which is stripped) and unanchored (those not
# beginning with "^"). This may change when we get a BIB-1 truncation
# attribute that says "do what CQL does".
position.first = 3=1 6=1
position.any = 3=3 6=1
position.last = 3=4 6=1
position.firstAndLast = 3=3 6=3
# Structure attributes may be specified for individual relations; a
# default structure attribute my be specified by the pseudo-relation
# "*", to be used whenever a relation not listed here occurs.
#
structure.exact = 4=108
structure.all = 4=2
structure.any = 4=2
structure.* = 4=1
# Truncation attributes used to implement CQL wildcard patterns. The
# simpler forms, left, right- and both-truncation will be used for the
# simplest patterns, so that we produce PQF queries that conform more
# closely to the Bath Profile. However, when a more complex pattern
# such as "foo*bar" is used, we fall back on Z39.58-style masking.
truncation.right = 5=1
truncation.left = 5=2
truncation.both = 5=3
truncation.none = 5=100
truncation.regexp = 5=102
truncation.z3958 = 5=104
# Finally, any additional attributes that should always be included
# with each term can be specified in the "always" property.
always = 6=1
# Bath Profile support, added Thu Dec 18 13:06:20 GMT 2003
# See the Bath Profile for SRW at
# http://zing.z3950.org/cql/bath.html
# including the Bath Context Set defined within that document.
#
# In this file, we only map index-names to BIB-1 use attributes, doing
# so in accordance with the specifications of the Z39.50 Bath Profile,
# and leaving the relations, wildcards, etc. to fend for themselves.
index.bath.keyTitle = 1=33
index.bath.possessingInstitution = 1=1044
index.bath.name = 1=1002
index.bath.personalName = 1=1
index.bath.corporateName = 1=2
index.bath.conferenceName = 1=3
index.bath.uniformTitle = 1=6
index.bath.isbn = 1=7
index.bath.issn = 1=8
index.bath.geographicName = 1=58
index.bath.notes = 1=63
index.bath.topicalSubject = 1=1079
index.bath.genreForm = 1=1075
## From: marc <marc@indexdata.dk>
## Date: December 20, 2006 9:55:24 AM EST
## To: Zebra Information Server <zebralist@lists.indexdata.dk>
## Subject: Re: [Zebralist] pqf.properties
## Reply-To: Zebra Information Server <zebralist@lists.indexdata.dk>
##
## Eric Lease Morgan wrote:
## > On Dec 19, 2006, at 4:45 PM, marc wrote:
## >>> How do I edit pqf.properties so I can get zebra to search my
## >>> indexes via SRU?
## >>> I suppose I get this because etc/pqf.properties does not know
## >>> about my field names.
## >>
## >> Right.
## >>
## >> The CQL-to-PQF conversion configuration has always been a bit of a
## >> hassle, and I'd really like this to improve.
## >>
## >> The problem is, of course, that one needs to type the same index
## >> names over-and-over again in different parts of the zebra configs.
## > Maybe I could work the other way around.
## > For example, how might I re-write my alvis indexing XSLT file so
## > they conform to the pqf.properties file that comes with the Zebra
## > distribution? Specifically, how might I change the value of the
## > name attribute below so I could use CQL and search by title:
## > <xsl:template match="rdf:RDF/rdf:Description/dc:Title">
## > <z:index name="title" type="w"><xsl:value-of select="." /></
## > z:index>
## > </xsl:template>
##
## The crucial part is that the CQLtoPQF config file needs to hit an
## existing index.
##
## so the lines and
##
## index.dc.title = 1=4
## <z:index name="title" type="w"><xsl:value-of select="." /></z:index>
##
## need to match
##
## If string indexes are used, easiest is to correct the standard config
## file to
##
## index.dc.title = 1=title
##
## and so forth. The numeric value '4' refers to the specific bib-1
## numeric attribute set, and is more confusion than help, so I suggest
## you stick to your own defined string index names.
##
## In addition, you have to take into account if the indexes are of type
## 'p', 'w', '0' or otherwise specified.
##
## so for example:
## <z:index name="thisandthat" type="p">...</z:index>
##
## would be
## index.dc.something = 1=thisandthat 6=3
##
## and
## <z:index name="thisandthat" type="0">...</z:index>
## would be
## index.dc.something = 1=thisandthat 4=3
##
## The reason why this is so complex is that people often want only to
## provide a subset of functionality to CQL queries, and therefore those
## are independent config files.
##
## A better view of the way PQF queries are mapped to zebra indexes is
## here:
##
## http://www.indexdata.com/zebra/doc/querymodel-zebra.tkl#querymodel-
## pqf-apt-mapping
##
## section:
## Mapping of PQF APT structure and completeness to register type
##
## and you need to run a PQF query to test that at least this part
## works, before attempting a CQL-to-PQF query conversion.
##
##
## So, the way to build a working config is:
##
## 1) define your indexing rues in the indexation stylesheet
## i.e define and index with
##
## <z:index name="thisandthat" type="0">...</z:index>
##
##
## 2) test that you worked out the correct PQF queries to acces them
## using the above mentioned documentation section.
##
## i.e test
## Z> querytype prefix
## Z> scan attr 1=thisandthat @attr 4=3 aterm
##
## 3) showel that query into the right hand side of the index
## definitions of the CQL to PQF converter
##
## write
## index.dc.something = 1=thisandthat 4=3
##
##
## 4) test that the CQL is performed correctly
##
## test (yes, yaz-client can send CQL queries)
## Z> querytype cql
## Z> scan dc.title=aterm
##
## --
## Marc Cromme
## M.Sc and Ph.D in Mathematical Modelling and Computation
## Senior Developer, Project Manager
 
----
--[[User:Ericleasemorgan|Eric Lease Morgan]] 10:24, 18 June 2008 (PDT)
 
[[Category: Zebra]]
2
edits

Navigation menu