Changes

Notes from Open Source Discovery Portal Camp

499 bytes added, 01:13, 26 February 2009
solr marc
== Notes from Open Source Discovery Portal Camp ==
On 6 November 2008, there was a meeting at the Palinet offices in Philadelphia to discuss the future of open source discovery portals. The [[VuFind]] and [[Blacklight]] projects had already started cooperating by sharing indexing code in the form of [[SolrMarcSolrmarc]], and as a community we wanted to explore whether there were other ways we could be cooperating, and what our development priorities should be. We discussed the following topics and some people identified themselves as particularly interested in following up on specific topics and doing further work in a given area. Bess took notes, which are pasted here, but please feel free to expand upon these with your own memories of the conversation.
=== Jangle ===
Andrew started by giving us a brief introduction to [[Jangle]], a standard approach to building a toolset for interacting with the ILS. Kickstarted It was kickstarted by the DLF ILS API set. The idea is to create a standard way of interacting with the ILS. Jangle is the first implementation of this, and is planned as the "reference implementation." It's an open source standard approach, and will give us a lot of flexibility.
Gabe points out that the DLF standard and the Jangle standard aren't the same thing exactly, but people seem to agree it's still a good start at standardization. Andrew asks, how do we contribute the VuFind drivers to Jangle? Is there an [[NCIP ]] driver for Jangle? [http://www.extensiblecatalog.org/ Xtensible Catalog ] is using NCIP, for example. One problem with this, though, is that many vendors don't implement NCIP.
Ross Singer is the main dev developer for Jangle, and he wrote an article about it for the latest code4lib journal. Everyone's homework is to go read that article, available [http://journal.code4lib.org/articles/109 here]. Many institutions are hacking their own ILS, it would be more efficient if we all share this code through something like Jangle, which could then be used by VuFind, Blacklight, [http://code.google.com/p/fac-back-opac/ Helios], or any other project that could talk to Jangle.
Eric Morgan: Jangle is a step in the right direction. DLF came up with a list of API features they want, and then Ross came along and said here's a simple RESTful implementation of a lot of that API, based on [[ATOM ]] publishing protocol. We need a number of agreed upon shapes of URLs that do things like tell me the status of this book, authority information for a person. To what degree do we want to use something like Jangle in vufind? There aren't a lot of choices right now, and this seems like a good project to explore further.
What about [[XC]]? They arenThere't actually s a lot of frustration around this project, because they say they are open source, no one has seen their code. They've approached both VuFind and Blacklight about incorporating the code from those projects, but arenhaven't making actually made any of their own code source available. There's a lot of frustration around this project. How do we get them to participate with the larger community? There's a growing community of developers around these issues, and XC should be involved. Eric says someone should have explicitly invited them.
(interested in further development: Bess, Andrew, Gabe)
=== Non-catalog content / digital repositories ===
- Bob, Dennis, Peter, Naomi Could we adapt [[SolrMARC ]] to also include [[SolrOAI]]? Yes, Bob, Naomi and Andrew all have ideas about how this could work. Sounds like this is the kernel of our kernel. [[Solr ]] already has a lot of functionality to allow for this. Do we want a couple of plugins, one for solr and one for [[OAI]]? Or do we want an app that handles both?
Lots of little data silos aren't going to work, we need everything in a local catalog. But that doesn't mean we should all try to be google. We still need well-defined collection development policies.
What about social data? [[SoPAC ]] is neat, and has an independent layer for saving social data.
We also talked about [[Blacklight ]] and the ways it brings in various data sources and handles behavior for different kinds of objects, e.g., [http://musicbrainz.org/ MusicBrainz] data for music items.
(interested in further development: Bob, Dennis, Peter, Naomi, Bess)
=== solr marc ===
- Bob, Naomi, Chris Q: How well is solr marc handling bad data these days?
How well is solr marc handling bad data these days?  Bob: I've been adding to [[marc4j ]] more permissive reading and error correction. It's also reporting errors as it finds them, to make it easier to find bad records. Request for writing to log files instead of standard out. How to handle records with bad leaders? Naomi has some marc test data. We need more test driven development.
Naomi is offering code for parsing OCLC numbers and LC numbers, she'll be working with Bob next week to get that into solrmarc.
Chris from Villanova is going to do some graphic design work for solr marc. Yay!
=== Authority control === - YZ(Interested in further development: Bob, DanielNaomi, MarkChris, Bess)
=== Authority control ===  Can we get the LC authority control data, index it locally, and take advantage of that in our searching. Actually getting the authority index data is the problem. It's government created monitored data, so why can't we get access to it? We can get snapshots, but there's no method for harvesting it. We need some way to get weekly / monthly updates of authority data. EdSu might have set something up, but it isn't an official service.
Eric says go ahead and implement something, and don't worry about the update method right now. Can we get authority data? Does Open Library have any authority data? Bess will look into this.
"Fred Data" <-- subject authorities
Consensus seems to be that we need a proof of concept first, see how well that scales, and then after that start lobbying LC / OCLC / Palinet / other vendors.
(Interested in further development: Ya'aqov, Daniel, Mark, Bess)
=== Dedupping / FRBR ===
Open Library is also very interested in de-duping research.
=== Serials holdings === - YZ, Mark
Marc format for holdings data (muff head?MFHD)
xISSN service might be helpful for this, too
Bibliographic records for serials should refer to each other.
How to represent this data for users? There's a summary holdings recordfield, a one-line display, and then there's a detailed holding display. There can be multiple screens of lines with this. Summary holdings are pretty easy, detail holdings are hard. Are they necessary?
Maybe we can handle this the way we're doing "composition era" in blacklight? If we know the range, we can assign values for all possible values of this range.
You can get an extract of all serial holdings from your Open URL database(SFX), harvest your journal holdings through that. Texas A&M is doing this w/ Ex LibrisSFX. This seems like an efficient way of getting detailed holdings. Indexing this might be helpful if you don't have marc records for all of your electronic holdings, and it also might help for knowing when you have full text online and when you don't.
(interested in further development: Ya'aqov, Mark) === Federated Search / article content === - no one
Can we partner with LibraryFind? Or should we implement an engine like pazpar2?
IndexData has something called pazpar2, which is a federated search engine.  (Interested in further development: one guy, whose name I didn't catch. Please self identify!)
=== back-end arch / OSS methods ===
=== How do we organize - How do we reach out to libraries and formalize a committment? ===
PalinetJohn, Dennis, Joe, Andrew, Mark, Daniel [[Category: Meeting agendas]]