PBCore RDF Hackathon
Here are our notes from day 1 and day 2: https://docs.google.com/document/d/1n6VXxklbSOeGu-b02YGl6HnjlyObYBYzV_OfhxaFQsk/edit
Github: https://github.com/WGBH/pbucore/
>>> When, Where, What time?
Date: Saturday & Sunday, February 7-8, 2015
Time: ~8:30am-5pm (with option of continued work throughout the conference at the same location)
Location: 4104 Northeast 73rd Avenue, Portland, Oregon, 97218
hashtag: #PBCoreRDF15
Contents
What will be the format of the event?
In advance of the hackathon, participants are asked to fill out this form so that we can get a sense of the experience and skills of those who plan to attend. On the first day of the event, we will begin with welcome and introductions, review the agenda, and then break into groups to work on a variety of tasks. Groups may be identified as those working on intellectual content, intellectual property, technical, etc.
The days themselves will be structured something like this. Coffee/tea will be provided. Lunch is on your own.
Saturday, February 7
8:30am – Welcome, introductions
9am - 9:45am - Discuss and determine the domain and scope of the ontology
9:45am - noon - Review of existing ontologies (DC terms, MODS, EBUCore, BIBFRAME, PREMIS) to determine what can be used for PBCore. Snacks and coffee to be served.
Noon - 1pm – Lunch on your own.
1pm - 2pm - Generate a comprehensive list of terms that are needed in the ontology. Snacks and coffee will be served.
2pm - 4:45pm - Begin developing the class hierarchy and defining properties of concepts. Use existing vocabularies and harness EBUCore data model when appropriate.
4:45pm - 5pm - Review and wrap up.
Sunday, February 8
8:30am - Review progress to date; introductions of new participants
8:45am - noon - Continue working on class hierarchy and properties
noon - 1pm - Lunch on your own
1pm - 3:00pm -- Define the facets of the properties (value type, allowed values, number of values/cardinality, and other features). Review facets of existing ontologies. Do they meet the needs of PBCore users?
3:00pm - 4:30pm -- As a larger group, review progress and suggestions of smaller groups
4:30pm - 5pm -- Return to smaller groups, make suggested edits, finalize documentation
Summary & Background
The PBCore RDF Ontology Hackathon is occurring out of a growing need for PBCore users to express their metadata in RDF. A number of PBCore users contribute to and are part of the Project Hydra community, a collaborative, open source effort to build digital repository software solutions at archives institutions. Hydra is built on a framework that uses Fedora Commons as the repository for storing metadata. Many users are seeking to update their Fedora repositories to the latest version (Fedora 4), which provides a great opportunity to develop an RDF data structure. If PBCore had an RDF ontology, it would be easier for PBCore users to take full advantage of Fedora 4 capabilities in managing data and encourage adoption of Fedora 4.
We envision building upon existing knowledge bases that are already well established. In particular, we hope to harmonize the EBUCore ontology with PBCore and determine what existing terms from the EBUCore vocabulary can be re-used, and what concepts may be unique to PBCore that would deem the need for additional terms.
PBCore is a metadata schema for audiovisual materials. Its original development in 2004 was funded by the Corporation for Public Broadcasting, with a goal of creating a metadata standard for public broadcasters to share information about their video and audio assets within and among public media stations. Since its conception, PBCore has been adopted by a growing number of audiovisual archives and organizations that needed a way to describe their archival audiovisual collections. The schema has been reviewed multiple times and is currently in further development via the American Archive of Public Broadcasting and the Association of Moving Image Archivists (AMIA) PBCore Advisory Subcommittee.
The Schema Team is working on an updated version of PBCore (PBCore 2.1), the changes of which will consist of minor tweaks and bug fixes, and is expected to be released in March 2015. Other Teams on the Subcommittee are working on PBCore outreach, education, documentation, and a new website.
Important Links and Documentation
- Here is the shared Google Drive folder where we will put all documentation created during the hackathon: https://drive.google.com/folderview?id=0B0v2vnLd6vOSeGJjQnFxXzlzOUk&usp=sharing
- PBCore website: http://pbcore.org/
- To download EBUCore documentation: https://tech.ebu.ch/docs/tech/tech3293.pdf
- A handy translator from RDF/XML to turtle: http://rdf-translator.appspot.com/
- Adam's EBUCore in RDF example: https://github.com/awead/pbcore-rdf/blob/master/news_ebucore_rdf.n3
Working Groups
Participants should sign up for a working group. On the days of the event, these sections will be filled with suggestions and links to documentation created by the working groups.
Intellectual Content Working Group
This group will focus on the intellectual content part of the knowledge base. Intellectual content in PBCore XML is currently expressed through elements like pbcoreTitle, pbcoreAssetType, pbcoreAssetDate, pbcoreSubject, pbcoreDescription, pbcoreGenre, pbcoreRelation, pbcoreCoverage, pbcoreAudienceLevel, pbbcoreAudienceRating, pbcoreAnnotation, etc.
Participants
Casey E. Davis, WGBH, @caseyedavis1
Julie Hardesty, Indiana University, @jlhardes
Jack Brighton, University of Illinois, @jackbrighton
Glenn Clatworthy, PBS, @glennclatworthy
Intellectual Property Working Group
This group will focus on the intellectual property part of the knowledge base. Intellectual property in PBCore XML is currently expressed through elements like pbcoreCreator, pbcoreContributor, pbcorePublisher, pbcoreRightsSummary, and roles.
Participants
Rebecca Guenther, LC and NYU/MIAP, @rguenther52, rguenther52@gmail.com
Rebecca Fraimow, NDSR and WGBH, @rhfraim
Instantiation Working Group
This group will focus on the instantiation part of the knowledge base, excluding essence tracks.
Participants
Peggy Griesinger, MoMA/NDSR, @peggygriesinger
Julie Hardesty, Indiana University, @jlhardes
Essence Track Working Group
This group will focus on the essence track part of the knowledge base.
Participants
Name, Institution, Twitter handle/email address
Lauren Sorensen, Library of Congress, @laurensx laurens@nyu.edu (won't have access to work email Sat/Sun)
Documentation Working Group
This group will create, gather and organize documentation produced during the hackathon. One person from each of the other working groups should also work on the documentation working group.
Participants
Casey E. Davis, WGBH, @caseyedavis1
Rebecca Fraimow, NDSR and WGBH, @rhfraim
Suggested Reading & Preparation
- Sign up for a Code4Lib wiki account (if you don't already have an account)
- Everyone should read at least the first chapters of the Allemang book, Semantic Web for the Working Ontologist:
- Everyone should understand the RDF meaning of classes, properties, domain and range before beginning. (cf: http://kcoyle.blogspot.com/2014/11/classes-in-rdf.html)
- Review PBCore Schema: http://pbcore.org/elements/
- Read this awesome Ontology Development 101 publication: http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html
- Read about RDF on the W3C website: http://www.w3.org/RDF/
- Read this article: "Multi-Entity Models of Resource Description in the Semantic Web: A comparison of FRBR, RDA and BIBFRAME." (http://kcoyle.net/LHTv32n4preprint.pdf)
- Review existing ontologies
- EBUCore: http://www.ebu.ch/metadata/ontologies/ebucore/index.html and http://www.ebu.ch/metadata/ontologies/ebucore/ebucore.rdf and https://tech.ebu.ch/docs/tech/tech3293v1_5.pdf
- MODS: http://www.loc.gov/standards/mods/modsrdf/
- BIBFRAME: http://www.loc.gov/bibframe/
- DC Terms: http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=terms#
- FOAF: http://www.foaf-project.org/
- PREMIS: http://id.loc.gov/ontologies/premis.html
- Review common ontology pitfalls and the OOPS! Ontology Pitfall Scanner: http://oeg-lia3.dia.fi.upm.es/oops/catalogue.jsp
Tips and Advice from the Community
from Karen Coyle
- Don't lean too heavily on Protege. Protege is very OWL-oriented and can lead one far astray. It's easy to click on check boxes without knowing what they really mean. Do as much development as you can without using Protege, and do your development in RDFS not OWL. Later you can use Protege to check your work, or to complete the code.
- Develop in ntriples or turtle but NOT rdf/xml. RDF differs from XML in some fundamental ways that are not obvious, and developing in rdf/xml masks these differences and often leads to the development of not very good ontologies.
from Jean-Pierre Evain
- I have personally no issue whatsoever with Protégé or RDF/XML for the type of ontology we seem to be aiming at
- I agree that OWL is probably not required. But this doesn't prevent using Protégé. Of course one needs to know what is specific to OWL.
Need more info?
If you have questions or need more information, feel free to contact Casey Davis at casey_davis [at] wgbh [dot] org.