Changes

2014 Prepared Talk Proposals

2,161 bytes added, 16:58, 8 November 2013
no edit summary
* gathering community feedback
* creating a product rather than a bag of parts
 
== How to check your data to provide a great data product? Data quality as a key product feature at Europeana ==
 
*[mailto:Peter.Kiraly@kb.nl Péter Király, portal backend developer, Europeana]
*No previous C4L presentations
 
Europeana.eu - Europe's digital library, archive and museum - aggregates more than 30 million metadata records from more than 2200 institutions. The records come from libraries, archives, museums and every other kind of cultural institution, from very different systems and metadata schemas, and are typically transformed several times until they are ingested into the Europeana data repository. Europeana builds a consolidated database from these records, creating reliable and consistent services for end-users (a search portal, search widget, mobile apps, thematic sites etc.) and an API, which supports our strategic goeal of data for reuse in education, creative industries, and the cultural sector. A reliable "data product" is thus at the core of our own software products, as well as those of our API partners.
 
Much effort is needed to smooth out local differences in the metadata curation practice of our data providers. We need a solid framework to measure the consistency of our data and provide feedback to decision-makers inside and outside the organisation. We can also use this metrics framework to ask content providers to improve their own metadata. Of course, a data-quality-driven approach requires that we also improve the data transformation steps of the Europeana ingestion process itself. Data quality issues heavily define what new features we are able to create in our user interfaces and API, and might actually affect the design and implementation of our underlying data structure, the Europeana Data Model.
 
In the presentation I briefly describe the Europeana metadata ingestion process, show the data quality metrics, the measuring techniques (using the Europeana API, Solr and MongoDB queries), some typical problems (both trivial and difficult ones), and finally the feedback mechanism we propose to deploy.
 
Keywords: Europeana, data quality, EDM, API, Apache Solr, MongoDB, #opendata, #openglam
[[:Category:Code4Lib2014]]
6
edits