6
edits
Changes
→2014 Prepared Talk Proposals
'''Talk Proposals'''
==Creating a new Greek-Dutch dictionary==
* Caspar Treijtel, University of Amsterdam, c.treijtel@uva.nl
At present, no complete dictionary of (ancient) Greek-Dutch is available online. A new dictionary is currently under construction at Leiden University, with software being developed at the University of Amsterdam. The team in Leiden has already begun preparation of the data, with at this moment about 6,000 approved lemmas. The ultimate goal is to produce both a print version and online open access version from the same source documents. The software needed for this has been made in a project that was funded by CLARIN-NL.
====Migrator====
For the production of lemmas we have implemented an advanced workflow. The (generally non-technical) users create lemmas using MS Word, which is both familiar and easy to use. We have developed a custom software module that carefully migrates the Word documents into deeply structured XML by analyzing the structure and semantics of the lemmas, and falling back on heuristics in ambiguous cases. While having initially envisioned the oXygen XML Author component as the main tool for creating new lemmas, we obtained excellent results with the migrator module, and decided therefore to continue using MS Word as the primary composition tool. The main advantage of this is that the editors are much more familiar with Word than with any other WYSIWYG editor. Lemmas that have been migrated to XML are stored in an XML database and can be further edited using oXygen XML Author.
====Lemmatizer====
Greek morphology is complicated. In order to use a dictionary effectively, a rather high level of initial language competence is necessary for the user to be able to relate the word form s/he finds in a text to the correct basic lemma form, where the definition of the word can be found. Using a Greek morphological database we have been able to facilitate the search for lemmas. A ‘lemmatizer’ module gives the possible parsings of the word forms and the lemmas they can be derived from. This enables the user to type in the word as found in the text and be redirected to the correct lemma.
====Visualization====
For the online dictionary we have implemented a visualization module that allows the user to view multiple lemmas at once. The implementation of this module has been done using the Javascript framework MooTools. The result is a viewer that performs really well and is run by maintainable Javascript code.
The online viewer is still being worked on, have a look at http://www.woordenboekgrieks.nl/ for the beta version. A newer test version with additional features can be found here: http://angel.ic.uva.nl:8600/.
====Credits====
* construction of the dictionary: Prof. Ineke Sluiter, Classics department of Leiden University; Prof. Albert Rijksbaron, University of Amsterdam
* publisher of the dictionary: Amsterdam University Press
* design/typesetting dictionary: TaT Zetwerk (http://www.tatzetwerk.nl/)
* software development: Digital Production Center, University Library, University of Amsterdam
* project funding: CLARIN-NL (http://www.clarin.nl/)
* morphological database for use by the lemmatizer: courtesy of Prof. Helma Dik, University of Chicago (based on data of the Perseus Project)
----