Changes

2011talks Submissions

1,661 bytes added, 21:51, 12 November 2010

no edit summary

One technique, Latent Semantic Analysis (LSA), has been used by the project to create a set of tools to discover the semantic structure and organization of the corpus of text, and has discovered shared passages, phrases, and technical vocabulary across the corpus. We thought many projects with tei data might want to do LSA, but may not know how. We’ll discuss creating tools for LSA to analyze tei encoded text using xsl, perl, php, a mathematical/statistical software package (e.g. Matlab), and having a supercomputer handy is helpful but not required! We'll walk through our method for chunking text, building a term document matrix, executing singular value decomposition and outputting that data as correlated document pairs and in Graphml format so it can be analyzed in a network analysis and vizualization tool (e.g. Network Workbench)[http://nwb.slis.indiana.edu].

== Adventures In Implementing an Extended FRBR Model ==

* PaulBen McElwain, Digital Library Program, Indiana University (pbmcelwa at indiana dot edu)

The Variations/FRBR Project (http://vfrbr.info) has developed an implementation of an extended FRBR/FRAD conceptual model.

The model encompasses the entities defined in FRBR along with some further entities from FRAD, the attributes defined for those entities, and the relationships between the entities. One extension to the FRBR model is through the addition of some entity attributes needed for MARC attributes important to collections of recorded music. But the most interesting, and challenging, extension (from a data model perspective) is the addition of a structured set of properties for the attributes of the entities, and properties for the entity relationships. The place of publication/distribution, for instance, can include properties for type, jurisdiction, normalized value, and source vocabulary, all in addition to the string value of the place.

The model was defined in XML Schema and then implemented in a Java class structure with a relational database for persistence. The implemented data service currently supports a user search application, data exports in multiple structured formats (FRBR XML, RDF/XML), and is also designed to support an interactive cataloging interface.

The presentation discusses the model designs developed, the technologies considered, and the implementations produced. This presentation should be of interest to other projects considering complex models of shared hierarchies implemented across XML Schema, Java, and relational data stores (via JPA).

Pbmcelwa

1

edit

Changes

2011talks Submissions

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools