Registry E-R Diagram

Revision as of 00:50, 4 August 2011 by DataGazetteer (Talk | contribs) (Database Design – Entities and Attributes: Remove characteristics; adding associations)

Revision as of 00:50, 4 August 2011 by DataGazetteer (Talk | contribs) (Database Design – Entities and Attributes: Remove characteristics; adding associations)

Background

As part of the Mellon Foundation grant funding the start-up of LYRASIS Technology Services, LTS is establishing a registry to provide in-depth comparative, evaluative, and version information about open source products. In the market research survey conducted during the LTS exploratory phase, 66% of respondents indicated a need for product comparison consulting, making it the topic of greatest interest across all library segments. Existing registries provide basic information about open source applications and links, but do not provide any means for comparison, evaluation, or readiness assessment. Most are aimed at developers. For example, Open Source Systems for Libraries (http://www.oss4lib.org) provides short descriptions, announcements, and links by category, but doesn’t provide in-depth information on features and support requirements, or a means to compare similar products. There are several registries that are not specific to libraries (Free Software Directory, Sourceforge, Freshmeat) providing basic information on thousands of open source software applications, within which it is often difficult to find those specific to libraries. This registry will provide in-depth information about open source products, including comparison of products by feature, reviews and evaluations, and information about use of the application that can help libraries find, test, and select open source products. The registry will be developed by consultants and shepherded by LTS staff; it will be made freely available to libraries.

Figure 1. Entity-Relationship Diagram for the Open Source Software Registry

Target Questions

The registry is geared towards answering these questions:

  1. What open source options exist to meet a particular need of my library?
  2. What are the strengths and weaknesses of an open source package?
  3. My library has developers with skills in specific technologies. What open source packages mesh well with the skills my library has in-house?
  4. Where can my library go to get training, documentation, hosting, and/or contract software development for a specific open source package?
  5. Are any peers using this open source software?
  6. Where is there more information about this open source software package?

Database Design – Entities and Attributes

This document describes a proposed entity-relationship (E-R) model of the data for the open source registry. The purpose of this model is to ensure that the necessary information is being captured and stored in a way that can be used to answer the questions listed above. An E-R model consists of three building blocks: entities, relationships, and attributes. Entities are the “nouns” in the model – the things that need to be described. Relationships are like “verbs” that are the connection point between entities – one entity “announces” another entity (and vice-versa, one entity is “announced by” another entity). Attributes are like “adjectives” (describing the noun) or “adverbs” (describing the verb) that are the distinguishing characteristics of entities and relationships.

E-R models are sometimes best viewed as pictures; Figure 1 is the E-R diagram for the open source registry. Entities are the bold, centered names at the top of boxes. Attributes of entities are listed left justified in the lower part of boxes. For the sake of clarity, not all attributes are listed in the E-R diagram, but are described in the narrative below. (Enumerations are different. They specify exact values an entity can take. For instance, declaring that a value for a “spice” attribute can only be one of “salt”, “pepper”, or “paprika”.) Relationships are lines that connect entity boxes, and have labels and cardinality. Cardinality can be one of “0..*” (zero or more), “1..*” (one or more), or “1” (exactly one). A relationship can be read as a sentence. For instance, “A Person announces zero or more Releases” and “A Release is announced by exactly one Person.”

Each entity is listed below along with its attributes.

Package Attribute Type Description
Name string Name of the software package
Type PackageType Enumerated list of package types. This list will include these labels, and can be amended over time: content repository, integrated library system, electronic resource management system, website management
Project URL url URL to the main project page
Source code URL url URL to the source code repository


Technology Attribute Type Description
Name string Name of the technology
Type TechType Enumerated list of technology types. This list will include these labels, and can be amended over time: database engine, programming language, operating system


Association Attribute Type Description
Related AssociationType Enumerated list of association types. This list will include these labels, and can be amended over time: “is required to run / execution requires”, “enhances execution / execution enhanced by”


Release Attribute Type Description
Version string Version of the software release, as described by project
Date date Date of the release
URL url URL to the announcement of the release
Description html-body Free-text information about the release. At the discretion of the person making the entry, this field can contain release notes, installation instructions, and/or known issues


Person Attribute Type Description
Name string Individual’s name
Email email addr E-mail address
URL url Individual’s homepage URL


Comment Attribute Type Description
Rating integer Numeric rating of the software release
Comment html-body Free-text commentary


Institution Attribute Type Description
Name string Name of the institution
Identifier uri An identifier representing the institution. (One of the design requirements is the use of AJAX-driven selection list for institution name. If a name matches, the identifier is stored here and the Name attribute is overwritten with the official name from the identifier source. If a name doesn’t match, the user’s entry will be kept in the Name attribute and the Identifier attribute will be empty OCLC’s WorldCat Registry will be the likely source of identifiers.)
URL url Institution’s homepage URL


Provider Attribute Type Description
Name string Name of the provider
Type ProviderType Enumerated list of provider types. This list will include these labels, and can be amended over time: hosting, custom development, consulting, training
URL url Provider’s homepage URL
Description html-body Free-text information about the provider.


Event Attribute Type Description
Name string Name of the event
Type EventType Enumerated list of event types. This list will include these labels, and can be amended over time: meeting, training webinar, developer webinar
URL url Event’s URL
Start date/time timestamp Starting date and time of the event
End date/time timestamp Ending date and time of the event
Description html-body Free-text information about the event

Database Design – Relationships

All relationships in this E-R model are bi-directional, meaning that a relationship specified in one direction (e.g. from Package to Provider) is equally meaningful in the opposite direction (e.g. from Provider to Package). There are relationships between these entities:

Entity Entity Label
Package Technology A Package uses zero-or-more Technologies
A Technology is used_by one or more Packages
Package Release A Package releases one-or-more Releases
A Release comes_from exactly-one Package
Package Institution A Package is used_by zero-or-more Institutions
An Institution uses one-or-more Packages (Note 1)
Package Provider A Package is supported_by zero-or-more Providers
A Provider supports one-or-more Packages (Note 1)
Package Event A Package holds zero-or-more Events
An Event supports one-or-more Packages
Person Institution A Person is employed_by zero-or-more Institutions
An Institution employs one-or-more Persons
Person Provider A Person is employed_by zero-or-more Providers
A Provider employs one-or-more Persons
Person Release A Person announces zero-or-more Releases
A Release is announced_by exactly-one Person
Person Comment A Person writes zero-or-more Comments
A Comment is authored_by exactly-one Person
Comment Release A Comment describes exactly-one Release
A Release is described_by zero-or-more Comments

Note 1: Part of the business logic built into the platform will say that only a Person who has a relationship with an Institution can make a relationship between a Package and that Institution. The Person making the relationship between the Package and the Institution will be recorded.

Implications and Questions

There are some important implications and questions with this data model. These are the ones I’ve identified, and if you see others, please let me know.

  1. The model does not provide for a relationship between a person and a software package. Would such a relationship be useful? E.g., individuals self-identifying as affiliated with an open source software package.
  2. The initial planning process did not account for the inclusion of packages that were not themselves end products. Should code libraries and support programs be included as packages in the registry? The model could conceivably be adjusted in two ways to account for this. The simplest would only require the addition of new PackageType enumerations (e.g. “code library”); this would not allow for searching of packages that use code libraries (e.g., answering the question “What repositories use the djatoka JPEG2000 viewer system?”) Another simple change would be to add “code library” to the TechType enumeration; the code library would not have the benefit of links to other relationships and entities. A more complicated change would do both but there would be no relationship between the code library as a Package and as a Technology. Are there better ways to add code libraries to the model?
  3. Some who have reviewed the concept for the registry suggested other attributes. Should these be added? (And what is missing?)
    • Package – Translations
    • Package – Intended audience (e.g. developers, patrons/desktop, patrons/web, library-staff/desktop, library-staff/web)
    • Version – Code maturity (e.g., alpha, beta, release candidate, formal release)
  4. To answer the question “Are any peers using this open source software?” is it necessary to have an enumeration of library types? Public library, school library, university library, community college library, special library, museum (others?)
  5. Is the location of Institutions and Providers desired? One reason it might be desirable is to do a geography-based search (e.g. training providers within a 60-mile radius).