Registry E-R Diagram
As part of the Mellon Foundation grant funding the start-up of LYRASIS Technology Services, LTS is establishing a registry to provide in-depth comparative, evaluative, and version information about open source products. In the market research survey conducted during the LTS exploratory phase, 66% of respondents indicated a need for product comparison consulting, making it the topic of greatest interest across all library segments. Existing registries provide basic information about open source applications and links, but do not provide any means for comparison, evaluation, or readiness assessment. Most are aimed at developers. For example, Open Source Systems for Libraries (http://www.oss4lib.org) provides short descriptions, announcements, and links by category, but doesn’t provide in-depth information on features and support requirements, or a means to compare similar products. There are several registries that are not specific to libraries (Free Software Directory, Sourceforge, Freshmeat) providing basic information on thousands of open source software applications, within which it is often difficult to find those specific to libraries. This registry will provide in-depth information about open source products, including comparison of products by feature, reviews and evaluations, and information about use of the application that can help libraries find, test, and select open source products. The registry will be developed by consultants and shepherded by LTS staff; it will be made freely available to libraries.
The registry is geared towards answering these questions:
- What open source options exist to meet a particular need of my library?
- What are the strengths and weaknesses of an open source package?
- My library has developers with skills in specific technologies. What open source packages mesh well with the skills my library has in-house?
- Where can my library go to get training, documentation, hosting, and/or contract software development for a specific open source package?
- Are any peers using this open source software?
- Where is there more information about this open source software package?
Database Design – Entities and Attributes
This document describes a proposed entity-relationship (E-R) model of the data for the open source registry. The purpose of this model is to ensure that the necessary information is being captured and stored in a way that can be used to answer the questions listed above. An E-R model consists of three building blocks: entities, relationships, and attributes. Entities are the “nouns” in the model – the things that need to be described. Relationships are like “verbs” that are the connection point between entities – one entity “announces” another entity (and vice-versa, one entity is “announced by” another entity). Attributes are like “adjectives” (describing the noun) or “adverbs” (describing the verb) that are the distinguishing characteristics of entities and relationships.
E-R models are sometimes best viewed as pictures; Figure 1 is the E-R diagram for the open source registry. Entities are the bold, centered names at the top of boxes. Attributes of entities are listed left justified in the lower part of boxes. For the sake of clarity, not all attributes are listed in the E-R diagram, but are described in the narrative below. (Enumerations are different. They specify exact values an entity can take. For instance, declaring that a value for a “spice” attribute can only be one of “salt”, “pepper”, or “paprika”.) Relationships are lines that connect entity boxes, and have labels and cardinality. Cardinality can be one of “0..*” (zero or more), “1..*” (one or more), or “1” (exactly one). A relationship can be read as a sentence. For instance, “A Person announces zero or more Releases” and “A Release is announced by exactly one Person.”
Figure 1. Entity-Relationship Diagram for the Open Source Software Registry
Each entity is listed below along with its attributes.
|Name||string||Name of the software package|
|Type||PackageType||Enumerated list of package types. This list will include these labels, and can be amended over time: content repository, integrated library system, electronic resource management system, website management|
|Project URL||url||URL to the main project page|
|Source code URL||url||URL to the source code repository|
|Name||string||Name of the technology|
|Type||TechType||Enumerated list of technology types. This list will include these labels, and can be amended over time: database engine, programming language, operating system|
|Version||string||Version of the software release, as described by project|
|Date||date||Date of the release|
|URL||url||URL to the announcement of the release|
|Description||html-body||Free-text information about the release. At the discretion of the person making the entry, this field can contain release notes, installation instructions, and/or known issues|
|email addr||E-mail address|
|URL||url||Individual’s homepage URL|
|Rating||integer||Numeric rating of the software release|
|Name||string||Name of the institution|
|Identifier||uri||An identifier representing the institution. (One of the design requirements is the use of AJAX-driven selection list for institution name. If a name matches, the identifier is stored here and the Name attribute is overwritten with the official name from the identifier source. If a name doesn’t match, the user’s entry will be kept in the Name attribute and the Identifier attribute will be empty OCLC’s WorldCat Registry will be the likely source of identifiers.)|
|URL||url||Institution’s homepage URL|
|Name||string||Name of the provider|
|Type||ProviderType||Enumerated list of provider types. This list will include these labels, and can be amended over time: hosting, custom development, consulting, training|
|URL||url||Provider’s homepage URL|
|Description||html-body||Free-text information about the provider.|
|Name||string||Name of the event|
|Type||EventType||Enumerated list of event types. This list will include these labels, and can be amended over time: meeting, training webinar, developer webinar|
|Start date/time||timestamp||Starting date and time of the event|
|End date/time||timestamp||Ending date and time of the event|
|Description||html-body||Free-text information about the event|
Database Design – Relationships
All relationships in this E-R model are bi-directional, meaning that a relationship specified in one direction (e.g. from Package to Provider) is equally meaningful in the opposite direction (e.g. from Provider to Package). There are relationships between these entities:
|Package||Technology||A Package uses zero-or-more Technologies|
A Technology is used_by one or more Packages
|Package||Release||A Package releases one-or-more Releases|
A Release comes_from exactly-one Package
|Package||Institution||A Package is used_by zero-or-more Institutions|
An Institution uses one-or-more Packages (Note 1)
|Package||Provider||A Package is supported_by zero-or-more Providers|
A Provider supports one-or-more Packages (Note 1)
|Package||Event||A Package holds zero-or-more Events|
An Event supports one-or-more Packages
|Person||Institution||A Person is employed_by zero-or-more Institutions|
An Institution employs one-or-more Persons
|Person||Provider||A Person is employed_by zero-or-more Providers|
A Provider employs one-or-more Persons
|Person||Release||A Person announces zero-or-more Releases|
A Release is announced_by exactly-one Person
|Person||Comment||A Person writes zero-or-more Comments|
A Comment is authored_by exactly-one Person
|Comment||Release||A Comment describes exactly-one Release|
A Release is described_by zero-or-more Comments
Note 1: Part of the business logic built into the platform will say that only a Person who has a relationship with an Institution can make a relationship between a Package and that Institution. The Person making the relationship between the Package and the Institution will be recorded.
Implications and Questions
There are some important implications and questions with this data model. These are the ones I’ve identified, and if you see others, please let me know.
- The model does not provide for a relationship between a person and a software package. Would such a relationship be useful? E.g., individuals self-identifying as affiliated with an open source software package.
- The initial planning process did not account for the inclusion of packages that were not themselves end products. Should code libraries and support programs be included as packages in the registry? The model could conceivably be adjusted in two ways to account for this. The simplest would only require the addition of new PackageType enumerations (e.g. “code library”); this would not allow for searching of packages that use code libraries (e.g., answering the question “What repositories use the djatoka JPEG2000 viewer system?”) Another simple change would be to add “code library” to the TechType enumeration; the code library would not have the benefit of links to other relationships and entities. A more complicated change would do both but there would be no relationship between the code library as a Package and as a Technology. Are there better ways to add code libraries to the model?
- Some who have reviewed the concept for the registry suggested other attributes. Should these be added? (And what is missing?)
- Package – Translations
- Package – Intended audience (e.g. developers, patrons/desktop, patrons/web, library-staff/desktop, library-staff/web)
- Version – Code maturity (e.g., alpha, beta, release candidate, formal release)
- To answer the question “Are any peers using this open source software?” is it necessary to have an enumeration of library types? Public library, school library, university library, community college library, special library, museum (others?)
- Is the location of Institutions and Providers desired? One reason it might be desirable is to do a geography-based search (e.g. training providers within a 60-mile radius).