→Feed - The HathiTrust Ingest Toolkit
HathiTrust has a mission of ensuring the long-term preservation and accessibility of materials in the archive. Ensuring consistency among materials from different sources is one way we do this; it ensures that tools such as large scale search and PageTurner don't need to be concerned with where the content originated from and that it will be possible to undertake format migrations in the future. To ensure consistency, we have very specific and stringent standards including (but not limited to) the following areas:
- Item identifiers (i.e. how each individual submitted item is identified and named)- Package layout (file names, directory structure, etc.)- Image technical characteristics (file format, resolution, color depth, etc.)- Image metadata (scanning time, scanning artist, etc.)- Source METS file comprising MARC, PREMIS, package contents and structMap, optionally with page numbers and page tags
We have chosen not to accept submissions in arbitrary formats for a couple of reasons. Unfortunately we just don't have the resources to create custom transformations for all sources of content, and if we created generic transformations that could accept data in a wide variety of formats there would most likely be some data loss in the transformation.