3
edits
Changes
no edit summary
As the acquisition of born-digital materials grows, institutions are seeking methods to facilitate easy ingest into their repositories and provide access to disk images and files derived or extracted from disk images. In this session, we describe our development of a Fedora 4.0 Content model for disk images, including acceptable image file formats and the rationale behind those choices. We will also discuss efforts to integrate the disk image content model into the BitCurator Access environment. Unlike generalized, format-agnostic content models which might treat the disk image as a generic bitstream, a content model designed for disk images enables expression of relationships among associated content in the collection such as files extracted from images and other born-digital and digitized material associated with the same creator. It also enables capture of file-system attributes such as file paths, timestamps, whether files are allocated/deleted, etc. Further, a disk image content model suggests further steps repositories can take in order to transform and re-use associated metadata generated during the creation and forensic analysis of the disk image.
== Data acquisition and publishing tools in R ==
* Scott Chamberlain, scott@ropensci.org, rOpenSci/UC Berkeley
R is an open source programming environment that is widely used among researchers in many fields. R is powerful because it's free, increasingly robust, and facilitates reproducible research, an increasingly sought after goal in academia. Although tools for data manipulation/visualization/analysis are well developed in R, data acquisition and publishing tools are not. rOpenSci is a collaborative effort to create the tools necessary to complete the reproducible research workflow. This presentation discusses the need for these tools, including examples, including interacting with the repositories Mendeley, Dryad, DataONE, and Figshare. In addition, we are building tools for searching scholarly metadata and acuiring full text of open access articles in a standarized way across metadata providers (e.g., Crossref, DataCite, DPLA) and publishers (e.g., PLOS, PeerJ, BMC, Pubmed). Last, we are building out tools for data reading and writing in Ecologial Metadata Language (EML).