Difference between revisions of "2014 Breakout II (Wednesday)"
(→BIBFRAME 2 & Linked Data)
|Line 40:||Line 40:|
Revision as of 17:27, 28 March 2014
- 1 UX
- 2 Securing EZproxy
- 3 Tech services
- 4 AngularJS
- 5 BIBFRAME 2 & Linked Data
- 6 Unusual searches & long searches
- 7 ResCarta
- 8 OCLC institution RDF project
- 9 Digital Preservation
Notes by @erinrwhite again. Y'all cannot escape me
NCSU's UX department is cross-functional and has members from across departments. Looking at creating cross-channel experiences from digital to real life. Working on consistency across experiences. Expanded on UMich's UX department to create a UX research team.
Research: the NCSU does a research project every month. NCSU is also training new library fellows to infuse User Experience work into their projects. Growing the culture of UX within the organization.
How do you work in harmony with a dev team when sometimes the UX team can be the roadblock to development? Need to get a workflow that works so that everyone can move quickly.
UXing web pages vs. entire web applications: they're totally different experiences so need different approaches to user experience evaluation.
Guerrilla research: go out into the public spaces of your library to test prototypes or design ideas. Make it quick. User research doesn't have to be a huge deal.
If you can't give money as remuneration, give 'em candy bars. But make the candy bars full-size, not the minis.
Librarians are users too...right?
How do we push back against librarians' assertions that pages/interfaces should look a certain way?
Research with users can *sometimes* help.
Need to communicate your evidence to your library. UT hired someone last year just to do IT communication (!).
Numbers don't always work. Need a visual tool if possible (i.e. a heatmap). If you can compile a video or audio of user interviews or usability testing, that can be very powerful.
Recommendation: 37Signals' book Getting Real on helping choose things that are/aren't important and moving on.
Publish your damn work!
As a community, we need to get better about sharing our work with each other so we don't have to keep reinventing the wheel.
We shared projects, challenges, and areas of interest
- Linked data for acquisitions info and the Global Open Knowledgebase
- Changing roles for catalogers -- description of unique resources, data extraction and manipulation, linked data
- ILS migrations
- Managing multiple systems and silos (ERM, ILS, ERP, archives)
- Managing DDA (demand driven acquisitions)
- Skills for catalogers -- computational thinking
- Trends toward fewer professional librarians in tech services
- Accepting ambiguity
- CORAL open source ERM
A few discussion topics emerged
- We use different ones -- mostly we get whatever our IT department already has
- Helpful for representing electronic resources -- there's no physical presence to remind you to do the work
- Helpful for metrics
- One barrier to use can be training others to use the systems rather than contacting an individual directly
- A lot of people have to be involved -- collections, tech services, IT
- Duplicate records between existing collection and DDA records -- we don't always realize where duplication exists
- People want to be able to activate DDA records in their e-resources knowledgebase -- ideally we'd have our book jobber help with updating the kb
- Important to a have a good vendor rep
- We weren't able to understand the entire process at the start -- every step was like a new discovery
- A challenge is getting quality records and identifying records that need additional work
MARC record services -- how do you evaluate quality?
- For ebooks, they can be really bad
- Many people are using MARC edit to do batch processing of records, find things that need to be fixed
- Suggested use of regular expressions to pull things out of leader field
- Another common practice is to use various methods to convert MARC records to Excel and look at errors there
- Some of us are using OpenRefine to find problems
- Some of us are becoming more error tolerant, but the cool stuff that people do is dependent on good data
- IRC - #angularjs freenode
Modules, Tools, Features
Misc Resources mentioned and more-or-less related to Angular:
- http://firebase.com/docs/angular/ (cloud back end)
- Other "No Backend" solutions: http://nobackend.org/solutions.html
- http://emberjs.com/ - okay, not angular
BIBFRAME 2 & Linked Data
Notes based off Tweets made by group during the session:
Starting comment: This is the year of testers, early implementers, of BIBFRAME.
- How to get from MARC to BIBFRAME? Request to explain tools, scripts.
- Issues with linking different ontologies to building linked data networks, SKOS brought up, being discussed
- How do people feel about the concept of event? being discussed
- When do we have to switch? When will the vendors build applications in BIBFRAME so then libraries can follow?
- How far does Bibframe extend, and when do you say this is no longer Bibframe's job?
- How do people feel about the concept of event?
- Explain Place, dates, agents as three attributes?
- Discussing expressions versus works (making expression into relationships)
- Model: Works, Instances, with relationships between Works that has expression
- Brief mention of Named entity extraction work for finding these attributes
- What happens when you link to an ontology, then it changes? URIs play the important role here.
- VIVO Project shout out! http://t.co/OK9DLJXVWw
- Example of a collection put through BIBFRAME from A&M http://t.co/K4vnwdczAO
- Group member transcribed records from MARC to BIBFRAME had their *SERIES* records come out correctly
- Variances of cataloging practices will also be a huge issue as well for transcribing records
- Member of group: experiences w/transcribing MARC to BIBFRAME records: two tools, they didn't give the same output
- Battles lines have been drawn: discussing Dublin Core and it's simplicity (good? bad?)
- Music cataloging in FRBR and BIBFRAME being discussed now - diving the deep end
- Locally, we choose, self-select ontologies we need. But if we want more exposure for data, need to explain, share.
- Going from catalogers to metadata librarians in an institutional level, trying to start retrain people now.
- Creating an entire ontology for all of human history would be overwhelming :
- Response to this concern about ontologies: 'But an ontology is domain knowledge, it takes multiple domains/ontologies to cover all of human history'
Unusual searches & long searches
This group met to talk about unusual searches, especially extremely long searches, copied and pasted citations, and other issues related to serving niche searches.
Some of the possible solutions include:
- Looking for DOI, ISBN or other identifiers in the query, extract these, and make the request to a service using these IDs.
- Remove extraneous characters from the beginning of a string that may indicate copied and pasted text.
- Truncate a long query at a certain character length (80 to 100?) assuming that the most useful text appears at the start of the query.
- Use a regex to identify a citation by detecting some combination of words commonly used in citations (Vol., Iss., pp.), four digit years, and other combinations of numbers.
- It would be useful to test this regex against a search corpus to check for false matches.
- Once a citation is identified either certain characters could be removed from the query or a citation parser such as the Brown's FreeCite .
Other things noted:
- If you truncate a query don't truncate in the middle of a word or else recall may be worse.
- Log queries that provide zero hit as way to find types of queries that may need some post processing.
- Is there way to provide smarter, live results for libraries for thing such as library hours, similar to the way Google provides live flight tracking information directly in the results list.
We gathered in the ballroom and had an active conversation about the philosophy of keeping archives in a reduced set of file formats with standardized metadata. We reviewed directory structures and METS collection level details. For a future reduction of coding and costs we advise the reduction of file formats (normalization) on ingestion into a structured archive.
Justin from Artefactual shared their philosophy and thoughts on use of METS collection level file contents.
Historically systems like NDNP are gate keeper validation systems and we should be building digital archive creation systems. Build to a standard under code control rather than code to check hand made datasets.
OCLC institution RDF project
Cost issues, billing departments, charging grant projects one-time vs. multiple
Internal vs. external hosting
Trusted Digital Repository, TRAC, ISO standard
Geographic distribution, what does that actually mean
? who is using checksums and how often they are verifying
UNC - make sure checksums checked every quarter, throttle/stagger checking
? Has anyone had checksum checks fail?
only time is user error, checking wrong one, files are changed after initial checksum
video - frame-level checksum, part of ffmpeg, make frame level information and checksum that
? how much code/time is done to check on problems with checksums?
manual vs. auto repair, prefer manual intervention
how often to check tapes, without further damaging tape
for testing, there's a tool that will flip bits
disaster recovery testing
hesitance to test/break files on production
ZFS, self-healing filesystem, replication (worried about replicating checksum errors)
? about viruses, malicious scripts
UNC runs ClamAV on everything, does make sure everyone is authorized user
AV Artifact Atlas - visual glossary of damage types to a/v files
tape backup of everything can take too long to run (days)
rely on multiple copies of objects on disk
format migrations - no one has really done it yet
archivematica wiki is great resource
normalization on ingest
emulation as a service - possible collaboration in community
Major issues for Digital Preservation
- storage (terabytes coming in each year, no cost-effective solutions for growing needs)
- staffing (for smaller institutions)
- funding model/sustainability (some charge for services, some funding by Campus IT)
- research data, grants, data management planning tool
- how long can we offer to store files
- trying to convince Provost that library storage is like library shelf space and needs to be funded
- split funding, from graduate schools or president's office
- some work on service level agreements, tiers of service
- file retrievals may not be tracked anywhere, if so can't tell what hasn't been retrieved
NDSA Levels of Preservation - http://www.digitalpreservation.gov/ndsa/activities/levels.html