Difference between revisions of "2014 Breakout II (Wednesday)"

From Code4Lib
Jump to: navigation, search
m (ResCarta)
(Unusual searches & long searches)
Line 51: Line 51:
 
==Unusual searches & long searches==
 
==Unusual searches & long searches==
 
Willow Oak
 
Willow Oak
 +
 +
This group met to talk about unusual searches, especially extremely long searches, copied and pasted citations, and other issues related to serving niche searches.
 +
 +
Some of the possible solutions include:
 +
 +
*Looking for DOI, ISBN or other identifiers in the query, extract these, and make the request to a service using these IDs.
 +
*Remove extraneous characters from the beginning of a string that may indicate copied and pasted text.
 +
*Truncate a long query at a certain character length (80 to 100?) assuming that the most useful text appears at the start of the query.
 +
*Use a regex to identify a citation by detecting some combination of words commonly used in citations (Vol., Iss., pp.), four digit years, and other combinations of numbers.
 +
**It would be useful to test this regex against a search corpus to check for false matches.
 +
**Once a citation is identified either certain characters could be removed from the query or a citation parser such as the Brown's FreeCite [http://freecite.library.brown.edu/].
 +
 +
Other things noted:
 +
 +
*If you truncate a query don't truncate in the middle of a word or else recall may be worse.
 +
*Log queries that provide zero hit as way to find types of queries that may need some post processing.
 +
*Is there way to provide smarter, live results for libraries for thing such as library hours, similar to the way Google provides live flight tracking information directly in the results list.
  
 
==ResCarta==
 
==ResCarta==

Revision as of 20:32, 26 March 2014

UX

Notes by @erinrwhite again. Y'all cannot escape me

NCSU's UX department is cross-functional and has members from across departments. Looking at creating cross-channel experiences from digital to real life. Working on consistency across experiences. Expanded on UMich's UX department to create a UX research team.

Research: the NCSU does a research project every month. NCSU is also training new library fellows to infuse User Experience work into their projects. Growing the culture of UX within the organization.

Process

How do you work in harmony with a dev team when sometimes the UX team can be the roadblock to development? Need to get a workflow that works so that everyone can move quickly.

UXing web pages vs. entire web applications: they're totally different experiences so need different approaches to user experience evaluation.

Research

Guerrilla research: go out into the public spaces of your library to test prototypes or design ideas. Make it quick. User research doesn't have to be a huge deal.

If you can't give money as remuneration, give 'em candy bars. But make the candy bars full-size, not the minis.

Librarians are users too...right?

How do we push back against librarians' assertions that pages/interfaces should look a certain way?

Research with users can *sometimes* help.

Need to communicate your evidence to your library. UT hired someone last year just to do IT communication (!).

Numbers don't always work. Need a visual tool if possible (i.e. a heatmap). If you can compile a video or audio of user interviews or usability testing, that can be very powerful.

Resources

Recommendation: 37Signals' book Getting Real on helping choose things that are/aren't important and moving on.

Publish your damn work!

As a community, we need to get better about sharing our work with each other so we don't have to keep reinventing the wheel.

Securing EZproxy

Mag II

Tech service

Pine Oak

AngularJS

Capitol

BIBFRAME 2 & Linked Data

in Ballroom

Unusual searches & long searches

Willow Oak

This group met to talk about unusual searches, especially extremely long searches, copied and pasted citations, and other issues related to serving niche searches.

Some of the possible solutions include:

  • Looking for DOI, ISBN or other identifiers in the query, extract these, and make the request to a service using these IDs.
  • Remove extraneous characters from the beginning of a string that may indicate copied and pasted text.
  • Truncate a long query at a certain character length (80 to 100?) assuming that the most useful text appears at the start of the query.
  • Use a regex to identify a citation by detecting some combination of words commonly used in citations (Vol., Iss., pp.), four digit years, and other combinations of numbers.
    • It would be useful to test this regex against a search corpus to check for false matches.
    • Once a citation is identified either certain characters could be removed from the query or a citation parser such as the Brown's FreeCite [1].

Other things noted:

  • If you truncate a query don't truncate in the middle of a word or else recall may be worse.
  • Log queries that provide zero hit as way to find types of queries that may need some post processing.
  • Is there way to provide smarter, live results for libraries for thing such as library hours, similar to the way Google provides live flight tracking information directly in the results list.

ResCarta

We gathered in the ballroom and had an active conversation about the philosophy of keeping archives in a reduced set of file formats with standardized metadata. We reviewed directory structures and METS collection level details. For a future reduction of coding and costs we advise the reduction of file formats (normalization) on ingestion into a structured archive.

Justin from Artefactual shared their philosophy and thoughts on use of METS collection level file contents.

Historically systems like NDNP are gate keeper validation systems and we should be building digital archive creation systems. Build to a standard under code control rather than code to check hand made datasets.

OCLC institution RDF project

in ballroom


Digital Preservation

Cost issues, billing departments, charging grant projects one-time vs. multiple

Internal vs. external hosting

Trusted Digital Repository, TRAC, ISO standard

Geographic distribution, what does that actually mean

? who is using checksums and how often they are verifying
UNC - make sure checksums checked every quarter, throttle/stagger checking

? Has anyone had checksum checks fail?
only time is user error, checking wrong one, files are changed after initial checksum

video - frame-level checksum, part of ffmpeg, make frame level information and checksum that

? how much code/time is done to check on problems with checksums?
manual vs. auto repair, prefer manual intervention

how often to check tapes, without further damaging tape

for testing, there's a tool that will flip bits
disaster recovery testing
hesitance to test/break files on production

ZFS, self-healing filesystem, replication (worried about replicating checksum errors)

? about viruses, malicious scripts
UNC runs ClamAV on everything, does make sure everyone is authorized user

AV Artifact Atlas - visual glossary of damage types to a/v files

tape backup of everything can take too long to run (days)
rely on multiple copies of objects on disk

format migrations - no one has really done it yet
archivematica wiki is great resource

normalization on ingest
emulation as a service - possible collaboration in community
internet archive emulation service using javascript/jsmess

Major issues for Digital Preservation

  • storage (terabytes coming in each year, no cost-effective solutions for growing needs)
  • staffing (for smaller institutions)
  • funding model/sustainability (some charge for services, some funding by Campus IT)
    • research data, grants, data management planning tool
    • how long can we offer to store files
    • trying to convince Provost that library storage is like library shelf space and needs to be funded
    • split funding, from graduate schools or president's office
  • some work on service level agreements, tiers of service
  • file retrievals may not be tracked anywhere, if so can't tell what hasn't been retrieved

NDSA Levels of Preservation - http://www.digitalpreservation.gov/ndsa/activities/levels.html