Code4Lib Montreal meeting notes 20181023
Contents
Code4Lib Montreal 2018-10-23
Attendees
- Chris Trudeau - recent McGill SIS graduate
- Martin ?? - Health Sciences liaison, McGill
- Stephana Bretweiser - CCA
- Tim Walsh - Digital Preservation librarian, Concordia
- John ?? - Digital Archivist, Concordia
- Clara Turp - Metadata Analyst Librarian, McGill
- Jessica Reeve - Senior Electronic Resources
- Tomasz Langenbauer - Digital Projects, Concordia
- Rebecca Nicholson - Web group, McGill
- Eka Grguric - Web Librarian, McGill
- Dan Scott - Systems librarian, Laurentian / McGill student
Mandat du groupe et description / Group's description and mandate
Brief discussion about what the mandate of the group should be:
- Learning about technology and coding through doing; workshops
- Building a community - across universities, colleges, public institutions in Montreal
- Informal
We like https://code4lib.org/about
- Action Clara will customize the Code4Lib statement, ensuring it reflects a Montreal & bilingual context
Presentations
Sarah Severson: sick, will present conference report from DLF next time
Chris Trudeau: citations to reserves
Idea: instead of faculty emailing the library with their individual requests for items that need to be placed on reserve, why not extract the citations from the course outline / syllabus (in PDF or Word format) and automatically generate reserve requests?
Feedback
- McGill used to have faculty upload syllabi, but eventually stopped because of resistance ("private information")
- McGill accepts reserve requests in any format: email, in person, paper
- Tomasz built something like this for Concordia in 2009 and is willing to share it; but faculty wanted the ability to submit the entire syllabus; or paste in a full citation; or fill out the parts field-by-field
Tim Walsh, Bulk Reviewer
This is a project Tim started working on while a Harvard Fellow over the summer; the idea is to use forensics tools for the power of archives. Requires identifying individual files accurately rather than the broader-based "yeah it looks like there are credit card numbers on this hard drive" approach that forensics are interested in.
- Identifies, reviews, and removes sensitive files in disk images and directories, regardless of file format
- Sensitive info - SSN, credit card numbers, phone numbers, email addresses, internet history, EXIF metadata, GPS data, custom search terms, Windows registry (program install history)
- Built using Django, Vue.js, bulk_extractor, DFXML, and Docker
- bulk_extractor generates text files or a SQLite database that normally gets processed into a histogram; this processes the data to instead support a Web browser front end and identify the individual files that may be problematic
Problems
- Many false positives (e.g. all 9 digit numbers are identified as SSNs); Tim isn't sure any of these tools have a high level of confidence
- Tooling is all American-based, so adding something like a SIN requires C++ (Tomasz is willing to help!)
Next meeting
- November - Sarah and John to present
- Mid-December - social