Code4Lib Montreal meeting notes 20181023

From Code4Lib
Jump to: navigation, search

Code4Lib Montreal 2018-10-23


  • Chris Trudeau - recent McGill SIS graduate
  • Martin ?? - Health Sciences liaison, McGill
  • Stephana Bretweiser - CCA
  • Tim Walsh - Digital Preservation librarian, Concordia
  • John ?? - Digital Archivist, Concordia
  • Clara Turp - Metadata Analyst Librarian, McGill
  • Jessica Reeve - Senior Electronic Resources
  • Tomasz Langenbauer - Digital Projects, Concordia
  • Rebecca Nicholson - Web group, McGill
  • Eka Grguric - Web Librarian, McGill
  • Dan Scott - Systems librarian, Laurentian / McGill student

Mandat du groupe et description / Group's description and mandate

Brief discussion about what the mandate of the group should be:

  • Learning about technology and coding through doing; workshops
  • Building a community - across universities, colleges, public institutions in Montreal
  • Informal

We like

  • Action Clara will customize the Code4Lib statement, ensuring it reflects a Montreal & bilingual context


Sarah Severson: sick, will present conference report from DLF next time

Chris Trudeau: citations to reserves

Idea: instead of faculty emailing the library with their individual requests for items that need to be placed on reserve, why not extract the citations from the course outline / syllabus (in PDF or Word format) and automatically generate reserve requests?


  • McGill used to have faculty upload syllabi, but eventually stopped because of resistance ("private information")
  • McGill accepts reserve requests in any format: email, in person, paper
  • Tomasz built something like this for Concordia in 2009 and is willing to share it; but faculty wanted the ability to submit the entire syllabus; or paste in a full citation; or fill out the parts field-by-field

Tim Walsh, Bulk Reviewer

This is a project Tim started working on while a Harvard Fellow over the summer; the idea is to use forensics tools for the power of archives. Requires identifying individual files accurately rather than the broader-based "yeah it looks like there are credit card numbers on this hard drive" approach that forensics are interested in.

  • Identifies, reviews, and removes sensitive files in disk images and directories, regardless of file format
  • Sensitive info - SSN, credit card numbers, phone numbers, email addresses, internet history, EXIF metadata, GPS data, custom search terms, Windows registry (program install history)
  • Built using Django, Vue.js, bulk_extractor, DFXML, and Docker
  • bulk_extractor generates text files or a SQLite database that normally gets processed into a histogram; this processes the data to instead support a Web browser front end and identify the individual files that may be problematic


  • Many false positives (e.g. all 9 digit numbers are identified as SSNs); Tim isn't sure any of these tools have a high level of confidence
  • Tooling is all American-based, so adding something like a SIN requires C++ (Tomasz is willing to help!)

Next meeting

  • November - Sarah and John to present
  • Mid-December - social