Changes

Jump to: navigation, search

Code4Lib Montreal meeting notes 20181023

359 bytes added, 00:33, 25 October 2018
m
Formatting for Tim's presentation
=== Tim Walsh, Bulk Reviewer ===
 This is a project Tim started working on while a Harvard Fellow over the summer; the idea is to use forensics tools for the power of archives. Requires identifying individual files accurately rather than the broader-based "yeah it looks like there are credit card numbers on this hard drive" approach that forensics are interested in. * Identifies, reviews, and removes sensitive files in disk images and directories, regardless of file format* Sensitive info - SSN, credit card numbers, phone numbers, email addresses, internet history, EXIF metadata, GPS data, custom search terms, Windows registry (program install history)* Built using Django, Vue.js, bulk_extractor, DFXML, and Docker* bulk_extractor generates text files or a SQLite database that normally gets processed into a histogram; this processes the data to instead support a Web browser front end and identify the individual files that may be problematic ==== Problems: ==== * Many false positives (e.g. all 9 digit numbers are identified as SSNs); Tim isn't sure any of these tools have a high level of confidence* Tooling is all American-based, so adding something like a SIN requires C++ (Tomasz is willing to help!)
== Next meeting ==
18
edits

Navigation menu