1
edit
Changes
no edit summary
Data Science is increasing in buzz and hype. I'll go over what it is, what it isn't, and how it fits in libraries.
== PDF metadata extraction for academic literature ==
* Kevin Savage, kevin.savage at mendeley.com, Mendeley
* Joyce Stack, joyce.stack at mendeley.com, Mendeley
Mendeley recently added a, "document from file," endpoint to its API which attempts to extract metadata such as title and authors directly from PDF files. This talk will describe at a high level the machine learning methods we used including how we measured and tuned our model. We will then delve more deeply into our stack, the tools we used, some of the things that didn't work and why PDFs are the worst thing ever to compute over.
[[Category:Code4Lib2015]]
[[Category:Talk Proposals]]