Difference between revisions of "Hacking Pre-Ingest Assessment Tools (Solr/Ruby/Python)"
From Code4Lib
(New page: '''Django/Solr Metadata Archive Tool''') |
|||
(10 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
'''Django/Solr Metadata Archive Tool''' | '''Django/Solr Metadata Archive Tool''' | ||
+ | |||
+ | As part of my [http://code4lib.org/conference/2011/Matienzo code4lib presentation] I (Matienzo) may demo some code that works with Digital Forensics XML and gets it into a Solr index. I've successfully thrown Blacklight on top of it, but want to extend it further, especially in terms of figuring what I can do with it and creating a straightforward UI that will represent directory hierarchies. | ||
+ | |||
+ | * https://github.com/anarchivist/foresole | ||
+ | * https://github.com/anarchivist/gumshoe | ||
+ | * Solr index w/ sample data: http://solr.onebigarchives.net:8983/solr/admin/ | ||
+ | * Sample query: http://solr.onebigarchives.net:8983/solr/select?indent=on&version=2.2&q=*:*&fq=&start=0&rows=10&fl=*,score&qt=standard&wt=standard&explainOther=&hl.fl= | ||
+ | |||
+ | This would maybe be happy with an Event microservice. Mark Phillips hopes to release a Django app to this effect in April 2011. | ||
+ | |||
+ | ==Fears== | ||
+ | * Identifiers are precious | ||
+ | * Ingest is forever | ||
+ | * Where does rights management come in | ||
+ | * Hard drives full of junk and an uncorrelated spreadsheet. | ||
+ | * Resolving logical conflicts in human-edited spreadsheets--often difficult to notice problems in advance | ||
+ | |||
+ | ==Desires== | ||
+ | * Command-line statistical analysis (histogram, number of distinct values, etc.) of spreadsheets. | ||
+ | * Organizable digital limbo | ||
+ | * Pie charts and other visualizations. (How much of this stuff is ingestable? and other questions) | ||
+ | |||
+ | ==Tools== | ||
+ | * Event microservice | ||
+ | * GUI XSLT editors exist for MARC... how about for spreadsheets? |
Latest revision as of 20:12, 7 February 2011
Django/Solr Metadata Archive Tool
As part of my code4lib presentation I (Matienzo) may demo some code that works with Digital Forensics XML and gets it into a Solr index. I've successfully thrown Blacklight on top of it, but want to extend it further, especially in terms of figuring what I can do with it and creating a straightforward UI that will represent directory hierarchies.
- https://github.com/anarchivist/foresole
- https://github.com/anarchivist/gumshoe
- Solr index w/ sample data: http://solr.onebigarchives.net:8983/solr/admin/
- Sample query: http://solr.onebigarchives.net:8983/solr/select?indent=on&version=2.2&q=*:*&fq=&start=0&rows=10&fl=*,score&qt=standard&wt=standard&explainOther=&hl.fl=
This would maybe be happy with an Event microservice. Mark Phillips hopes to release a Django app to this effect in April 2011.
Fears
- Identifiers are precious
- Ingest is forever
- Where does rights management come in
- Hard drives full of junk and an uncorrelated spreadsheet.
- Resolving logical conflicts in human-edited spreadsheets--often difficult to notice problems in advance
Desires
- Command-line statistical analysis (histogram, number of distinct values, etc.) of spreadsheets.
- Organizable digital limbo
- Pie charts and other visualizations. (How much of this stuff is ingestable? and other questions)
Tools
- Event microservice
- GUI XSLT editors exist for MARC... how about for spreadsheets?