Difference between revisions of "Umlaut wishlist"

From Code4Lib
Jump to: navigation, search
 
(27 intermediate revisions by 11 users not shown)
Line 1: Line 1:
 
[[Category:Umlaut]]
 
[[Category:Umlaut]]
  
Desired or planned features.
+
=WARNING: This is Outdated Documentation!!!!=
  
* Parsing of formatted references from an entry screen. Use http://wing.comp.nus.edu.sg/parsCit/ package. Very interesting!  Or a similar UCOP package: http://purl.net/net/egh/hmm-citation-extractor/
+
'''THIS IS OUTDATED DOCUMENTATION''' See new Umlaut documentation at http://github.com/team-umlaut/umlaut/wiki
 +
---------
 +
 
 +
Some actual current future plans:
 +
 
 +
* JournalTOCs ToC?
 +
 
 +
* Use OCLC xISBN to find HT and Internet Archive/OCA matches?
 +
 
 +
* Internet Archive -- use new OL/IA api, discover search-inside-the-book.
 +
 
 +
* WorldCat, use new api, link directly to nearest public library in 'see also' or elsewhere.
 +
 
 +
* CiteSeerX -- source of 'cited by' info, AND, most excitingly, open access pre-prints. But their Atom/RSS feeds (the only API I could find) don't seem to advertise enough info to actually use these features. Would need to talk to developer team -- possibly offer to help code? Also not entirely clear how big their corpus actually is, if it's worth it.
 +
 
 +
* Try screen-scraping Google Scholar (and maybe Microsoft Academic) to get the open access full text links they find.  Also, there's a Springer API for open access content now. http://dev.springer.com/docs/Restful_operations
 +
 
 +
* When no full text is found, provide link to search on Google Scholar, or Bing Academic?  Need to have sufficient metadata to create the search. Oct 2010 Library Technology Reports article has some ideas, I think.
 +
 
 +
 
 +
'''old''' Desired or planned features.
 +
 
 +
* Check for similar articles from: http://biosemantics.org/jane/faq.php#api
 +
 
 +
* Full-text availability check from http://chroniclingamerica.loc.gov/ -- check by title/city, check by lccn (?), able to check particular dates/link to particular dates and/or pages of paper?
 +
 
 +
* Allow a service_response to have a tree relationship to children, so for instance alternate versions of a text can be attached as children of the main link, expandable by the user.
 +
 
 +
* http://export.arxiv.org/api_help/  !!!!
 +
 
 +
* PubMed Central full text lookup http://www.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html (SFX may already do this?)
 +
 
 +
* Journal ToC from CiteULike
 +
 
 +
* Parsing of formatted references from an entry screen. Use http://wing.comp.nus.edu.sg/parsCit/ package. Very interesting!  Or a similar UCOP package: http://purl.net/net/egh/hmm-citation-extractor/ See list of such packages here under "Other Parsing Tools" http://freecite.library.brown.edu/
  
 
* LibraryThing open knowledge API for more data. http://www.librarything.com/blog/2008/08/free-web-services-api-to-common.php
 
* LibraryThing open knowledge API for more data. http://www.librarything.com/blog/2008/08/free-web-services-api-to-common.php
Line 9: Line 43:
 
* Connect to internet linked movie database on movies: http://www.linkedmdb.org/
 
* Connect to internet linked movie database on movies: http://www.linkedmdb.org/
  
* Add information about the conversation happening around an article with Scintilla if we have a URL, PMID or DOI:
+
* Add information about the conversation happening around an article with Scintilla if we have a URL, PMID or DOI (Alf at Scintilla would prefer us NOT to use the API for high-traffic. But we can copy his techniques internally to Umlaut. CrossRef and PubMed for "cited by" on DOI and PMID identifiers are a good idea. He has also reverse engineered the Scopus javascript api to allow server-side json access. http://hublog.hubmed.org/archives/001512.html):
 
     http://hublog.hubmed.org/archives/001609.html
 
     http://hublog.hubmed.org/archives/001609.html
 
     Unofficially it will return json:
 
     Unofficially it will return json:
 
     http://scintilla.nature.com/conversations?uri=info%3Adoi%2F10.1371%2Fjournal.pmed.0020124&format=json
 
     http://scintilla.nature.com/conversations?uri=info%3Adoi%2F10.1371%2Fjournal.pmed.0020124&format=json
 +
 +
  
  
Line 52: Line 88:
 
* Fix Umlaut Referent to more easily allow multiple authors. Architectural change neccessary to get a lot of this stuff working right.  
 
* Fix Umlaut Referent to more easily allow multiple authors. Architectural change neccessary to get a lot of this stuff working right.  
  
* "Cited by" service. Scopus via screen scraping? (scopus javascript api? http://www.scopus.com/scsearchapi/ ) ISI Web of Science is too hard to even screen scrape the interface is such a mess, but Scopus looks do-able.  Google scholar?
+
* "Cited by" service. Scopus via screen scraping? (scopus javascript api? http://www.scopus.com/scsearchapi/ See also http://hublog.hubmed.org/archives/001512.html ) ISI Web of Science is too hard to even screen scrape the interface is such a mess, but Scopus looks do-able.  Google scholar?
  
  
Line 83: Line 119:
  
 
* SFX adaptor: Add a "rollup" feature that pays attention to dates to avoid eliminating coverage.
 
* SFX adaptor: Add a "rollup" feature that pays attention to dates to avoid eliminating coverage.
 
 
 
== done or in progress ==
 
 
* Google Books search to complement the OCA and Gutenberg searches I’ve got--may or may not be possible with no google books api. Screen scrape? Umich oai-pmh records?
 
 
* UMich MBooks for fulltext (and search-inside)
 
    http://mirlyn.lib.umich.edu/cgi-bin/sdrsmd?id=1&oclc=16857172
 
    http://code.google.com/p/jquery-sdrsmd/
 
 
* connection to OCLC Identities
 
    http://outgoing.typepad.com/outgoing/2008/06/linking-to-worl.html
 
 
* Cover images from Open Library?  See http://johnmiedema.ca/openbook-wordpress-plugin/.
 

Latest revision as of 16:22, 19 June 2012


WARNING: This is Outdated Documentation!!!!

THIS IS OUTDATED DOCUMENTATION See new Umlaut documentation at http://github.com/team-umlaut/umlaut/wiki


Some actual current future plans:

  • JournalTOCs ToC?
  • Use OCLC xISBN to find HT and Internet Archive/OCA matches?
  • Internet Archive -- use new OL/IA api, discover search-inside-the-book.
  • WorldCat, use new api, link directly to nearest public library in 'see also' or elsewhere.
  • CiteSeerX -- source of 'cited by' info, AND, most excitingly, open access pre-prints. But their Atom/RSS feeds (the only API I could find) don't seem to advertise enough info to actually use these features. Would need to talk to developer team -- possibly offer to help code? Also not entirely clear how big their corpus actually is, if it's worth it.
  • When no full text is found, provide link to search on Google Scholar, or Bing Academic? Need to have sufficient metadata to create the search. Oct 2010 Library Technology Reports article has some ideas, I think.


old Desired or planned features.

  • Full-text availability check from http://chroniclingamerica.loc.gov/ -- check by title/city, check by lccn (?), able to check particular dates/link to particular dates and/or pages of paper?
  • Allow a service_response to have a tree relationship to children, so for instance alternate versions of a text can be attached as children of the main link, expandable by the user.
  • Journal ToC from CiteULike
  • Add information about the conversation happening around an article with Scintilla if we have a URL, PMID or DOI (Alf at Scintilla would prefer us NOT to use the API for high-traffic. But we can copy his techniques internally to Umlaut. CrossRef and PubMed for "cited by" on DOI and PMID identifiers are a good idea. He has also reverse engineered the Scopus javascript api to allow server-side json access. http://hublog.hubmed.org/archives/001512.html):
    http://hublog.hubmed.org/archives/001609.html
    Unofficially it will return json:
    http://scintilla.nature.com/conversations?uri=info%3Adoi%2F10.1371%2Fjournal.pmed.0020124&format=json



  • Rochester “Getting Users Fulltext” style code to skip right to the full text, skipping content-provider metadata pages.


  • UMich Mirlyn for metadata enrichment?
    http://webservices.itcs.umich.edu/mediawiki/MLibraryAPI/index.php/Mirlynapi:Home


  • xISBN/thingISBN use. (Some thought is required in how to integrate this while avoiding false positives). Bowker ISSN service for metadata enhancement. OCLC xISSN? Integrate preceding/succeeding title information from OPAC or xISSN?
  • LibraryLookup: http://xisbn.worldcat.org/liblook/index.htm At least until xISBN is baked in we could provide a link to this service. Increases the chances of finding a desired book in the catalog through work set grouping. Used by LibX.
     http://xisbn.worldcat.org/liblook/resolve.htm?res_id=http://www.iucat.iu.edu&rft.isbn=0451530942&url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:book
  • Journal covers from Ulrich's via screen-scraping (or Ulrich's/sersol built in api?)


  • Connotea integration


  • Fetch ToC from LC. Screen scrape, I guess? Or z3950? Any other content from LC?


  • Link to Books In Print ala Notre Dame.

http://www.library.nd.edu/eresources/findit/findit.cgi?doc_num=001939269&aleph_session=U5AVHRXD5QB1CGDFDSVJ9DSY2UA6QNCGVEU8EYRX9NNMIQ429Q-54668%22 example

  • bip search url? :

http://www.booksinprint.com/merge_shared/Search/advsearch.asp%3FdateState%3DY%26txtAction%3D%26BooleanSearch%3D%26SType%3Dadv%26collection%3DBIP%26QueryMode%3DSimple%26ResultCount%3D25%26ResultTemplate%3Dmbbookresult_fl.hts%26navPage%3D1%26SrchFrm%3DAdv%26ScoreThreshold%3D0%26Criteria1%3DISBN%26CriteriaText1%3D0838935370


  • SFX plugin: Notice when first title given is non-roman, and look for roman title to enhance metadata with when so.


  • HIP and other OPAC searchers should pull ToC from MARC 505 when present. And 856's judged to be ToC in ToC, not full text.


  • Fix Umlaut Referent to more easily allow multiple authors. Architectural change neccessary to get a lot of this stuff working right.


  • Enhance metadata to have full metadata for a refworks etc export. Using: CrossRef? Metalib? Anything else?


  • A general purpose responsecache. Schema: Date, service/source, key. Use for caching image urls, ToC urls from LC, etc.


  • Fix Worldcat registry auto-discovery.


  • Add a Worldcat search that uses API, instead of screen scrape.


  • Switch OCA search to use OCA native APIs, instead of indexdata mirror index.


  • fix unapi in umlaut. unapi to rsi? For zotero.


  • Change background to use Spawn plugin instead of manual threading. Investigating using spawn with fork instead of thread (terry reese on limited pool of forks).


  • Crazy idea for an abstract interface/architecture to support querying web service apis that require client side javascript, like Google Books and Scopus.


  • Integrate my various local document delivery services into menu of options when full text isn’t available. More generally, a clear architecture for providing localized doc delivery services in addition to a single ILL link.


  • SFX adaptor: Add a "rollup" feature that pays attention to dates to avoid eliminating coverage.