Code4Lib Journal PDFs

From Code4Lib

Jump to: navigation, search

Goal: Make PDFs from the HTML from WordPress with the fewest changes possible.


Contents

[edit] Status

  • PDFs for an entire issue or for a single article can be created. Issue PDFs have a Table of Contents.
  • Most of the main formatting elements are working, but there are lots of things left to finalize and cleanup.

[edit] Detailed Info

  • Basic info (title, authors, issue #) - working
  • PDF Headers and Footers - working
  • Headings - working, h2 - h4
  • Paragraphs - working
  • Lists
    • Ordered - working
    • Unordered - working
  • Tables -
  • Images - mostly working (more notes later)
  • Links
    • External - working
    • Internal - working (right now just TOC, maybe add footnotes)
  • Code / Pre - working, needs cleanup
  • Other - more notes later

[edit] Big Issues

  • Unsure how to do syntax highlighting.


[edit] Process

  1. Each article is saved as a single HTML file.
  2. Images are saved locally, some file renaming needed.
  3. A new XML file is created for the issue. It has some basic info, but mostly just the article IDs, which is needed for the PDF for the entire issue.

[edit] Article HTML Conversion to XML

  • add closing slash to img tags
  • remove entities: &copy, &nbsp
  • strip doctype
  • remove namespace on <html>
Personal tools