Code4Lib Journal PDFs
From Code4Lib
Goal: Make PDFs from the HTML from WordPress with the fewest changes possible.
Contents |
[edit] Status
- PDFs for an entire issue or for a single article can be created. Issue PDFs have a Table of Contents.
- Most of the main formatting elements are working, but there are lots of things left to finalize and cleanup.
[edit] Detailed Info
- Basic info (title, authors, issue #) - working
- PDF Headers and Footers - working
- Headings - working, h2 - h4
- Paragraphs - working
- Lists
- Ordered - working
- Unordered - working
- Tables -
- Images - mostly working (more notes later)
- Links
- External - working
- Internal - working (right now just TOC, maybe add footnotes)
- Code / Pre - working, needs cleanup
- Other - more notes later
[edit] Big Issues
- Unsure how to do syntax highlighting.
[edit] Process
- Each article is saved as a single HTML file.
- Images are saved locally, some file renaming needed.
- A new XML file is created for the issue. It has some basic info, but mostly just the article IDs, which is needed for the PDF for the entire issue.
[edit] Article HTML Conversion to XML
- add closing slash to img tags
- remove entities: ©,  
- strip doctype
- remove namespace on <html>
