Difference between revisions of "Code4Lib Journal PDFs"
From Code4Lib
(Created page with status overview and basic information.) |
(→Detailed Info) |
||
(One intermediate revision by the same user not shown) | |||
Line 11: | Line 11: | ||
* Basic info (title, authors, issue #) - working | * Basic info (title, authors, issue #) - working | ||
* PDF Headers and Footers - working | * PDF Headers and Footers - working | ||
− | * Headings - working, | + | * Headings - working, h2 - h4 |
* Paragraphs - working | * Paragraphs - working | ||
* Lists | * Lists | ||
** Ordered - working | ** Ordered - working | ||
− | ** Unordered - | + | ** Unordered - working |
* Tables - | * Tables - | ||
Line 23: | Line 23: | ||
* Links | * Links | ||
− | ** External - working | + | ** External - working |
− | ** Internal - | + | ** Internal - working (right now just TOC, maybe add footnotes) |
* Code / Pre - working, needs cleanup | * Code / Pre - working, needs cleanup | ||
* Other - more notes later | * Other - more notes later | ||
− | |||
== Big Issues == | == Big Issues == | ||
Line 48: | Line 47: | ||
* strip doctype | * strip doctype | ||
* remove namespace on <html> | * remove namespace on <html> | ||
+ | |||
+ | |||
+ | [[Category: Code4Lib Journal]] |
Latest revision as of 22:37, 29 March 2008
Goal: Make PDFs from the HTML from WordPress with the fewest changes possible.
Status
- PDFs for an entire issue or for a single article can be created. Issue PDFs have a Table of Contents.
- Most of the main formatting elements are working, but there are lots of things left to finalize and cleanup.
Detailed Info
- Basic info (title, authors, issue #) - working
- PDF Headers and Footers - working
- Headings - working, h2 - h4
- Paragraphs - working
- Lists
- Ordered - working
- Unordered - working
- Tables -
- Images - mostly working (more notes later)
- Links
- External - working
- Internal - working (right now just TOC, maybe add footnotes)
- Code / Pre - working, needs cleanup
- Other - more notes later
Big Issues
- Unsure how to do syntax highlighting.
Process
- Each article is saved as a single HTML file.
- Images are saved locally, some file renaming needed.
- A new XML file is created for the issue. It has some basic info, but mostly just the article IDs, which is needed for the PDF for the entire issue.
Article HTML Conversion to XML
- add closing slash to img tags
- remove entities: ©,  
- strip doctype
- remove namespace on <html>