Sunday, January 19, 2014

pdf2htmlEX v0.11 is out

pdf2htmlEX v0.11 has been released.  Thanks to all the contributors!

This version includes lots of JavaScript code cleaning. jQuery is now completely removed. The additional term in the license is removed, now pdf2htmlEX is released with pure GPLv3.

Complete Changelog:

* Compress JS with closure-compiler
* Compress CSS with YUI Compressor
* jQuery removed
* Lots of JS code cleaning
* Enable global key handler by default
* Use WOFF by default
* Always generate TTF before the final output
* Fix CSS for loading-indicator
* Do not set style for global <span>
* Improvements on the SVG output
* New options

Wednesday, November 6, 2013

Removing dependency on jQuery

Recently I've been working on removing jQuery from pdf2htmlEX.js.

jQuery was initially introduced as a handy cross-platform JavaScript syntax. Until recently, I didn't realized that IE(>=9) had already implemented most standard JavaScript APIs. I was still thinking about attachEvent vs addEventListener at that time.

A few days ago when I was trying to optimizing JS code, I found jQuery is really large (~90K minimized), while all other js and css files are no bigger than ~10K in total. Also lots of results from show that jQuery is really slow.

Currently most jQuery functions have been replaced with standard JavaScript APIs, except for $.extend and $.ajax, which will be worked out soon. Another exception is Element.classList which is not implemented in IE9, but I've copied a JS snippet from PDF.js for that.

I can indeed feel a boost on performance after the removal of jQuery, but the modification is likely to cause regression, although I've checked for each API I used.  I've tested the code on Firefox/Chrome on Linux, and I'll test on others as well.

Target browsers:
- IE >= 9
- Recent versions of Firefox / Chrome / Safari

Please help test (the git version of) pdf2htmlEX with your browser and file bugs if any. Thanks!

Thursday, October 17, 2013

pdf2htmlEX v0.10 is out

pdf2htmlEX v0.10 is out, bringing experimental support for SVG image and Type 3 fonts.


* Lots of code cleaning
* Logo as loading indicator
* Add a logo
* Remove several CSS prefixes
* Background image optimization
* Support output background image in JPEG (--bg-format jpg)
* [Experimental] Support output background image in SVG (--bg-format svg)
* [Experimental] Support Type 3 fonts
* New options
--font-format (same as --font-suffix, but without the leading dot)

* Deprecated options:

Thursday, September 26, 2013

pdf2htmlEX got a logo

I managed to craft a logo with Inkscape, which is basically an emblem of "<pdf>". Perhaps it is not of much use, but I just hope that it can help visualizing the concept.

The images are located in the logo/ folder, all of them are licensed under CC-BY 3.0.

Friday, September 20, 2013

Preliminary support for Type 3 fonts

I'm happy to announce that a preliminary support for type 3 fonts has been added to pdf2htmlEX. For now 2 simple PDFs from PDF.js are passed:

This feature is actually one of the features that I want to implement the most, since the very beginning. Another one is generating background images in SVG, a preliminary version of which has also just been added.

Both features rely on CairoOutputDev from poppler, which further replies on cairo and freetype. Actually it might be possible to eliminate the dependency on freetype, but I don't want to touch those files in order to make it easier to merge upstream files in the future. Anyway seems that freetype is depended by poppler, so no big deal.

To enable this feature, you need the latest source code from git. Add `-DENABLE_SVG=ON` to cmake, and `--process-type3=1` when running pdf2htmlEX.

The current idea is, for each type 3 font, to dump each glyph into an SVG image and then combine them into a font with FontForge. It's actually inspired by FontCustom, I realized the capability of importing SVG glyphs of FontForge by reading the code of FontCustom.

Each glyph is drawn on a 100x100 canvas, although SVG is for vector graphics, CairoOutputDev would thicken thin strokes (for printing purpose?), which might ruin the font. Also there are cases that sampled raster images are stored in the SVG file, probably it is the behaviour of cairo due to the limitations of SVG. In such cases, 100x100 might not be large enough for a font.

The size is defined as GLYPH_DUMP_EM_SIZE in I tried to set it to 1000, and indeed the quality for `issue3188.pdf` was improved; but for some other PDF files, the values in SVG files might be so large that FontForge would complain that those values cannot be stored into 16-bit fields. Or maybe it is the problem of TTF, and I'd better change it to another.

However due to the complexity of Type 3 fonts, (each glyph is a mini-PDF), especially the font matrix, I don't have a perfect solution for each possible cases. Right now let me just focus on `average` cases.

Wednesday, September 18, 2013

Preliminary SVG support

A preliminary SVG support has been implemented, powered by CairoOutputDev from poppler.

Since CairoOutputDev is not exposed by poppler, I have to maintain a copy of a few files inside pdf2htmlEX. Also cairo and freetype are required for this feature. This feature can be enable/disabled by the ENABLE_SVG cmake
A new option `--bg-format` has also been added, to specify the format for the background images. Currently only 'png' and 'svg' are supported.

(This is also a test for auto forwarding blog posts to the mailing list)