Short version: Always scan in raw documents at 200 or 300 dpi if you intend to put them into djvu format. NB, This is higher resolution than you might think is necessary! But it is worth it. |
Long version: My problemI scanned in some documents, creating HUGE tiff files; and tried to get them into djvu format. The results were awful, even though the original huge tiff files were very readable. I was using my scanner's "screen" setting because this was adequate for producing readable gifs for putting on webpages. Yann's replyI have bad news. It seems that your document was scanned at 72dpi. DjVu cannot do a good job at separating the foreground from the background with resolutions that low. DjVu segmentation works best with documents at 300dpi, though 200dpi is OK in most cases. 72dpi is out of the question (most character strokes are thinner than one pixel at that resolution). I only see two solutions: (1) compress the pages in "photo" mode. The disadvantages are that you can't do OCR, and the pages are around 100-150KB despite the low resolution. You can do this on any2djvu, or if you have installed DjVuLibre, you can simply run the shell script below as follows: % tifftobook book.djvu This script takes all the .tif files in the current directory and produces a DjVu book (book.djvu) by compressing the pages in photo mode. ________________________________________________________________ #!/bin/tcsh # tifftobook foreach i ( *.tif ) echo $i tifftopnm $i >/tmp/$$ c44 -dpi 72 -slice 80+27 /tmp/$$ /tmp/djvu$$$i:r rm /tmp/$$ end djvm -c $1 /tmp/djvu$$* rm /tmp/djvu$$* ________________________________________________________________ (2) rescan your document at 300dpi (or at least 200 dpi). Uncompressed color TIFFs at 300dpi are 25MB, so you might want to compress them to high-quality JPEG before uploading them to the server, which you can do with the following Linux script (assuming you have a standard Linux distro with ImageMagick pre-installed): ________________________________________________________________ #!/bin/tcsh foreach i ( *.tif ) convert -quality 95 $i $i:r.jpg end ________________________________________________________________ Best, -- Yann LeCun |