A word of advice for new users of DJVU

Short version: Always scan in raw documents at 200 or 300 dpi if you intend to put them into djvu format. NB, This is higher resolution than you might think is necessary! But it is worth it.

(this problem is illustrated visually here)

Long version:

My problem

I scanned in some documents, creating HUGE tiff files; and tried to get them into djvu format. The results were awful, even though the original huge tiff files were very readable. I was using my scanner's "screen" setting because this was adequate for producing readable gifs for putting on webpages.

Yann's reply

I have bad news. It seems that your document was scanned at 72dpi.
DjVu cannot do a good job at separating the foreground from the
background with resolutions that low. DjVu segmentation works best
with documents at 300dpi, though 200dpi is OK in most cases. 72dpi is
out of the question (most character strokes are thinner than one pixel
at that resolution).

I only see two solutions: 
(1) compress the pages in "photo" mode. The disadvantages are that
    you can't do OCR, and the pages are around 100-150KB despite
    the low resolution. You can do this on any2djvu, or if you
    have installed DjVuLibre, you can simply run the
    shell script below as follows:  
     % tifftobook book.djvu
    This script takes all the .tif files in the current directory
    and produces a DjVu book (book.djvu) by compressing the pages
    in photo mode.
________________________________________________________________
#!/bin/tcsh
# tifftobook
foreach i ( *.tif )
  echo $i
  tifftopnm $i >/tmp/$$
  c44 -dpi 72 -slice 80+27 /tmp/$$ /tmp/djvu$$$i:r
  rm /tmp/$$
end
djvm -c $1 /tmp/djvu$$*
rm /tmp/djvu$$*
________________________________________________________________


(2) rescan your document at 300dpi (or at least 200 dpi).
    Uncompressed color TIFFs at 300dpi are 25MB, so you might want
    to compress them to high-quality JPEG before uploading
    them to the server, which you can do with the following
    Linux script (assuming you have a standard Linux distro
    with ImageMagick pre-installed):

________________________________________________________________
#!/bin/tcsh
foreach i ( *.tif )
  convert -quality 95 $i $i:r.jpg
end
________________________________________________________________


Best,

  -- Yann LeCun

(this problem is illustrated visually here)


David MacKay
Last modified: Wed Jan 21 18:26:12 2004