Just for future reference:
- Scan images at 300 dpi (might be able to make this work at a lower resolution, but this is fine). For one sample page, this resulted in a 2348×3129 pixel image where each baseline height was around 50 pixels, and capital letters had a height around 30 pixels.
- Install ocropus 0.3.1-2 from Ubuntu mirror. Other Ubuntu versions may have other ocropus versions.
- Run
ocroscript recognize image.png > image.html