{"id":228,"date":"2011-10-07T16:31:32","date_gmt":"2011-10-07T21:31:32","guid":{"rendered":"http:\/\/blogs.cae.tntech.edu\/mwr\/?p=228"},"modified":"2024-10-27T14:26:19","modified_gmt":"2024-10-27T14:26:19","slug":"ocr-of-scanned-pages-on-ubuntu-10-04","status":"publish","type":"post","link":"https:\/\/sites.tntech.edu\/renfro\/2011\/10\/07\/ocr-of-scanned-pages-on-ubuntu-10-04\/","title":{"rendered":"OCR of scanned pages on Ubuntu 10.04"},"content":{"rendered":"<p>Just for future reference:<\/p>\n<ol>\n<li>Scan images at 300 dpi (might be able to make this work at a lower resolution, but this is fine). For one sample page, this resulted in a 2348&#215;3129 pixel image where each baseline height was around 50 pixels, and capital letters had a height around 30 pixels.<\/li>\n<li>Install ocropus 0.3.1-2 from Ubuntu mirror. Other Ubuntu versions may have other ocropus versions.<\/li>\n<li>Run\n<pre>ocroscript recognize image.png &gt; image.html<\/pre>\n<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>Just for future reference: Scan images at 300 dpi (might be able to make this work at a lower resolution, but this is fine). For one sample page, this resulted in a 2348&#215;3129 pixel image where each baseline height was around 50 pixels, and capital letters had a height around 30 pixels. Install ocropus 0.3.1-2 &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/sites.tntech.edu\/renfro\/2011\/10\/07\/ocr-of-scanned-pages-on-ubuntu-10-04\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;OCR of scanned pages on Ubuntu 10.04&#8221;<\/span><\/a><\/p>\n","protected":false},"author":87,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,20],"tags":[],"class_list":["post-228","post","type-post","status-publish","format-standard","hentry","category-debian","category-ubuntu","entry"],"_links":{"self":[{"href":"https:\/\/sites.tntech.edu\/renfro\/wp-json\/wp\/v2\/posts\/228","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.tntech.edu\/renfro\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.tntech.edu\/renfro\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.tntech.edu\/renfro\/wp-json\/wp\/v2\/users\/87"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.tntech.edu\/renfro\/wp-json\/wp\/v2\/comments?post=228"}],"version-history":[{"count":1,"href":"https:\/\/sites.tntech.edu\/renfro\/wp-json\/wp\/v2\/posts\/228\/revisions"}],"predecessor-version":[{"id":443,"href":"https:\/\/sites.tntech.edu\/renfro\/wp-json\/wp\/v2\/posts\/228\/revisions\/443"}],"wp:attachment":[{"href":"https:\/\/sites.tntech.edu\/renfro\/wp-json\/wp\/v2\/media?parent=228"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.tntech.edu\/renfro\/wp-json\/wp\/v2\/categories?post=228"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.tntech.edu\/renfro\/wp-json\/wp\/v2\/tags?post=228"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}