Thursday 31 December 2009

Converting a paper book into an ebook using Ubuntu

My task today, as the title says, is to convert a paper book into a text-searchable ebook on Ubuntu (a Linux-based OS).

I'm using an HP G60 Intel Core2Duo using Ubuntu Jaunty 64-bit.

My plan is to:

1. scan the book (about 100 pages) using Xsane scanner with my Epson Stylus CX3100 (old machine that she is) into TIFF files at 300 dpi
2. crop pages so only one page is showing (rather than two pages side by side) so that I can use tesseract for OCR. I'm hoping to use a batch processing plugin on the GIMP for that.
3. I will need to remove any blemishes from each page, such as shadows and scribbles in the margins from previous readers. Also hoping a batch process will do the lion's share of this.
4. living in Cambodia, this is a copy of an original book, so the 'original' is not that great. I'm expecting some OCR errors because of that.
5. Then I will need to collate the OCR text into a text file, can I batch this somehow? Manual copy-and-paste will be really tedious.
6. Using the text file will import to Calibre to create an ebook of any file I want (pdb, prc, mobi, lib, you name it!)
7. save to SD card for reading (I currently use a Palm TX)

Let's see how I go and I'll let you know any further steps I needed to use.

Saturday 5 December 2009

Exercise Postponed

Sadly I was taken ill Thursday night and spent all of Saturday in a horizontal posture. Exercise will have to wait until I recover sufficient energy to attempt the ride to Kep.

Thursday 3 December 2009

Excerise

One of my (ongoing) goals is fitness. But is like so many goals that
gets pushed to the end of the list, and simply missed in the craziness
of life. Well, the move to Kampot is facilitating excercise - somewhat.