My task today, as the title says, is to convert a paper book into a text-searchable ebook on Ubuntu (a Linux-based OS).
I'm using an HP G60 Intel Core2Duo using Ubuntu Jaunty 64-bit.
My plan is to:
1. scan the book (about 100 pages) using Xsane scanner with my Epson Stylus CX3100 (old machine that she is) into TIFF files at 300 dpi
2. crop pages so only one page is showing (rather than two pages side by side) so that I can use tesseract for OCR. I'm hoping to use a batch processing plugin on the GIMP for that.
3. I will need to remove any blemishes from each page, such as shadows and scribbles in the margins from previous readers. Also hoping a batch process will do the lion's share of this.
4. living in Cambodia, this is a copy of an original book, so the 'original' is not that great. I'm expecting some OCR errors because of that.
5. Then I will need to collate the OCR text into a text file, can I batch this somehow? Manual copy-and-paste will be really tedious.
6. Using the text file will import to Calibre to create an ebook of any file I want (pdb, prc, mobi, lib, you name it!)
7. save to SD card for reading (I currently use a Palm TX)
Let's see how I go and I'll let you know any further steps I needed to use.
1 comment:
شركة تعقيم في عجمان
شركات تعقيم المنازل من كورونا عجمان
Post a Comment