I pulled up the OCR project for the Book of Asaph the physician in Finereader 11 this lunchtime. It’s a 6th century Jewish medical text, which apparently contains interesting quotes from classical writers.
Readers may remember — I can hardly remember myself — that I was experimenting with deskewing the pages, increasing the brightness, etc, in order to improve OCR.
Pretty much the last thing that I did was to open the PDF and import it into FR11, without doing any work. I ran the OCR anyway, just to see what the raw result would look like.
The raw result is certainly better than some of the rubbish that I have had to clean up in the past. But it is far from simple. I think deskewing etc would be the answer. However there are 250 pages to do, one at a time. It might be a gentle task to do some time.