In one of the comments to my entry about Read Print from Tuesday, “Blues” suggested trying out PG Distributed Proofreaders, and I did. It’s a fascinating web artefact. They’ve solved the problem of how to accomplish the labor-intensive job of proofing and correcting the OCR scans of books.
The site is a web application for handing out units of work, and getting back results. They have over 11,700 people signed up to proof pages, and they are proofing 6200 pages a day. You sign up on the site, then log in to proof pages. You are presented with a scan of a page and the text as produced by the OCR software. Your job is simply to compare the two, and make corrections. Mostly it seems to come down to re-joining hyphenated words (why can’t OCR software do that itself?). All they ask it that you proof one page a day.
It’s a cool way to provide a little bit of labor for a noble cause: the dissemination of public domain information electronically.