Distributed proofreaders

Thursday 13 May 2004

In one of the comments to my entry about Read Print from Tuesday, “Blues” suggested trying out PG Distributed Proofreaders, and I did. It’s a fascinating web artefact. They’ve solved the problem of how to accomplish the labor-intensive job of proofing and correcting the OCR scans of books.

The site is a web application for handing out units of work, and getting back results. They have over 11,700 people signed up to proof pages, and they are proofing 6200 pages a day. You sign up on the site, then log in to proof pages. You are presented with a scan of a page and the text as produced by the OCR software. Your job is simply to compare the two, and make corrections. Mostly it seems to come down to re-joining hyphenated words (why can’t OCR software do that itself?). All they ask it that you proof one page a day.

It’s a cool way to provide a little bit of labor for a noble cause: the dissemination of public domain information electronically.


Read Print also does the same thing... they have a bunch of volunteers who proof and format all the works. I have asked them earlier about this... they said they are currently working on a system whereby visitors would also be able to contribute.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.