Amazon full text search

Thursday 23 October 2003This is 21 years old. Be careful.

Now Amazon lets you search the full text of its books. This is astounding, not only because of the further differences it highlights between Amazon and traditional bookstores, but because of the effort it must have taken to accomplish. The text seems to be from scans of pages, subjected to an OCR process. And not just the bulk of popular books, either. They’ve got all sorts of wild and wooly volumes available this way.

I don’t know how truly useful it will be, since full text searching can be extremely noisy, even before the OCR noise is factored in. Searching for “Ned Batchelder” (what else do people test searches first with?) found this:

received a preliminary version of the 2001 decommissfonfng study, which deferm(ned that EE wiff have to fund about $312 million

Still and all, a remarkable feature, in an overwhelming brute force kind of way.

Comments

[gravatar]
I saw that for the first time last night when I was listing a used tech book for sale. While I love the idea in concept, my first impression was "too much noise" when I was searching for the listing of a specific book. What should have been a very narrow list was greatly expanded due to text hits.

I have to explore this a little further...
[gravatar]
It's a good idea, at least in principle. I assume that Amazon will improve the execution as feedback comes in.

One thing that puzzles me about Amazon is why they don't facilitate linking to and from reviews. As it is, you can't link to a particular review, only to a dynamic review page, which is so much less useful (than a real, granular link) as to be almost worthless by comparison. Also, Amazon does not allow linking from reviews -- if you include a link it gets stripped from the published version of your review. This seems like foolish policy on Amazon's part. They could gain much by making linking easy.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.