Tuesday 6 January 2004

I don’t know what WebFountain is, and after reading IBM’s page about it, I still don’t. Used to be you could surf the research group web pages and get some hard technical details. Now they’ve got slathered-on market-speak just like the rest of the company. Here’s the opening paragraph from the research page:

As the Web continues to grow, the knowledge gap between a company and the current events that could affect it will expand. Consequently, this knowledge gap can translate into lost opportunities, the inability to take proactive measures or missed revenue. It is becoming imperative that businesses minimize the gap to enable accurate and timely decision making to maintain a company’s effectiveness.

The rest of the page isn’t much better. It seems to have something to do with analyzing text. There are other publications linked from there, maybe some of them explain it better.


Wow. I read it, and then I read it again more carefully, and I am still none the wiser. What is the point of documentation that doesn't even say what the darn thing is?
I came across this the other day when it was touted as a possible Google competitor. When I looked at the page and there was no content, I didn't even bother linking to it.

I hate when technology products are hidden behind marketing-speak. For example, when it was first being introduced it was hard to figure out what .NET even was.
I like to call such documents "leveraged content", since the words "leverage" and "content" often show up in market-speak. The presence of such words often indicate that absolutely nothing is going to be described in any understandable way.
While I agree that the IBM page is a bit of blather, the press has done a better job of explaining what this is about. I found this in the Press section: http://www.infotoday.com/newsbreaks/nb030922-1.shtml including "WebFountain is a Web-scale mining and discovery platform that extracts trends, patterns, and relationships from massive amounts of unstructured and semi-structured text".

Should this info be on the IBM page? Probably. But maybe they are making a point about the difficulty of mining text for facts by making the user experience it.

Or maybe I am just giving them too much credit.
Michael Maggard 12:20 AM on 9 Jan 2004
I actually read through the whole bit and parsed out what it seems to be (pretty exciting stuff):

Stage 1 is a standard web spider.

Stage 2 breaks down the spidered pages for metadata. The analysis is done by systems tuned for various subjects like locations, pharmaceuticals, etc.

Stage 3 then pumps the now automagically marked up material into standard data warehousing systems where paying clients can spelunk for information.

Basically in goes the chaotic mass of 'the web' and out comes presumably neatly taxonified and indexed data ready for analysis.

I can see this beating the pants off of Google & ilk or even specialized search engines for fields it's tuned for and in the hands of folks who know how to use it. Indeed I'll bet in a year or three every analyst & researcher will be drooling for this or it's like: pre-digested web. It'll be required like having a Bloomberg terminal or Westlaw account for knowledge workers.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.