Idealog Columns: JUST HOW SMART?

Dick Pountain/15 March 2011 14:05/Idealog 200

The PR boost that IBM gleaned from winning the US quiz show Jeopardy, against two expert human opponents, couldn't have come at a better time. We in the PC business barely register the company's existence since it stopped making PCs and flogged off its laptop business to Lenovo a few years ago, while the public knows it only from those completely incomprehensible black-and-blue TV adverts. But just how smart is Watson, the massively parallel Power 7-based supercomputer that won this famous victory?

It's probably smart enough to pass a restricted version of the Turing Test. Slightly reconfigure the Jeopardy scenario so a human proxy delivers Watson's answers and it's unlikely that anyone would tell the difference. Certainly Jeopardy is a very constrained linguistic environment compared to free-form conversation, but the most impressive aspect of Watson's performance was its natural language skill. Jeopardy involves guessing what the question was when supplied with the answer, which may contain puns, jokes and other forms of word play that tax average human beings (otherwise Jeopardy would be easy). For example a question "what clothing a young girl might wear on an operatic ship" has the answer "a pinafore", the connection - which Watson found - being Gilbert and Sullivan's opera H.M.S. Pinafore.

Now I'm a hardened sceptic about "strong" AI claims concerning the reasoning power of computers. It's quite clear that Watson doesn't "understand" either that question or that answer the way that we do, but its performance impressed in several respects. Firstly its natural language processing (NLP) powers go way beyond any previous demonstration: it wasn't merely parsing questions into nouns, verbs and so on, and their grammatical relationships, but also some *semantic* relationships which it used as entry points into vast trees of related objects (called ontologies in NLP jargon) to create numerous diverging paths to explore in looking for answers. Secondly it determined its confidence in the various combinations generated by such exploration using probabilistic algorithms. And thirdly it did all this within the three seconds allowed in Jeopardy.

Watson retrieves data from a vast unstructured text database that contains huge quantities of general knowledge info as well as details of all previous Jeopardy games to provide clues to the sort of word-games it employs. 90 rack-mounted IBM servers, running 2,880 Power 7 cores, present over 15 Terabytes of text equivalent to 200,000,000 pages (all stored locally for fairness, since the human contestants weren't allowed Google) and all accessible at 500GB/sec. This retrieval process is managed by two open-source software frameworks. The first, developed by IBM and donated to the Apache project is UIMA (Unstructured Information Management Architecture) which sets up multiple tasks called annotators to analyse pieces of text, create assertions about them and assign probabilities to these assertions.

The second is Hadoop, which Ian Wrigley has covered recently in our Real World Open Source column. This massively parallel distributed database - employed by the likes of Google and Amazon - is used to place the annotators onto Watson's 2,880 processors in an optimal way so that work on each line of inquiry happens close to its relevant data. In effect the UIMA/Hadoop combination tags just those parts of this vast knowledge base that might be relevant to a particular query, on the fly. This may not be the way our brains work (we know rather little about low-level memory mechanisms) but it's quite like the way that we work on our computers via Google: search for a couple of keywords, click further links in the top listed documents to build an ad hoc path through the vast ocean of data.

Web optimists describe this as using Google as an extension of our brains, while web pessimists like Nicholas Carr see it as an insidious process of intellectual decay. In his book "The Shallows: How the Internet is Changing the Way We Think, Read and Remember", Carr suggests that the phenomenon called neuroplasticity allows excessive net surfing to remodel our brain structure, reducing our attention span and destroying our capacity for deep reading. On the other hand some people think this is a good thing. David Brooks in the New York Times said that "I had thought that the magic of the information age was that it allows us to know more, but then I realised that the magic is... that it allows us to know less". You don't need to remember it because you can always Google it.

It's unlikely such disputes can be resolved by experimental evidence because everything we do may remodel our brain: cabbies who've taken "the knowledge" have enlarged hippocampuses while pianists have enlarged areas of cortex devoted to the fingers. Is a paddle in the pool of Google more or less useful/pleasurable to you than a deep-dive into Heidegger's "Being and Time"? Jim Holt, reviewing Carr's book in the London Review of Books, came to a more nuanced conclusion. The web isn't making us less intelligent, nor is it making us less happy, but it might make us less creative. That would be because creativity arises through the sorts of illogical and accidental connection (short circuits if you like) that just don't happen in the stepwise semantic chains of either a Google search or a Watson lookup. In fact, the sort of imaginative leaps we expect from a Holmes rather than a Watson...

Idealog Columns

Tuesday, 3 July 2012

JUST HOW SMART?

No comments:

Post a Comment

ARTY FACTS

Search This Blog