Monday, 2 July 2012

SEMANTICS, SHMANTICS

Dick Pountain/Wed 15 January 2003/2:56 pm/Idealog 102

In a recent discussion on Cix, PC Pro's editor James Morris has been making a valiant attempt to head off the next egregious buzz-phrase that looks likely to be inflicted upon us, namely the 'semantic web'. I applaud his effort but don't give a lot for its chance of success, for a buzz-phrase whose time has come is like a (biological) virus, quite unstoppable by any medicine known to humankind.

What happens is that some legitimately-coined technical term (for example 'object-oriented' or 'expert system') passes from the heads of the engineers or programmers who willed it to exist, into the heads of marketing and advertising people. This is the incubation phase during which the term undergoes a gruesome metamorphosis: first of all it is only half-understood, and then it's further diluted, just for good measure and to avoid frightening the horses. For example the term 'object-oriented' comes to mean 'little pictures on the screen called i-c-o-n-s'. The buzz-phrase is now ready to be deployed by applying it every new product imaginable, from software through egg timers to duvet covers.  

I'm not claiming the term 'semantic web' has no meaning in W3C documents, or when Tim Berners-Lee wrote about it in Scientific American a couple of years ago. Its original meaning is 'data on the web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications'. What James' sensitive nose has detected is the first whiff of buzzification, an indication that very soon every dumb-ass utility that can read an XML tag is going to be calling itself a semantic web product.

I can't help worrying that W3C might have been unwise to mess with the word 'semantic' in the first place, might have been better off with some slightly less ambitious term. Semantics and computers have never mixed well, a rant I got off my chest back in Idealog 98. Computers are good for manipulating strings of symbols but semantics is about the meaning of those symbol strings to human beings. Symbol strings have no 'meaning' to the computer, and there's little to suggest so far that we'll ever be able, either via hardware or software, to make a computer for which they will have meaning. Such a computer would have in effect to be a human being - the phantasy that lies at the heart of all sci-fi android guff - but we've known how to make human beings for 200,000-odd years (hint: it's more fun than computing).

The best we can do is try to encode the meaning some symbol string has for us into the string itself, then get the computer to manipulate it in its usual dumb-but-rapid way. That's what XML enables us to do, and that presumably is the shaky rationale for W3C's adoption of the s-word. For example if I write the string 'Dick Pountain PC Pro Editorial', to the computer that's just a 30-byte character string. If I write the string <person>Dick Pountain</person><magazine>PC Pro</magazine><department>Editorial</department> that's still just a rather longer string of characters, but if I go further and supply definitions for the tags <person>, <magazine> and <department> then I've enabled the computer to differentiate this, and similar, strings into three separate kinds of information. I've injected a little meaning into the string, so that software could now search for the persons in a file and distinguish them from the magazines.

However you might have noticed that this doesn't come for free - not only has the size of the data increased, but those definitions of <person> and <magazine> are just strings too. They could be decomposed and refined further and further in classic hierarchical fashion, but you have to stop somewhere or face infinite regress and an exponential explosion in data size. This won't be a problem in real-world commercial applications as even such a tiny amount of extra meaning can be very useful indeed (which is why XML is a wholly Good Thing) but to label this as semantics verges on hubris.

The thing is we've been here before, in the Artificial Intelligence booms of the 1960s, 70s and 80s. At one time people were optimistic about 'semantic nets' until it turned out that they got too tangled too fast. Then it was rule-based systems, until it turned out that after some point you could no longer know whether your rules were complete or consistent. Are you beginning to see a pattern here? Think Godel, think Turing, think thermodynamics (laws of). Finally it was Case-Based Reasoning, which had the advantage that it worked, because it just employs the computer to organise chunks of human reasoning.

Don't get me wrong, I'm a huge fan of smart mark-up languages and have studied several from other domains. For example there's the excellent DELTA (DEscription Language for TAxonomy) designed for use by non-computer specialists in the biological sciences, invented at CSIRO (Commonwealth Scientific and Industrial Research Organisation) in Canberra Australia. Or again, there's IconClass, a clever notation for codifying the subject matter of paintings, devised at the University of Leiden in Holland. Or even the Dewey library classification. What they all have in common is that human beings put in the required amount of meaning and the computer just gets to shovel it from place to place, which is exactly as it should be.

No comments:

Post a Comment

TURNING THE AIR BLUE

Dick Pountain /Idealog 358/ 07 May 2024 01:32 In my back-room hardware morgue is a black cotton bag, about the size of Santa’s Sack, contain...