Tuesday 3 July 2012

GAME OF TAG

Dick Pountain/22 April 2008/12:35/Idealog 165

What a fantastic couple of years it's been for Google. From being merely the best search engine it's spurted ahead and elbowed Microsoft aside as the leader in personal computing innovation. I've been searching with Google for ever (actually since it overtook AltaVista and Inktomi, back when dinosaurs roamed) but recently I've started using Googlemail too, routing email from my other accounts through its web interface.

I didn't really like the Googlemail interface at first - presenting everything in one big list felt like a step backwards from my personalised mailboxes in Ameol - but once I grasped filters and labels I was able to build a far more sophisticated filing system for my mail, because a single message can have several labels and thus belong to several "virtual mailboxes" rather than just one. (As an aside, does anyone remember when the next Windows filing system was supposed to introduce a similar capability. Doh.)

This principle of labelling or tagging things is becoming enormously important nowadays. Web 2 applications make much use of tagging as an aid to navigation: for example two sites I use a lot, Flickr and LibraryThing, both actively encourage you to tag all your pictures/books in ways that accurately describe their contents. I've found that sensible tagging on Flickr can greatly increase the views your pictures attract, and the "tag cloud" (also employed by LibraryThing) is perhaps the only really useful innovation in user-interface design for years. For those unfamiliar with it, you get a screenful of the tags that you've applied, arranged as a random cloud in a typeface whose size is proportional to the frequency with which that tag occurs. At a glance you can see which topics are most popular, and a single click on a tag brings up all the pictures labelled with it.

A similar concept of tagging lies behind Tim Berner's next big idea, the Semantic Web, and with the increasing use of XML as an output format that looks as if it might finally become a reality. In brief the idea involves tagging documents and web pages to indicate their subject matter (for example this column might be tagged <computers><ideas><tags>) using XML-based markup languages, so that documents can be searched for by content. It hasn't taken off so far because the web remains stubbornly wedded - for wholy understandable economic reasons - to plain old fixed-tag HTML. For all the sniping from the Anything-But-Microsoft brigade, the ISO standardisation of Microsoft's OOXML file formats can only be a good thing by accelerating the spread of XML, and until XML becomes ubiquituous a semantic web can't deliver its promised benefits.

But in any case there's a further question hanging over the semantic web idea, which is, who does the work of tagging? It's all very well having web browsers that understand XML and can peer inside files to read the content tags, but who will put those content tags there in the first place? The answer has to be the original creator, and it has to be made so easy that it's a no-brainer to do it rather than skip it. Writing XML code certainly doesn't fall into that category. All tagging systems are also plagued by mispelling and ambiguity. If I invent a tag "Poggioni", then either the parsing system needs to be smart enough to guess that "poggioni" and even "Pogionni" were both probably intended to refer to the same tag, or else the software needed to present me with a menu of my own tags to select from rather than (mis)typing them every time. Unless one of these conditions pertains you're bound to get orphaned data, not retrievable solely because of misspelled tags. Flickr has this quite well sorted because it invites you to tag your pictures at every stage during their uploading. It only falls down in one respect: on the Flickr page for each individual picture the "Add a tag" dialog does indeed offer a link called "Choose from your tags" but this ability is *not* offered when you first add tags during upload, which is where it's needed. 

There's been a lot of fuss recently about Amazon's Mechanical Turk service with its notion of harnessing the "wisdom of crowds" to solve problems, and this tagging business is a closely related phenomenon. It too delegates and distributes the task of recognition and classification to millions of individual human document creators rather than trying to find an algorithmic solution that a computer can execute: that's both a welcome acceptance of the limits of computability and a sensible allocation of resources. What we need is a much better tagging user interface for all documents.

You can tag Microsoft Office documents at creation time inside the relevant application, and add titles and comments later in Explorer by using the Summary tab under Properties. However most users don't know this tab exists, it's a pain to bring up and unlikely to be used, and it doesn't present your own tags to avoid misspelling. It shouldn't be too hard to write a pop-up tagger applet that works everywhere, in Office, Windows Explorer, your browser, even the Google command line, but it would need some universally understood tagging format for files. At the moment the filename is the only universal site and that's ugly...

No comments:

Post a Comment

SOCIAL UNEASE

Dick Pountain /Idealog 350/ 07 Sep 2023 10:58 Ten years ago this column might have listed a handful of online apps that assist my everyday...