Sunday, 1 July 2012

ANYONE FOR OOHTMQL?

Dick Pountain\04 September 1996\12:10 pm\Idealog 25

Last week I made from my annual trip to the north of Scotland (the purpose of which is to refresh my memory of what a horizon looks like) but on returning to work on Tuesday morning I discovered that my desktop PC wouldn't boot. The reason turned out to be total motor failure in the C: drive - not a lot you can do about that. The simplest solution, ie. to swap over the D: drive, was precluded because I'd formatted it without a system track, an imbecility now noted for future reference. Anyway to cut the story short, all my data was saved thanks to Pereos tape backups, but rather than rebuild an identical C: drive I decided that this was God's way of telling me that now was the time to upgrade to Windows 95, which I duly did.

The only casualty in this whole mishap was my Ameol mail database, which now had a 3-week hole in the middle of it (due to holidays I hadn't backed up since the beginning of August.) After some helpful advice from Liam P I repaired even that, but the process of doing so triggered off a whole train of thought. Leaving aside the question of the robustness or otherwise of Ameol's database design, the only reason I'd had this problem was that all my mail was residing on my PC - if it had all been left on CIX's hard disks there would have been no problem. That in turn forced me to think about the philosophical problems of distributed data, about clients and servers and centres and peripheries, and in particular about replication which Real World authors Silver and Cassidy have been making much of recently. 

The world of computing as it exists today pivots around three major technology strands, but they appear to have some quite fundamental incompatibilities - these three pillars of wisdom are the Relational Database, Object Orientation, and the World Wide Web (that is HTML+URLs).

The Relational Database as expounded by Edgar Codd back in 1969 encapsulated a brilliant insight, namely that the only way to keep data coherent is to avoid duplicating it. If you keep a copy of your home address in various different applications, then when you move house you have to remember to update all of these different copies; forget just one and you have an inconsistency (now extend this example to national census data or gas bills.) Codd's message was to keep just one copy and let all other applications access it. Of course there is much more than this to Codd's 12 Rules of database design, but at root it's a technique for untangling complex data relationships into sets of tables in which no data relation is duplicated - the data is totally independent of any of the programs that access it, and you can combine the data into any number of different views without having to alter the database structure. The actual implementation of Codd's idea turned out to be rather difficult, and so only recently have relational databases (and their query language SQL) become a staple in the PC, as opposed to mainframe and mini computer worlds.

Object Orientation tries to mimic the structure of the real world, which is full of things that do stuff - objects contain both data which describes their properties and code which describes their behaviour. To build an object oriented data base for a gas company, represent each customer by an object which contains their name, address, how much they owe you etc. Trouble is this immediately conflicts with the relational paradigm which wants to keep all the names in one table, addresses in another, etc. Object databases don't prevent duplication of data, and what's more they mix the code that accesses the data (ie. the methods) into the database itself. An object database is great for storing a description of a Boeing 747, which is made of wings and fuselage, which are made of spars and ribs, which are etc., but it is far less good than a relational database at answering "show me everyone in Leeds, under 30, who owes more than £100".

The third leg of this tripod, the WWW, is a very clever solution for publishing documents via a worldwide network. Its decentralised, tree-like structure makes it very democratic and quite good at sharing out the demands on limited network bandwidth, but also makes it very difficult to index (check out Alta Vista's horsepower.) You can use tools like Microsoft's dbWeb to publish live information from a database onto Web pages, but for the moment the Web is mostly a read-only medium. Which is just as well because the Web positively encourages data duplication - the only way it keeps running at all is because popular sites get mirrored to other places on the Web.

What we need is clearly a synthesis of these three technologies, something along the lines of objects that contain not data, but URLs that point to tables in a remote RDBMS. You could allow replication of such objects (so the gas company could borrow 'me' whenever they need to calculate my bill) the way Lotus Notes does, guaranteeing that all updates percolate back to the original. This scheme, which I'm proposing to call OOHTMGL (Object Oriented Hyper-Text Markup and Query Language) might overtax today's Internet fabric somewhat, so I may have to wait for an ATM backbone before I get to rule the world. It may also be the case that the Object Management Group is already doing something equivalent, but I spent the weekend trying to read the CORBA 2.0 specification and frankly I'm none the wiser - they may be designing a new kind of expresso machine and a Mars Probe as well. 



No comments:

Post a Comment

POD PEOPLE

Dick Pountain /Idealog 366/ 05 Jan 2025 03:05 It’s January, when columnists feel obliged to reflect on the past year and who am I to refuse,...