Tuesday, 3 July 2012

AFTER MOORE, AMDAHL

Dick Pountain/15 July 2009 09:18/Idealog 180

 There's a lot of talk at the moment about seeing an end to "Moore's Law", and having ritually humiliated myself several times before by predicting such an end this stirs up some memories. In Byte back in 1997 I suggested that VLSI feature sizes (that is, the size of chip transistors) might plateau at around 0.1 micron, for various good reasons to do with lithography and wavelengths. That proved roughly right but missed the fact that ways were found to reduce wafer defects, so as wafer and die sizes crept up they could still stuff more widgets onto each chip. Then in this magazine back in 2005 I pointed out that Moore's Law is actually a "law" of economics rather than physics, representing Intel's will to build new fabs and its financial muscle in funding them.

Dramatic changes have occurred in the world since then, like the "Green Computing" imperative to reduce power consumption and the unprecedented meltdown of the US financial system, both of which have dented that will. The result has been a shift of emphasis toward slower, multi-core CPUs rather than faster single cores (power consumption rises as the square of clock-speed). Several of our Real World columnists have recently mentioned that chips from a year or two back on eBay have faster clocks than ones you can buy today, while David Fearon in a thoughtful column last month pointed up the inherent problems of fully exploiting multi-core processors, like those twin gremlins of Deadlock and Race Condition.

This really brought the memories flooding back, because for a decade or so I was Byte's specialist editor for anything to do with parallel processing, and in that capacity I wrote up all the major manufacturers of parallel supercomputers. Names that now feel almost as ancient and forgotten as Sopwith or Lagonda - firms like Alliant, NCube, Parsytec, Stardent, Sequent, Kendall Square Research, Thinking Machines, Tera and many more. All the clever people who worked in those firms knew then that you couldn't increase clock speeds for ever and that multiple processors were the ultimate way forward. But all those firms passed into oblivion because it never quite happened, because programming for real concurrency is just too difficult.

Programming parallel computers (which a four-core Intel CPU is, on a small scale) is inherently difficult for the reasons David outlined last month and more. When you have a single thread of execution computing is easy: start, do some work, then stop. Even in a supposedly multi-threading operating system like Windows the parallelism is fudged, because the CPU is actually just time-slicing the threads so that at bottom level all proceeds sequentially. But once you really have multiple processors executing separate programs - and the more so if they each have their own separate memory as was the case with the most radical parallel architectures - then who finishes before whom becomes a matter of critical importance. Guess wrong and you produce the wrong answer, but it's impossible to predict in advance. Synchronisation becomes the major, perhaps the only problem.

Clever solutions were developed, but none made it to the mainstream of commercial programming. One was Occam, the language Inmos developed for its Transputer (and for which I wrote the tutorial manual). This simple, elegant language stemmed from fundamental research by Oxford computing don Anthony Hoare into communicating sequential processes, which talked to each other over named channels and waited politely for each other to finish speaking. Another clever trick was "barrier synchronisation", which I explain using the metaphor of a peculiar sort of cross-country race, run across fields and hedges: the rule is to run as fast as you can across the grass, but whenever you reach a hedge you must stop and wait for everyone else to catch up. The "hedges" are special processor instructions that force all processes to pause, at points determined in the program code. Occam died along with the Transputer, while barrier synchronisation has not yet been implemented in commercial hardware to my knowledge, only in a few experimental C++ and Java compilers.

Once we find some acceptable way to synchronise all our concurrent processes, a more profound problem bites us on the bottom thanks to Amdahl's Law (which is far closer to a law of physics than Moore's). Parallel processing pioneer Gene Amdahl realised that applying more processors to a computation has no effect if that computation is inherently sequential, or put more formally, if the percentage of a program that's inherently sequential is S, the best speed-up you can hope for by running it on P processors is 100/(S+(100-S)/P). A half sequential program running on four cores will speed up just 1.6 times. 

To realise any speed up at all, multi-core CPU chips ought to increase communication bandwidth (I/O and inter-core) in strict proportion to the number of cores, but fatter interconnects consume more chip space so the clock-speed race turns into a bandwidth race. Worse still, balancing the traffic across those interconnects is just as crucial as synchronising concurrent computation, and just as difficult - no use having 64 parallel computations jabbering down the same two pipes. Watch out for the buzzphrase "cross-sectional bandwidth" to enter the jargon pool sometime soon...

No comments:

Post a Comment

TURNING THE AIR BLUE

Dick Pountain /Idealog 358/ 07 May 2024 01:32 In my back-room hardware morgue is a black cotton bag, about the size of Santa’s Sack, contain...