Elemental Parallelism: November 2012

I recently returned from the SC12 (Supercomputing 2012) conference.

As last year, GPUs were a big topic of discussion, but this time, much of the novelty, magic, and infatuation seemed to have worn off, leaving more practical "how to" questions and concerns. The other big talk of the conference (for many of us) was "exascale", the challenge being mounted by the government (DOE/NNSA) to build an exascale computer by 2020 or so.

There are a few major challenges here, some which may be downright impossible to meet. The goal is not just to put together something that can theoretically deliver an exaflops -- that is 1,000,000,000,000,000,000 flops (floating point operations per second), a thousand petaflops, a million teraflops, etc. -- on some theoretical application. It is also to be able to power it ("energy" goal, sic), program it, keep it running ("resilience" goal), and ensure that applications/hardware/OS all work together ("co-design" goal). In other words, it must be (in some sense) practical, to someone.

Even just attaining the energy goal -- to power the machine with 20MW -- seems nearly impossible, in that it must be 20-50 times more efficient per flop than what we've got now. Bill Dally (now of NVIDIA) argued (in an exhibit talk) how we/they could possibly reach that goal by continuing current technology trends, but even then, only under some pretty optimistic assumptions. He illustrated that the power drain on current computers is not so much in the arithmetic units, but the data movement, especially to and from memory, but also just to register files, etc. In his own keynote, William Harrod of DOE publicly doubted Dally's reasoning that evolutionary approaches will be sufficient, and it seems clear that even if we can get Dally's predicted performance efficiency in the raw hardware, he didn't adequately factor in the overhead for resilience or runtime support to exploit parallelism (e.g. to enforce runtime dependences).

Harrod's keynote was also interesting in its honesty, that there was probably no market for this machine other than the US government. For that reason, he said that the government would have to invest big bucks, but he seemed unsure of how much, or how much support there was with this congress and in this economic climate to invest/grant it. Still, he suggested that such a machine had commercial drivers: For example, if you take 1/1000 of the machine, you could put a petaflops in a closet, which many businesses might find useful. I would make a few observations here:

Harrod's claim that there will be no market seems eerily reminiscent of T. J. Watson's quote of 1958, "I think there is a world market for about five computers." But perhaps Harrod wasn't claiming there would never be a market, just that market forces would not be enough to drive development of this computer on the government's desired timeline.
There are market forces for some of the goals, but not all of them. For example, meeting the energy and co-design goals will help even with a smaller (1 petaflops) systems. But resilience, for example, will be wasted on a smaller system. If an N-machine system (e.g. a system of 1000 1-petaflops machines) is considered to fail when any one of the machines fails, then to reach any particular reliability (say less than f probability of failure within 1 time unit), the N machines must individually have a much higher reliability independently (less than 1-(1-f)^N probability of failure in that same time unit). Perhaps the government grants should focus primarily on these unique factors which are unlikely to pay off for anybody but the government.
I was also somewhat disappointed by where and how government funding was intended (and was apparently already, in the Fast Forward program) to be invested, primarily in large companies and universities, but I am willing to accept that I misunderstood, or that this could change over time.

I will post more on this topic, primarily because some information from BOFs (Birds of a Feather meetings) at the conference suggests that groups currently addressing these exaflops goals aren't fully understanding the challenges before them, and are therefore failing to react adequately, in my opinion, hanging onto failed and outdated approaches in hopes that exaflops will look much like current platforms. Perhaps needless to say, I also believe that approaches outlined in the my Scalable Planning book (e.g. ScalPL) do directly address these challenges, even more than I suggested previously.

Elemental Parallelism

Friday, November 23, 2012

SC12: Exascale holy grail?