Elemental Parallelism: 2012

Friday, November 23, 2012

SC12: Exascale holy grail?

I recently returned from the SC12 (Supercomputing 2012) conference.

As last year, GPUs were a big topic of discussion, but this time, much of the novelty, magic, and infatuation seemed to have worn off, leaving more practical "how to" questions and concerns. The other big talk of the conference (for many of us) was "exascale", the challenge being mounted by the government (DOE/NNSA) to build an exascale computer by 2020 or so.

There are a few major challenges here, some which may be downright impossible to meet. The goal is not just to put together something that can theoretically deliver an exaflops -- that is 1,000,000,000,000,000,000 flops (floating point operations per second), a thousand petaflops, a million teraflops, etc. -- on some theoretical application. It is also to be able to power it ("energy" goal, sic), program it, keep it running ("resilience" goal), and ensure that applications/hardware/OS all work together ("co-design" goal). In other words, it must be (in some sense) practical, to someone.

Even just attaining the energy goal -- to power the machine with 20MW -- seems nearly impossible, in that it must be 20-50 times more efficient per flop than what we've got now. Bill Dally (now of NVIDIA) argued (in an exhibit talk) how we/they could possibly reach that goal by continuing current technology trends, but even then, only under some pretty optimistic assumptions. He illustrated that the power drain on current computers is not so much in the arithmetic units, but the data movement, especially to and from memory, but also just to register files, etc. In his own keynote, William Harrod of DOE publicly doubted Dally's reasoning that evolutionary approaches will be sufficient, and it seems clear that even if we can get Dally's predicted performance efficiency in the raw hardware, he didn't adequately factor in the overhead for resilience or runtime support to exploit parallelism (e.g. to enforce runtime dependences).

Harrod's keynote was also interesting in its honesty, that there was probably no market for this machine other than the US government. For that reason, he said that the government would have to invest big bucks, but he seemed unsure of how much, or how much support there was with this congress and in this economic climate to invest/grant it. Still, he suggested that such a machine had commercial drivers: For example, if you take 1/1000 of the machine, you could put a petaflops in a closet, which many businesses might find useful. I would make a few observations here:

Harrod's claim that there will be no market seems eerily reminiscent of T. J. Watson's quote of 1958, "I think there is a world market for about five computers." But perhaps Harrod wasn't claiming there would never be a market, just that market forces would not be enough to drive development of this computer on the government's desired timeline.
There are market forces for some of the goals, but not all of them. For example, meeting the energy and co-design goals will help even with a smaller (1 petaflops) systems. But resilience, for example, will be wasted on a smaller system. If an N-machine system (e.g. a system of 1000 1-petaflops machines) is considered to fail when any one of the machines fails, then to reach any particular reliability (say less than f probability of failure within 1 time unit), the N machines must individually have a much higher reliability independently (less than 1-(1-f)^N probability of failure in that same time unit). Perhaps the government grants should focus primarily on these unique factors which are unlikely to pay off for anybody but the government.
I was also somewhat disappointed by where and how government funding was intended (and was apparently already, in the Fast Forward program) to be invested, primarily in large companies and universities, but I am willing to accept that I misunderstood, or that this could change over time.

I will post more on this topic, primarily because some information from BOFs (Birds of a Feather meetings) at the conference suggests that groups currently addressing these exaflops goals aren't fully understanding the challenges before them, and are therefore failing to react adequately, in my opinion, hanging onto failed and outdated approaches in hopes that exaflops will look much like current platforms. Perhaps needless to say, I also believe that approaches outlined in the my Scalable Planning book (e.g. ScalPL) do directly address these challenges, even more than I suggested previously.

Friday, October 19, 2012

Actors vs ScalPL tactics: Hidden state vs WYSIWYG

Hewitt, Meijer and Szyperski: The Actor Model (everything you wanted to know, but were afraid to ask)

I stumbled into a video via Google+ the other day, an informal discussion with Carl Hewitt explaining "The Actor Model (everything you wanted to know, but were afraid to ask)". This seemed a good opportunity to refresh my memory from when I looked in the late 80s, to catch up with any newer changes/additions, and to compare Actors with F-Nets. I also consulted the Wikipedia entry, and a paper which Dr. Hewitt recommended in the comments section. (I did not consult much of the more extensive written documentation, like Hewitt's and/or Agha's books, so I do not pretend to be fully up on all aspects of this work.)

The basis of the abstract model is that an actor, upon receiving a message, can do three things: Create more actors, send messages to actors it knows, and designate what it (the actor with this address) is to do with the next message it receives. And, from what I could ascertain, an actor knows the addresses of other actors it creates, as well as any addresses that it receives in messages, plus (presumably?) its own address.

From my own work, it seems clear what at least some of the intent is: Abstractly, to make immaterial the order in which the actor performs any/all of those activity components (creations, sends, and new behavior specifications). That's a worthwhile benefit, and leads to (some) separation of specification from implementation: Any implementation which does those same things in response to the same messages, regardless of the order or timing of those things relative to one another, is effectively equivalent in a system, and conversely, an actor can be considered as a function from the history of all the messages it has received so far to the set of new actors created and new messages sent as a result of the most recent message. (The "new behavior" component is not immediately observable: It just further defines that function -- i.e. the new actors and messages it should produce when given even more message history.) And because the actor's receipt of a message is essentially atomic, and the actor's behavior in response to that message is functional, that entire "receipt+sends+creations" collection can also be considered abstractly as one atomic action.

Those are nice attributes. But they pale in comparison to what ScalPL (and the F-Nets model upon which it is based) offers along similar lines. Perhaps that shouldn't be surprising: Actors predates ScalPL, and apparently even its ancestors. In ScalPL, an actor would correspond roughly to a plan, and a plan, in turn, is either a strategy or a tactic. For the remainder of this entry, I'll primarily limit my discussion to tactics.

A ScalPL tactic can be made to react to any number of specific resources becoming ready (e.g. containing "input"): There's no advantage to making that exactly one (like the actor model, considering the receipt of a message as being equivalent to a resource becoming available), but no harm in making it one, either. Like an actor, a tactic can also be specified as a function, but unlike an actor, not (generally) a function of the history of messages it has received; instead, a function of the data ("input messages" in actor-speak) it's observing right now, specifying the results ("output messages") it will produce right now as a response. In other words, it's pretty much just your normal everyday function, with no hidden state. In fact, the data a tactic observes (i.e. its inputs) and that it produces (its outputs) can be (all or part) on the very same resources (think "variables" or "files"), so the specification of an input-to-output transformation for a particular resource can just be a specification of the changes (i.e. updates) that need to be performed to the contents of that resource. That should look very familiar: It's the way ordinary imperative programs generally deal with memory, observing (reading) and updating (writing) all or part of it in place, instead of sending messages around from place to place. (The semantics of inter-tactic communication in ScalPL/F-Nets is carefully designed to be easily and efficiently implementable as messages, too, if the situation calls for it, but that is a detail generally unimportant to the planner/programmer.)

Of course, if one wants to make a tactic with behavior that (like an actor) depends upon its history as well as its new/current input(s), it's easy to do: Just consider/use one or more resources as the salient parts of its history, and have the tactic observe and update those resources as it does others containing inputs and outputs. (No, it would not really be equivalent to restricting actors to also be functional, and just constantly sending their history back to themselves in messages: Actor messages have different, heavier semantics than ScalPL resource updates.)

To bring this home, consider you have a person (or even a robot), and you know that its behavior in response to its environment at any one time is a combination of its nature (how it was built) and nurture (everything that's ever happened to it in the past). Even if you know how it began (i.e its nature), and what its current environment is like, unless you also know its nurture (entire history), you have little idea how it will currently respond to its environment. That's the actor case -- or, for that matter, most imperative programming in general, as I hope to address later. In the tactic case, its nature does not change and there is no nurture component: If you know what the tactic was (and therefore still is), and what its environment is, you know how it will behave. There's no hidden state (e.g. history), it's very WYSIWYG (What You See Is What You Get).

As for creating other tactics (the way an actor creates other actors): A ScalPL tactic does not create other tactics first-hand, for some of the same WYSIWYG reasons that tactics don't depend on history. That is, all relationships in ScalPL are intended to be illustrated/visualized/specified in its visual representation, rather than hidden inside of one or more tactics. So if a tactic is meant to become active at some point, it and its (eventual) relationship to other tactics and resources is specified (graphically) from the beginning. The effect of creating (activating) a tactic results (again, in a clear diagrammatic fashion) from another tactic altering a resource they have in common, thereby awakening the until-then quiescent tactic. (It the potential relationships aren't known from the beginning, there are ways to create those relationships, too, within strategies, just as there are ways to hide history within one, but that's for another post.)

I can (and probably will) discuss the Actors video further in this blog, but the main thing I hope you get from this specific post is that ScalPL incorporates a WYSIWYG approach to (a) how a plan will behave now in response to its current environment, (b) where it might get its inputs from, (c) where its results might be felt, and (d) what other plans might be activated as a result. Unlike Actors, ScalPL does not hide information like the above within a the tactic (e.g. in terms of its accumulated state over history, including the addresses that it knows or uses), is not limited to a message model, and it is not limited to one input per tactic activation.

Tuesday, October 16, 2012

Video summary of the Scalable Planning book (intro chapter)

A 13-minute description of the book (summarizing the first chapter).

From near the end...

"The Scalable Planning book isn't your typical parallel programming textbook, but it is built with that use in mind. If you don't know anything about programming, that's OK: It introduces all terminology, and avoids too much lingo. But if you are using it to teach or to self-learn parallel programming, you should ask yourself: Should concurrency concepts be taught the same way they were 30 years ago? Programming curricula these days rarely start with machine language and assembler: Should today's parallel programming curricula be saddled with their analogs, message passing, semaphores, and locking? Here, my answers are no: There are chapters to cover the lower-level mechanisms if you want to go there, but the focus here is on higher-level concepts, on constructing correct, portable, understandable, efficient programs (plans), leaving the low-level details to others. As a result, it is organized much like other sequential programming texts, with chapters on topics like structured programming, object-oriented principles, arrays and dynamic resource allocation, and formal methods, but in a concurrent context."

Sunday, September 23, 2012

ScalPL for exaFLOpS

I stumbled into this interesting video, "Jack Dongarra: On the Future of High Performance Computing", from the recent SPEEDUP Workshop at ETH Zurich. I listen to Jack at almost every chance I get, and if you want to know where the very highest performing machines are going in the next 20 years, he tells you here. I highly recommend watching the whole talk.

On slide 15 (starting at about 30:35 in), he lays out some of the critical issues he sees for peta and exascale computing. At risk of copyright infringement, I'm going to list his six bullets here:

Synchronization-reducing algorithms
Communication-reducing algorithms
Mixed Precision methods
Autotuning
Fault resilient algorithms
Reproducibility of results

ScalPL (Scalable Planning Language, explained in this blog and the new book) addresses five of these six points. (The one it doesn't address: "Mixed Precision methods". After hearing him speak on that topic last year at SC11, it looks too closely aligned with the algorithm itself and numerical analysis/methods work to benefit much from within the runtime system.)

To be fair, it appears that Dr. Dongarra is often referring to algorithm development to address many of these issues, but tools and runtime can offer significant leverage. For example, virtually all of ScalPL is centered around expressing the algorithm in a platform-independent format so that synchronization and communication can be dynamically optimized and reduced to the minimum required by the algorithm and platform themselves. That addresses the first two points. For the autotuning point, (1) "actons" (processes, threads) within ScalPL are functional, and can therefore be independently profiled and modeled to predict their behavior for optimal scheduling, and (2) the scheduling of those actons in a "strategy" (network) can be traced/instrumented efficiently (after static analysis) to help such analyses.

ScalPL really kicks in for the last two points. It can theoretically help with fault detection (e.g. by comparing results of duplicate executions), but that aspect will likely be more effectively addressed via hardware. However, when faults are detected, ScalPL provides (through a technique called supplemental re-execution in the book) a means of preserving the work that has been done, with no global checkpoint/restart, and limited data redundancy to emulate safe storage (for resource contents). And as for reproducibility of results, the book contains an entire chapter examining the implications of determinism and ways to guarantee it, even categorizing types of nondeterminism.

Later in this same talk, at about 35:30, Dongarra talks some about using a DAG (directed acyclic graph) of dependences between tasks to dynamically schedule the larger problems. This is specifically what ScalPL is made for. (The book generally just refers to the DAGs as "computations".) In fact, I mentioned in a recent blog post that even homogeneous machines can benefit from dynamic dataflow scheduling instead of static synchronized "loop at a time" scheduling. (I was going to call it "fork-join parallelism" in that post, as Jack does in his talk, but web references to that term often confuse it with a form of recursion.) Dongarra's talk here illustrates exactly what I was talking about.

(I might mention that Dongarra and I have been on parallel paths on these issues for quite awhile. Back in the late 80s, he and Danny Sorensen were working on a tool called SCHEDULE to facilitate such DAG-related scheduling, while Robbie Babb and I were working on LGDF/Large Grain Data Flow to help automatically generate such DAGs in an architecture-independent fashion. Both his work in this video, and ScalPL in my book, seem to be natural progressions of each of those.)

I guess the point I'm hoping to get across here is that ScalPL's best days are ahead of it. It is here to address the issues of the coming decades of computing.

Tuesday, September 11, 2012

Scalable Planning Language (ScalPL) = F-Nets + ...

I recently described F-Nets, as well as some reasons I preferred that model to serve as the basis for concurrent/parallel programming. But I obviously didn't think it was sufficient on its own, or I would have stopped there. So, just what is ScalPL, and how is it different (superior to) F-Nets for programming? It starts with F-Nets, and adds...

Modularity
Instead of activating just tactics within a context, in ScalPL an entire F-Net fragment (i.e. a network of tactics and resources) called a strategy can activate within a context. In other words, a strategy combines other constructs into a single entity/module, and ScalPL defines how the activity (resulting from its activation) interfaces to its context's surroundings (via the role bindings on that context). Among other things, this means that we can create a module (strategy) to behave any way we like with regards to patterns of accepting inputs or producing outputs, or maintaining internal state, or acting (non)atomically or (non)deterministically, rather than being tied to only atomic deterministic modules (i.e. tactics).

First Class Modules
Together, tactics and strategies are called plans. And since plans (before they are activated) are just a form of static data/information, they can be stored on resources. Instead of labeling a context with the plan that is to be activated within it, a strategy can be asked to activate the plan that is present on one of its resources.

Clean Termination, Atomic Activation
Since strategies can activate, they should also be able to terminate -- so that they can be reactivated (cleanly) or something else can activate in the same context. Also, it should be easy to control the activation of a strategy the same way as for a tactic, and in particular, to make it easy to activate a strategy atomically. ScalPL addresses both of these in simple, logical ways.

Object Orientation (Class Based)
Since a strategy is a first-class entity encapsulating both behavior (i.e. contexts containing plans) and state (i.e. resources), and having a well-defined interface, it is almost an object. However, to truly be an object, it needs to be able to carry its state with it as it is activated in different contexts, so ScalPL introduces instantiation levels (and instantiation) for resources. With these, and a few methodologies to deal with them wisely, ScalPL becomes an object-oriented language, with meta-classes.

Arrays
Inherent in the F-Net model is an assumption that each context is statically bound to a fixed set of resources, and while that may be theoretically sufficient, it doesn't practically. At the same time, allowing the role bindings to be determined after activation re-introduces all sorts of complications (with deadlock, etc.) that the F-Nets execution model cleared up. ScalPL bridges the gap on this by introducing array resources (i.e. sets of resources, each an element of the array, addressable via indices) and a method of specifying which resource elements to which a context is to bind as part of the activation itself.

Data Concurrency/Parallelism
Having arrays isn't enough for data concurrency: For that, one needs a way to scale the amount of concurrency with the number of elements being processed. ScalPL provides ways to duplicate/clone contexts, and how many plans they should activate collectively (e.g. one among them all, or one each).

Dynamic Resource Allocation
Since there is no specified spatial relationship between different elements of a resource array (since they could even end up on completely different processors), and the initial control and content state of each resource element is known before/until it is accessed, resource arrays can be (and are, by default) infinite in size, and only those elements actually accessed (and modified) are actually tracked. This means that elements can be allocated only as they accessed (i.e. bound to).

Each of these properties conforms roughly to chapters in the Scalable Planning book.

Of course, many sequential OO programming languages have the above constructs and/or properties, and a typical approach to constructing a concurrent language is to start there and augment it with concurrent constructs. That doesn't usually work very well. Here, we've instead started with a simple computational model (F-Nets) that properly handles the concurrency issues, and have then built it into a complete language.

Monday, September 10, 2012

Why F-Nets

Now that I've provided a short and sweet description of the F-Nets computational model, here are some reasons I think the F-Net model serves as an excellent starting point for concurrent programming (and specifically why I used it as the basis of ScalPL).

Architecture-Independent Communication
The memory model in F-Nets (expressed in the semantics of its resources) does not restrict or favor how data gets moved from one place to another, such as message passing or shared memory (or something in between). There is no assumption that data will be copied (with the commensurate overhead) as a byproduct of communication, as with message passing, nor is there any need to mess with (and wait for) locks, as with shared memory. Runtime systems can often dynamically trade off copying with blocking to optimize performance.

Variable Granularity
In large part because of the flexible and efficient communication model, there is virtually no penalty for using the same memory model for static data. As a result, if communicating entities are colocated, from a granularity perspective they essentially merge into one entity. This allows the granularity of the application to naturally increase (i.e. concurrency to shrink) to match the available concurrency in the platform.

Latency Tolerance
As in data flow models, the processing model together with the memory model does not dictate when data must be somewhere, or even specifically where it should be. It only states (or strongly suggests) what data and code must be together in one place before computation can occur. Data movement can all take place between tactic activations, rather than during them, so latency can be made to overlap with other computation within the runtime/scheduling system, and computation in progress never needs to stall waiting for data to arrive, consuming resources and perhaps interfering with other progress.

Language Independence
The processing and communication model ask nothing of the underlying computer language except to do what virtually all already do -- i.e. express a deterministic mapping (e.g. subroutine) from the initial values within some set of data structures (generally passed as arguments) to the final values within some (perhaps overlapping, perhaps identical) set of data structures (also generally arguments). Well, there is one other minor requirement: That the subroutine also assign a transition (color, control state, or name) to each of those data structures/arguments, which is easily done by adding a simple statement for that purpose to almost any imperative language. In fact, this makes F-Nets/ScalPL a fine mechanism for combining different languages in the same application.

Formal Semantics
Because of the simplicity of the memory model, and the fact that tactics are deterministic and each operate on a fixed set of resources, the model is easily described in terms of functions and sets, both axiomatically and operationally, without delving into the internal semantics of the language used to implement tactics. (Denotational semantics of the language would be used to define the precise function association with each tactic.)

Independence of Specification & Implementation
And because any implementation of a tactic which expresses a desired function is equivalent in terms of the semantics, the precise way it does it, and specifically the order or timing in which any implementation accesses (or declares a new control state for) different resources, is immaterial.: Implementors can code to a high-level specification, with significant latitude. That's very different than other approaches to concurrency, where the ordering of operations can mean the difference between (say) deadlock and none.

Structured Concurrency
Sequential computer languages are generally called structured if it the physical structure of the program (e.g. its indentation) immediately suggests (or restricts the possibilities for) its execution. More specifically, the program fairly obviously represents a folded up form of its possible executions. F-Nets are the extension of these same principles to the concurrent world.

Tamed Nondeterminism
Some concurrent languages (e.g. functional) don't permit the expression of nondeterminism. Others do, but in such an undisciplined way that it can easily "creep" into programs unintentionally, and regardless of how it gets there, it's hard to tell how it will affect the overall execution. There are clear ways to recognize possible nondeterminism within F-Nets, not only to ensure that the programmer has an opportunity to exclude it if desired, but also to optimize tracing to minimize overhead for so-called "instant replay" debugging. And even when there is nondeterminism, the semantics still allow for the analysis of the execution as a collection of functional evaluations.

I would challenge the reader to find another basis for concurrency which has all of these advantages.

Sunday, September 09, 2012

F-Nets, the essence of ScalPL

I've been wanting to explain here why I get excited about ScalPL (Scalable Planning Language), to compare it with other approaches, etc. But to do that, I first need to assume that the reader knows what ScalPL is, even those who haven't read the book. Fortunately, ScalPL can be separated into its foundational (fairly simple) computational model, called F-Nets (short for function networks), and then the other stuff built on top of that (object-oriented model, arrays, data concurrency, etc.), and I really only need to explain F-Nets to convey many of ScalPL's merits. I'd like to just convey the raw "what" behind F-Nets in one post (here), leaving the "why"s to become clearer in later posts. Although I presented most of the facets of F-Nets a month ago from a different angle, I'll subsequently use this post as the definitive "quick and complete description" of F-Nets for future reference. So without further ado, in a dozen paragraphs...

An F-Net consists of two main kinds of entities: Resources and tactics. A resource, represented graphically by a rectangle (sometimes with other decorations), is a container -- it is capable of holding something static, which is (unsurprisingly) called its content, or more technically, its content state. For computers, you can think of a resource roughly like a variable, and its content as its current value. So, each has an initial content state (either declared or default), and a content domain (like a data type) which describes the kinds of content it can hold.

A tactic is a description of what to do to some resources, so it can be considered as a program (subprogram, function, method, etc.). And like a program, which can be run (or executed) to become a process (an active entity "doing its thing"), a tactic can be activated (executed) to become an acton (also an active entity "doing its thing").

A construct called a context, represented graphically by a circle, signifies an association between a tactic and a set of resources, by labeling the circle with the name of the tactic, and drawing lines from the circle to the rectangles representing those resources. Multiple contexts in the F-Net can be labeled with the same tactic name. The connecting lines, called role bindings, are uniquely labeled (explicitly or implicitly, by labeling the resource instead), and these labels/names are used by any acton in that context (i.e. resulting from activating the associated tactic) to designate the resources. The behavior of an acton -- i.e. the alterations it makes to the content of its resources via these role bindings -- is completely determined by the tactic from which the acton springs (i.e. the tactic's description of what to do) and the original contents of those resources when the tactic activates into an acton. In other words, a tactic can be regarded as a function which translates/maps the contents of its resources (as identified by those role binding names) when it activates, to the final contents of those same resources when the activated tactic (i.e. acton) finishes. And since a tactic can be regarded as a function, then a context can be regarded as specifying an application of that function relative to a specific set of resources. And an acton can be regarded as an evaluation of that application of that function.

Technically, it's possible to get by without specifying much more, just letting the tactics activate atomically (e.g. one at a time) randomly or continuously. In fact, what's been described here so far is already very similar to a computational model developed years ago by K. M. Chandy and J. Misra called UNITY (or similarly Dijkstra's Guarded Commands) where the guards would be built into the tactics. However, restricting/specifying the circumstances under which each tactic can activate is generally really useful for both understanding and efficiently implementing the F-Net, so there's another piece of the puzzle: In addition to content state, each resource also has another changeable attribute, called its control state, represented as a color, initially green for each resource. And, each of the role bindings (lines) connecting a context to a resource also has an associated set of (static) colors, represented as colored dots (called access dots) shown between the end of role binding and the resource rectangle. Now the activation rule is: A tactic can activate in a context if and only if all of the context's resources have a control state matching one of the access dots on the role binding connecting that resource with the context. And, when a tactic does activate, in addition to accessing (e.g. observing, updating, or replacing) the content of its resources, the acton also assigns a new control state (color) to each resource, thereby (perhaps) allowing other tactics (e.g. associated with other contexts) to activate.

That covers when a tactic can activate into a tactic, but must it ever do so? The liveness rule answers that question: If a tactic can activate, then it must, within a finite amount of time -- or, another tactic using at least one of the same resources (and meeting the normal activation criteria) must. In other words, if something can happen, it will/must, as long as something else doesn't interfere.

That is the essence of F-Nets, but there are a few other notations and minor rules which make them more portable, easily understood, and efficient.

First, we put arrowheads (called permissions) on all role bindings to designate whether an acton in that context might observe/read the contents of that resource (arrowhead on the circle/context end) and/or will always replace/rewrite the contents of that resource (arrowhead on the resource end). A role binding with no arrowheads means that the acton won't touch the content state (though it will still assign it a new control state), and arrowheads on both ends signify the other options -- i.e. that the acton might only change parts of the control state, or might only sometimes replace the entire control state. The four combinations of arrowheads are called observe, replace, nodata, and update permission, respectively (i.e. context end, resource end, neither end, both ends).
Then, just as arrowheads help to represent what the actons might do to the resources' content states, we use colored dots inside of the resource rectangles to represent what the actons might do to their control states. That is, near where each role binding connects to the rectangle (with access dots), we put one or more colored dots (called, unsurprisingly, new control state dots) inside the rectangle to represent the new control states that the corresponding acton might assign to that resource. It is also common to add a legend inside the left end of the resource rectangle showing each color that the resource might ever assume (i.e. its control domain) and what that color symbolizes.

So now, just by looking at the F-Net, you can tell, to a large extent, how each acton might affect, or be affected by, the content and control states of the resources to which it is bound (attached) by role bindings. The control states make an F-Net act very much like a Petri Net, in that regard.

But there is one more loose end. If tactics can be considered as programs or functions (specifically what theorists call partial recursive functions), complexity theory tells us that we may not be able to tell, with any amount of analysis, whether an acton (once it has been activated from a tactic) will ever finish accessing the content of any particular resource, or assign it a new control state. (This is true even if the acton has already been cranking away for an arbitrary amount of time.) This messes up the idea that we can activate just one tactic at a time: If one gets stuck, we'd never get around to activating any other tactics, but we might also never be able to know if we can just give up on it, because it might eventually do more if we just give it more time.

The essence of getting around this issue is by simply considering a resource which an acton never finishes with as having a special control state (called bottom or ⊥) represented by no color at all (often shown as black). Because its can be so hard (or impossible) to figure whether any particular acton will eventually finish accessing any particular one of its resources, it is also generally assumed that any resource could end up with this bottom control state, so it's as though there is an implicit bottom (black) dot added to every set of new control state dots. However, technically speaking, whether an acton will or will not finish up with any particular resource is still completely determined by its tactic and the initial content of its resources, so even with this so-called augmented control domain, we can still at least abstractly consider each tactic as a (total) functional mapping. And we can freely allow one tactic to activate even before others have finished, as long as we don't let a subsequent tactic activation use a resource that a previous one (i.e. acton) hasn't finished with yet.

And that brings us to one final rule, called the predictability rule, which greatly increases the efficiency and portability of F-Nets in common cases. It says simply: If all of the new control state dots for a role binding have the same color, and that role binding has either observe or nodata permission (i.e. there is no arrowhead on the resource end of the role binding), then we call the role binding predictable, which we take to mean that there is not an implied new control state dot of bottom. That is, predictable role bindings always finish up. And because they always finish up, and they don't alter (update or replace) their content state, and they have only one choice for new control state, we know the final content and control state of the associated resource the moment the tactic is activated, even before it has finished up.

OK, that's F-Nets in a nutshell. There are other things I could get into, like just how to express tactics in traditional programming language, or how to represent F-Nets textually. There are also other constructs and abstractions: For example, role bindings really represent (surprise!) bindings for things called roles, and there are ways to a make role bindings branch, etc. But relatively speaking, those extras are all just window dressing. F-Nets serve as only the most basic foundation of ScalPL, which altogether has much more functionality, power, and expressiveness than F-Nets, but F-Nets alone are plenty enough to explain many of the advantages of ScalPL over many other approaches. And now onto the beauty of this approach...

Friday, September 07, 2012

So why parallel (instead of concurrent)?

My last blog entry focused on the ongoing confusion between parallelism and concurrency. At the very least, I hope I convinced you that parallelism is a subset of concurrency -- i.e. that all parallel programming is also concurrent programming. So, what, specifically, are parallel programmers trying to achieve beyond concurrency, and why?

A little history is in order. Parallel programming emerged in an age where the number of practical architectures/platforms for supporting concurrency was limited, as was the technology for programming them. By and large, these platforms consisted of a handful to hundreds of identical (homogeneous) sequential processors tied together with a uniform communication network, to support either a shared memory or message passing paradigm. Typically, the entire platform (or some well-defined portion of it, in an approach called space-sharing) was devoted to running one program at a time.

To make the most of these exclusively-allocated processors while reining in complexity, approaches like SPMD (Single Process Multiple Data) or SIMD (Single Instruction Multiple Data) were used to program these machines, where the same instructions (or at least identical programs) ran on several processors at the same time, but with each processor operating on a different part of a larger data set (or parameter space). Programs were often written using an approach called loop-level (or data) parallelism, where all processors were started on some part of a task (e.g. to process some data structure) at about the same time, generally to deliver their contributions to some specific processor at one time when finished. This was often at least partially facilitated by assigning different parts of large data structures (typically arrays) to different processors, signifying where operations to those structure elements would be performed. Communication was optimized for these so-called "collective" cases so that information could be efficiently distributed to and/or collected from all the processors at once. A program consisted of one or more of these collective distribute-compute-collect steps.

Software tools and languages facilitated this approach, using either SIMD languages (like Thinking Machines' C*) on specialized architectures, or more commonly (to whatever extent feasible), executing different iterations of a normally-sequential loop on different processors concurrently instead of in their default/native sequential order, using new languages (or extensions to traditional sequential ones) like HPF and OpenMP. Tools such as The Force, BSP (Bulk Synchronous Parallel), and collective operations in MPI also supported this collective-execute-and-collect paradigm. All-in-all, this assumption that a program had an entire, well-defined, homogeneous machine all to itself, and that the programmer was largely responsible for exploiting that machine (in part by optimizing the scheduling of operations on its different processors) to speed up the result, came to be considered as "parallel programming".

Fast forward to the present (and, if you like, future). Processors are ubiquitous. Communication is ubiquitous. PCs, tablets, and even phones have multicore chips (i.e. multiple processors). GPUs (Graphics Processor Units) that were originally intended for, well, graphics are being retargeted for more general computation in cases where there is a good application/architecture match. Our music, pictures, documents, etc. are often stored and processed in "the cloud", along with web searches and apps, on perhaps thousands of individual (generally PC-sized) machines handling perhaps millions of requests from different users. The number of processors in these larger platforms (e.g. in the cloud) together with the rate of technology advancement make the economics of homogeneity restrictive: Rather than replacing the entire platform at one time when parts go bad or are outdated, it's often more economically feasible to replace them piecemeal, with the technology of the day, but this leads to a hodgepodge of different technologies and characteristics. Virtually all the assumptions of traditional parallel computing are out the window, and with them, the effectiveness of traditional parallel programming approaches.

But it's worse than that. Even if platforms were still largely uniform collections of homogeneous processors in a uniform topology and roughly uniform latency (memory access time) -- and some still are -- those traditional parallel programming approaches would still be suboptimal, as they always were. First, the goal of high-level programming (and algorithms) from their very inception has been to divorce the correctness/viability of the program/algorithm from the architecture used to implement it, and these parallel programming approaches fail that, often being dependent both on the number of processors and the latency and semantic characteristics of their interconnection (e.g. message passing, NUMA or UMA shared memory). Second, limiting concurrent expression to fork-join or one-loop-at-a-time execution often forgoes the opportunity to fully exploit the available processors, such as when multiple loops or a loop and other less parallel sections could execute concurrently: These approaches were largely chosen as a quick-and-dirty way of reining in complexity, and do not fit all computational needs, being relatively unsuitable for a large class of programs (e.g. employing software pipelining or other kinds of functional decomposition). Concurrency can scale with data without expressing it as "non-looping loops". And third, the time for any program to achieve a solution includes the time it must wait around for a platform to become available for the program to even start running, which goes way up if an entire machine must be dedicated to one application at a time (or even space-shared between them).

So then, what, really, is the motivation for using these limited parallel programming approaches over concurrent ones? I would claim: Primarily laziness. Laziness in developing truly flexible expressive ways of representing concurrency algorithmically, independent of platform, that encompass loop-level parallelism as well as other kinds of concurrency, and in developing the necessary systems software (or middleware) and methodologies to have the resulting programs run (and share platforms) efficiently.

A programmer should be revealing the available concurrency in their program, their algorithm, and the platform (hardware and system software and middleware) should exploit that, in the context of the current conditions, including the availability of processors and other load on those processors. If a human needs to inject some guidance, some high level knowledge, to facilitate that algorithm-to-platform mapping, then fine, but let it be in a separate step, rather than within the expression of the algorithm itself. We've been fully accustomed to having that algorithm/platform independence in sequential programming for decades. It's time to get real in the concurrent world -- and I would argue that approaches like Scalable Planning Language (ScalPL) allow us to do just that.

Thursday, September 06, 2012

Parallelism "vs" concurrency

My expertise/experience is in parallel programming and parallel tools, and my company name, "elepar", is a shortening of the coined term "elemental parallelism". Yet my recent book barely mentions parallelism, favoring instead the term "concurrency". (The working title was in fact "Elemental Concurrency".) Why the terminology switch from "parallelism" to "concurrency"?

Over the years, there have been many attempts to contrast parallelism and concurrency, as they relate to computation. Two fairly recent examples (from a quick google search) are:

Guy Steele Jr, How to Think About Parallel Programming: Not! (video), from 0:29:45 to 0:30:47
David Liebke, "From Concurrency to Parallelism: an illustrated guide to multi-core parallelism in Clojure"(pdf), the first Clojure Conj., slide 2

The upshot in these is that concurrency refers to cases where different activities are vying for shared resources, or when outside forces prevent control over whether they will execute simultaneously, and parallelism refers to the other cases, where (say) you want activities to act simultaneously, and to not compete for resources, usually because you want to get as much done as possible in minimum time.

Is there justification for this stance? Oxford American Dictionary says parallel (computing) means "involving the simultaneous performance of operations", and concurrent means "existing, happening, or done at the same time". With such similar definitions, it's no wonder that these terms get confused, but while I can understand attempting to clarify their differences, they are not mutually exclusive. In fact, if operations are parallel (simultaneous), they are also necessarily concurrent (happening at the same time).

It's the other cases that apparently confuse, where operations are concurrent but not parallel. Simply put, if many activities overlap in time, at some time scale, they are concurrent, but if they cannot act at precisely the same time because (say) some or all of them are competing for resources, then they can't all work simultaneously (as they take turns accessing the resource), and are thereby not (technically) parallel. The most clear and common case of this is when there is but one processor or person (considered a resource, if you like) available to carry out a set of concurrent activities. That processor or person must then advance one activity at a time, or perhaps time-slice or multitask to move them all forward in stages. Some people consider Petri Nets to embody this concept of concurrency, because one person is often employed to play the token game, even if multiple transitions could technically fire simultaneously. Also, the transitions in Petri Nets fire atomically, which can be interpreted to mean that each one takes no time, thereby precluding multiple from happening at any one at a time (to some level of precision).

In addition to the computing sense, the term "parallel" also has a geometrical definition, "side by side and having the same distance continuously between them". If one draws a space-time diagram of a set of activities, and they form parallel (adjacent) line segments in the geometrical sense, then those activities are parallel in the computing sense. It implies that at some points in time (at least), the events are all active at the same time, though at different places (points in space). This highlights an advantage of parallelism over just concurrency: To keep the individual events each going, to get as much done in as short a time as possible. And it illustrates how to accomplish that: By having the different activities at different places, and interacting as little (i.e. being as independent) as possible.

But now to get serious. Since concurrency is a precondition to parallelism, parallel programming is necessarily concurrent programming. And in common usage, although atomic actions (e.g. transactions) can be considered abstractly to take no time, in reality they do take time and can thereby be parallel. Plus, regardless of how one implements parallelism, virtually all parallel activities share some resources in a mutually-exclusive fashion, so using that as a basis for differentiating concurrency and parallelism is not as definitive as it might at first seem. (Parallel activities which don't share resources are rare enough that they merit their own terminology, like "embarrassingly parallel".) In fact, it's worse than that: Occurrences at different places (e.g. on different processors) are (relativistically speaking) operating relative to different clocks, so simultaneity is not clearly and objectively defined.

But even assuming that one comes up with objective definitions of parallelism and simultaneity, etc., to delineate some concurrent programs as parallel ones, what features should a parallel programming methodology have over a concurrent programming methodology? And moreover, what would the advantages and disadvantages of building parallel programs be, as opposed to concurrent ones? I'll address that in my next blog entry.

Friday, August 10, 2012

Is it a function?

Math is traditionally written as contiguous, mostly sequential arrangements of symbols. A function application consists of a name or symbol identifying the function, and some number of arguments, all represented together as a contiguous unit. The result of such an application is a single value (also represented as a contiguous unit), thereby facilitating an evaluation process where applications are replaced by their results, enabling other applications, repeatedly -- embodied by the principle of referential transparency, and formalized in the lambda calculus. A traditional function like this can indeed be tricked into returning multiple results by having it return a single composite value, such as a list or tuple (cartesian product, "record" or "struct"), which can then be dissected into its component pieces after the fact to use as arguments to further applications, but that is (at best) clumsy to accomplish using only the traditional replace-application-with-result approach. (More generally, one represents the pieces symbolically, then uses those in separate expressions.)

Integer division, here called div_rem,
takes two arguments and produces two
results, a quotient and remainder, yet
is certainly a function.

In mathematics, these multi-argument single-result functions are the norm, largely to facilitate this form of textual, in-place expression and reduction, but that is certainly different from claiming that such a form is natural in any sense, or optimal in all contexts. Mathematically speaking, there is nothing magic about producing a single result: Even the very common integer division function produces two results (a quotient and a remainder). Single-result functions are also not especially common in the world around us, where (say) computer programs or organizational processes often produce many different outputs, physically and/or conceptually separate, each likely destined for different downstream functions/processes (e.g. printed reports, file updates, other results, directions to different subgroups).

(Even common nomenclature is confusing and/or inadequate for this discussion. "Multi-result function" sometimes refers to functions which may produce any one of a number of different results for the same inputs/arguments -- in other words, not functions at all! Sometimes the term "output" is used instead of "result" for the situation here, and "result" is used to refer to the actual instance (e.g. data value) being produced. Here, I use the term "result" as analogous to "argument" -- i.e. formally referring to a completely independent part of the output from (vs input to) a function application, without referring to the data fulfilling it, but there's no standard common name for this, as far as I can tell.)

What would composition between applications of multi-result functions look like? Unlike the single result case, it is not generally possible in a 1D or 2D textual representation to have each application adjacent to both any other applications providing its arguments (from their results) and those accepting its results (as their arguments). So, let's start by throwing off the strictures of sequential text, organizing the applications more loosely, such as in the 2-dimensional plane (e.g. a sheet of paper), representing each application by, say, a circle labeled with the name of the function, not necessarily physically adjacent to their upstream or downstream applications.

Then I'll represent each argument associated with each of those circles with, say, an arc (i.e. line segment) from the circle, as in the div_rem diagram above. Since we have lost the identification of these arguments that comes with their textual ordering, I now need to uniquely label each of those arcs with a number or name, so the function knows which is which. And, there's no reason not to represent results in exactly the same way, also with uniquely labeled arcs. I'll differentiate arguments from results with arrowheads on those arcs -- toward the circle for arguments, away from the circle for results.

So far, this looks somewhat like the beginning of a traditional visual dataflow model/graph. In those, result and argument arcs of different applications would be associated pairwise ("hooked up" or united in some way), so that data is considered to "flow" from the result of one application to the argument of the other. There might even be some sort of junction/bifurcation, to allow a single result arc to connect to multiple argument arcs.... and perhaps even multiple results to a single argument.

Such approaches beg questions like: Is it now possible/"legal" to produce multiple results on a single arc, to collect there (e.g. queue up) until consumed? Or, as a special case of that, for a single function application to produce multiple results on a single arc, to potentially be consumed by different downstream applications? And if multiples are allowed, does each datum stay individual, or do they merge together into a stream? And, when an arc does split, does the "actual" result datum go only to the first argument branch trying to "consume" it, or to all the argument branches? But it wasn't our intent to create more questions, or to make the arcs constructs in themselves with their own semantics, just to associate arguments and results.

To simplify and clarify, I'll instead toss a bunch of rectangles into this 2D diagram, which I'll call resources ("variables" if you like), and declare that each is capable of holding one datum. Now, instead of tying argument and result arcs directly to one another as in dataflow, I'll tie them to these resources. This is clearer: If an argument arc is tied to the resource, the entire content on the resource is used (as the "actual" for the "formal" argument) when evaluating the function application, and if a result arc is tied to the resource, the new content of the resource (all of it) becomes that result of the function application. I'll also define the initial content on each of these resources, to ensure everything is well defined to start out.

But which application(s) can/should evaluate at any one time? Functions (or functional models) are often held forth as embodying no concept of control flow, but in order for them to achieve their full power, just as in imperative approaches, they must have a way to selectively suppress evaluation, or play tricks to accomplish the same end (such as to selectively make evaluations idempotent). The most common approach in functional languages is lazy (or "on demand") evaluation to suppress evaluation of function arguments until they are needed (such as through evaluation or lambda operations). Dataflow models may use variables which transition between empty/undefined or full/defined states to control whether upstream or downstream evaluations can occur.

I'll generalize these traditional approaches into a simple unified system: In addition to the mutable (i.e. changeable) content (aka content state) of each resource, I'll also associate a mutable so-called control state with the resource, which is simply an indication (in some form) of which of the function applications, bound to the resource, are currently allowed to evaluate. A function application can only evaluate if the control state of all of its resources (i.e. those bound to all of its arguments or results) says it's OK to do so. And when it does evaluate, it not only produces (deterministically, of course) new content state for the resources associated with its result arcs, it also produces (deterministically) a new control state to all of its resources -- those representing both arguments and results -- to indicate what can evaluate next. For simplicity, I'll consider control state as a color, and associate one or more permanent colors with each arc, shown on "access dots" at the end of the arc. So the evaluation rule becomes: A function application (circle) can only evaluate when the current (last assigned) control state (color) of each of its resources matches one of the access dots associated with it (on the argument or result arc). And just like content state, we'll define the initial control state of each resource, specifically as the color green, so that the starting condition is well defined.

This does create one confusion, however: If the same resource is used as both an argument and result, the function should know that so (at least) it won't assign a different control state to each. So, I'll rule out binding both an argument and result to a single resource, and instead create a new kind of arc (with arrows on both ends) that takes an argument from, and produces a result to, the same resource. Since it's one arc, it gets one new control state as a result of the evaluation. And since we now have arcs with arrowheads at one end or the other or both, I'll also allow "no data" arcs with no arrowheads -- i.e. which can restrict the evaluation of a function application (via its access dots) and which can produce a new control state to its resource, but which doesn't observe or affect the content state of the resource at all.

What I've just described is the meat of a computation model called F-Nets, short for Function Networks. There are a few other touch-ups, to ensure that the function declarations are abstract/modular enough to be applied in different contexts, and to provide a property called "predictability" for efficiency, but for the most part, it's just as described. It was my PhD dissertation, and it forms the basis for the Scalable Planning Language described in the book Scalable Planning: Concurrent Action from Elemental Thinking, and the above is a condensation of much of what is in Chapters 2 and 3 of that book. F-Nets is similar to many other models, including Petri Nets, Unity, dataflow, etc.

Are these multi-result functions really "functions"? How can they not be? Their results (both content and control state) are completely determined by their arguments. "But they can update resources, thwarting referential transparency", I hear you say. "And worse, their composition can potentially be nondeterministic, and (therefore) non-functional!"

While those statements are true, they are features available in this form of composition, and it is odd to fear or criticize one form of expression as being more powerful and flexible than another. F-Nets can be constrained to be deterministic, resources can be constrained to be defined only once, so referential transparency can be attained if that's important. But traditional function composition is simply not up to the task of representing complex combinations of multi-result functions, and/or representing nondeterminacy of any kind. F-Nets are. They also have an intuitive graphical representation that suits software engineering and component/coordination languages, and merges imperative, data flow, and functional approaches. They are fully formal, with both operational and axiomatic semantics. They are concurrent, portable, efficient. They are a valuable tool in planner's toolbox. Traditional functional composition is nice where it works: F-Nets are there for other cases.

Saturday, May 19, 2012

Diagrammatically speaking...

You step up to the clean whiteboard, marker in hand, ready to explain how your device (or organization) works, or should work. Do you start by writing down words, or drawing a picture? Or a combination of both?

If it's between many words and a few visual components, you and your audience will probably prefer the latter. And yet -- even in this age of tablets, multimedia, gesture and voice recognition, and high-bandwidth connections -- such pictures are a rare medium for communicating our intentions to a computer, regardless of field. Visual programming languages (true ones, not "visual basic", for example) have generally not fared well.

It's true that language processor (compiler) folks have developed most of their techniques and formalisms in the textual realm, but that's hardly an excuse or reason. I'm talking here about the computer-human interaction: If it's formal enough, establishing another internal representation as text (or other bit sequences) is not difficult. And graph grammars have been around for decades.

So when does it make sense for a human to use visual components related in 2 or 3 dimensions to convey intent to a computer or other humans? When I took my first programming class in 1972, we were admonished to use flowcharts before writing our programs, to envision the control flow. Back then, once you encoded that into a program, the control flow became severely obscured -- represented by the association between a symbol (usually a line number) in the "goto" statement and that same symbol used elsewhere as a label.
So, we were also told to (somehow) keep our flowchart with the finished program, for later reference: Virtually impossible. Since then, we have found ways to convey about as much information about the control flow in the program itself, using indentation -- so-called structured programming -- and flowcharts are now très passé. A success of software engineering!

Even so, when detailing data flow (how data moves) within a computer program, we almost always use the same "spaghetti code" symbol-matching tricks, but now based on a variable name (instead of statement labels) to establish a relationship between previous definition and next use. In imperative languages, data and control flow are managed separately (with variables for one, control constructs for the other), and programming is the art of getting them to work in tandem to produce a desired result.

And when it comes to parallel processing, we traditionally add yet another set of paths for both control and data flow between processes, using yet another layer of associative matching in either message passing (tag/type and process ID) or shared memory (address or name of shared variable or lock). And the results of those communication matches are entirely dependent upon the confluence in the control and data flows of the individual processes/threads, to the point that the communication matches may (commonly) be non-deterministic.

Dataflow languages attempted to simplify this mess by unifying control and data flows, allowing computation (i.e. control) anywhere that the necessary data appeared (became defined) through fixed channels/pipes. While this suggests the potential use of visual representations (graphs) to encompass both control and data (i.e. arrows representing fixed data pipes between nodes representing potential filters/transforms/computations), it's generally infeasible because of the sheer number of such dependences: Every computation result is conceivably another opportunity for data/control transfer. So, even in this case, associative matching on variable names was used, to convey (now) both data and control dependences. Hardware to facilitate it dynamically became something of a joke, due in part to the bottleneck caused by this associative matching.

Alright then, how about increasing the granularity of the dependences? Increase the size of the data objects flowing (technically "data" not "datums"), thereby decreasing the number and complexity of the dependences, perhaps even to the point that visual representations become feasible again, and shrinking the associative bottleneck. That does indeed solve some of the problem. But it still suggests that control flow is the result of data flow, and now there's more data to flow to lead to such control, resulting in significant overhead for control transfer, moving data around (often needlessly) just to get the right part of the program to "light up" with activity/control. Can't we manipulate the control and the data separately, but let them both operate along the same paths when convenient?

That was the jumping off point of my work from my grad school advisor's (R. Babb) work on Large Grain Data Flow (LGDF). Its basis looks something like a cross between a Petri Net

and a bunch of finite state automata -- each lending themselves to visual representation -- or, if you like, a network of Turing Machines passing tapes around. Through the years, the concepts have evolved, as has the name -- from LGDF2, to F-Nets, to Software Cabling, and now to ScalPL (Scalable Planning Language). All of them have been based on a similar simple circles-lines-boxes visual representation. In my new book, I suggest that the principles and uses extend well beyond the computer programming field, to almost any kind of concurrent planning, but that in the parallel/concurrent programming world, it is the logical extension of structured programming.

And, yes, I believe ScalPL is a representation which is simple and intuitive enough to use when you step up to that whiteboard and wish to convey the relationships between several activities working together, especially if you've got a handful of different-colored markers. But for now, even just using it for parallel and concurrent programming will be a big step forward. What is the link between diagrams and concurrency? For another time...

Wednesday, May 16, 2012

Is the book about planning? Or parallel computing? Yes.

A few months ago, I said I'd be blogging about my new book... and now that it is really finally out (see www.elepar.com), it's time to actually do so!

The name of the book is "Scalable Planning: Concurrent Action from Elemental Thinking". Wait!... Before you close this tab while muttering "I don't care about planning, whatever that means," let me explain: I don't self-identify as a planner, either. My "thing" (for almost 40 years) has been software. Or more specifically (for the last 25 or so), parallel software. Or even more specifically, architecture (or platform) independent parallel software. And more recently, this has meant finding ways to write computer programs so that they can execute efficiently on a number of different platforms (read "hardware"), such as on a traditional sequential computer, or a multicore system (lots of processors on one chip), or in the cloud (or the "grid," as they used to say), or on a cluster ("farm") of interconnected but independent systems (e.g. workstations), or on a graphics processor unit (GPU), or what have you -- including collections or hybrids of the above. The goal is to write one program (or algorithm) that runs on any suitable platforms (or hybrid platform) available at the time. That goal was conquered decades ago for sequential computers, with the advent of high-level languages and compilers, but not really well with parallel systems. It leaves the term "parallel algorithm" as an oxymoron -- algorithm implies independence from platform, but parallel usually doesn't.

So what has that to do with planning? As it turns out, a lot. Software is, by definition, a plan, a description of activity that you want to take place. And many of the problems inherent in platform-independent parallel software -- problems like nondeterminism, and efficiency (including dealing with relatively long communication latencies), and scheduling, and fault tolerance, building larger plans out of smaller ones (or delegating smaller parts of bigger ones to other entities), and just trying to make sure that everything works together as it should to get the answer -- are present in other planning problems as well, especially those involving multiple people, or offices/branches, or devices, that all need to work together to produce a desired result. Together, but not too closely together. (One of our slogans, "Working together independently", sort of sums it up.) In these non-computer cases, the people, or branches, or devices are the platform.

Besides, if the book had been named something related to "parallel computing", many readers would have expected it contain computer programs, and to list traditional models of parallelism (e.g. message passing and/or shared memory and PRAM), or newer approaches coming out of research labs, etc. Although this book has hints of such things (mostly in chapters that are explicitly noted as optional), it's really about how to avoid dealing with such things during the planning (e.g. programming) phase -- or, put another way, to express a plan in such a way as to make it easier for another entity (called The Director in the book) to address those issues (like communication styles) later when the platform is known. And the shape of that platform may even shift as the plan is underway. (In fact, the plan/program itself can change on the fly, too, if necessary.)

If it's about expressing plans (e.g. programs) in a way to make them more portable among platforms, isn't that just yet another portable parallel programming language? It is a language, in a sense, a visual one, called ScalPL (Scalable Planning Language). Calling it a programming language might be confusing, since it doesn't even have the most basic mathematical operations or conditionals -- there are already plenty of languages to express those parts in -- but many would consider ScalPL a coordination language, or a component architecture language, or a process algebra (or calculus). It can also be considered a pattern language.

So, if you are into platform independent parallel/concurrent software, or even just want to know what the issues are in those fields, I expect you will understand and appreciate this book. And if you aren't, and are just into more traditional planning? The title wasn't really intended to mislead you, and based on some very positive feedback from a few people who have seen the book so far, it appears that it may have exciting new approaches for traditional planners, too. And because it straddles the software and concurrency and planning nomenclatures, there is some cross-fertilization between them, such as structured and object-oriented and refactoring approaches (from computer programming) being applied to the others.

Of course, this all leaves plenty of other questions -- including why I would be qualified to address some of these issues when plenty of others in the computer field have failed. And to such things, I can only say that many of us have worked on these issues for decades, and continue to work on them. We are all evaluating what each other is doing, and this book is my attempt to educate people on my thought processes on the matter, hopefully to be built upon and integrated into other work. I have tried to make this work accessible to the lay person as well as the expert, so that it might be appreciated, utilized, and, yes, judged, by a wide audience.