Saturday, May 19, 2012

Diagrammatically speaking...

You step up to the clean whiteboard, marker in hand, ready to explain how your device (or organization) works, or should work. Do you start by writing down words, or drawing a picture? Or a combination of both?

If it's between many words and a few visual components, you and your audience will probably prefer the latter.   And yet -- even in this age of tablets, multimedia, gesture and voice recognition, and high-bandwidth connections -- such pictures are a rare medium for communicating our intentions to a computer, regardless of field. Visual programming languages (true ones, not "visual basic", for example) have generally not fared well.

It's true that language processor (compiler) folks have developed most of their techniques and formalisms in the textual realm, but that's hardly an excuse or reason. I'm talking here about the computer-human interaction: If it's formal enough, establishing another internal representation as text (or other bit sequences) is not difficult. And graph grammars have been around for decades.

So when does it make sense for a human to use visual components related in 2 or 3 dimensions to convey intent to a computer or other humans? When I took my first programming class in 1972, we were admonished to use flowcharts before writing our programs, to envision the control flow. Back then, once you encoded that into a program, the control flow became severely obscured -- represented by the association between a symbol (usually a line number) in the "goto" statement and that same symbol used elsewhere as a label.
So, we were also told to (somehow) keep our flowchart with the finished program, for later reference: Virtually impossible. Since then, we have found ways to convey about as much information about the control flow in the program itself, using indentation -- so-called structured programming -- and flowcharts are now très passé.  A success of software engineering!

Even so, when detailing data flow (how data moves) within a computer program, we almost always use the same "spaghetti code" symbol-matching tricks, but now based on a variable name (instead of statement labels) to establish a relationship between previous definition and next use.  In imperative languages, data and control flow are managed separately (with variables for one,  control constructs for the other), and programming is the art of getting them to work in tandem to produce a desired result.

And when it comes to parallel processing, we traditionally add yet another set of paths for both control and data flow between processes, using yet another layer of associative matching in either message passing (tag/type and process ID) or shared memory (address or name of shared variable or lock). And the results of those communication matches are entirely dependent upon the confluence in the control and data flows of the individual processes/threads, to the point that the communication matches may (commonly) be non-deterministic.

Dataflow languages attempted to simplify this mess by unifying control and data flows, allowing computation (i.e. control) anywhere that the necessary data appeared (became defined) through fixed channels/pipes.  While this suggests the potential use of visual representations (graphs) to encompass both control and data (i.e. arrows representing fixed data pipes between nodes representing potential filters/transforms/computations), it's generally infeasible because of the sheer number of such dependences:  Every computation result is conceivably another opportunity for data/control transfer. So, even in this case, associative matching on variable names was used, to convey (now) both data and control dependences. Hardware to facilitate it dynamically became something of a joke, due in part to the bottleneck caused by this associative matching.

Alright then, how about increasing the granularity of the dependences?  Increase the size of the data objects flowing (technically "data" not "datums"), thereby decreasing the number and complexity of the dependences, perhaps even to the point that visual representations become feasible again, and shrinking the associative bottleneck.  That does indeed solve some of the problem.  But it still suggests that control flow is the result of data flow, and now there's more data to flow to lead to such control, resulting in significant overhead for control transfer, moving data around (often needlessly) just to get the right part of the program to "light up" with activity/control.  Can't we manipulate the control and the data separately, but let them both operate along the same paths when convenient?

That was the jumping off point of my work from my grad school advisor's (R. Babb) work on Large Grain Data Flow (LGDF).  Its basis looks something like a cross between a Petri Net
and a bunch of finite state automata -- each lending themselves to visual representation -- or, if you like, a network of Turing Machines passing tapes around.  Through the years, the concepts have evolved, as has the name -- from LGDF2, to F-Nets, to Software Cabling, and now to ScalPL (Scalable Planning Language).  All of them have been based on a similar simple circles-lines-boxes visual representation.  In my new book, I suggest that the principles and uses extend well beyond the computer programming field, to almost any kind of concurrent planning, but that in the parallel/concurrent programming world, it is the logical extension of structured programming.


And, yes, I believe ScalPL is a representation which is simple and intuitive enough to use when you step up to that whiteboard and wish to convey the relationships between several activities working together, especially if you've got a handful of different-colored markers.  But for now, even just using it for parallel and concurrent programming will be a big step forward.  What is the link between diagrams and concurrency?  For another time...


Wednesday, May 16, 2012

Is the book about planning? Or parallel computing? Yes.

A few months ago, I said I'd be blogging about my new book... and now that it is really finally out (see www.elepar.com), it's time to actually do so!

The name of the book is "Scalable Planning: Concurrent Action from Elemental Thinking".  Wait!... Before you close this tab while muttering "I don't care about planning, whatever that means," let me explain:  I don't self-identify as a planner, either.  My "thing" (for almost 40 years) has been software. Or more specifically (for the last 25 or so), parallel software. Or even more specifically, architecture (or platform) independent parallel software.  And more recently, this has meant finding ways to write computer programs so that they can execute efficiently on a number of different platforms (read "hardware"), such as on a traditional sequential computer, or a multicore system (lots of processors on one chip), or in the cloud (or the "grid," as they used to say), or on a cluster ("farm") of interconnected but independent systems (e.g. workstations), or on a graphics processor unit (GPU), or what have you -- including collections or hybrids of the above.  The goal is to write one program (or algorithm) that runs on any suitable platforms (or hybrid platform) available at the time.  That goal was conquered decades ago for sequential computers, with the advent of high-level languages and compilers, but not really well with parallel systems.  It leaves the term "parallel algorithm" as an oxymoron -- algorithm implies independence from platform, but parallel usually doesn't.

So what has that to do with planning?  As it turns out, a lot.  Software is, by definition, a plan, a description of activity that you want to take place.  And many of the problems inherent in platform-independent parallel software -- problems like nondeterminism, and efficiency (including dealing with relatively long communication latencies), and scheduling, and fault tolerance, building larger plans out of smaller ones (or delegating smaller parts of bigger ones to other entities), and just trying to make sure that everything works together as it should to get the answer -- are present in other planning problems as well, especially those involving multiple people, or offices/branches, or devices, that all need to work together to produce a desired result.  Together, but not too closely together.  (One of our slogans, "Working together independently", sort of sums it up.)  In these non-computer cases, the people, or branches, or devices are the platform.

Besides, if the book had been named something related to "parallel computing", many readers would have expected it contain computer programs, and to list traditional models of parallelism (e.g. message passing and/or shared memory and PRAM), or newer approaches coming out of research labs, etc.  Although this book has hints of such things (mostly in chapters that are explicitly noted as optional), it's really about how to avoid dealing with such things during the planning (e.g. programming) phase -- or, put another way, to express a plan in such a way as to make it easier for another entity (called The Director in the book) to address those issues (like communication styles) later when the platform is known.  And the shape of that platform may even shift as the plan is underway.  (In fact, the plan/program itself can change on the fly, too, if necessary.)

If it's about expressing plans (e.g. programs) in a way to make them more portable among platforms, isn't that just yet another portable parallel programming language?  It is a language, in a sense, a visual one, called ScalPL (Scalable Planning Language).  Calling it a programming language might be confusing, since it doesn't even have the most basic mathematical operations or conditionals -- there are already plenty of languages to express those parts in -- but many would consider ScalPL a coordination language, or a component architecture language, or a process algebra (or calculus).  It can also be considered a pattern language.

So, if you are into platform independent parallel/concurrent software, or even just want to know what the issues are in those fields, I expect you will understand and appreciate this book.  And if you aren't, and are just into more traditional planning?  The title wasn't really intended to mislead you, and based on some very positive feedback from a few people who have seen the book so far, it appears that it may have exciting new approaches for traditional planners, too.  And because it straddles the software and concurrency and planning nomenclatures, there is some cross-fertilization between them, such as structured and object-oriented and refactoring approaches (from computer programming) being applied to the others.

Of course, this all leaves plenty of other questions -- including why I would be qualified to address some of these issues when plenty of others in the computer field have failed.  And to such things, I can only say that many of us have worked on these issues for decades, and continue to work on them.  We are all evaluating what each other is doing, and this book is my attempt to educate people on my thought processes on the matter, hopefully to be built upon and integrated into other work.  I have tried to make this work accessible to the lay person as well as the expert, so that it might be appreciated, utilized, and, yes, judged, by a wide audience.