Elemental Parallelism: 2013

Wednesday, December 11, 2013

Actors and ScalPL, again

I begin this post by noting that the price of the Scalable Planning book (describing ScalPL) has now been reduced to $24.95. This is likely as cheap as you can expect to see it for some time, just barely recouping costs, and about 40% of what it sold for a little over a year ago. So now's your chance. (The ebook version is waiting until I am certain I can produce a quality version, including the fairly significant math notation in chapter 10.)

Some time ago, I commented on a video by Carl Hewitt regarding the Actor model (primarily theoretical aspects), and I may still comment on that further... but this post is regarding another video on actors, from a recent Halmstad Colloquium, by Hewitt's former (many years ago) student, Gul Agha, who wrote a book on the subject in 1986, and who has since has been a professor at UIUC. (I've had the pleasure benefiting from Gul's gracious hospitality on a few occasions over the years.) This particular talk regards more practical matters relating to the Actor model, with the benefit of significant hindsight. Dr. Agha seems honest here about how some implementations have fudged the rules, and how those who haven't fudged them have paid some performance penalties.

First, in three slides, Agha summarizes some of Actors' important characteristics, and explains their merits. In the first (at 5:18), Actors are described as concurrent (with one another), communicating with asynchronous message passing and no shared state. In the next slide (at 8:40), he outlines basic principles, specifically: Encapsulation (of state by an actor), fairness (a.k.a. liveness, a guarantee that each sent message will eventually be received), location transparency (the independence of an actor's behavior from its physical location), and mobility (apparently related to the location actually being flexible during execution). In the third slide (at 18:10), Agha brings this all down to how an Actor is typically implemented: as "encapsulated state + behavior ('methods') + thread of control + mailbox".

Some of the merits are argued to be decreased complexity (with message interleavings being the only potential source of nondeterminism), tolerance of communication latency (the possibility of "hiding" message latency with other computation during transport), and portability (with actor addresses being independent of location or platform). State encapsulation within an actor is likened to data hiding, a mainstay of software engineering (and OO programming). The model is also said to facilitate mobility (which I assume means during execution).

Some of those benefits are actually not so clear to me. For example, regarding the fairness constraint facilitating correctness: The Actor model permits the creation of unlimited number of messages to be directed to an actor, but then also requires that every message be eventually handled (by that actor). This implies that, regardless of how quickly an actor can handle each message, it may not be fast enough to satisfy the model's requirements. That's a problem: Even beyond the requirement cited by Agha of unbounded queuing, the queue could actually grow infinitely (unless the parallelism can grow infinitely). I won't venture into the case when an actor doesn't terminate, and how/if this is formally different than an actor just refusing to handle more messages and violating fairness.

Also, while location-independent addressing (of actions) may aid mobility, mobility can actually be thwarted by the fact that an actor can maintain an unlimited and unobvious amount of persistent internal state, even while the actor is not active (i.e. while awaiting another message). It means there are essentially two kinds of state to be moved around, in completely different ways and under different circumstances: The state within a message, and that within an actor. And moving intra-actor state, especially to another kind of processor architecture, can be difficult at best.

Even some of the cited merits have corresponding drawbacks. For example, while data hiding is an important software engineering principle (in sequential programming), part of the state hidden within an actor is the set of actors with which that actor may communicate. So the entire communication and control topology of the program is hidden within actors -- and that is actually completely antithetical to engineering principles of structured systems. And since there is no limitation on the size or persistence of the state within an actor, the only way to characterize that hidden state (including the state representing the actors with which it may communicate) is in terms of the sequence of all of the messages that the actor has ever received throughout its history. This makes the propagation and transformation of state during an execution very difficult to track without knowing a lot about (and keeping track of) what's happening inside of each actor.

Dr. Agha also mentions the significant overhead associated with the required "by value" messages (with payload), which is avoidable in other models for communication between co-located procedures, in cases where they could just share access to a common area ("by reference"). Agha suggests a potential solution where static analysis could theoretically determine when an actual message transfer (copy) could be avoided, implementing sharing under the covers. This static analysis is much easier said than done, and something which has been proposed for decades with other models (e.g. Linda) while never very successfully implemented (without hardware support).

These issues are hardly new. When I was devising the F-Nets computational model around 1990 (see chapter 10 of the Scalable Planning book for a rewrite of my 1991 dissertation), it was already addressing many of these drawbacks, seen in most message-passing programs, with the rest (and others) addressed by its more complete language, ScalPL (Scalable Planning Language), shortly thereafter. (Described in the same book and many posts in this blog.) And, ScalPL retains the advantages of Actors cited above, but in addition:

ScalPL more cleanly handles data hiding by offering two kinds of "plans" (i.e. program fragments), called "tactics" and "strategies". Tactics, which are essentially functions or methods implemented with traditional sequential programming languages, have short-lived activations/evaluations ("actons"), and don't hide or maintain any state between activations; Strategies, are graphically built in ScalPL itself of other plans (tactics and strategies), and are generally long-lived and do maintain/hide state internally, in "resources". Since strategies have clean interfaces, with just a few simple additional constructs, they also serve as classes and their instances (objects).
The stateless and functional nature of tactics in ScalPL, together with the clear graphical representation (within strategies) of how they interact with resources and therefore with each other, makes the flow of both state and control clear, to both the human and automated analyst, allowing optimizations for efficient data sharing and/or migration without delving into the innards of the tactic implementations. These features make ScalPL potentially the closest representation to structured programming available in the concurrent/parallel world. It is essentially an executable software engineering diagram.
Those same ScalPL resources also serve to preserve state (as necessary) for tactics between activations, with negligible overhead. Since ScalPL strategies are composed only of these resources and other subplans (tactics and strategies) which access them, everything is ultimately built from just resources and tactics, leading to true mobility: Everything that might need to be moved is either a tactic (which is stateless, and therefore basically constant data, between activations), a resource (which is designed for mobility, acting as a message when necessary), or a strategy (which is ultimately composed of the previous two).
Fairness in ScalPL takes the form "if a tactic is continuously activateable, then it eventually will activate". Unlike Actors, since resources -- i.e. the connections between tactics -- are not (in essence) queues, data results of one component generally can't pile up unboundedly to be forced upon non-ready downstream tactics. Instead, upstream components must wait (i.e. won't be activateable) for downstream results to be consumed unless circumstances permit queuing (explicitly or implicitly) of prior results.
In the video, Dr. Agha mentions the merits of sharing a thread among multiple actors, as an optimization, as opposed to one actor per thread. In ScalPL, since tactics (actons) are atomic and maintain no state between activations, they are short-lived "threadlets", and an implementor is not even tempted to devote a long-lived thread to each one.

And with ScalPL, that's all with no mailboxes (or locks) to mess with, and an underlying formal computational model (similar to Petri nets), all while offering the other advantages claimed here for actors. And although the tactics in ScalPL are described here as "atomic" and "functional", this does not imply overheads associated with checkpointing and roll-back, or the use of functional languages (e.g. lacking "update in place" semantics). As the ScalPL book explains, the default execution model can be about as efficient as a standard function call, and implemented in your favorite imperative language.

I will be happy to receive information from, or engage in discussion with, those who find my observations here unfounded. I find the Actors model an important step in the progress of our knowledge of parallelism, in part because it can be used to describe so many traditional message-passing programs which possess its basic attributes. (The same cannot be said for ScalPL: It does not pretend to explain existing programs developed without knowledge of the model.) That said, I do hope that today's increased hardware technology and available parallelism are matched by advances in programming technologies.

Tuesday, September 10, 2013

Scalable Planning (ScalPL) Book Price Reduced

I figure the early early adopters have had their chance, so I've reduced the price of my book, "Scalable Planning: Concurrent Action from Elemental Thinking," from $65 to $45 for the later early adopters! And if you use the 20% discount code available at elepar.com (before it goes away), that's actually more like $36 -- about half the original price. Plenty more info about the book on this blog and at the elepar website. Questions and feedback are welcome.

Monday, June 03, 2013

Video of ScalPL presentation to the Zissou Society

The motto "Keep Portland Weird" even applies to computer languages, if "weird" can mean "eclectic" -- just witness The Zissou Society for Programming Language Exploration... or pdxlang for short. How eclectic? Well, Haskell would be considered practically a mainstream language here in Portland, so probably wouldn't qualify. Anyway, ScalPL (as described on this blog and the Scalable Planning book) was deemed weird... um, I mean eclectic enough to merit a presentation (immediately following a presentation on m4) last Thursday. And here is the screen capture of that presentation.

This was my first attempt at screen cap, so it's not perfect. Also, due to time constraints, the OO concepts at the end aren't especially well covered: Look for another vid later to do a better job on that. But if you are looking for a pretty thorough intro on what ScalPL is all about, here's a 72-minute presentation, followed by a 10-minute Q&A. (No significant background in parallel processing is assumed.)

Feedback here is welcome.

Thursday, March 21, 2013

ScalPL and the Sequential Prison

A former colleague (and his associates) recently referred me to this video/slide presentation: Ivan Sutherland, The Sequential Prison, at the SPLASH 2011 conference. I met Dr. Sutherland once some months ago at an alumni event at Portland State University, and understood that there was an overlap in our work, but this presentation provided an opportunity to understand how much.

The talk contains few very concrete or completely novel recommendations, but even it its generality, it's worth a listen, and it is clear that the speaker and I are on the same wavelength more often than not. He discusses his intuition that the form of our traditional human and computer languages, together with specific words in them, affect our approach to problem solving as a sequential endeavor, and that a more picture-oriented language would help to avoid this "sequential prison" that we've locked ourselves into. When questioned later about functional languages, he mentions how much they seem to be built around the concept of sequences, and how this also affects the tendency to build and process things serially.

As might be apparent from my work described on this blog, I agree with all of that. Even things we don't agree on exactly, we agree on vaguely. For example, in the parallel (or concurrent) world, he prefers the term "configuration" over "program". While I, also, tend to avoid terms like program and algorithm as being too sequential in heritage, I use the term "plan" in general and "strategy" as a more specific form of plan. Ivan seems to suggest that picture-oriented languages might require multiple simple forms to convey a single concept in enough specificity to facilitate execution, while my work in ScalPL concentrates on a single, rather technically annotated form, in part as an alternative to the complexities I've seen in the multi-view forms of UML and the like. Whether ScalPL (or more specifically, my ScalPL diagrams) are meaningful and easy to read, whether they would "leap into" Dr. Sutherland's mind as he requests, I can't say, but I do think they capture the spirit: Pipelines do look like pipelines, concurrency does look like concurrency -- to me, anyway -- as he requests. Other representations (such as he requests) are (can be) assembled by the planning tools.

His discussion starting at 30 minutes or so is very descriptive of ScalPL, though he seems to be addressing primarily hardware. In fact, the sort on page 279 of my Scalable Planning book (here, to the left) seems (to me) to be exactly the sort that he seems to describe there, but where he describes it as an instruction, it is here a ScalPL strategy. The first question (in Q&A near the end of the video) relates to the difference between hardware and software, with the old impression that the software and hardware are different beasts, that software is too dynamic, too fluid, too soft, to draw the same way we do with hardware. I have not found that to be the slightest problem in my work.

Of course, his discussion of standardization is also well taken. ScalPL is available.

Wednesday, March 20, 2013

How ex-treme is exa-scale?

[I drafted this some months ago, but for some reason didn't post it. It still looks good to me, so here goes.]

At the SC12 conference in November, I attended a couple of BoFs (Birds of a Feather meetings) regarding exascale. The first, an Open Community Runtime (OCR) for Exascale, was helpful in providing pointers, such as to this website which in turn points to slides, which in turn led me to sites for Intel's Concurrent Collectives (CnC) (including UTK) and University of Delaware Codelets/Runnemede work. But I found myself wincing at some of the things said during the BoF, and after looking at these sites and documents, I admit to being disappointed and, frankly, frustrated. This perhaps became even more pronounced in the second BoF, Resilience for Extreme-scale High-performance Computing. (In hindsight, there are others I wish I had caught.)

First, I would contend that even if we were to come up with a design for The Correct Programming Model(tm) and The Correct Runtime Support(tm) today, it could still realistically take until 2020* or so to develop the appropriate software, support, migration, software engineering techniques (debugging, tracing, patterns, methodologies, libraries, etc.) to make them useful. So there's no time to lose. And from what I was seeing, these groups are not only a long way from having The Correct design, in some ways they seem to be actually going back into history. More specifics about that in an upcoming blog entry.

(*As I write this, I find that the 2020 goal may be more like 2022, but my argument still holds.)

Second, in order to come up with The Correct design, we must understand the critical constraints, not saddle ourselves with non-critical ones and unwarranted assumptions. This is where people like Seymour Cray (starting with "a clean sheet of paper") and Steve Jobs (whether considered a visionary, big thinker, or "tweaker") excelled. Instead, in these meetings, I was hearing "requirements" such as the ability to run MPI programs effectively, or the desirability (necessity?) to use checkpoint/restart to achieve resilience. Even if backwards compatibility may be desirable in some cases, and existing practice may be a useful jumping off point, we will get nowhere by using them to guide the design.

Third, but strongly related to the above, it struck me that I was seeing lots of familiar faces in these BoFs, mostly from my days (a decade+ ago) on the MPI2 Forum. And while reconnecting with former colleagues is part of what the SC conferences are all about, using a standards-based approach is almost certainly not the best way to make significant advances in technology. New approaches will be required to achieve the goals, and standards for those can be established only after those goals are attained. (I could further argue that existing standards have artificially interfered with our progress in this field in other ways, but I'll save that rant for another day, except to say that standards committees are constantly bombarded with the reminder that their job is to standardize existing practice, often while maintaining compatibility with previous versions of the standard. That's not the recipe for quantum leaps!)

So what is the most effective way to achieve the exascale software goals? One might take a hint from Seymour Cray, in the link above: "Shunning committees, he felt that the best computers were the ones where a single architect offered a unified vision. After the machine had been delivered, it was then appropriate, Mr. Cray felt, to listen to feedback from customers and, if necessary, start over from 'a clean sheet of paper.'" I claim that there is no reason to assume that the same wouldn't also be true for software.

In other words, try some big ideas, in multiple (at least two) stages, assume that you may not succeed in the early (e.g. first) attempt, but will learn a lot nonetheless, and integrate what you learn into later stages. That is, in fact, how we fashioned the strategy for NASA's Information Power Grid (back when I had some influence there)... before that plan was apparently revised (and discarded?).

Of course, it could be argued that this "think big with master architect" approach is precisely that which was applied in DARPA's HPCS program, by funding individual companies and allowing them full control over their product/proposal. It could also be argued that that program (has) had limited success (though I would hardly call it a failure). And to those arguments, I would counter that the goals that were not conservative were underspecified. There was little advantage to the participants adding more objectives than those provided by the sponsors, and in fact, disadvantages, in that their competitors for the funds would then be working toward simpler goals.

If failure is not an option, then the goals are too conservative. Consider the manned lunar program: HUGE goals, and a really concrete way to measure if we'd achieved them or not. We were proud to meet them, but it was by no means assured. The exascale project has set some very high hurdles regarding power consumption/energy and reliability/resilience, which seem to fit into this model, but again, at least the goals I've seen on programmability are either conservative or vague or both. Of course, in a political climate, failure to meet a goal can be painful. And where there is public money, there is (and should be) politics.

And, like the moon landing, the trade-off between expense and benefits of attempting to meet (or actually meeting) the goals is a separate issue from whether the goals can be met at all, and the balance is not necessarily clear-cut. If the US government is the customer, they must decide how much it is worth to them, and other participants must decide if it fits their business (and/or research) model. The manned space program was run from within the government. Such a model seems unlikely today.

How about data movement? (For real, this time.)

In the previous post, I addressed the kinds of inherent constraints that an F-Net (and/or ScalPL) plan imposes (and does not impose) on data movement. The question then becomes how to turn these constraints into an implementation: Sooner or later, the decision to move or not move each data item must be made to meet the constraints, and something must make those decisions.

The most basic way to meet the constraints is by assigning all the computations (actons) to one processor, putting all the data (resources) into its local memory, and executing the whole thing sequentially. Unlike some other models (like most message passing) with semantics that intrinsically impose additional copying and/or data movement, or that require context switching between heavyweight address spaces, this sequential execution of F-Nets requires little if any overhead over a native sequential expression of the same computation, either for data movement/copying or context switching. In F-Nets, if a sub-computation/acton is going to complete at all, it can always be executed to completion before moving on to any other subcomputation/acton, and there is no need to save transient/intermediate state in this case. So the only need for more heavyweight context switching is when an acton diverges ("infinitely loops") or appears it may do so. (Most message passing requires data copying or movement even in a sequential environment, because both the sender and receiver are assumed to have their own mutable copy of the message after the communication, and may both be required to make progress for the overall computation to progress.)

But in the more common case where we want an F-Net/ScalPL plan to exploit the parallelism of a platform, how best to decide how (and when) to embed the plan (program or strategy) onto that platform? In the good old days, parallel programmers made several assumptions about both the platform and the algorithm to simplify this embedding process: We assumed that the platform consisted of a fixed number of heterogeneous, dedicated, fault-free general-purpose processors connected with a predictable, regular communication topology. We also often modeled the application/plan as a set (and usually a sequence) of parallelized steps ("epochs", phases) such as array operations, where each potentially-concurrent operation in a step took about as long as any other operation in that step to complete. Based on such assumptions, it was sometimes viable to statically map the computations and their associated data onto the processors in the platform.

Under those same assumptions, an F-Net or ScalPL plan can also be statically embedded on a platform in similar ways. Rather than assigning resources to processors (as some languages might assign array elements under an "owner computes"-like rule), it makes far more sense to assign the computations themselves -- i.e. actons -- to processors, by annotating their contexts. Another approach is to assign each control state of each resource to a processor. In either of these cases, the constraint (implied in the F-Net itself) that the content state of a resource must be colocated with the acton during execution, is assumed, and annotations which suggest conflict to this can be resolved in a number of sensible ways (e.g. majority rules). In ScalPL, by allowing these annotations to include the index values supplied by so-called "dup bindings", such embeddings can be a function of the array elements processed, so are no less expressive than those in HPF, etc.

Even so, just as can crop up in deep subroutine calls in HPF, the hierarchical and dynamic nature of ScalPL means that different (perhaps recursive) activations of the same plan (strategy, program) may be best assigned to different parts of the platform, yet preferably related to the embedding of their parent activation in sensible ways -- so static labeling of contexts isn't necessarily sufficient. This can be at least partially addressed by a simplified parameter passing -- i.e. providing special (integer) variables in child plans which are set equal to the annotation of the context in the parent, and which the child can then use (e.g. in mathematical expressions) when annotating its own contexts.

Even better is to consider representing the overall embedding as a two-step process, the first as mapping the computations to an abstract platform, which is considered to have easily-understood internode properties, and the second as mapping that abstract platform to a concrete/actual one to hopefully preserve as many of those properties as possible. My work has focused on an abstract platform (AP) representation of nodes addressed by finite sequences of non-negative integers, allowing the AP topology to be interpreted in a number of ways -- e.g. as a sequence/cycle (e.g. if just single integers), or a rectangular grid (e.g. if tuples of integers), or a tree (e.g. with zero-length sequence representing the root, any other sequence representing a descendant of all nodes addressed by its prefix sequences). The annotation/parameter passing described in the previous paragraph can now be of these sequences, instead of integers, with the child able to append new suffixes to the sequence it was passed. Mapping of the AP to the actual platform can be manual or automated, based on various assumptions. A common shorthand is to assign all nodes having the same prefix to a particular actual processor, which can be interpreted as assigning an entire row or column (if the AP is considered a grid) or subtree (if the AP is considered a tree) to that node. (People who are accustomed to HPF and similar embedding languages may feel it lacks constructs for transpose, etc., but (a) these would be handled in embedding to the AP, not mapping the AP to the actual architecture, and (b) such a transpose, if necessary at all, is imposed by mapping the pre-transpose computations and post-transpose computations to appropriate nodes, since mappings effectively change between steps rather than within them.)

But that leaves two important points. The first is that the traditional simplifying assumptions for parallel static embedding (summarized in the third paragraph, above) are virtually all unrealistic these days. Applications for concurrency are no longer confined to scientific and/or matrix operations, so their concurrency is no longer regular in shape or uniform in subcomputation duration. And this irregularity, together with increased concurrency, makes it less practical to devote entire machines (or portions thereof) to specific applications, so node sharing (yielding variable loads) must also be accommodated. And as platforms grow ever larger, it becomes less practical to make them heterogeneous and fault-free. The bottom line is that static embedding becomes all but impractical: These embedding decisions must be made at runtime, in real time. Using the two-step process mentioned above, only the mapping of the AP to the concrete platform must be dynamically determined.

And the second point, related to the first, is that making such dynamic embedding decisions on a fine-grained basis is also impractical. This is another reason for facilitating non-trivial, large-granularity subcomputations as the basis for embedding. And that, in turn, requires that these subcomputations be more flexible and expressive than traditional data flow-- e.g. allowed to produce multiple results (to each potentially be directed to different downstream subcomputations), and results which affect only parts of larger data structures. This is a prime motivation behind the structural properties which have been endowed upon actons (the subcomputations) in F-Nets/ScalPL. Because these actons are typically entire subcomputations/subprograms in themselves, the embedding decisions within an acton to a single processor (to memory, registers, logical units, etc.) to carry them out can be performed statically and locally, such as by compiler, or if necessary, by OoO scheduling within the processor. These local, lightweight decisions are, by now, well understood.

So that is the answer to how data movement is determined within an F-Nets/ScalPL plan (i.e. strategy, potentially consisting of sub-strategies, ultimately down to actons): This balance, between explicit parameterized embedding of a strategy/plan to an Abstract Platform, dynamic (or, in restricted cases, static) embedding of the AP to a concrete platform, and traditional compiler/processor scheduling of processor resources for each acton within the strategy/plan. The book goes into some depth in describing an entity known as The Director, and how it can efficiently determine when to move data. There is also significant discussion on how to optimize the movement of data depending on architecture type (e.g. shared vs distributed memory) and the form of the strategy (e.g. with multiple readers of a single resource, readers and writers, etc.)

Monday, March 11, 2013

What about data movement?

I received a recent question from someone quite familiar with the fields of high-performance computing and data flow, so I figured others might share his same concerns, and answering here might kill more birds than a personal reply. (Just in case there are privacy concerns, I'll not cite his precise name or words.)

It included statements with which I wholeheartedly agree -- e.g. that data movement is very expensive in modern compute environments (e.g. relative to local computation). He says that the F-Net model doesn't say anything about when and where data should move, which he says is required of any modern model.

It is, in fact, a central goal of F-Nets (and ScalPL) to put as few constraints as possible on how and when data might move in order to carry out the specified plan (aka execute the program, evaluate the mapping or calculation) -- but enough constraints so that the plan can be efficiently implemented on different platforms (e.g. shared and distributed memory) by knowing (in advance, when possible) when colocation of data and program/plan fragments must occur. Even the term "data movement" might be confusing here, in that the model doesn't preclude moving parts of the plan itself instead of the data upon which those parts operate -- but those, too, can be considered as data during movement. F-Nets/ScalPL is all about expressing (without overexpressing) those constraints -- i.e. the kinds of access each subcomputation (acton) will need to which data (resources), and (assuming that each subcomputation (acton) is to be statically evaluated without migration) the data and program fragments that need to be colocated for this to occur. It is designed to be neither too much nor too little information.

One can be forgiven for assuming that satisfactory forms of portable algorithmic expression already exist, e.g. functional or dataflow languages. As a brief overview: Languages and models fall broadly into control flow and data flow camps, where the former defines what gets to evaluate next based on a program counter (the functions called or objects invoked/messaged), and the latter defines it based on which data is available to compute with. Functional languages are closely related to dataflow languages, but the interpretation of a variable is slightly different, as simply a shorthand for an entity like a numeric value or subfunction, rather than a dynamic entity which becomes filled/defined or not. Traditional parallel constructs (e.g. messages or shared memory) are often sort of a hybrid between control and data flow, where there are several locusts of control ("threads") that are selectively intentionally blocked when necessary to wait for data to become available (from another thread).

Data flow approaches are traditionally facilitated by single-assignment variables (or arcs) that change state to signify when data is available there for use by downstream calculation. (Lazy and non-strict evaluation can allow evaluation to progress even when some of the potentially-needed data isn't available.) But as indicated by the nomenclature, the variables can be assigned to only once (or simply used/considered as shorthand for another expression), in essence ignoring a central feature of virtually all computing hardware, including those in nature: Mutable memory, the ability to change the contents or character of something stored without changing where it is stored. As a result, when only part of a large datum is being altered in these languages, the entire result must be defined in terms of the entire previous value -- very unlike what needs to actually happen internally for any efficiency. That is, assignment to a data flow variable has nothing much to do with updating storage, but rather with signaling the availability of the data to some part of the program. Because this signaling is integrated into the assignment, it is impossible (or unnatural) to signal availability without assignment, or to assign without signaling availability.

These shortcomings are not always obvious in examples. For example, functions (or algorithms) which are primarily mathematical in nature (e.g. produce a single deterministic result from many arguments), and which don't benefit by partial assignment and/or update in place don't really need anything more that traditional data flow. However, in real world programming, operations often produce multiple results, perhaps each destined for different downstream computations, and each of those results often provides only part of an overall datum, a datum which may be partially or completely updated by several computations during its lifetime -- plus, any particular datum might best be updated on a first-come first-served basis, rather than some deterministic order. These are not so natural for data flow languages and/or approaches.

F-Nets and ScalPL overcome these constraints fairly directly by providing separate constructs for the control state (i.e. accessibility) and the content state (i.e. content) of a variable. The control state is related to the flow of control, and content state, the flow of data, and like traditional data flow models, both are related to the variable (called a resource), rather than to an independent control flow. By allowing manipulation of control state and content state explicitly and independently, flexibility, efficiency, and portability ensue.

It does introduce some complications. Can a single construct still really represent either broadcast (or multicast) and shared readers? How about buffering on distributed systems while avoiding extra overhead of queuing in shared memory? Read and write locks? (Can making an extra copy be made semantically and syntactically identical to blocking writers waiting for readers to finish?) And all in the presence of non-terminating functions and Turing undecidability? And while hiding latency on platforms and applications where that applies? It takes some finesse in definitions, but it can all be done, and these fine points are the essence of F-Nets.

OK, fine, even assuming I've convinced you that F-Nets/ScalPL is a great portable representation, it still begs the original question: How, when, and by what, does the issue of data movement actually get settled? I'll address some of that in the next entry. (A more complete explanation is in chapters 4 and 5 of the book.)