Elemental Parallelism: What about data movement?

I received a recent question from someone quite familiar with the fields of high-performance computing and data flow, so I figured others might share his same concerns, and answering here might kill more birds than a personal reply. (Just in case there are privacy concerns, I'll not cite his precise name or words.)

It included statements with which I wholeheartedly agree -- e.g. that data movement is very expensive in modern compute environments (e.g. relative to local computation). He says that the F-Net model doesn't say anything about when and where data should move, which he says is required of any modern model.

It is, in fact, a central goal of F-Nets (and ScalPL) to put as few constraints as possible on how and when data might move in order to carry out the specified plan (aka execute the program, evaluate the mapping or calculation) -- but enough constraints so that the plan can be efficiently implemented on different platforms (e.g. shared and distributed memory) by knowing (in advance, when possible) when colocation of data and program/plan fragments must occur. Even the term "data movement" might be confusing here, in that the model doesn't preclude moving parts of the plan itself instead of the data upon which those parts operate -- but those, too, can be considered as data during movement. F-Nets/ScalPL is all about expressing (without overexpressing) those constraints -- i.e. the kinds of access each subcomputation (acton) will need to which data (resources), and (assuming that each subcomputation (acton) is to be statically evaluated without migration) the data and program fragments that need to be colocated for this to occur. It is designed to be neither too much nor too little information.

One can be forgiven for assuming that satisfactory forms of portable algorithmic expression already exist, e.g. functional or dataflow languages. As a brief overview: Languages and models fall broadly into control flow and data flow camps, where the former defines what gets to evaluate next based on a program counter (the functions called or objects invoked/messaged), and the latter defines it based on which data is available to compute with. Functional languages are closely related to dataflow languages, but the interpretation of a variable is slightly different, as simply a shorthand for an entity like a numeric value or subfunction, rather than a dynamic entity which becomes filled/defined or not. Traditional parallel constructs (e.g. messages or shared memory) are often sort of a hybrid between control and data flow, where there are several locusts of control ("threads") that are selectively intentionally blocked when necessary to wait for data to become available (from another thread).

Data flow approaches are traditionally facilitated by single-assignment variables (or arcs) that change state to signify when data is available there for use by downstream calculation. (Lazy and non-strict evaluation can allow evaluation to progress even when some of the potentially-needed data isn't available.) But as indicated by the nomenclature, the variables can be assigned to only once (or simply used/considered as shorthand for another expression), in essence ignoring a central feature of virtually all computing hardware, including those in nature: Mutable memory, the ability to change the contents or character of something stored without changing where it is stored. As a result, when only part of a large datum is being altered in these languages, the entire result must be defined in terms of the entire previous value -- very unlike what needs to actually happen internally for any efficiency. That is, assignment to a data flow variable has nothing much to do with updating storage, but rather with signaling the availability of the data to some part of the program. Because this signaling is integrated into the assignment, it is impossible (or unnatural) to signal availability without assignment, or to assign without signaling availability.

These shortcomings are not always obvious in examples. For example, functions (or algorithms) which are primarily mathematical in nature (e.g. produce a single deterministic result from many arguments), and which don't benefit by partial assignment and/or update in place don't really need anything more that traditional data flow. However, in real world programming, operations often produce multiple results, perhaps each destined for different downstream computations, and each of those results often provides only part of an overall datum, a datum which may be partially or completely updated by several computations during its lifetime -- plus, any particular datum might best be updated on a first-come first-served basis, rather than some deterministic order. These are not so natural for data flow languages and/or approaches.

F-Nets and ScalPL overcome these constraints fairly directly by providing separate constructs for the control state (i.e. accessibility) and the content state (i.e. content) of a variable. The control state is related to the flow of control, and content state, the flow of data, and like traditional data flow models, both are related to the variable (called a resource), rather than to an independent control flow. By allowing manipulation of control state and content state explicitly and independently, flexibility, efficiency, and portability ensue.

It does introduce some complications. Can a single construct still really represent either broadcast (or multicast) and shared readers? How about buffering on distributed systems while avoiding extra overhead of queuing in shared memory? Read and write locks? (Can making an extra copy be made semantically and syntactically identical to blocking writers waiting for readers to finish?) And all in the presence of non-terminating functions and Turing undecidability? And while hiding latency on platforms and applications where that applies? It takes some finesse in definitions, but it can all be done, and these fine points are the essence of F-Nets.

OK, fine, even assuming I've convinced you that F-Nets/ScalPL is a great portable representation, it still begs the original question: How, when, and by what, does the issue of data movement actually get settled? I'll address some of that in the next entry. (A more complete explanation is in chapters 4 and 5 of the book.)

Elemental Parallelism

Monday, March 11, 2013

What about data movement?

No comments: