# HG changeset patch
# User Paul Boddie pmap
function.
For a brief summary of each of the features of pprocess
, see
the reference document.
The way pprocess
uses multiple processes to perform work in
+parallel involves the fork
system call, which on modern operating
+systems involves what is known as "copy-on-write" semantics. In plain language,
+when pprocess
creates a new child process to perform work
+in parallel with other work that needs to be done, this new process will be a
+near-identical copy of the original parent process, and the running
+code will be able to access data resident in that parent process.
However, when a child process modifies data, instead of changing that data +in such a way that the parent process can see the modifications, the parent +process will, in fact, remain oblivious to such changes. What happens is that +as soon as the child process attempts to modify the data, it obtains its own +separate copy which is then modified independently of the original data. Thus, +a copy of any data is made when an attempt is made to write +to such data. Meanwhile, the parent's copy of that data will be left untouched +by the activities of the child.
+ +It is therefore essential to note that any data distributed to other +processes, and which will then be modified by those processes, will not appear +to change in the parent process even if the objects employed are mutable. This +is rather different to the behaviour of a normal Python program: passing a +list to a function, for example, mutates that list in such a way that upon +returning from that function the modifications will still be present. For +example:
+ ++def mutator(l): + l.append(3) + +l = [1, 2] +mutator(l) # l is now [1, 2, 3] ++ +
In contrast, passing a list to a child process will cause the list to +mutate in the child process, but the parent process will not see the list +change. For example:
+ ++def mutator(l): + l.append(3) + +results = pprocess.Map() +mutator = results.manage(pprocess.MakeParallel(mutator)) + +l = [1, 2] +mutator(l) # l is now [1, 2] ++ +
To communicate changes to data between processes, the modified objects must +be explicitly returned from child processes using the mechanisms described in +this documentation. For example:
+ ++def mutator(l): + l.append(3) + return l # the modified object is explicitly returned + +results = pprocess.Map() +mutator = results.manage(pprocess.MakeParallel(mutator)) + +l = [1, 2] +mutator(l) + +all_l = results[:] # there are potentially many results, not just one +l = all_l[0] # l is now [1, 2, 3], taken from the first result ++ +
It is perhaps easiest to think of the communications mechanisms as +providing a gateway between processes through which information can be passed, +with the rest of a program's data being private and hidden from the other +processes (even if that data initially resembles what the other processes also +see within themselves).
+Consider a program using the built-in map
function and a sequence of inputs: