1.1 --- a/docs/tutorial.xhtml Sat Sep 15 23:46:55 2007 +0000
1.2 +++ b/docs/tutorial.xhtml Sat Sep 15 23:47:05 2007 +0000
1.3 @@ -3,17 +3,17 @@
1.4 <head>
1.5 <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
1.6 <title>pprocess - Tutorial</title>
1.7 + <link href="styles.css" rel="stylesheet" type="text/css" />
1.8 </head>
1.9 <body>
1.10
1.11 <h1>pprocess - Tutorial</h1>
1.12
1.13 -<p>The <code>pprocess</code>
1.14 -module provides several mechanisms for running Python code concurrently
1.15 -in several processes. The most straightforward way of making a program
1.16 -parallel-aware - that is, where the program can take advantage of more
1.17 -than one processor to simultaneously process data - is to use the
1.18 -<code>pmap</code> function.</p>
1.19 +<p>The <code>pprocess</code> module provides several mechanisms for running
1.20 +Python code concurrently in several processes. The most straightforward way of
1.21 +making a program parallel-aware - that is, where the program can take
1.22 +advantage of more than one processor to simultaneously process data - is to
1.23 +use the <code>pmap</code> function.</p>
1.24
1.25 <h2>Converting Map-Style Code</h2>
1.26
1.27 @@ -42,16 +42,30 @@
1.28
1.29 print "Time taken:", time.time() - t</pre>
1.30
1.31 -<p>(This code in context with <code>import</code> statements and functions is found in the <code>examples/simple_map.py</code> file.)</p>
1.32 +<p>(This code in context with <code>import</code> statements and functions is
1.33 +found in the <code>examples/simple_map.py</code> file.)</p>
1.34 +
1.35 +<p>The principal features of this program involve the preparation of an array
1.36 +for input purposes, and the use of the <code>map</code> function to iterate
1.37 +over the combinations of <code>i</code> and <code>j</code> in the array. Even
1.38 +if the <code>calculate</code> function could be invoked independently for each
1.39 +input value, we have to wait for each computation to complete before
1.40 +initiating a new one. The <code>calculate</code> function may be defined as
1.41 +follows:</p>
1.42
1.43 -<p>The principal features of this program involve the preparation of an array for input purposes, and the use of the <code>map</code> function to iterate over the combinations of <code>i</code> and <code>j</code> in the array. Even if the <code>calculate</code>
1.44 -function could be invoked independently for each input value, we have
1.45 -to wait for each computation to complete before initiating a new
1.46 -one.</p>
1.47 +<pre>
1.48 +def calculate(t):
1.49 +
1.50 + "A supposedly time-consuming calculation on 't'."
1.51
1.52 -<p>In order to reduce the processing time - to speed the code up,
1.53 -in other words - we can make this code use several processes instead of
1.54 -just one. Here is the modified code:</p>
1.55 + i, j = t
1.56 + time.sleep(delay)
1.57 + return i * N + j
1.58 +</pre>
1.59 +
1.60 +<p>In order to reduce the processing time - to speed the code up, in other
1.61 +words - we can make this code use several processes instead of just one. Here
1.62 +is the modified code:</p>
1.63
1.64 <pre>
1.65 t = time.time()
1.66 @@ -76,15 +90,20 @@
1.67
1.68 print "Time taken:", time.time() - t</pre>
1.69
1.70 -<p>(This code in context with <code>import</code> statements and functions is found in the <code>examples/simple_pmap.py</code> file.)</p>
1.71 +<p>(This code in context with <code>import</code> statements and functions is
1.72 +found in the <code>examples/simple_pmap.py</code> file.)</p>
1.73
1.74 -<p>By replacing usage of the <code>map</code> function with the <code>pprocess.pmap</code>
1.75 -function, and specifying the limit on the number of processes to be active at any
1.76 -given time, several calculations can now be performed in parallel.</p>
1.77 +<p>By replacing usage of the <code>map</code> function with the
1.78 +<code>pprocess.pmap</code> function, and specifying the limit on the number of
1.79 +processes to be active at any given time (the value of the <code>limit</code>
1.80 +variable is defined elsewhere), several calculations can now be performed in
1.81 +parallel.</p>
1.82
1.83 <h2>Converting Invocations to Parallel Operations</h2>
1.84
1.85 -<p>Although some programs make natural use of the <code>map</code> function, others may employ an invocation in a nested loop. This may also be converted to a parallel program. Consider the following Python code:</p>
1.86 +<p>Although some programs make natural use of the <code>map</code> function,
1.87 +others may employ an invocation in a nested loop. This may also be converted
1.88 +to a parallel program. Consider the following Python code:</p>
1.89
1.90 <pre>
1.91 t = time.time()
1.92 @@ -109,23 +128,41 @@
1.93
1.94 print "Time taken:", time.time() - t</pre>
1.95
1.96 -<p>(This code in context with <code>import</code> statements and functions is found in the <code>examples/simple1.py</code> file.)</p>
1.97 +<p>(This code in context with <code>import</code> statements and functions is
1.98 +found in the <code>examples/simple1.py</code> file.)</p>
1.99 +
1.100 +<p>Here, a computation in the <code>calculate</code> function is performed for
1.101 +each combination of <code>i</code> and <code>j</code> in the nested loop,
1.102 +returning a result value. However, we must wait for the completion of this
1.103 +function for each element before moving on to the next element, and this means
1.104 +that the computations are performed sequentially. Consequently, on a system
1.105 +with more than one processor, even if we could call <code>calculate</code> for
1.106 +more than one combination of <code>i</code> and <code>j</code><code></code>
1.107 +and have the computations executing at the same time, the above program will
1.108 +not take advantage of such capabilities.</p>
1.109
1.110 -<p>Here, a computation in the <code>calculate</code> function is performed for each combination of <code>i</code> and <code>j</code>
1.111 -in the nested loop, returning a result value. However, we must wait for
1.112 -the completion of this function for each element before moving on to
1.113 -the next element, and this means that the computations are performed
1.114 -sequentially. Consequently, on a system with more than one processor,
1.115 -even if we could call <code>calculate</code> for more than one combination of <code>i</code> and <code>j</code><code></code> and have the computations executing at the same time, the above program will not take advantage of such capabilities.</p>
1.116 +<p>We use a slightly modified version of <code>calculate</code> which employs
1.117 +two parameters instead of one:</p>
1.118 +
1.119 +<pre>
1.120 +def calculate(i, j):
1.121
1.122 -<p>In order to reduce the processing time - to speed the code up,
1.123 -in other words - we can make this code use several processes instead of
1.124 -just one. Here is the modified code:</p>
1.125 + """
1.126 + A supposedly time-consuming calculation on 'i' and 'j'.
1.127 + """
1.128 +
1.129 + time.sleep(delay)
1.130 + return i * N + j
1.131 +</pre>
1.132 +
1.133 +<p>In order to reduce the processing time - to speed the code up, in other
1.134 +words - we can make this code use several processes instead of just one. Here
1.135 +is the modified code:</p>
1.136
1.137 <pre>
1.138 t = time.time()
1.139
1.140 - # Initialise the results using map with a limit on the number of
1.141 + # Initialise the results using a map with a limit on the number of
1.142 # channels/processes.
1.143
1.144 <strong>results = pprocess.Map(limit=limit)</strong><code></code>
1.145 @@ -150,12 +187,343 @@
1.146
1.147 print "Time taken:", time.time() - t</pre>
1.148
1.149 -<p>(This code in context with <code>import</code> statements and functions is found in the <code>examples/simple_manage_map.py</code> file.)</p>
1.150 +<p>(This code in context with <code>import</code> statements and functions is
1.151 +found in the <code>examples/simple_manage_map.py</code> file.)</p>
1.152 +
1.153 +<p>The principal changes in the above code involve the use of a
1.154 +<code>pprocess.Map</code> object to collect the results, and a version of the
1.155 +<code>calculate</code> function which is managed by the <code>Map</code>
1.156 +object. What the <code>Map</code> object does is to arrange the results of
1.157 +computations such that iterating over the object or accessing the object using
1.158 +list operations provides the results in the same order as their corresponding
1.159 +inputs.</p>
1.160 +
1.161 +<h2>Converting Arbitrarily-Ordered Invocations</h2>
1.162 +
1.163 +<p>In some programs, it is not important to receive the results of
1.164 +computations in any particular order, usually because either the order of
1.165 +these results is irrelevant, or because the results provide "positional"
1.166 +information which let them be handled in an appropriate way. Consider the
1.167 +following Python code:</p>
1.168 +
1.169 +<pre>
1.170 + t = time.time()
1.171 +
1.172 + # Initialise an array.
1.173 +
1.174 + results = [0] * N * N
1.175 +
1.176 + # Perform the work.
1.177 +
1.178 + print "Calculating..."
1.179 + for i in range(0, N):
1.180 + for j in range(0, N):
1.181 + i2, j2, result = calculate(i, j)
1.182 + results[i2*N+j2] = result
1.183 +
1.184 + # Show the results.
1.185 +
1.186 + for i in range(0, N):
1.187 + for result in results[i*N:i*N+N]:
1.188 + print result,
1.189 + print
1.190 +
1.191 + print "Time taken:", time.time() - t
1.192 +</pre>
1.193 +
1.194 +<p>(This code in context with <code>import</code> statements and functions is
1.195 +found in the <code>examples/simple2.py</code> file.)</p>
1.196 +
1.197 +<p>Here, a result array is initialised first and each computation is performed
1.198 +sequentially. A significant difference to the previous examples is the return
1.199 +value of the <code>calculate</code> function: the position details
1.200 +corresponding to <code>i</code> and <code>j</code> are returned alongside the
1.201 +result. Obviously, this is of limited value in the above code because the
1.202 +order of the computations and the reception of results is fixed. However, we
1.203 +get no benefit from parallelisation in the above example.</p>
1.204 +
1.205 +<p>We can bring the benefits of parallel processing to the above program with
1.206 +the following code:</p>
1.207 +
1.208 +<pre>
1.209 + t = time.time()
1.210 +
1.211 + # Initialise the communications queue with a limit on the number of
1.212 + # channels/processes.
1.213 +
1.214 + <strong>queue = pprocess.Queue(limit=limit)</strong>
1.215 +
1.216 + # Initialise an array.
1.217 +
1.218 + results = [0] * N * N
1.219 +
1.220 + # Wrap the calculate function and manage it.
1.221 +
1.222 + <strong>calc = queue.manage(pprocess.MakeParallel(calculate))</strong>
1.223 +
1.224 + # Perform the work.
1.225 +
1.226 + print "Calculating..."
1.227 + for i in range(0, N):
1.228 + for j in range(0, N):
1.229 + <strong>calc(i, j)</strong>
1.230 +
1.231 + # Store the results as they arrive.
1.232 +
1.233 + print "Finishing..."
1.234 + <strong>for i, j, result in queue:</strong>
1.235 + <strong>results[i*N+j] = result</strong>
1.236 +
1.237 + # Show the results.
1.238 +
1.239 + for i in range(0, N):
1.240 + for result in results[i*N:i*N+N]:
1.241 + print result,
1.242 + print
1.243 +
1.244 + print "Time taken:", time.time() - t
1.245 +</pre>
1.246 +
1.247 +<p>(This code in context with <code>import</code> statements and functions is
1.248 +found in the <code>examples/simple_managed_queue.py</code> file.)</p>
1.249 +
1.250 +<p>This revised code employs a <code>pprocess.Queue</code> object whose
1.251 +purpose is to collect the results of computations and to make them available
1.252 +in the order in which they were received. The code collecting results has been
1.253 +moved into a separate loop independent of the original computation loop and
1.254 +taking advantage of the more relevant "positional" information emerging from
1.255 +the queue.</p>
1.256 +
1.257 +<p>We can take this example further, illustrating some of the mechanisms
1.258 +employed by <code>pprocess</code>. Instead of collecting results in a queue,
1.259 +we can define a class containing a method which is called when new results
1.260 +arrive:</p>
1.261 +
1.262 +<pre>
1.263 +class MyExchange(pprocess.Exchange):
1.264 +
1.265 + "Parallel convenience class containing the array assignment operation."
1.266 +
1.267 + def store_data(self, ch):
1.268 + i, j, result = ch.receive()
1.269 + self.D[i*N+j] = result
1.270 +</pre>
1.271 +
1.272 +<p>This code exposes the channel paradigm which is used throughout
1.273 +<code>pprocess</code> and is available to applications, if desired. The effect
1.274 +of the method is the storage of a result received through the channel in an
1.275 +attribute of the object. The following code shows how this class can be used,
1.276 +with differences to the previous program illustrated:</p>
1.277 +
1.278 +<pre>
1.279 + t = time.time()
1.280 +
1.281 + # Initialise the communications exchange with a limit on the number of
1.282 + # channels/processes.
1.283 +
1.284 + <strong>exchange = MyExchange(limit=limit)</strong>
1.285 +
1.286 + # Initialise an array - it is stored in the exchange to permit automatic
1.287 + # assignment of values as the data arrives.
1.288 +
1.289 + <strong>results = exchange.D = [0] * N * N</strong>
1.290 +
1.291 + # Wrap the calculate function and manage it.
1.292 +
1.293 + calc = <strong>exchange</strong>.manage(pprocess.MakeParallel(calculate))
1.294 +
1.295 + # Perform the work.
1.296 +
1.297 + print "Calculating..."
1.298 + for i in range(0, N):
1.299 + for j in range(0, N):
1.300 + calc(i, j)
1.301 +
1.302 + # Wait for the results.
1.303 +
1.304 + print "Finishing..."
1.305 + <strong>exchange.finish()</strong>
1.306 +
1.307 + # Show the results.
1.308 +
1.309 + for i in range(0, N):
1.310 + for result in results[i*N:i*N+N]:
1.311 + print result,
1.312 + print
1.313 +
1.314 + print "Time taken:", time.time() - t
1.315 +</pre>
1.316
1.317 -<p>The principal changes in the above code involve the use of a <code>pprocess.Map</code> object to collect the results, and a version of the <code>calculate</code> function which is managed by the <code>Map</code> object. What the <code>Map</code>
1.318 -object does is to arrange the results of computations such that
1.319 -iterating over the object or accessing the object using list operations
1.320 -provides the results in the same order as their corresponding inputs.</p>
1.321 +<p>(This code in context with <code>import</code> statements and functions is
1.322 +found in the <code>examples/simple_managed.py</code> file.)</p>
1.323 +
1.324 +<p>The main visible differences between this and the previous program are the
1.325 +storage of the result array in the exchange, the removal of the queue
1.326 +consumption code from the main program, placing the act of storing values in
1.327 +the exchange's <code>store_data</code> method, and the need to call the
1.328 +<code>finish</code> method on the <code>MyExchange</code> object so that we do
1.329 +not try and access the results too soon. One underlying benefit not visible in
1.330 +the above code is that we no longer need to accumulate results in a queue or
1.331 +other structure so that they may be processed and assigned to the correct
1.332 +positions in the result array.</p>
1.333 +
1.334 +<p>For the curious, we may remove some of the remaining conveniences of the
1.335 +above program to expose other features of <code>pprocess</code>. First, we
1.336 +define a slightly modified version of the <code>calculate</code> function:</p>
1.337 +
1.338 +<pre>
1.339 +def calculate(ch, i, j):
1.340 +
1.341 + """
1.342 + A supposedly time-consuming calculation on 'i' and 'j', using 'ch' to
1.343 + communicate with the parent process.
1.344 + """
1.345 +
1.346 + time.sleep(delay)
1.347 + ch.send((i, j, i * N + j))
1.348 +</pre>
1.349 +
1.350 +<p>This function accepts a channel, <code>ch</code>, through which results
1.351 +will be sent, and through which other values could potentially be received,
1.352 +although we choose not to do so here. The program using this function is as
1.353 +follows, with differences to the previous program illustrated:</p>
1.354 +
1.355 +<pre>
1.356 + t = time.time()
1.357 +
1.358 + # Initialise the communications exchange with a limit on the number of
1.359 + # channels/processes.
1.360 +
1.361 + exchange = MyExchange(limit=limit)
1.362 +
1.363 + # Initialise an array - it is stored in the exchange to permit automatic
1.364 + # assignment of values as the data arrives.
1.365 +
1.366 + results = exchange.D = [0] * N * N
1.367 +
1.368 + # Perform the work.
1.369 +
1.370 + print "Calculating..."
1.371 + for i in range(0, N):
1.372 + for j in range(0, N):
1.373 + <strong>exchange.start(calculate, i, j)</strong>
1.374 +
1.375 + # Wait for the results.
1.376 +
1.377 + print "Finishing..."
1.378 + exchange.finish()
1.379 +
1.380 + # Show the results.
1.381 +
1.382 + for i in range(0, N):
1.383 + for result in results[i*N:i*N+N]:
1.384 + print result,
1.385 + print
1.386 +
1.387 + print "Time taken:", time.time() - t
1.388 +</pre>
1.389 +
1.390 +<p>(This code in context with <code>import</code> statements and functions is
1.391 +found in the <code>examples/simple_start.py</code> file.)</p>
1.392 +
1.393 +<p>Here, we have discarded two conveniences: the wrapping of callables using
1.394 +<code>MakeParallel</code>, which lets us use functions without providing any
1.395 +channel parameters, and the management of callables using the
1.396 +<code>manage</code> method on queues, exchanges, and so on. The
1.397 +<code>start</code> method still calls the provided callable, but using a
1.398 +different notation from that employed previously.</p>
1.399 +
1.400 +<h2>Converting Inline Computations</h2>
1.401 +
1.402 +<p>Although many programs employ functions and other useful abstractions which
1.403 +can be treated as parallelisable units, some programs perform computations
1.404 +"inline", meaning that the code responsible appears directly within a loop or
1.405 +related control-flow construct. Consider the following code:</p>
1.406 +
1.407 +<pre>
1.408 + t = time.time()
1.409 +
1.410 + # Initialise an array.
1.411 +
1.412 + results = [0] * N * N
1.413 +
1.414 + # Perform the work.
1.415 +
1.416 + print "Calculating..."
1.417 + for i in range(0, N):
1.418 + for j in range(0, N):
1.419 + time.sleep(delay)
1.420 + results[i*N+j] = i * N + j
1.421 +
1.422 + # Show the results.
1.423 +
1.424 + for i in range(0, N):
1.425 + for result in results[i*N:i*N+N]:
1.426 + print result,
1.427 + print
1.428 +
1.429 + print "Time taken:", time.time() - t
1.430 +</pre>
1.431 +
1.432 +<p>(This code in context with <code>import</code> statements and functions is
1.433 +found in the <code>examples/simple.py</code> file.)</p>
1.434 +
1.435 +<p>To simulate "work", as in the different versions of the
1.436 +<code>calculate</code> function, we use the <code>time.sleep</code> function
1.437 +(which does not actually do work, and which will cause a process to be
1.438 +descheduled in most cases, but which simulates the delay associated with work
1.439 +being done). This inline work, which must be performed sequentially in the
1.440 +above program, can be performed in parallel in a somewhat modified version of
1.441 +the program:</p>
1.442 +
1.443 +<pre>
1.444 + t = time.time()
1.445 +
1.446 + # Initialise the results using a map with a limit on the number of
1.447 + # channels/processes.
1.448 +
1.449 + <strong>results = pprocess.Map(limit=limit)</strong>
1.450 +
1.451 + # Perform the work.
1.452 + # NOTE: Could use the with statement in the loop to package the
1.453 + # NOTE: try...finally functionality.
1.454 +
1.455 + print "Calculating..."
1.456 + for i in range(0, N):
1.457 + for j in range(0, N):
1.458 + <strong>ch = results.create()</strong>
1.459 + <strong>if ch:</strong>
1.460 + <strong>try: # Calculation work.</strong>
1.461 +
1.462 + time.sleep(delay)
1.463 + <strong>ch.send(i * N + j)</strong>
1.464 +
1.465 + <strong>finally: # Important finalisation.</strong>
1.466 +
1.467 + <strong>pprocess.exit(ch)</strong>
1.468 +
1.469 + # Show the results.
1.470 +
1.471 + for i in range(0, N):
1.472 + for result in results[i*N:i*N+N]:
1.473 + print result,
1.474 + print
1.475 +
1.476 + print "Time taken:", time.time() - t
1.477 +</pre>
1.478 +
1.479 +<p>(This code in context with <code>import</code> statements and functions is
1.480 +found in the <code>examples/simple_create_map.py</code> file.)</p>
1.481 +
1.482 +<p>Although seemingly more complicated, the bulk of the changes in this
1.483 +modified program are focused on obtaining a channel object, <code>ch</code>,
1.484 +at the point where the computations are performed, and the wrapping of the
1.485 +computation code in a <code>try</code>...<code>finally</code> statement which
1.486 +ensures that the process associated with the channel exits when the
1.487 +computation is complete. In order for the results of these computations to be
1.488 +collected, a <code>pprocess.Map</code> object is used, since it will maintain
1.489 +the results in the same order as the initiation of the computations which
1.490 +produced them.</p>
1.491
1.492 </body>
1.493 </html>