1.1 --- a/docs/tutorial.xhtml	Sat Sep 15 23:46:55 2007 +0000
     1.2 +++ b/docs/tutorial.xhtml	Sat Sep 15 23:47:05 2007 +0000
     1.3 @@ -3,17 +3,17 @@
     1.4  <head>
     1.5    <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
     1.6    <title>pprocess - Tutorial</title>
     1.7 +  <link href="styles.css" rel="stylesheet" type="text/css" />
     1.8  </head>
     1.9  <body>
    1.10  
    1.11  <h1>pprocess - Tutorial</h1>
    1.12  
    1.13 -<p>The <code>pprocess</code>
    1.14 -module provides several mechanisms for running Python code concurrently
    1.15 -in several processes. The most straightforward way of making a program
    1.16 -parallel-aware - that is, where the program can take advantage of more
    1.17 -than one processor to simultaneously process data - is to use the
    1.18 -<code>pmap</code> function.</p>
    1.19 +<p>The <code>pprocess</code> module provides several mechanisms for running
    1.20 +Python code concurrently in several processes. The most straightforward way of
    1.21 +making a program parallel-aware - that is, where the program can take
    1.22 +advantage of more than one processor to simultaneously process data - is to
    1.23 +use the <code>pmap</code> function.</p>
    1.24  
    1.25  <h2>Converting Map-Style Code</h2>
    1.26  
    1.27 @@ -42,16 +42,30 @@
    1.28  
    1.29      print "Time taken:", time.time() - t</pre>
    1.30  
    1.31 -<p>(This code in context with <code>import</code> statements and functions is found in the <code>examples/simple_map.py</code> file.)</p>
    1.32 +<p>(This code in context with <code>import</code> statements and functions is
    1.33 +found in the <code>examples/simple_map.py</code> file.)</p>
    1.34 +
    1.35 +<p>The principal features of this program involve the preparation of an array
    1.36 +for input purposes, and the use of the <code>map</code> function to iterate
    1.37 +over the combinations of <code>i</code> and <code>j</code> in the array. Even
    1.38 +if the <code>calculate</code> function could be invoked independently for each
    1.39 +input value, we have to wait for each computation to complete before
    1.40 +initiating a new one. The <code>calculate</code> function may be defined as
    1.41 +follows:</p>
    1.42  
    1.43 -<p>The principal features of this program involve the preparation of an array for input purposes, and the use of the <code>map</code> function to iterate over the combinations of <code>i</code> and <code>j</code> in the array. Even if the <code>calculate</code>
    1.44 -function could be invoked independently for each input value, we have
    1.45 -to wait for each computation to complete before initiating a new
    1.46 -one.</p>
    1.47 +<pre>
    1.48 +def calculate(t):
    1.49 +
    1.50 +    "A supposedly time-consuming calculation on 't'."
    1.51  
    1.52 -<p>In order to reduce the processing time - to speed the code up,
    1.53 -in other words - we can make this code use several processes instead of
    1.54 -just one. Here is the modified code:</p>
    1.55 +    i, j = t
    1.56 +    time.sleep(delay)
    1.57 +    return i * N + j
    1.58 +</pre>
    1.59 +
    1.60 +<p>In order to reduce the processing time - to speed the code up, in other
    1.61 +words - we can make this code use several processes instead of just one. Here
    1.62 +is the modified code:</p>
    1.63  
    1.64  <pre>
    1.65      t = time.time()
    1.66 @@ -76,15 +90,20 @@
    1.67  
    1.68      print "Time taken:", time.time() - t</pre>
    1.69  
    1.70 -<p>(This code in context with <code>import</code> statements and functions is found in the <code>examples/simple_pmap.py</code> file.)</p>
    1.71 +<p>(This code in context with <code>import</code> statements and functions is
    1.72 +found in the <code>examples/simple_pmap.py</code> file.)</p>
    1.73  
    1.74 -<p>By replacing usage of the <code>map</code> function with the <code>pprocess.pmap</code>
    1.75 -function, and specifying the limit on the number of processes to be active at any
    1.76 -given time, several calculations can now be performed in parallel.</p>
    1.77 +<p>By replacing usage of the <code>map</code> function with the
    1.78 +<code>pprocess.pmap</code> function, and specifying the limit on the number of
    1.79 +processes to be active at any given time (the value of the <code>limit</code>
    1.80 +variable is defined elsewhere), several calculations can now be performed in
    1.81 +parallel.</p>
    1.82  
    1.83  <h2>Converting Invocations to Parallel Operations</h2>
    1.84  
    1.85 -<p>Although some programs make natural use of the <code>map</code> function, others may employ an invocation in a nested loop. This may also be converted to a parallel program. Consider the following Python code:</p>
    1.86 +<p>Although some programs make natural use of the <code>map</code> function,
    1.87 +others may employ an invocation in a nested loop. This may also be converted
    1.88 +to a parallel program. Consider the following Python code:</p>
    1.89  
    1.90  <pre>
    1.91      t = time.time()
    1.92 @@ -109,23 +128,41 @@
    1.93  
    1.94      print "Time taken:", time.time() - t</pre>
    1.95  
    1.96 -<p>(This code in context with <code>import</code> statements and functions is found in the <code>examples/simple1.py</code> file.)</p>
    1.97 +<p>(This code in context with <code>import</code> statements and functions is
    1.98 +found in the <code>examples/simple1.py</code> file.)</p>
    1.99 +
   1.100 +<p>Here, a computation in the <code>calculate</code> function is performed for
   1.101 +each combination of <code>i</code> and <code>j</code> in the nested loop,
   1.102 +returning a result value. However, we must wait for the completion of this
   1.103 +function for each element before moving on to the next element, and this means
   1.104 +that the computations are performed sequentially. Consequently, on a system
   1.105 +with more than one processor, even if we could call <code>calculate</code> for
   1.106 +more than one combination of <code>i</code> and <code>j</code><code></code>
   1.107 +and have the computations executing at the same time, the above program will
   1.108 +not take advantage of such capabilities.</p>
   1.109  
   1.110 -<p>Here, a computation in the <code>calculate</code> function is performed for each combination of <code>i</code> and <code>j</code>
   1.111 -in the nested loop, returning a result value. However, we must wait for
   1.112 -the completion of this function for each element before moving on to
   1.113 -the next element, and this means that the computations are performed
   1.114 -sequentially. Consequently, on a system with more than one processor,
   1.115 -even if we could call <code>calculate</code> for more than one combination of <code>i</code> and <code>j</code><code></code> and have the computations executing at the same time, the above program will not take advantage of such capabilities.</p>
   1.116 +<p>We use a slightly modified version of <code>calculate</code> which employs
   1.117 +two parameters instead of one:</p>
   1.118 +
   1.119 +<pre>
   1.120 +def calculate(i, j):
   1.121  
   1.122 -<p>In order to reduce the processing time - to speed the code up,
   1.123 -in other words - we can make this code use several processes instead of
   1.124 -just one. Here is the modified code:</p>
   1.125 +    """
   1.126 +    A supposedly time-consuming calculation on 'i' and 'j'.
   1.127 +    """
   1.128 +
   1.129 +    time.sleep(delay)
   1.130 +    return i * N + j
   1.131 +</pre>
   1.132 +
   1.133 +<p>In order to reduce the processing time - to speed the code up, in other
   1.134 +words - we can make this code use several processes instead of just one. Here
   1.135 +is the modified code:</p>
   1.136  
   1.137  <pre>
   1.138      t = time.time()
   1.139  
   1.140 -    # Initialise the results using map with a limit on the number of
   1.141 +    # Initialise the results using a map with a limit on the number of
   1.142      # channels/processes.
   1.143  
   1.144      <strong>results = pprocess.Map(limit=limit)</strong><code></code>
   1.145 @@ -150,12 +187,343 @@
   1.146  
   1.147      print "Time taken:", time.time() - t</pre>
   1.148  
   1.149 -<p>(This code in context with <code>import</code> statements and functions is found in the <code>examples/simple_manage_map.py</code> file.)</p>
   1.150 +<p>(This code in context with <code>import</code> statements and functions is
   1.151 +found in the <code>examples/simple_manage_map.py</code> file.)</p>
   1.152 +
   1.153 +<p>The principal changes in the above code involve the use of a
   1.154 +<code>pprocess.Map</code> object to collect the results, and a version of the
   1.155 +<code>calculate</code> function which is managed by the <code>Map</code>
   1.156 +object. What the <code>Map</code> object does is to arrange the results of
   1.157 +computations such that iterating over the object or accessing the object using
   1.158 +list operations provides the results in the same order as their corresponding
   1.159 +inputs.</p>
   1.160 +
   1.161 +<h2>Converting Arbitrarily-Ordered Invocations</h2>
   1.162 +
   1.163 +<p>In some programs, it is not important to receive the results of
   1.164 +computations in any particular order, usually because either the order of
   1.165 +these results is irrelevant, or because the results provide "positional"
   1.166 +information which let them be handled in an appropriate way. Consider the
   1.167 +following Python code:</p>
   1.168 +
   1.169 +<pre>
   1.170 +    t = time.time()
   1.171 +
   1.172 +    # Initialise an array.
   1.173 +
   1.174 +    results = [0] * N * N
   1.175 +
   1.176 +    # Perform the work.
   1.177 +
   1.178 +    print "Calculating..."
   1.179 +    for i in range(0, N):
   1.180 +        for j in range(0, N):
   1.181 +            i2, j2, result = calculate(i, j)
   1.182 +            results[i2*N+j2] = result
   1.183 +
   1.184 +    # Show the results.
   1.185 +
   1.186 +    for i in range(0, N):
   1.187 +        for result in results[i*N:i*N+N]:
   1.188 +            print result,
   1.189 +        print
   1.190 +
   1.191 +    print "Time taken:", time.time() - t
   1.192 +</pre>
   1.193 +
   1.194 +<p>(This code in context with <code>import</code> statements and functions is
   1.195 +found in the <code>examples/simple2.py</code> file.)</p>
   1.196 +
   1.197 +<p>Here, a result array is initialised first and each computation is performed
   1.198 +sequentially. A significant difference to the previous examples is the return
   1.199 +value of the <code>calculate</code> function: the position details
   1.200 +corresponding to <code>i</code> and <code>j</code> are returned alongside the
   1.201 +result. Obviously, this is of limited value in the above code because the
   1.202 +order of the computations and the reception of results is fixed. However, we
   1.203 +get no benefit from parallelisation in the above example.</p>
   1.204 +
   1.205 +<p>We can bring the benefits of parallel processing to the above program with
   1.206 +the following code:</p>
   1.207 +
   1.208 +<pre>
   1.209 +    t = time.time()
   1.210 +
   1.211 +    # Initialise the communications queue with a limit on the number of
   1.212 +    # channels/processes.
   1.213 +
   1.214 +    <strong>queue = pprocess.Queue(limit=limit)</strong>
   1.215 +
   1.216 +    # Initialise an array.
   1.217 +
   1.218 +    results = [0] * N * N
   1.219 +
   1.220 +    # Wrap the calculate function and manage it.
   1.221 +
   1.222 +    <strong>calc = queue.manage(pprocess.MakeParallel(calculate))</strong>
   1.223 +
   1.224 +    # Perform the work.
   1.225 +
   1.226 +    print "Calculating..."
   1.227 +    for i in range(0, N):
   1.228 +        for j in range(0, N):
   1.229 +            <strong>calc(i, j)</strong>
   1.230 +
   1.231 +    # Store the results as they arrive.
   1.232 +
   1.233 +    print "Finishing..."
   1.234 +    <strong>for i, j, result in queue:</strong>
   1.235 +        <strong>results[i*N+j] = result</strong>
   1.236 +
   1.237 +    # Show the results.
   1.238 +
   1.239 +    for i in range(0, N):
   1.240 +        for result in results[i*N:i*N+N]:
   1.241 +            print result,
   1.242 +        print
   1.243 +
   1.244 +    print "Time taken:", time.time() - t
   1.245 +</pre>
   1.246 +
   1.247 +<p>(This code in context with <code>import</code> statements and functions is
   1.248 +found in the <code>examples/simple_managed_queue.py</code> file.)</p>
   1.249 +
   1.250 +<p>This revised code employs a <code>pprocess.Queue</code> object whose
   1.251 +purpose is to collect the results of computations and to make them available
   1.252 +in the order in which they were received. The code collecting results has been
   1.253 +moved into a separate loop independent of the original computation loop and
   1.254 +taking advantage of the more relevant "positional" information emerging from
   1.255 +the queue.</p>
   1.256 +
   1.257 +<p>We can take this example further, illustrating some of the mechanisms
   1.258 +employed by <code>pprocess</code>. Instead of collecting results in a queue,
   1.259 +we can define a class containing a method which is called when new results
   1.260 +arrive:</p>
   1.261 +
   1.262 +<pre>
   1.263 +class MyExchange(pprocess.Exchange):
   1.264 +
   1.265 +    "Parallel convenience class containing the array assignment operation."
   1.266 +
   1.267 +    def store_data(self, ch):
   1.268 +        i, j, result = ch.receive()
   1.269 +        self.D[i*N+j] = result
   1.270 +</pre>
   1.271 +
   1.272 +<p>This code exposes the channel paradigm which is used throughout
   1.273 +<code>pprocess</code> and is available to applications, if desired. The effect
   1.274 +of the method is the storage of a result received through the channel in an
   1.275 +attribute of the object. The following code shows how this class can be used,
   1.276 +with differences to the previous program illustrated:</p>
   1.277 +
   1.278 +<pre>
   1.279 +    t = time.time()
   1.280 +
   1.281 +    # Initialise the communications exchange with a limit on the number of
   1.282 +    # channels/processes.
   1.283 +
   1.284 +    <strong>exchange = MyExchange(limit=limit)</strong>
   1.285 +
   1.286 +    # Initialise an array - it is stored in the exchange to permit automatic
   1.287 +    # assignment of values as the data arrives.
   1.288 +
   1.289 +    <strong>results = exchange.D = [0] * N * N</strong>
   1.290 +
   1.291 +    # Wrap the calculate function and manage it.
   1.292 +
   1.293 +    calc = <strong>exchange</strong>.manage(pprocess.MakeParallel(calculate))
   1.294 +
   1.295 +    # Perform the work.
   1.296 +
   1.297 +    print "Calculating..."
   1.298 +    for i in range(0, N):
   1.299 +        for j in range(0, N):
   1.300 +            calc(i, j)
   1.301 +
   1.302 +    # Wait for the results.
   1.303 +
   1.304 +    print "Finishing..."
   1.305 +    <strong>exchange.finish()</strong>
   1.306 +
   1.307 +    # Show the results.
   1.308 +
   1.309 +    for i in range(0, N):
   1.310 +        for result in results[i*N:i*N+N]:
   1.311 +            print result,
   1.312 +        print
   1.313 +
   1.314 +    print "Time taken:", time.time() - t
   1.315 +</pre>
   1.316  
   1.317 -<p>The principal changes in the above code involve the use of a <code>pprocess.Map</code> object to collect the results, and a version of the <code>calculate</code> function which is managed by the <code>Map</code> object. What the <code>Map</code>
   1.318 -object does is to arrange the results of computations such that
   1.319 -iterating over the object or accessing the object using list operations
   1.320 -provides the results in the same order as their corresponding inputs.</p>
   1.321 +<p>(This code in context with <code>import</code> statements and functions is
   1.322 +found in the <code>examples/simple_managed.py</code> file.)</p>
   1.323 +
   1.324 +<p>The main visible differences between this and the previous program are the
   1.325 +storage of the result array in the exchange, the removal of the queue
   1.326 +consumption code from the main program, placing the act of storing values in
   1.327 +the exchange's <code>store_data</code> method, and the need to call the
   1.328 +<code>finish</code> method on the <code>MyExchange</code> object so that we do
   1.329 +not try and access the results too soon. One underlying benefit not visible in
   1.330 +the above code is that we no longer need to accumulate results in a queue or
   1.331 +other structure so that they may be processed and assigned to the correct
   1.332 +positions in the result array.</p>
   1.333 +
   1.334 +<p>For the curious, we may remove some of the remaining conveniences of the
   1.335 +above program to expose other features of <code>pprocess</code>. First, we
   1.336 +define a slightly modified version of the <code>calculate</code> function:</p>
   1.337 +
   1.338 +<pre>
   1.339 +def calculate(ch, i, j):
   1.340 +
   1.341 +    """
   1.342 +    A supposedly time-consuming calculation on 'i' and 'j', using 'ch' to
   1.343 +    communicate with the parent process.
   1.344 +    """
   1.345 +
   1.346 +    time.sleep(delay)
   1.347 +    ch.send((i, j, i * N + j))
   1.348 +</pre>
   1.349 +
   1.350 +<p>This function accepts a channel, <code>ch</code>, through which results
   1.351 +will be sent, and through which other values could potentially be received,
   1.352 +although we choose not to do so here. The program using this function is as
   1.353 +follows, with differences to the previous program illustrated:</p>
   1.354 +
   1.355 +<pre>
   1.356 +    t = time.time()
   1.357 +
   1.358 +    # Initialise the communications exchange with a limit on the number of
   1.359 +    # channels/processes.
   1.360 +
   1.361 +    exchange = MyExchange(limit=limit)
   1.362 +
   1.363 +    # Initialise an array - it is stored in the exchange to permit automatic
   1.364 +    # assignment of values as the data arrives.
   1.365 +
   1.366 +    results = exchange.D = [0] * N * N
   1.367 +
   1.368 +    # Perform the work.
   1.369 +
   1.370 +    print "Calculating..."
   1.371 +    for i in range(0, N):
   1.372 +        for j in range(0, N):
   1.373 +            <strong>exchange.start(calculate, i, j)</strong>
   1.374 +
   1.375 +    # Wait for the results.
   1.376 +
   1.377 +    print "Finishing..."
   1.378 +    exchange.finish()
   1.379 +
   1.380 +    # Show the results.
   1.381 +
   1.382 +    for i in range(0, N):
   1.383 +        for result in results[i*N:i*N+N]:
   1.384 +            print result,
   1.385 +        print
   1.386 +
   1.387 +    print "Time taken:", time.time() - t
   1.388 +</pre>
   1.389 +
   1.390 +<p>(This code in context with <code>import</code> statements and functions is
   1.391 +found in the <code>examples/simple_start.py</code> file.)</p>
   1.392 +
   1.393 +<p>Here, we have discarded two conveniences: the wrapping of callables using
   1.394 +<code>MakeParallel</code>, which lets us use functions without providing any
   1.395 +channel parameters, and the management of callables using the
   1.396 +<code>manage</code> method on queues, exchanges, and so on. The
   1.397 +<code>start</code> method still calls the provided callable, but using a
   1.398 +different notation from that employed previously.</p>
   1.399 +
   1.400 +<h2>Converting Inline Computations</h2>
   1.401 +
   1.402 +<p>Although many programs employ functions and other useful abstractions which
   1.403 +can be treated as parallelisable units, some programs perform computations
   1.404 +"inline", meaning that the code responsible appears directly within a loop or
   1.405 +related control-flow construct. Consider the following code:</p>
   1.406 +
   1.407 +<pre>
   1.408 +    t = time.time()
   1.409 +
   1.410 +    # Initialise an array.
   1.411 +
   1.412 +    results = [0] * N * N
   1.413 +
   1.414 +    # Perform the work.
   1.415 +
   1.416 +    print "Calculating..."
   1.417 +    for i in range(0, N):
   1.418 +        for j in range(0, N):
   1.419 +            time.sleep(delay)
   1.420 +            results[i*N+j] = i * N + j
   1.421 +
   1.422 +    # Show the results.
   1.423 +
   1.424 +    for i in range(0, N):
   1.425 +        for result in results[i*N:i*N+N]:
   1.426 +            print result,
   1.427 +        print
   1.428 +
   1.429 +    print "Time taken:", time.time() - t
   1.430 +</pre>
   1.431 +
   1.432 +<p>(This code in context with <code>import</code> statements and functions is
   1.433 +found in the <code>examples/simple.py</code> file.)</p>
   1.434 +
   1.435 +<p>To simulate "work", as in the different versions of the
   1.436 +<code>calculate</code> function, we use the <code>time.sleep</code> function
   1.437 +(which does not actually do work, and which will cause a process to be
   1.438 +descheduled in most cases, but which simulates the delay associated with work
   1.439 +being done). This inline work, which must be performed sequentially in the
   1.440 +above program, can be performed in parallel in a somewhat modified version of
   1.441 +the program:</p>
   1.442 +
   1.443 +<pre>
   1.444 +    t = time.time()
   1.445 +
   1.446 +    # Initialise the results using a map with a limit on the number of
   1.447 +    # channels/processes.
   1.448 +
   1.449 +    <strong>results = pprocess.Map(limit=limit)</strong>
   1.450 +
   1.451 +    # Perform the work.
   1.452 +    # NOTE: Could use the with statement in the loop to package the
   1.453 +    # NOTE: try...finally functionality.
   1.454 +
   1.455 +    print "Calculating..."
   1.456 +    for i in range(0, N):
   1.457 +        for j in range(0, N):
   1.458 +            <strong>ch = results.create()</strong>
   1.459 +            <strong>if ch:</strong>
   1.460 +                <strong>try: # Calculation work.</strong>
   1.461 +
   1.462 +                    time.sleep(delay)
   1.463 +                    <strong>ch.send(i * N + j)</strong>
   1.464 +
   1.465 +                <strong>finally: # Important finalisation.</strong>
   1.466 +
   1.467 +                    <strong>pprocess.exit(ch)</strong>
   1.468 +
   1.469 +    # Show the results.
   1.470 +
   1.471 +    for i in range(0, N):
   1.472 +        for result in results[i*N:i*N+N]:
   1.473 +            print result,
   1.474 +        print
   1.475 +
   1.476 +    print "Time taken:", time.time() - t
   1.477 +</pre>
   1.478 +
   1.479 +<p>(This code in context with <code>import</code> statements and functions is
   1.480 +found in the <code>examples/simple_create_map.py</code> file.)</p>
   1.481 +
   1.482 +<p>Although seemingly more complicated, the bulk of the changes in this
   1.483 +modified program are focused on obtaining a channel object, <code>ch</code>,
   1.484 +at the point where the computations are performed, and the wrapping of the
   1.485 +computation code in a <code>try</code>...<code>finally</code> statement which
   1.486 +ensures that the process associated with the channel exits when the
   1.487 +computation is complete. In order for the results of these computations to be
   1.488 +collected, a <code>pprocess.Map</code> object is used, since it will maintain
   1.489 +the results in the same order as the initiation of the computations which
   1.490 +produced them.</p>
   1.491  
   1.492  </body>
   1.493  </html>