1.1 --- a/docs/reference.html Mon Jun 02 22:43:11 2008 +0000
1.2 +++ b/docs/reference.html Mon Jun 02 22:43:43 2008 +0000
1.3 @@ -32,6 +32,8 @@
1.4 <li><a href="#Map">Map-style Processing</a></li>
1.5 <li><a href="#Reusing">Reusing Processes and Channels</a></li>
1.6 <li><a href="#MakeReusable">Making Existing Functions Parallel and Reusable</a></li>
1.7 +<li><a href="#BackgroundCallable">Background Processes and Callables</a></li>
1.8 +<li><a href="#PersistentQueue">Background and Persistent Queues</a></li>
1.9 <li><a href="#Implementation">Implementation Notes</a></li>
1.10 </ul>
1.11
1.12 @@ -331,6 +333,67 @@
1.13 <p>This wraps the callable in a similar fashion to <code>MakeParallel</code>, but
1.14 provides the necessary mechanisms described above for reusable code.</p>
1.15
1.16 +<h2 id="BackgroundCallable">Background Processes and Callables</h2>
1.17 +
1.18 +<p>So far, all parallel computations have involved created processes which
1.19 +depend on the existence of the created process to collect results and to
1.20 +communicate with these created processes, preventing the created process from
1.21 +terminating, even if the created processes actually perform work and potentially
1.22 +create output which need not concern the process which created them. In order to
1.23 +separate creating and created processes, the concept of a background process
1.24 +(also known as a daemon process) is introduced.</p>
1.25 +
1.26 +<p>The <code>BackgroundCallable</code> class acts somewhat like the
1.27 +<code>manage</code> method on exchange-based objects, although no exchange is
1.28 +immediately involved, and instances of <code>BackgroundCallable</code> provide
1.29 +wrappers around existing parallel-aware callables which then be invoked in order
1.30 +to initiate a background computation in a created process. For example:</p>
1.31 +
1.32 +<pre>
1.33 +backgroundfn = pprocess.BackgroundCallable(address, fn)
1.34 +</pre>
1.35 +
1.36 +<p>This wraps the supplied callable (which can itself be the result of using
1.37 +<code>MakeParallel</code>), with the resulting wrapper lending itself to
1.38 +invocation like any other function. One distinguishing feature is that of the
1.39 +<code>address</code>: in order to contact the background process after
1.40 +invocation to (amongst other things) receive any result, a specific address
1.41 +must be given to define the contact point between the created process and any
1.42 +processes seeking to connect to it. Since these "persistent" communications
1.43 +employ special files (specifically UNIX-domain sockets), the address must be a
1.44 +suitable filename.</p>
1.45 +
1.46 +<h2 id="PersistentQueue">Background and Persistent Queues</h2>
1.47 +
1.48 +<p>Background processes employing persistent communications require adaptations
1.49 +of the facilities described in the sections above. For a single background
1.50 +process, the <code>BackgroundQueue</code> function is sufficient to create a
1.51 +queue-like object which can monitor the communications channel between the
1.52 +connecting process and a background process. For example:</p>
1.53 +
1.54 +<pre>
1.55 +queue = pprocess.BackgroundQueue(address)
1.56 +</pre>
1.57 +
1.58 +<p>This code will cause the process reachable via the given <code>address</code>
1.59 +to be contacted and any results made available via the created queue-like
1.60 +object.</p>
1.61 +
1.62 +<p>Where many background processes have been created, a single
1.63 +<code>PersistentQueue</code> object can monitor their communications by being
1.64 +connected to them all, as in the following example:</p>
1.65 +
1.66 +<pre>
1.67 +queue = pprocess.PersistentQueue()
1.68 +for address in addresses:
1.69 + queue.connect(address)
1.70 +</pre>
1.71 +
1.72 +<p>Here, the queue monitors all previously created processes whose addresses
1.73 +reside in the <code>addresses</code> sequence. Upon iterating over the queue,
1.74 +results will be taken from whichever process happens to have data available in
1.75 +no particular pre-defined order.</p>
1.76 +
1.77 <h2 id="Implementation">Implementation Notes</h2>
1.78
1.79 <h3>Signals and Waiting</h3>
2.1 --- a/docs/tutorial.html Mon Jun 02 22:43:11 2008 +0000
2.2 +++ b/docs/tutorial.html Mon Jun 02 22:43:43 2008 +0000
2.3 @@ -15,7 +15,18 @@
2.4 advantage of more than one processor to simultaneously process data - is to
2.5 use the <code>pmap</code> function.</p>
2.6
2.7 -<h2>Converting Map-Style Code</h2>
2.8 +<ul>
2.9 +<li><a href="#pmap">Converting Map-Style Code</a></li>
2.10 +<li><a href="#Map">Converting Invocations to Parallel Operations</a></li>
2.11 +<li><a href="#Queue">Converting Arbitrarily-Ordered Invocations</a></li>
2.12 +<li><a href="#create">Converting Inline Computations</a></li>
2.13 +<li><a href="#MakeReusable">Reusing Processes in Parallel Programs</a></li>
2.14 +<li><a href="#BackgroundCallable">Performing Computations in Background Processes</a></li>
2.15 +<li><a href="#ManagingBackgroundProcesses">Managing Several Background Processes</a></li>
2.16 +<li><a href="#Summary">Summary</a></li>
2.17 +</ul>
2.18 +
2.19 +<h2 id="pmap">Converting Map-Style Code</h2>
2.20
2.21 <p>Consider a program using the built-in <code>map</code> function and a sequence of inputs:</p>
2.22
2.23 @@ -99,7 +110,7 @@
2.24 variable is defined elsewhere), several calculations can now be performed in
2.25 parallel.</p>
2.26
2.27 -<h2>Converting Invocations to Parallel Operations</h2>
2.28 +<h2 id="Map">Converting Invocations to Parallel Operations</h2>
2.29
2.30 <p>Although some programs make natural use of the <code>map</code> function,
2.31 others may employ an invocation in a nested loop. This may also be converted
2.32 @@ -198,7 +209,7 @@
2.33 list operations provides the results in the same order as their corresponding
2.34 inputs.</p>
2.35
2.36 -<h2>Converting Arbitrarily-Ordered Invocations</h2>
2.37 +<h2 id="Queue">Converting Arbitrarily-Ordered Invocations</h2>
2.38
2.39 <p>In some programs, it is not important to receive the results of
2.40 computations in any particular order, usually because either the order of
2.41 @@ -245,7 +256,7 @@
2.42 <p>We can bring the benefits of parallel processing to the above program with
2.43 the following code:</p>
2.44
2.45 -<pre>
2.46 +<pre id="simple_managed_queue">
2.47 t = time.time()
2.48
2.49 # Initialise the communications queue with a limit on the number of
2.50 @@ -433,7 +444,7 @@
2.51 <code>start</code> method still calls the provided callable, but using a
2.52 different notation from that employed previously.</p>
2.53
2.54 -<h2>Converting Inline Computations</h2>
2.55 +<h2 id="create">Converting Inline Computations</h2>
2.56
2.57 <p>Although many programs employ functions and other useful abstractions which
2.58 can be treated as parallelisable units, some programs perform computations
2.59 @@ -525,7 +536,7 @@
2.60 the results in the same order as the initiation of the computations which
2.61 produced them.</p>
2.62
2.63 -<h2>Reusing Processes in Parallel Programs</h2>
2.64 +<h2 id="MakeReusable">Reusing Processes in Parallel Programs</h2>
2.65
2.66 <p>One notable aspect of the above programs when parallelised is that each
2.67 invocation of a computation in parallel creates a new process in which the
2.68 @@ -577,7 +588,273 @@
2.69 creating a new process for each computation and then discarding it, only to
2.70 create a new process for the next computation.</p>
2.71
2.72 -<h2>Summary</h2>
2.73 +<h2 id="BackgroundCallable">Performing Computations in Background Processes</h2>
2.74 +
2.75 +<p>Occasionally, it is desirable to initiate time-consuming computations and to
2.76 +not only leave such processes running in the background, but to be able to detach
2.77 +the creating process from them completely, potentially terminating the creating
2.78 +process altogether, and yet also be able to collect the results of the created
2.79 +processes at a later time, potentially in another completely different process.
2.80 +For such situations, we can make use of the <code>BackgroundCallable</code>
2.81 +class, which converts a parallel-aware callable into a callable which will run
2.82 +in a background process when invoked.</p>
2.83 +
2.84 +<p>Consider this excerpt from a modified version of the <a
2.85 +href="#simple_managed_queue">simple_managed_queue</a> program:</p>
2.86 +
2.87 +<pre>
2.88 +<strong>def task():</strong>
2.89 +
2.90 + # Initialise the communications queue with a limit on the number of
2.91 + # channels/processes.
2.92 +
2.93 + queue = pprocess.Queue(limit=limit)
2.94 +
2.95 + # Initialise an array.
2.96 +
2.97 + results = [0] * N * N
2.98 +
2.99 + # Wrap the calculate function and manage it.
2.100 +
2.101 + calc = queue.manage(pprocess.MakeParallel(calculate))
2.102 +
2.103 + # Perform the work.
2.104 +
2.105 + print "Calculating..."
2.106 + for i in range(0, N):
2.107 + for j in range(0, N):
2.108 + calc(i, j)
2.109 +
2.110 + # Store the results as they arrive.
2.111 +
2.112 + print "Finishing..."
2.113 + for i, j, result in queue:
2.114 + results[i*N+j] = result
2.115 +
2.116 + <strong>return results</strong>
2.117 +</pre>
2.118 +
2.119 +<p>Here, we have converted the main program into a function, and instead of
2.120 +printing out the results, we return the results list from the function.</p>
2.121 +
2.122 +<p>Now, let us consider the new main program (with the relevant mechanisms
2.123 +highlighted):</p>
2.124 +
2.125 +<pre>
2.126 + t = time.time()
2.127 +
2.128 + if "--reconnect" not in sys.argv:
2.129 +
2.130 + # Wrap the computation and manage it.
2.131 +
2.132 + <strong>ptask = pprocess.BackgroundCallable("task.socket", pprocess.MakeParallel(task))</strong>
2.133 +
2.134 + # Perform the work.
2.135 +
2.136 + ptask()
2.137 +
2.138 + # Discard the callable.
2.139 +
2.140 + del ptask
2.141 + print "Discarded the callable."
2.142 +
2.143 + if "--start" not in sys.argv:
2.144 +
2.145 + # Open a queue and reconnect to the task.
2.146 +
2.147 + print "Opening a queue."
2.148 + <strong>queue = pprocess.BackgroundQueue("task.socket")</strong>
2.149 +
2.150 + # Wait for the results.
2.151 +
2.152 + print "Waiting for persistent results"
2.153 + for results in queue:
2.154 + pass # should only be one element
2.155 +
2.156 + # Show the results.
2.157 +
2.158 + for i in range(0, N):
2.159 + for result in results[i*N:i*N+N]:
2.160 + print result,
2.161 + print
2.162 +
2.163 + print "Time taken:", time.time() - t
2.164 +</pre>
2.165 +
2.166 +<p>(This code in context with <code>import</code> statements and functions is
2.167 +found in the <code>examples/simple_background_queue.py</code> file.)</p>
2.168 +
2.169 +<p>This new main program has two parts: the part which initiates the
2.170 +computation, and the part which connects to the computation in order to collect
2.171 +the results. Both parts can be run in the same process, and this should result
2.172 +in similar behaviour to that of the original
2.173 +<a href="#simple_managed_queue">simple_managed_queue</a> program.</p>
2.174 +
2.175 +<p>In the above program, however, we are free to specify <code>--start</code> as
2.176 +an option when running the program, and the result of this is merely to initiate
2.177 +the computation in a background process, using <code>BackgroundCallable</code>
2.178 +to obtain a callable which, when invoked, creates the background process and
2.179 +runs the computation. After doing this, the program will then exit, but it will
2.180 +leave the computation running as a collection of background processes, and a
2.181 +special file called <code>task.socket</code> will exist in the current working
2.182 +directory.</p>
2.183 +
2.184 +<p>When the above program is run using the <code>--reconnect</code> option, an
2.185 +attempt will be made to reconnect to the background processes already created by
2.186 +attempting to contact them using the previously created <code>task.socket</code>
2.187 +special file (which is, in fact, a UNIX-domain socket); this being done using
2.188 +the <code>BackgroundQueue</code> function which will handle the incoming results
2.189 +in a fashion similar to that of a <code>Queue</code> object. Since only one
2.190 +result is returned by the computation (as defined by the <code>return</code>
2.191 +statement in the <code>task</code> function), we need only expect one element to
2.192 +be collected by the queue: a list containing all of the results produced in the
2.193 +computation.</p>
2.194 +
2.195 +<h2 id="ManagingBackgroundProcesses">Managing Several Background Processes</h2>
2.196 +
2.197 +<p>In the above example, a single background process was used to manage a number
2.198 +of other processes, with all of them running in the background. However, it can
2.199 +be desirable to manage more than one background process.</p>
2.200 +
2.201 +<p>Consider this excerpt from a modified version of the <a
2.202 +href="#simple_managed_queue">simple_managed_queue</a> program:</p>
2.203 +
2.204 +<pre>
2.205 +<strong>def task(i):</strong>
2.206 +
2.207 + # Initialise the communications queue with a limit on the number of
2.208 + # channels/processes.
2.209 +
2.210 + queue = pprocess.Queue(limit=limit)
2.211 +
2.212 + # Initialise an array.
2.213 +
2.214 + results = [0] * N
2.215 +
2.216 + # Wrap the calculate function and manage it.
2.217 +
2.218 + calc = queue.manage(pprocess.MakeParallel(calculate))
2.219 +
2.220 + # Perform the work.
2.221 +
2.222 + print "Calculating..."
2.223 + <strong>for j in range(0, N):</strong>
2.224 + <strong>calc(i, j)</strong>
2.225 +
2.226 + # Store the results as they arrive.
2.227 +
2.228 + print "Finishing..."
2.229 + <strong>for i, j, result in queue:</strong>
2.230 + <strong>results[j] = result</strong>
2.231 +
2.232 + <strong>return i, results</strong>
2.233 +</pre>
2.234 +
2.235 +<p>Just as we see in the previous example, a function called <code>task</code>
2.236 +has been defined to hold a background computation, and this function returns a
2.237 +portion of the results. However, unlike the previous example or the original
2.238 +example, the scope of the results of the computation collected in the function
2.239 +have been changed: here, only results for calculations involving a certain value
2.240 +of <code>i</code> are collected, with the particular value of <code>i</code>
2.241 +returned along with the appropriate portion of the results.</p>
2.242 +
2.243 +<p>Now, let us consider the new main program (with the relevant mechanisms
2.244 +highlighted):</p>
2.245 +
2.246 +<pre>
2.247 + t = time.time()
2.248 +
2.249 + if "--reconnect" not in sys.argv:
2.250 +
2.251 + # Wrap the computation and manage it.
2.252 +
2.253 + <strong>ptask = pprocess.MakeParallel(task)</strong>
2.254 +
2.255 + <strong>for i in range(0, N):</strong>
2.256 +
2.257 + # Make a distinct callable for each part of the computation.
2.258 +
2.259 + <strong>ptask_i = pprocess.BackgroundCallable("task-%d.socket" % i, ptask)</strong>
2.260 +
2.261 + # Perform the work.
2.262 +
2.263 + <strong>ptask_i(i)</strong>
2.264 +
2.265 + # Discard the callable.
2.266 +
2.267 + del ptask
2.268 + print "Discarded the callable."
2.269 +
2.270 + if "--start" not in sys.argv:
2.271 +
2.272 + # Open a queue and reconnect to the task.
2.273 +
2.274 + print "Opening a queue."
2.275 + <strong>queue = pprocess.PersistentQueue()</strong>
2.276 + <strong>for i in range(0, N):</strong>
2.277 + <strong>queue.connect("task-%d.socket" % i)</strong>
2.278 +
2.279 + # Initialise an array.
2.280 +
2.281 + <strong>results = [0] * N</strong>
2.282 +
2.283 + # Wait for the results.
2.284 +
2.285 + print "Waiting for persistent results"
2.286 + <strong>for i, result in queue:</strong>
2.287 + <strong>results[i] = result</strong>
2.288 +
2.289 + # Show the results.
2.290 +
2.291 + for i in range(0, N):
2.292 + <strong>for result in results[i]:</strong>
2.293 + print result,
2.294 + print
2.295 +
2.296 + print "Time taken:", time.time() - t
2.297 +</pre>
2.298 +
2.299 +<p>(This code in context with <code>import</code> statements and functions is
2.300 +found in the <code>examples/simple_persistent_queue.py</code> file.)</p>
2.301 +
2.302 +<p>In the first section, the process of making a parallel-aware callable is as
2.303 +expected, but instead of then invoking a single background version of such a
2.304 +callable, as in the previous example, we create a version for each value of
2.305 +<code>i</code> (using <code>BackgroundCallable</code>) and then invoke each one.
2.306 +The result of this is a total of <code>N</code> background processes, each
2.307 +running an invocation of the <code>task</code> function with a distinct value of
2.308 +<code>i</code> (which in turn perform computations), and each employing a
2.309 +UNIX-domain socket for communication with a name of the form
2.310 +<code>task-<em>i</em>.socket</code>.</p>
2.311 +
2.312 +<p>In the second section, since we now have more than one background process, we
2.313 +must find a way to monitor them after reconnecting to them; to achieve this, a
2.314 +<code>PersistentQueue</code> is created, which acts like a regular
2.315 +<code>Queue</code> object but is instead focused on handling persistent
2.316 +communications. Upon connecting the queue to each of the previously created
2.317 +UNIX-domain sockets, the queue acts like a regular <code>Queue</code> and
2.318 +exposes received results through an iterator. Here, the principal difference
2.319 +from previous examples is the structure of results: instead of collecting each
2.320 +individual value in a flat <code>i</code> by <code>j</code> array, a list is
2.321 +returned for each value of <code>i</code> and is stored directly in another
2.322 +list.</p>
2.323 +
2.324 +<h3>Applications of Background Computations</h3>
2.325 +
2.326 +<p>Background computations are useful because they provide flexibility in the
2.327 +way the results can be collected. One area in which they can be useful is Web
2.328 +programming, where a process handling an incoming HTTP request may need to
2.329 +initiate a computation but then immediately send output to the Web client - such
2.330 +as a page indicating that the computation is "in progress" - without having to
2.331 +wait for the computation or to allocate resources to monitor it. Moreover, in
2.332 +some Web architectures, notably those employing the Common Gateway Interface
2.333 +(CGI), it is necessary for a process handling an incoming request to terminate
2.334 +before its output will be sent to clients. By using a
2.335 +<code>BackgroundCallable</code>, a Web server process can initiate a
2.336 +communication, and then subsequent server processes can be used to reconnect to
2.337 +the background computation and to wait efficiently for results.</p>
2.338 +
2.339 +<h2 id="Summary">Summary</h2>
2.340
2.341 <p>The following table indicates the features used in converting one
2.342 sequential example program to another parallel program:</p>
2.343 @@ -602,7 +879,7 @@
2.344 <td>MakeParallel, Map, manage</td>
2.345 </tr>
2.346 <tr>
2.347 - <td rowspan="3">simple2</td>
2.348 + <td rowspan="5">simple2</td>
2.349 <td>simple_managed_queue</td>
2.350 <td>MakeParallel, Queue, manage</td>
2.351 </tr>
2.352 @@ -615,6 +892,14 @@
2.353 <td>Channel, Exchange (subclass), start, finish</td>
2.354 </tr>
2.355 <tr>
2.356 + <td>simple_background_queue</td>
2.357 + <td>MakeParallel, BackgroundCallable, BackgroundQueue</td>
2.358 + </tr>
2.359 + <tr>
2.360 + <td>simple_persistent_queue</td>
2.361 + <td>MakeParallel, BackgroundCallable, PersistentQueue</td>
2.362 + </tr>
2.363 + <tr>
2.364 <td>simple</td>
2.365 <td>simple_create_map</td>
2.366 <td>Channel, Map, create, exit</td>