1 Introduction
2 ------------
3
4 The pprocess module provides elementary support for parallel programming in
5 Python using a fork-based process creation model in conjunction with a
6 channel-based communications model implemented using socketpair and poll. On
7 systems with multiple CPUs or multicore CPUs, processes should take advantage
8 of as many CPUs or cores as the operating system permits.
9
10 Since pprocess distributes work to other processes, certain aspects of the
11 behaviour of those processes may differ from the normal behaviour of such
12 code. For example, any mutable objects distributed to other processes can
13 still be modified, but any modifications will not be visible outside the
14 processes making such modifications.
15
16 Tutorial
17 --------
18
19 The tutorial provides some information about the examples described below.
20 See the docs/tutorial.html file in the distribution for more details.
21
22 Reference
23 ---------
24
25 A description of the different mechanisms provided by the pprocess module can
26 be found in the reference document. See the docs/reference.html file in the
27 distribution for more details.
28
29 Quick Start
30 -----------
31
32 Try running the simple examples. For example:
33
34 PYTHONPATH=. python examples/simple_create.py
35
36 (These examples show in different ways how limited number of processes can be
37 used to perform a parallel computation. The simple.py, simple1.py, simple2.py
38 and simple_map.py programs are sequential versions of the other programs.)
39
40 The following table summarises the features used in the programs:
41
42 Program (.py) pmap MakeParallel manage start create Map Queue Exchange
43 ------------- ---- ------------ ------ ----- ------ --- ----- --------
44 simple_create_map Yes Yes
45 simple_create_queue Yes Yes
46 simple_create Yes Yes
47 simple_managed_map Yes Yes Yes
48 simple_managed_queue Yes Yes Yes
49 simple_managed Yes Yes Yes
50 simple_pmap Yes
51 simple_pmap_iter Yes
52 simple_start_queue Yes Yes Yes
53 simple_start Yes Yes
54
55 The simplest parallel programs are simple_pmap.py and simple_pmap_iter.py
56 which employ the pmap function resembling the built-in map function in
57 Python.
58
59 Other simple programs are those employing the Queue class, together with those
60 using the manage method which associates functions or callables with Queue or
61 Exchange objects for convenient invocation of those functions and the
62 management of their communications.
63
64 The most technically involved program is simple_start.py which uses the
65 Exchange class together with a calculation function which is aware of the
66 parallel environment and which communicates over the supplied communications
67 channel directly to the creating process.
68
69 It should be noted that with the exception of simple_start.py, those examples
70 employing calculation functions (as opposed to doing a calculation inline in a
71 loop body) all use MakeParallel to make those functions parallel-aware, thus
72 permitting the conversion of "normal" functions to a form usable in the
73 parallel environment.
74
75 Reusable Processes
76 ------------------
77
78 An additional example not listed above, simple_managed_map_reusable.py,
79 employs the MakeReusable class instead of MakeParallel in order to demonstrate
80 reusable processes and channels:
81
82 PYTHONPATH=. python examples/simple_managed_map_reusable.py
83
84 Continuous Process Communications
85 ---------------------------------
86
87 Another example not listed above, simple_continuous_queue.py, employs
88 continuous communications to monitor output from created processes:
89
90 PYTHONPATH=. python examples/simple_continuous_queue.py
91
92 Persistent Processes
93 --------------------
94
95 A number of persistent variants of some of the above examples employ a
96 persistent or background process which can be started by one process and
97 contacted later by another in order to collect the results of a computation.
98 For example:
99
100 PYTHONPATH=. python examples/simple_persistent_managed.py --start
101 PYTHONPATH=. python examples/simple_persistent_managed.py --reconnect
102
103 PYTHONPATH=. python examples/simple_background_queue.py --start
104 PYTHONPATH=. python examples/simple_background_queue.py --reconnect
105
106 PYTHONPATH=. python examples/simple_persistent_queue.py --start
107 PYTHONPATH=. python examples/simple_persistent_queue.py --reconnect
108
109 Parallel Raytracing with PyGmy
110 ------------------------------
111
112 The PyGmy raytracer modified to use pprocess can be run to investigate the
113 potential for speed increases in "real world" programs:
114
115 cd examples/PyGmy
116 PYTHONPATH=../..:. python scene.py
117
118 (This should produce a file called test.tif - a TIFF file containing a
119 raytraced scene image.)
120
121 Examples from the Concurrency SIG
122 ---------------------------------
123
124 The special interest group (SIG) for concurrency in Python proposed a
125 particular application as a showcase for concurrency libraries. Two examples
126 are included which demonstrate pprocess and the use of continuous processes to
127 implement the application concerned:
128
129 PYTHONPATH=. python examples/concurrency-sig/bottles.py
130 PYTHONPATH=. python examples/concurrency-sig/bottles_heartbeat.py
131
132 Examples of Modifying Mutable Objects
133 -------------------------------------
134
135 Mutable objects can be modified in processes created by pprocess, but the
136 modifications will not be visible in the parent process. The following
137 examples illustrate the problem:
138
139 PYTHONPATH=. python examples/simple_mutation.py
140 PYTHONPATH=. python examples/simple_mutation_queue.py
141
142 The former, non-parallel program will display the expected result of the
143 computation, whereas the latter, parallel program will fail to do so. This is
144 because the latter attempts to modify the input collection in order to use it
145 as a result collection, but these modifications are not propagated back to the
146 parent process.
147
148 Test Programs
149 -------------
150
151 There are some elementary tests:
152
153 PYTHONPATH=. python tests/create_loop.py
154 PYTHONPATH=. python tests/start_loop.py
155
156 (Simple loop demonstrations which use two different ways of creating and
157 starting the parallel processes.)
158
159 PYTHONPATH=. python tests/start_indexer.py <directory>
160
161 (A text indexing demonstration, where <directory> should be a directory
162 containing text files to be indexed, although HTML files will also work well
163 enough. After indexing the files, a prompt will appear, words or word
164 fragments can be entered, and matching words and their locations will be
165 shown. Run the program without arguments to see more information.)
166
167 Contact, Copyright and Licence Information
168 ------------------------------------------
169
170 The current Web page for pprocess at the time of release is:
171
172 http://www.boddie.org.uk/python/pprocess.html
173
174 The author can be contacted at the following e-mail address:
175
176 paul@boddie.org.uk
177
178 Copyright and licence information can be found in the docs directory - see
179 docs/COPYING.txt, docs/lgpl-3.0.txt and docs/gpl-3.0.txt for more information.
180
181 For the PyGmy raytracer example, different copyright and licence information
182 is provided in the docs directory - see docs/COPYING-PyGmy.txt and
183 docs/LICENCE-PyGmy.txt for more information.
184
185 Dependencies
186 ------------
187
188 This software depends on standard library features which are stated as being
189 available only on "UNIX"; it has only been tested repeatedly on a GNU/Linux
190 system, and occasionally on systems running OpenSolaris.
191
192 New in pprocess 0.5.2 (Changes since pprocess 0.5.1)
193 ----------------------------------------------------
194
195 * Added examples involving mutable objects and the inability of pprocess to
196 automatically propagate changes to such objects back to parent processes.
197 * Added an explanatory section to the tutorial about data exchange between
198 processes and the differences from "normal" Python program behaviour.
199
200 New in pprocess 0.5.1 (Changes since pprocess 0.5)
201 --------------------------------------------------
202
203 * Added IOError handling when processes exit apparently without warning.
204
205 New in pprocess 0.5 (Changes since pprocess 0.4)
206 ------------------------------------------------
207
208 * Added proper support in the Exchange class for continuous communications
209 between processes, providing examples: simple_continuous_queue.py and the
210 concurrency-sig directory.
211 * Changed the Map class to permit incremental access to received results
212 from completed parts of the sequence of inputs, also adding an iteration
213 interface.
214 * Added an example, simple_pmap_iter.py, to demonstrate iteration over maps.
215 * Fixed the get_number_of_cores function to work with /proc/cpuinfo where
216 the "physical id" field is missing.
217 * Tidied the Exchange class, adding distinct status methods: unfinished and
218 busy.
219
220 New in pprocess 0.4 (Changes since pprocess 0.3.1)
221 --------------------------------------------------
222
223 * Added support for persistent/background processes.
224 * Added a utility function to detect and return the number of processor
225 cores available.
226 * Added missing documentation stylesheet.
227 * Added support for Solaris using pipes instead of socket pairs, since
228 the latter do not apparently work properly with poll on Solaris.
229
230 New in pprocess 0.3.1 (Changes since pprocess 0.3)
231 --------------------------------------------------
232
233 * Moved the reference material out of the module docstring and into a
234 separate document, converting it to XHTML in the process.
235 * Fixed the project name in the setup script.
236
237 New in pprocess 0.3 (Changes since parallel 0.2.5)
238 --------------------------------------------------
239
240 * Added managed callables: wrappers around callables which cause them to be
241 automatically managed by the exchange from which they were acquired.
242 * Added MakeParallel: a wrapper instantiated around a normal function which
243 sends the result of that function over the supplied channel when invoked.
244 * Added MakeReusable: a wrapper like MakeParallel which can be used in
245 conjunction with the newly-added reuse capability of the Exchange class in
246 order to reuse processes and channels.
247 * Added a Map class which attempts to emulate the built-in map function,
248 along with a pmap function using this class.
249 * Added a Queue class which provides a simpler iterator-style interface to
250 data produced by created processes.
251 * Added a create method to the Exchange class and an exit convenience
252 function to the module.
253 * Changed the Exchange implementation to not block when attempting to start
254 new processes beyond the process limit: such requests are queued and
255 performed as running processes are completed. This permits programs using
256 the start method to proceed to consumption of results more quickly.
257 * Extended and updated the examples. Added a tutorial.
258 * Added Ubuntu Feisty (7.04) package support.
259
260 New in parallel 0.2.5 (Changes since parallel 0.2.4)
261 ----------------------------------------------------
262
263 * Added a start method to the Exchange class for more convenient creation of
264 processes.
265 * Relicensed under the LGPL (version 3 or later) - this also fixes the
266 contradictory situation where the GPL was stated in the pprocess module
267 (which was not, in fact, the intention) and the LGPL was stated in the
268 documentation.
269
270 New in parallel 0.2.4 (Changes since parallel 0.2.3)
271 ----------------------------------------------------
272
273 * Set buffer sizes to zero for the file object wrappers around sockets: this
274 may prevent deadlock issues.
275
276 New in parallel 0.2.3 (Changes since parallel 0.2.2)
277 ----------------------------------------------------
278
279 * Added convenient message exchanges, offering methods handling common
280 situations at the cost of having to define a subclass of Exchange.
281 * Added a simple example of performing a parallel computation.
282 * Improved the PyGmy raytracer example to use the newly added functionality.
283
284 New in parallel 0.2.2 (Changes since parallel 0.2.1)
285 ----------------------------------------------------
286
287 * Changed the status testing in the Exchange class, potentially fixing the
288 premature closure of channels before all data was read.
289 * Fixed the PyGmy raytracer example's process accounting by relying on the
290 possibly more reliable Exchange behaviour, whilst also preventing
291 erroneous creation of "out of bounds" processes.
292 * Added a removed attribute on the Exchange to record which channels were
293 removed in the last call to the ready method.
294
295 New in parallel 0.2.1 (Changes since parallel 0.2)
296 --------------------------------------------------
297
298 * Added a PyGmy raytracer example.
299 * Updated copyright and licensing details (FSF address, additional works).
300
301 New in parallel 0.2 (Changes since parallel 0.1)
302 ------------------------------------------------
303
304 * Changed the name of the included module from parallel to pprocess in order
305 to avoid naming conflicts with PyParallel.
306
307 Release Procedures
308 ------------------
309
310 Update the pprocess __version__ attribute and the setup.py file version field.
311 Change the version number and package filename/directory in the documentation.
312 Update the release notes (see above).
313 Check the release information in the PKG-INFO file.
314 Tag, export.
315 Archive, upload.
316 Update PyPI.
317
318 Making Packages
319 ---------------
320
321 To make Debian-based packages:
322
323 1. Create new package directories under packages if necessary.
324 2. Make a symbolic link in the distribution's root directory to keep the
325 Debian tools happy:
326
327 ln -s packages/ubuntu-hoary/python2.4-parallel-pprocess/debian/
328
329 Or:
330
331 ln -s packages/ubuntu-feisty/python-pprocess/debian/
332
333 3. Run the package builder:
334
335 dpkg-buildpackage -rfakeroot
336
337 4. Locate and tidy up the packages in the parent directory of the
338 distribution's root directory.