pprocess

Annotated README.txt

149:eeaa043dbfb1
2008-06-04 paulb [project @ 2008-06-04 22:13:31 by paulb] Added a link to the reference document.
paulb@22 1
Introduction
paulb@22 2
------------
paulb@22 3
paulb@40 4
The pprocess module provides elementary support for parallel programming in
paulb@22 5
Python using a fork-based process creation model in conjunction with a
paulb@68 6
channel-based communications model implemented using socketpair and poll. On
paulb@68 7
systems with multiple CPUs or multicore CPUs, processes should take advantage
paulb@68 8
of as many CPUs or cores as the operating system permits.
paulb@22 9
paulb@140 10
Tutorial
paulb@140 11
--------
paulb@140 12
paulb@140 13
The tutorial provides some information about the examples described below.
paulb@144 14
See the docs/tutorial.html file in the distribution for more details.
paulb@140 15
paulb@140 16
Reference
paulb@140 17
---------
paulb@140 18
paulb@140 19
A description of the different mechanisms provided by the pprocess module can
paulb@144 20
be found in the reference document. See the docs/reference.html file in the
paulb@140 21
distribution for more details.
paulb@140 22
paulb@22 23
Quick Start
paulb@22 24
-----------
paulb@22 25
paulb@105 26
Try running the simple examples. For example:
paulb@68 27
paulb@100 28
PYTHONPATH=. python examples/simple_create.py
paulb@105 29
paulb@105 30
(These examples show in different ways how limited number of processes can be
paulb@113 31
used to perform a parallel computation. The simple.py, simple1.py, simple2.py
paulb@113 32
and simple_map.py programs are sequential versions of the other programs.)
paulb@105 33
paulb@105 34
The following table summarises the features used in the programs:
paulb@105 35
paulb@113 36
Program (.py)         pmap  MakeParallel manage start create Map Queue Exchange
paulb@113 37
-------------         ----  ------------ ------ ----- ------ --- ----- --------
paulb@113 38
simple_create_map                                     Yes    Yes
paulb@113 39
simple_create_queue                                   Yes        Yes
paulb@113 40
simple_create                                         Yes              Yes
paulb@113 41
simple_managed_map          Yes          Yes                 Yes
paulb@113 42
simple_managed_queue        Yes          Yes                     Yes
paulb@113 43
simple_managed              Yes          Yes                           Yes
paulb@113 44
simple_pmap           Yes
paulb@113 45
simple_start_queue          Yes                 Yes              Yes
paulb@113 46
simple_start                                    Yes                    Yes
paulb@68 47
paulb@105 48
The simplest parallel program is simple_pmap.py which employs the pmap
paulb@105 49
function resembling the built-in map function in Python.
paulb@105 50
paulb@105 51
Other simple programs are those employing the Queue class, together with those
paulb@105 52
using the manage method which associates functions or callables with Queue or
paulb@105 53
Exchange objects for convenient invocation of those functions and the
paulb@105 54
management of their communications.
paulb@105 55
paulb@105 56
The most technically involved program is simple_start.py which uses the
paulb@105 57
Exchange class together with a calculation function which is aware of the
paulb@105 58
parallel environment and which communicates over the supplied communications
paulb@105 59
channel directly to the creating process.
paulb@105 60
paulb@105 61
It should be noted that with the exception of simple_start.py, those examples
paulb@105 62
employing calculation functions (as opposed to doing a calculation inline in a
paulb@105 63
loop body) all use MakeParallel to make those functions parallel-aware, thus
paulb@105 64
permitting the conversion of "normal" functions to a form usable in the
paulb@105 65
parallel environment.
paulb@100 66
paulb@140 67
Reusable Processes
paulb@140 68
------------------
paulb@140 69
paulb@119 70
An additional example not listed above, simple_managed_map_reusable.py,
paulb@119 71
employs the MakeReusable class instead of MakeParallel in order to demonstrate
paulb@140 72
reusable processes and channels:
paulb@140 73
paulb@140 74
PYTHONPATH=. python examples/simple_managed_map_reusable.py
paulb@140 75
paulb@140 76
Persistent Processes
paulb@140 77
--------------------
paulb@119 78
paulb@140 79
A number of persistent variants of some of the above examples employ a
paulb@140 80
persistent or background process which can be started by one process and
paulb@140 81
contacted later by another in order to collect the results of a computation.
paulb@140 82
For example:
paulb@140 83
paulb@140 84
PYTHONPATH=. python examples/simple_persistent_managed.py --start
paulb@140 85
PYTHONPATH=. python examples/simple_persistent_managed.py --reconnect
paulb@140 86
paulb@144 87
PYTHONPATH=. python examples/simple_background_queue.py --start
paulb@144 88
PYTHONPATH=. python examples/simple_background_queue.py --reconnect
paulb@100 89
paulb@148 90
PYTHONPATH=. python examples/simple_persistent_queue.py --start
paulb@148 91
PYTHONPATH=. python examples/simple_persistent_queue.py --reconnect
paulb@148 92
paulb@105 93
Parallel Raytracing with PyGmy
paulb@105 94
------------------------------
paulb@105 95
paulb@100 96
The PyGmy raytracer modified to use pprocess can be run to investigate the
paulb@105 97
potential for speed increases in "real world" programs:
paulb@68 98
paulb@100 99
cd examples/PyGmy
paulb@100 100
PYTHONPATH=../..:. python scene.py
paulb@100 101
paulb@100 102
(This should produce a file called test.tif - a TIFF file containing a
paulb@100 103
raytraced scene image.)
paulb@100 104
paulb@105 105
Test Programs
paulb@105 106
-------------
paulb@105 107
paulb@100 108
There are some elementary tests:
paulb@22 109
paulb@22 110
PYTHONPATH=. python tests/create_loop.py
paulb@22 111
PYTHONPATH=. python tests/start_loop.py
paulb@22 112
paulb@22 113
(Simple loop demonstrations which use two different ways of creating and
paulb@22 114
starting the parallel processes.)
paulb@22 115
paulb@36 116
PYTHONPATH=. python tests/start_indexer.py <directory>
paulb@22 117
paulb@36 118
(A text indexing demonstration, where <directory> should be a directory
paulb@36 119
containing text files to be indexed, although HTML files will also work well
paulb@36 120
enough. After indexing the files, a prompt will appear, words or word
paulb@36 121
fragments can be entered, and matching words and their locations will be
paulb@36 122
shown. Run the program without arguments to see more information.)
paulb@22 123
paulb@22 124
Contact, Copyright and Licence Information
paulb@22 125
------------------------------------------
paulb@22 126
paulb@132 127
The current Web page for pprocess at the time of release is:
paulb@132 128
paulb@132 129
http://www.boddie.org.uk/python/pprocess.html
paulb@132 130
paulb@132 131
The author can be contacted at the following e-mail address:
paulb@22 132
paulb@22 133
paul@boddie.org.uk
paulb@22 134
paulb@22 135
Copyright and licence information can be found in the docs directory - see
paulb@78 136
docs/COPYING.txt, docs/lgpl-3.0.txt and docs/gpl-3.0.txt for more information.
paulb@22 137
paulb@48 138
For the PyGmy raytracer example, different copyright and licence information
paulb@48 139
is provided in the docs directory - see docs/COPYING-PyGmy.txt and
paulb@48 140
docs/LICENCE-PyGmy.txt for more information.
paulb@48 141
paulb@22 142
Dependencies
paulb@22 143
------------
paulb@22 144
paulb@22 145
This software depends on standard library features which are stated as being
paulb@22 146
available only on "UNIX"; it has only been tested on a GNU/Linux system.
paulb@22 147
paulb@144 148
New in pprocess 0.4 (Changes since pprocess 0.3.1)
paulb@144 149
--------------------------------------------------
paulb@135 150
paulb@140 151
  * Added support for persistent/background processes.
paulb@135 152
  * Added a utility function to detect and return the number of processor
paulb@135 153
    cores available.
paulb@137 154
  * Added missing documentation stylesheet.
paulb@135 155
paulb@131 156
New in pprocess 0.3.1 (Changes since pprocess 0.3)
paulb@131 157
--------------------------------------------------
paulb@131 158
paulb@131 159
  * Moved the reference material out of the module docstring and into a
paulb@131 160
    separate document, converting it to XHTML in the process.
paulb@131 161
  * Fixed the project name in the setup script.
paulb@131 162
paulb@126 163
New in pprocess 0.3 (Changes since parallel 0.2.5)
paulb@100 164
--------------------------------------------------
paulb@84 165
paulb@84 166
  * Added managed callables: wrappers around callables which cause them to be
paulb@84 167
    automatically managed by the exchange from which they were acquired.
paulb@84 168
  * Added MakeParallel: a wrapper instantiated around a normal function which
paulb@84 169
    sends the result of that function over the supplied channel when invoked.
paulb@119 170
  * Added MakeReusable: a wrapper like MakeParallel which can be used in
paulb@119 171
    conjunction with the newly-added reuse capability of the Exchange class in
paulb@119 172
    order to reuse processes and channels.
paulb@89 173
  * Added a Map class which attempts to emulate the built-in map function,
paulb@89 174
    along with a pmap function using this class.
paulb@100 175
  * Added a Queue class which provides a simpler iterator-style interface to
paulb@100 176
    data produced by created processes.
paulb@100 177
  * Added a create method to the Exchange class and an exit convenience
paulb@100 178
    function to the module.
paulb@100 179
  * Changed the Exchange implementation to not block when attempting to start
paulb@100 180
    new processes beyond the process limit: such requests are queued and
paulb@100 181
    performed as running processes are completed. This permits programs using
paulb@100 182
    the start method to proceed to consumption of results more quickly.
paulb@105 183
  * Extended and updated the examples. Added a tutorial.
paulb@100 184
  * Added Ubuntu Feisty (7.04) package support.
paulb@84 185
paulb@78 186
New in parallel 0.2.5 (Changes since parallel 0.2.4)
paulb@78 187
----------------------------------------------------
paulb@78 188
paulb@78 189
  * Added a start method to the Exchange class for more convenient creation of
paulb@78 190
    processes.
paulb@78 191
  * Relicensed under the LGPL (version 3 or later) - this also fixes the
paulb@78 192
    contradictory situation where the GPL was stated in the pprocess module
paulb@78 193
    (which was not, in fact, the intention) and the LGPL was stated in the
paulb@78 194
    documentation.
paulb@78 195
paulb@73 196
New in parallel 0.2.4 (Changes since parallel 0.2.3)
paulb@73 197
----------------------------------------------------
paulb@73 198
paulb@73 199
  * Set buffer sizes to zero for the file object wrappers around sockets: this
paulb@73 200
    may prevent deadlock issues.
paulb@73 201
paulb@68 202
New in parallel 0.2.3 (Changes since parallel 0.2.2)
paulb@68 203
----------------------------------------------------
paulb@68 204
paulb@68 205
  * Added convenient message exchanges, offering methods handling common
paulb@68 206
    situations at the cost of having to define a subclass of Exchange.
paulb@68 207
  * Added a simple example of performing a parallel computation.
paulb@68 208
  * Improved the PyGmy raytracer example to use the newly added functionality.
paulb@68 209
paulb@55 210
New in parallel 0.2.2 (Changes since parallel 0.2.1)
paulb@55 211
----------------------------------------------------
paulb@55 212
paulb@55 213
  * Changed the status testing in the Exchange class, potentially fixing the
paulb@55 214
    premature closure of channels before all data was read.
paulb@55 215
  * Fixed the PyGmy raytracer example's process accounting by relying on the
paulb@55 216
    possibly more reliable Exchange behaviour, whilst also preventing
paulb@55 217
    erroneous creation of "out of bounds" processes.
paulb@58 218
  * Added a removed attribute on the Exchange to record which channels were
paulb@58 219
    removed in the last call to the ready method.
paulb@55 220
paulb@48 221
New in parallel 0.2.1 (Changes since parallel 0.2)
paulb@48 222
--------------------------------------------------
paulb@48 223
paulb@48 224
  * Added a PyGmy raytracer example.
paulb@53 225
  * Updated copyright and licensing details (FSF address, additional works).
paulb@48 226
paulb@40 227
New in parallel 0.2 (Changes since parallel 0.1)
paulb@40 228
------------------------------------------------
paulb@40 229
paulb@40 230
  * Changed the name of the included module from parallel to pprocess in order
paulb@40 231
    to avoid naming conflicts with PyParallel.
paulb@40 232
paulb@22 233
Release Procedures
paulb@22 234
------------------
paulb@22 235
paulb@40 236
Update the pprocess __version__ attribute.
paulb@22 237
Change the version number and package filename/directory in the documentation.
paulb@22 238
Update the release notes (see above).
paulb@22 239
Check the release information in the PKG-INFO file.
paulb@22 240
Tag, export.
paulb@22 241
Archive, upload.
paulb@68 242
Update PyPI.
paulb@26 243
paulb@26 244
Making Packages
paulb@26 245
---------------
paulb@26 246
paulb@44 247
To make Debian-based packages:
paulb@26 248
paulb@44 249
  1. Create new package directories under packages if necessary.
paulb@26 250
  2. Make a symbolic link in the distribution's root directory to keep the
paulb@26 251
     Debian tools happy:
paulb@26 252
paulb@44 253
     ln -s packages/ubuntu-hoary/python2.4-parallel-pprocess/debian/
paulb@26 254
paulb@100 255
     Or:
paulb@100 256
paulb@100 257
     ln -s packages/ubuntu-feisty/python-pprocess/debian/
paulb@100 258
paulb@26 259
  3. Run the package builder:
paulb@26 260
paulb@26 261
     dpkg-buildpackage -rfakeroot
paulb@26 262
paulb@26 263
  4. Locate and tidy up the packages in the parent directory of the
paulb@26 264
     distribution's root directory.