pprocess

Annotated README.txt

147:6d1f2970de78
2008-06-02 paulb [project @ 2008-06-02 22:44:40 by paulb] Updated release information.
paulb@22 1
Introduction
paulb@22 2
------------
paulb@22 3
paulb@40 4
The pprocess module provides elementary support for parallel programming in
paulb@22 5
Python using a fork-based process creation model in conjunction with a
paulb@68 6
channel-based communications model implemented using socketpair and poll. On
paulb@68 7
systems with multiple CPUs or multicore CPUs, processes should take advantage
paulb@68 8
of as many CPUs or cores as the operating system permits.
paulb@22 9
paulb@140 10
Tutorial
paulb@140 11
--------
paulb@140 12
paulb@140 13
The tutorial provides some information about the examples described below.
paulb@144 14
See the docs/tutorial.html file in the distribution for more details.
paulb@140 15
paulb@140 16
Reference
paulb@140 17
---------
paulb@140 18
paulb@140 19
A description of the different mechanisms provided by the pprocess module can
paulb@144 20
be found in the reference document. See the docs/reference.html file in the
paulb@140 21
distribution for more details.
paulb@140 22
paulb@22 23
Quick Start
paulb@22 24
-----------
paulb@22 25
paulb@105 26
Try running the simple examples. For example:
paulb@68 27
paulb@100 28
PYTHONPATH=. python examples/simple_create.py
paulb@105 29
paulb@105 30
(These examples show in different ways how limited number of processes can be
paulb@113 31
used to perform a parallel computation. The simple.py, simple1.py, simple2.py
paulb@113 32
and simple_map.py programs are sequential versions of the other programs.)
paulb@105 33
paulb@105 34
The following table summarises the features used in the programs:
paulb@105 35
paulb@113 36
Program (.py)         pmap  MakeParallel manage start create Map Queue Exchange
paulb@113 37
-------------         ----  ------------ ------ ----- ------ --- ----- --------
paulb@113 38
simple_create_map                                     Yes    Yes
paulb@113 39
simple_create_queue                                   Yes        Yes
paulb@113 40
simple_create                                         Yes              Yes
paulb@113 41
simple_managed_map          Yes          Yes                 Yes
paulb@113 42
simple_managed_queue        Yes          Yes                     Yes
paulb@113 43
simple_managed              Yes          Yes                           Yes
paulb@113 44
simple_pmap           Yes
paulb@113 45
simple_start_queue          Yes                 Yes              Yes
paulb@113 46
simple_start                                    Yes                    Yes
paulb@68 47
paulb@105 48
The simplest parallel program is simple_pmap.py which employs the pmap
paulb@105 49
function resembling the built-in map function in Python.
paulb@105 50
paulb@105 51
Other simple programs are those employing the Queue class, together with those
paulb@105 52
using the manage method which associates functions or callables with Queue or
paulb@105 53
Exchange objects for convenient invocation of those functions and the
paulb@105 54
management of their communications.
paulb@105 55
paulb@105 56
The most technically involved program is simple_start.py which uses the
paulb@105 57
Exchange class together with a calculation function which is aware of the
paulb@105 58
parallel environment and which communicates over the supplied communications
paulb@105 59
channel directly to the creating process.
paulb@105 60
paulb@105 61
It should be noted that with the exception of simple_start.py, those examples
paulb@105 62
employing calculation functions (as opposed to doing a calculation inline in a
paulb@105 63
loop body) all use MakeParallel to make those functions parallel-aware, thus
paulb@105 64
permitting the conversion of "normal" functions to a form usable in the
paulb@105 65
parallel environment.
paulb@100 66
paulb@140 67
Reusable Processes
paulb@140 68
------------------
paulb@140 69
paulb@119 70
An additional example not listed above, simple_managed_map_reusable.py,
paulb@119 71
employs the MakeReusable class instead of MakeParallel in order to demonstrate
paulb@140 72
reusable processes and channels:
paulb@140 73
paulb@140 74
PYTHONPATH=. python examples/simple_managed_map_reusable.py
paulb@140 75
paulb@140 76
Persistent Processes
paulb@140 77
--------------------
paulb@119 78
paulb@140 79
A number of persistent variants of some of the above examples employ a
paulb@140 80
persistent or background process which can be started by one process and
paulb@140 81
contacted later by another in order to collect the results of a computation.
paulb@140 82
For example:
paulb@140 83
paulb@140 84
PYTHONPATH=. python examples/simple_persistent_managed.py --start
paulb@140 85
PYTHONPATH=. python examples/simple_persistent_managed.py --reconnect
paulb@140 86
paulb@144 87
PYTHONPATH=. python examples/simple_background_queue.py --start
paulb@144 88
PYTHONPATH=. python examples/simple_background_queue.py --reconnect
paulb@100 89
paulb@105 90
Parallel Raytracing with PyGmy
paulb@105 91
------------------------------
paulb@105 92
paulb@100 93
The PyGmy raytracer modified to use pprocess can be run to investigate the
paulb@105 94
potential for speed increases in "real world" programs:
paulb@68 95
paulb@100 96
cd examples/PyGmy
paulb@100 97
PYTHONPATH=../..:. python scene.py
paulb@100 98
paulb@100 99
(This should produce a file called test.tif - a TIFF file containing a
paulb@100 100
raytraced scene image.)
paulb@100 101
paulb@105 102
Test Programs
paulb@105 103
-------------
paulb@105 104
paulb@100 105
There are some elementary tests:
paulb@22 106
paulb@22 107
PYTHONPATH=. python tests/create_loop.py
paulb@22 108
PYTHONPATH=. python tests/start_loop.py
paulb@22 109
paulb@22 110
(Simple loop demonstrations which use two different ways of creating and
paulb@22 111
starting the parallel processes.)
paulb@22 112
paulb@36 113
PYTHONPATH=. python tests/start_indexer.py <directory>
paulb@22 114
paulb@36 115
(A text indexing demonstration, where <directory> should be a directory
paulb@36 116
containing text files to be indexed, although HTML files will also work well
paulb@36 117
enough. After indexing the files, a prompt will appear, words or word
paulb@36 118
fragments can be entered, and matching words and their locations will be
paulb@36 119
shown. Run the program without arguments to see more information.)
paulb@22 120
paulb@22 121
Contact, Copyright and Licence Information
paulb@22 122
------------------------------------------
paulb@22 123
paulb@132 124
The current Web page for pprocess at the time of release is:
paulb@132 125
paulb@132 126
http://www.boddie.org.uk/python/pprocess.html
paulb@132 127
paulb@132 128
The author can be contacted at the following e-mail address:
paulb@22 129
paulb@22 130
paul@boddie.org.uk
paulb@22 131
paulb@22 132
Copyright and licence information can be found in the docs directory - see
paulb@78 133
docs/COPYING.txt, docs/lgpl-3.0.txt and docs/gpl-3.0.txt for more information.
paulb@22 134
paulb@48 135
For the PyGmy raytracer example, different copyright and licence information
paulb@48 136
is provided in the docs directory - see docs/COPYING-PyGmy.txt and
paulb@48 137
docs/LICENCE-PyGmy.txt for more information.
paulb@48 138
paulb@22 139
Dependencies
paulb@22 140
------------
paulb@22 141
paulb@22 142
This software depends on standard library features which are stated as being
paulb@22 143
available only on "UNIX"; it has only been tested on a GNU/Linux system.
paulb@22 144
paulb@144 145
New in pprocess 0.4 (Changes since pprocess 0.3.1)
paulb@144 146
--------------------------------------------------
paulb@135 147
paulb@140 148
  * Added support for persistent/background processes.
paulb@135 149
  * Added a utility function to detect and return the number of processor
paulb@135 150
    cores available.
paulb@137 151
  * Added missing documentation stylesheet.
paulb@135 152
paulb@131 153
New in pprocess 0.3.1 (Changes since pprocess 0.3)
paulb@131 154
--------------------------------------------------
paulb@131 155
paulb@131 156
  * Moved the reference material out of the module docstring and into a
paulb@131 157
    separate document, converting it to XHTML in the process.
paulb@131 158
  * Fixed the project name in the setup script.
paulb@131 159
paulb@126 160
New in pprocess 0.3 (Changes since parallel 0.2.5)
paulb@100 161
--------------------------------------------------
paulb@84 162
paulb@84 163
  * Added managed callables: wrappers around callables which cause them to be
paulb@84 164
    automatically managed by the exchange from which they were acquired.
paulb@84 165
  * Added MakeParallel: a wrapper instantiated around a normal function which
paulb@84 166
    sends the result of that function over the supplied channel when invoked.
paulb@119 167
  * Added MakeReusable: a wrapper like MakeParallel which can be used in
paulb@119 168
    conjunction with the newly-added reuse capability of the Exchange class in
paulb@119 169
    order to reuse processes and channels.
paulb@89 170
  * Added a Map class which attempts to emulate the built-in map function,
paulb@89 171
    along with a pmap function using this class.
paulb@100 172
  * Added a Queue class which provides a simpler iterator-style interface to
paulb@100 173
    data produced by created processes.
paulb@100 174
  * Added a create method to the Exchange class and an exit convenience
paulb@100 175
    function to the module.
paulb@100 176
  * Changed the Exchange implementation to not block when attempting to start
paulb@100 177
    new processes beyond the process limit: such requests are queued and
paulb@100 178
    performed as running processes are completed. This permits programs using
paulb@100 179
    the start method to proceed to consumption of results more quickly.
paulb@105 180
  * Extended and updated the examples. Added a tutorial.
paulb@100 181
  * Added Ubuntu Feisty (7.04) package support.
paulb@84 182
paulb@78 183
New in parallel 0.2.5 (Changes since parallel 0.2.4)
paulb@78 184
----------------------------------------------------
paulb@78 185
paulb@78 186
  * Added a start method to the Exchange class for more convenient creation of
paulb@78 187
    processes.
paulb@78 188
  * Relicensed under the LGPL (version 3 or later) - this also fixes the
paulb@78 189
    contradictory situation where the GPL was stated in the pprocess module
paulb@78 190
    (which was not, in fact, the intention) and the LGPL was stated in the
paulb@78 191
    documentation.
paulb@78 192
paulb@73 193
New in parallel 0.2.4 (Changes since parallel 0.2.3)
paulb@73 194
----------------------------------------------------
paulb@73 195
paulb@73 196
  * Set buffer sizes to zero for the file object wrappers around sockets: this
paulb@73 197
    may prevent deadlock issues.
paulb@73 198
paulb@68 199
New in parallel 0.2.3 (Changes since parallel 0.2.2)
paulb@68 200
----------------------------------------------------
paulb@68 201
paulb@68 202
  * Added convenient message exchanges, offering methods handling common
paulb@68 203
    situations at the cost of having to define a subclass of Exchange.
paulb@68 204
  * Added a simple example of performing a parallel computation.
paulb@68 205
  * Improved the PyGmy raytracer example to use the newly added functionality.
paulb@68 206
paulb@55 207
New in parallel 0.2.2 (Changes since parallel 0.2.1)
paulb@55 208
----------------------------------------------------
paulb@55 209
paulb@55 210
  * Changed the status testing in the Exchange class, potentially fixing the
paulb@55 211
    premature closure of channels before all data was read.
paulb@55 212
  * Fixed the PyGmy raytracer example's process accounting by relying on the
paulb@55 213
    possibly more reliable Exchange behaviour, whilst also preventing
paulb@55 214
    erroneous creation of "out of bounds" processes.
paulb@58 215
  * Added a removed attribute on the Exchange to record which channels were
paulb@58 216
    removed in the last call to the ready method.
paulb@55 217
paulb@48 218
New in parallel 0.2.1 (Changes since parallel 0.2)
paulb@48 219
--------------------------------------------------
paulb@48 220
paulb@48 221
  * Added a PyGmy raytracer example.
paulb@53 222
  * Updated copyright and licensing details (FSF address, additional works).
paulb@48 223
paulb@40 224
New in parallel 0.2 (Changes since parallel 0.1)
paulb@40 225
------------------------------------------------
paulb@40 226
paulb@40 227
  * Changed the name of the included module from parallel to pprocess in order
paulb@40 228
    to avoid naming conflicts with PyParallel.
paulb@40 229
paulb@22 230
Release Procedures
paulb@22 231
------------------
paulb@22 232
paulb@40 233
Update the pprocess __version__ attribute.
paulb@22 234
Change the version number and package filename/directory in the documentation.
paulb@22 235
Update the release notes (see above).
paulb@22 236
Check the release information in the PKG-INFO file.
paulb@22 237
Tag, export.
paulb@22 238
Archive, upload.
paulb@68 239
Update PyPI.
paulb@26 240
paulb@26 241
Making Packages
paulb@26 242
---------------
paulb@26 243
paulb@44 244
To make Debian-based packages:
paulb@26 245
paulb@44 246
  1. Create new package directories under packages if necessary.
paulb@26 247
  2. Make a symbolic link in the distribution's root directory to keep the
paulb@26 248
     Debian tools happy:
paulb@26 249
paulb@44 250
     ln -s packages/ubuntu-hoary/python2.4-parallel-pprocess/debian/
paulb@26 251
paulb@100 252
     Or:
paulb@100 253
paulb@100 254
     ln -s packages/ubuntu-feisty/python-pprocess/debian/
paulb@100 255
paulb@26 256
  3. Run the package builder:
paulb@26 257
paulb@26 258
     dpkg-buildpackage -rfakeroot
paulb@26 259
paulb@26 260
  4. Locate and tidy up the packages in the parent directory of the
paulb@26 261
     distribution's root directory.