pprocess

README.txt

176:c3eaa3391299
2016-12-19 Paul Boddie Updated copyright and release information.
     1 Introduction
     2 ------------
     3 
     4 The pprocess module provides elementary support for parallel programming in
     5 Python using a fork-based process creation model in conjunction with a
     6 channel-based communications model implemented using socketpair and poll. On
     7 systems with multiple CPUs or multicore CPUs, processes should take advantage
     8 of as many CPUs or cores as the operating system permits.
     9 
    10 Since pprocess distributes work to other processes, certain aspects of the
    11 behaviour of those processes may differ from the normal behaviour of such
    12 code. For example, any mutable objects distributed to other processes can
    13 still be modified, but any modifications will not be visible outside the
    14 processes making such modifications.
    15 
    16 Tutorial
    17 --------
    18 
    19 The tutorial provides some information about the examples described below.
    20 See the docs/tutorial.html file in the distribution for more details.
    21 
    22 Reference
    23 ---------
    24 
    25 A description of the different mechanisms provided by the pprocess module can
    26 be found in the reference document. See the docs/reference.html file in the
    27 distribution for more details.
    28 
    29 Quick Start
    30 -----------
    31 
    32 Try running the simple examples. For example:
    33 
    34 PYTHONPATH=. python examples/simple_create.py
    35 
    36 (These examples show in different ways how limited number of processes can be
    37 used to perform a parallel computation. The simple.py, simple1.py, simple2.py
    38 and simple_map.py programs are sequential versions of the other programs.)
    39 
    40 The following table summarises the features used in the programs:
    41 
    42 Program (.py)         pmap  MakeParallel manage start create Map Queue Exchange
    43 -------------         ----  ------------ ------ ----- ------ --- ----- --------
    44 simple_create_map                                     Yes    Yes
    45 simple_create_queue                                   Yes        Yes
    46 simple_create                                         Yes              Yes
    47 simple_managed_map          Yes          Yes                 Yes
    48 simple_managed_queue        Yes          Yes                     Yes
    49 simple_managed              Yes          Yes                           Yes
    50 simple_pmap           Yes
    51 simple_pmap_iter      Yes
    52 simple_start_queue          Yes                 Yes              Yes
    53 simple_start                                    Yes                    Yes
    54 
    55 The simplest parallel programs are simple_pmap.py and simple_pmap_iter.py
    56 which employ the pmap function resembling the built-in map function in
    57 Python.
    58 
    59 Other simple programs are those employing the Queue class, together with those
    60 using the manage method which associates functions or callables with Queue or
    61 Exchange objects for convenient invocation of those functions and the
    62 management of their communications.
    63 
    64 The most technically involved program is simple_start.py which uses the
    65 Exchange class together with a calculation function which is aware of the
    66 parallel environment and which communicates over the supplied communications
    67 channel directly to the creating process.
    68 
    69 It should be noted that with the exception of simple_start.py, those examples
    70 employing calculation functions (as opposed to doing a calculation inline in a
    71 loop body) all use MakeParallel to make those functions parallel-aware, thus
    72 permitting the conversion of "normal" functions to a form usable in the
    73 parallel environment.
    74 
    75 Reusable Processes
    76 ------------------
    77 
    78 An additional example not listed above, simple_managed_map_reusable.py,
    79 employs the MakeReusable class instead of MakeParallel in order to demonstrate
    80 reusable processes and channels:
    81 
    82 PYTHONPATH=. python examples/simple_managed_map_reusable.py
    83 
    84 Continuous Process Communications
    85 ---------------------------------
    86 
    87 Another example not listed above, simple_continuous_queue.py, employs
    88 continuous communications to monitor output from created processes:
    89 
    90 PYTHONPATH=. python examples/simple_continuous_queue.py
    91 
    92 Persistent Processes
    93 --------------------
    94 
    95 A number of persistent variants of some of the above examples employ a
    96 persistent or background process which can be started by one process and
    97 contacted later by another in order to collect the results of a computation.
    98 For example:
    99 
   100 PYTHONPATH=. python examples/simple_persistent_managed.py --start
   101 PYTHONPATH=. python examples/simple_persistent_managed.py --reconnect
   102 
   103 PYTHONPATH=. python examples/simple_background_queue.py --start
   104 PYTHONPATH=. python examples/simple_background_queue.py --reconnect
   105 
   106 PYTHONPATH=. python examples/simple_persistent_queue.py --start
   107 PYTHONPATH=. python examples/simple_persistent_queue.py --reconnect
   108 
   109 Parallel Raytracing with PyGmy
   110 ------------------------------
   111 
   112 The PyGmy raytracer modified to use pprocess can be run to investigate the
   113 potential for speed increases in "real world" programs:
   114 
   115 cd examples/PyGmy
   116 PYTHONPATH=../..:. python scene.py
   117 
   118 (This should produce a file called test.tif - a TIFF file containing a
   119 raytraced scene image.)
   120 
   121 Examples from the Concurrency SIG
   122 ---------------------------------
   123 
   124 The special interest group (SIG) for concurrency in Python proposed a
   125 particular application as a showcase for concurrency libraries. Two examples
   126 are included which demonstrate pprocess and the use of continuous processes to
   127 implement the application concerned:
   128 
   129 PYTHONPATH=. python examples/concurrency-sig/bottles.py
   130 PYTHONPATH=. python examples/concurrency-sig/bottles_heartbeat.py
   131 
   132 Examples of Modifying Mutable Objects
   133 -------------------------------------
   134 
   135 Mutable objects can be modified in processes created by pprocess, but the
   136 modifications will not be visible in the parent process. The following
   137 examples illustrate the problem:
   138 
   139 PYTHONPATH=. python examples/simple_mutation.py
   140 PYTHONPATH=. python examples/simple_mutation_queue.py
   141 
   142 The former, non-parallel program will display the expected result of the
   143 computation, whereas the latter, parallel program will fail to do so. This is
   144 because the latter attempts to modify the input collection in order to use it
   145 as a result collection, but these modifications are not propagated back to the
   146 parent process.
   147 
   148 Test Programs
   149 -------------
   150 
   151 There are some elementary tests:
   152 
   153 PYTHONPATH=. python tests/create_loop.py
   154 PYTHONPATH=. python tests/start_loop.py
   155 
   156 (Simple loop demonstrations which use two different ways of creating and
   157 starting the parallel processes.)
   158 
   159 PYTHONPATH=. python tests/start_indexer.py <directory>
   160 
   161 (A text indexing demonstration, where <directory> should be a directory
   162 containing text files to be indexed, although HTML files will also work well
   163 enough. After indexing the files, a prompt will appear, words or word
   164 fragments can be entered, and matching words and their locations will be
   165 shown. Run the program without arguments to see more information.)
   166 
   167 Contact, Copyright and Licence Information
   168 ------------------------------------------
   169 
   170 The current Web page for pprocess at the time of release is:
   171 
   172 http://www.boddie.org.uk/python/pprocess.html
   173 
   174 The author can be contacted at the following e-mail address:
   175 
   176 paul@boddie.org.uk
   177 
   178 Copyright and licence information can be found in the docs directory - see
   179 docs/COPYING.txt, docs/lgpl-3.0.txt and docs/gpl-3.0.txt for more information.
   180 
   181 For the PyGmy raytracer example, different copyright and licence information
   182 is provided in the docs directory - see docs/COPYING-PyGmy.txt and
   183 docs/LICENCE-PyGmy.txt for more information.
   184 
   185 Dependencies
   186 ------------
   187 
   188 This software depends on standard library features which are stated as being
   189 available only on "UNIX"; it has only been tested repeatedly on a GNU/Linux
   190 system, and occasionally on systems running OpenSolaris.
   191 
   192 New in pprocess 0.5.3 (Changes since pprocess 0.5.2)
   193 ----------------------------------------------------
   194 
   195   * Added CPU core counting for Mac OS X, based on feedback from Kai Staats.
   196 
   197 New in pprocess 0.5.2 (Changes since pprocess 0.5.1)
   198 ----------------------------------------------------
   199 
   200   * Added examples involving mutable objects and the inability of pprocess to
   201     automatically propagate changes to such objects back to parent processes.
   202   * Added an explanatory section to the tutorial about data exchange between
   203     processes and the differences from "normal" Python program behaviour.
   204 
   205 New in pprocess 0.5.1 (Changes since pprocess 0.5)
   206 --------------------------------------------------
   207 
   208   * Added IOError handling when processes exit apparently without warning.
   209 
   210 New in pprocess 0.5 (Changes since pprocess 0.4)
   211 ------------------------------------------------
   212 
   213   * Added proper support in the Exchange class for continuous communications
   214     between processes, providing examples: simple_continuous_queue.py and the
   215     concurrency-sig directory.
   216   * Changed the Map class to permit incremental access to received results
   217     from completed parts of the sequence of inputs, also adding an iteration
   218     interface.
   219   * Added an example, simple_pmap_iter.py, to demonstrate iteration over maps.
   220   * Fixed the get_number_of_cores function to work with /proc/cpuinfo where
   221     the "physical id" field is missing.
   222   * Tidied the Exchange class, adding distinct status methods: unfinished and
   223     busy.
   224 
   225 New in pprocess 0.4 (Changes since pprocess 0.3.1)
   226 --------------------------------------------------
   227 
   228   * Added support for persistent/background processes.
   229   * Added a utility function to detect and return the number of processor
   230     cores available.
   231   * Added missing documentation stylesheet.
   232   * Added support for Solaris using pipes instead of socket pairs, since
   233     the latter do not apparently work properly with poll on Solaris.
   234 
   235 New in pprocess 0.3.1 (Changes since pprocess 0.3)
   236 --------------------------------------------------
   237 
   238   * Moved the reference material out of the module docstring and into a
   239     separate document, converting it to XHTML in the process.
   240   * Fixed the project name in the setup script.
   241 
   242 New in pprocess 0.3 (Changes since parallel 0.2.5)
   243 --------------------------------------------------
   244 
   245   * Added managed callables: wrappers around callables which cause them to be
   246     automatically managed by the exchange from which they were acquired.
   247   * Added MakeParallel: a wrapper instantiated around a normal function which
   248     sends the result of that function over the supplied channel when invoked.
   249   * Added MakeReusable: a wrapper like MakeParallel which can be used in
   250     conjunction with the newly-added reuse capability of the Exchange class in
   251     order to reuse processes and channels.
   252   * Added a Map class which attempts to emulate the built-in map function,
   253     along with a pmap function using this class.
   254   * Added a Queue class which provides a simpler iterator-style interface to
   255     data produced by created processes.
   256   * Added a create method to the Exchange class and an exit convenience
   257     function to the module.
   258   * Changed the Exchange implementation to not block when attempting to start
   259     new processes beyond the process limit: such requests are queued and
   260     performed as running processes are completed. This permits programs using
   261     the start method to proceed to consumption of results more quickly.
   262   * Extended and updated the examples. Added a tutorial.
   263   * Added Ubuntu Feisty (7.04) package support.
   264 
   265 New in parallel 0.2.5 (Changes since parallel 0.2.4)
   266 ----------------------------------------------------
   267 
   268   * Added a start method to the Exchange class for more convenient creation of
   269     processes.
   270   * Relicensed under the LGPL (version 3 or later) - this also fixes the
   271     contradictory situation where the GPL was stated in the pprocess module
   272     (which was not, in fact, the intention) and the LGPL was stated in the
   273     documentation.
   274 
   275 New in parallel 0.2.4 (Changes since parallel 0.2.3)
   276 ----------------------------------------------------
   277 
   278   * Set buffer sizes to zero for the file object wrappers around sockets: this
   279     may prevent deadlock issues.
   280 
   281 New in parallel 0.2.3 (Changes since parallel 0.2.2)
   282 ----------------------------------------------------
   283 
   284   * Added convenient message exchanges, offering methods handling common
   285     situations at the cost of having to define a subclass of Exchange.
   286   * Added a simple example of performing a parallel computation.
   287   * Improved the PyGmy raytracer example to use the newly added functionality.
   288 
   289 New in parallel 0.2.2 (Changes since parallel 0.2.1)
   290 ----------------------------------------------------
   291 
   292   * Changed the status testing in the Exchange class, potentially fixing the
   293     premature closure of channels before all data was read.
   294   * Fixed the PyGmy raytracer example's process accounting by relying on the
   295     possibly more reliable Exchange behaviour, whilst also preventing
   296     erroneous creation of "out of bounds" processes.
   297   * Added a removed attribute on the Exchange to record which channels were
   298     removed in the last call to the ready method.
   299 
   300 New in parallel 0.2.1 (Changes since parallel 0.2)
   301 --------------------------------------------------
   302 
   303   * Added a PyGmy raytracer example.
   304   * Updated copyright and licensing details (FSF address, additional works).
   305 
   306 New in parallel 0.2 (Changes since parallel 0.1)
   307 ------------------------------------------------
   308 
   309   * Changed the name of the included module from parallel to pprocess in order
   310     to avoid naming conflicts with PyParallel.
   311 
   312 Release Procedures
   313 ------------------
   314 
   315 Update the pprocess __version__ attribute and the setup.py file version field.
   316 Change the version number and package filename/directory in the documentation.
   317 Update the release notes (see above).
   318 Check the release information in the PKG-INFO file.
   319 Tag, export.
   320 Archive, upload.
   321 Update PyPI.
   322 
   323 Making Packages
   324 ---------------
   325 
   326 To make Debian-based packages:
   327 
   328   1. Create new package directories under packages if necessary.
   329   2. Make a symbolic link in the distribution's root directory to keep the
   330      Debian tools happy:
   331 
   332      ln -s packages/ubuntu-hoary/python2.4-parallel-pprocess/debian/
   333 
   334      Or:
   335 
   336      ln -s packages/ubuntu-feisty/python-pprocess/debian/
   337 
   338   3. Run the package builder:
   339 
   340      dpkg-buildpackage -rfakeroot
   341 
   342   4. Locate and tidy up the packages in the parent directory of the
   343      distribution's root directory.