paulb@40 | 1 | #!/usr/bin/env python |
paulb@40 | 2 | |
paulb@40 | 3 | """ |
paulb@40 | 4 | A simple parallel processing API for Python, inspired somewhat by the thread |
paulb@40 | 5 | module, slightly less by pypar, and slightly less still by pypvm. |
paulb@40 | 6 | |
paulb@67 | 7 | Copyright (C) 2005, 2006, 2007 Paul Boddie <paul@boddie.org.uk> |
paulb@41 | 8 | |
paulb@79 | 9 | This program is free software; you can redistribute it and/or modify it under |
paulb@79 | 10 | the terms of the GNU Lesser General Public License as published by the Free |
paulb@79 | 11 | Software Foundation; either version 3 of the License, or (at your option) any |
paulb@79 | 12 | later version. |
paulb@41 | 13 | |
paulb@79 | 14 | This program is distributed in the hope that it will be useful, but WITHOUT |
paulb@79 | 15 | ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS |
paulb@79 | 16 | FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more |
paulb@79 | 17 | details. |
paulb@41 | 18 | |
paulb@79 | 19 | You should have received a copy of the GNU Lesser General Public License along |
paulb@79 | 20 | with this program. If not, see <http://www.gnu.org/licenses/>. |
paulb@41 | 21 | |
paulb@41 | 22 | -------- |
paulb@41 | 23 | |
paulb@40 | 24 | Thread-style Processing |
paulb@40 | 25 | ----------------------- |
paulb@40 | 26 | |
paulb@40 | 27 | To create new processes to run a function or any callable object, specify the |
paulb@40 | 28 | "callable" and any arguments as follows: |
paulb@40 | 29 | |
paulb@79 | 30 | channel = pprocess.start(fn, arg1, arg2, named1=value1, named2=value2) |
paulb@40 | 31 | |
paulb@40 | 32 | This returns a channel which can then be used to communicate with the created |
paulb@40 | 33 | process. Meanwhile, in the created process, the given callable will be invoked |
paulb@40 | 34 | with another channel as its first argument followed by the specified arguments: |
paulb@40 | 35 | |
paulb@40 | 36 | def fn(channel, arg1, arg2, named1, named2): |
paulb@40 | 37 | # Read from and write to the channel. |
paulb@40 | 38 | # Return value is ignored. |
paulb@40 | 39 | ... |
paulb@40 | 40 | |
paulb@40 | 41 | Fork-style Processing |
paulb@40 | 42 | --------------------- |
paulb@40 | 43 | |
paulb@40 | 44 | To create new processes in a similar way to that employed when using os.fork |
paulb@40 | 45 | (ie. the fork system call on various operating systems), use the following |
paulb@40 | 46 | method: |
paulb@40 | 47 | |
paulb@97 | 48 | channel = pprocess.create() |
paulb@40 | 49 | if channel.pid == 0: |
paulb@40 | 50 | # This code is run by the created process. |
paulb@40 | 51 | # Read from and write to the channel to communicate with the |
paulb@40 | 52 | # creating/calling process. |
paulb@40 | 53 | # An explicit exit of the process may be desirable to prevent the process |
paulb@40 | 54 | # from running code which is intended for the creating/calling process. |
paulb@40 | 55 | ... |
paulb@97 | 56 | pprocess.exit(channel) |
paulb@40 | 57 | else: |
paulb@40 | 58 | # This code is run by the creating/calling process. |
paulb@40 | 59 | # Read from and write to the channel to communicate with the created |
paulb@40 | 60 | # process. |
paulb@40 | 61 | ... |
paulb@40 | 62 | |
paulb@40 | 63 | Message Exchanges |
paulb@40 | 64 | ----------------- |
paulb@40 | 65 | |
paulb@40 | 66 | When creating many processes, each providing results for the consumption of the |
paulb@40 | 67 | main process, the collection of those results in an efficient fashion can be |
paulb@40 | 68 | problematic: if some processes take longer than others, and if we decide to read |
paulb@40 | 69 | from those processes when they are not ready instead of other processes which |
paulb@40 | 70 | are ready, the whole activity will take much longer than necessary. |
paulb@40 | 71 | |
paulb@40 | 72 | One solution to the problem of knowing when to read from channels is to create |
paulb@40 | 73 | an Exchange object, optionally initialising it with a list of channels through |
paulb@40 | 74 | which data is expected to arrive: |
paulb@40 | 75 | |
paulb@79 | 76 | exchange = pprocess.Exchange() # populate the exchange later |
paulb@79 | 77 | exchange = pprocess.Exchange(channels) # populate the exchange with channels |
paulb@40 | 78 | |
paulb@40 | 79 | We can add channels to the exchange using the add method: |
paulb@40 | 80 | |
paulb@40 | 81 | exchange.add(channel) |
paulb@40 | 82 | |
paulb@40 | 83 | To test whether an exchange is active - that is, whether it is actually |
paulb@40 | 84 | monitoring any channels - we can use the active method which returns all |
paulb@40 | 85 | channels being monitored by the exchange: |
paulb@40 | 86 | |
paulb@40 | 87 | channels = exchange.active() |
paulb@40 | 88 | |
paulb@40 | 89 | We may then check the exchange to see whether any data is ready to be received; |
paulb@40 | 90 | for example: |
paulb@40 | 91 | |
paulb@40 | 92 | for channel in exchange.ready(): |
paulb@40 | 93 | # Read from and write to the channel. |
paulb@40 | 94 | ... |
paulb@40 | 95 | |
paulb@40 | 96 | If we do not wish to wait indefinitely for a list of channels, we can set a |
paulb@40 | 97 | timeout value as an argument to the ready method (as a floating point number |
paulb@40 | 98 | specifying the timeout in seconds, where 0 means a non-blocking poll as stated |
paulb@40 | 99 | in the select module's select function documentation). |
paulb@40 | 100 | |
paulb@67 | 101 | Convenient Message Exchanges |
paulb@67 | 102 | ---------------------------- |
paulb@67 | 103 | |
paulb@67 | 104 | A convenient form of message exchanges can be adopted by defining a subclass of |
paulb@67 | 105 | the Exchange class and defining a particular method: |
paulb@67 | 106 | |
paulb@79 | 107 | class MyExchange(pprocess.Exchange): |
paulb@67 | 108 | def store_data(self, channel): |
paulb@67 | 109 | data = channel.receive() |
paulb@67 | 110 | # Do something with data here. |
paulb@67 | 111 | |
paulb@67 | 112 | The exact operations performed on the received data might be as simple as |
paulb@67 | 113 | storing it on an instance attribute. To make use of the exchange, we would |
paulb@67 | 114 | instantiate it as usual: |
paulb@67 | 115 | |
paulb@67 | 116 | exchange = MyExchange() # populate the exchange later |
paulb@67 | 117 | exchange = MyExchange(limit=10) # set a limit for later population |
paulb@67 | 118 | |
paulb@67 | 119 | The exchange can now be used in a simpler fashion than that shown above. We can |
paulb@67 | 120 | add channels as before using the add method, or we can choose to only add |
paulb@67 | 121 | channels if the specified limit of channels is not exceeded: |
paulb@67 | 122 | |
paulb@67 | 123 | exchange.add(channel) # add a channel as normal |
paulb@67 | 124 | exchange.add_wait(channel) # add a channel, waiting if the limit would be |
paulb@67 | 125 | # exceeded |
paulb@67 | 126 | |
paulb@97 | 127 | Or we can request that the exchange create a channel on our behalf: |
paulb@97 | 128 | |
paulb@97 | 129 | channel = exchange.create() |
paulb@97 | 130 | |
paulb@79 | 131 | We can even start processes and monitor channels without ever handling the |
paulb@79 | 132 | channel ourselves: |
paulb@79 | 133 | |
paulb@79 | 134 | exchange.start(fn, arg1, arg2, named1=value1, named2=value2) |
paulb@79 | 135 | |
paulb@79 | 136 | We can explicitly wait for "free space" for channels by calling the wait method, |
paulb@79 | 137 | although the start and add_wait methods make this less interesting: |
paulb@67 | 138 | |
paulb@67 | 139 | exchange.wait() |
paulb@67 | 140 | |
paulb@67 | 141 | Finally, when finishing the computation, we can choose to merely call the finish |
paulb@67 | 142 | method and have the remaining data processed automatically: |
paulb@67 | 143 | |
paulb@67 | 144 | exchange.finish() |
paulb@67 | 145 | |
paulb@67 | 146 | Clearly, this approach is less flexible but more convenient than the raw message |
paulb@67 | 147 | exchange API as described above. However, it permits much simpler and clearer |
paulb@67 | 148 | code. |
paulb@67 | 149 | |
paulb@97 | 150 | Exchanges as Queues |
paulb@97 | 151 | ------------------- |
paulb@97 | 152 | |
paulb@97 | 153 | Instead of having to subclass the pprocess.Exchange class and to define the |
paulb@97 | 154 | store_data method, it might be more desirable to let the exchange manage the |
paulb@97 | 155 | communications between created and creating processes and to let the creating |
paulb@97 | 156 | process just consume received data as it arrives, without particular regard for |
paulb@97 | 157 | the order of the received data - perhaps the creating process has its own way of |
paulb@97 | 158 | managing such issues. |
paulb@97 | 159 | |
paulb@97 | 160 | For such situations, the Queue class may be instantiated and channels added to |
paulb@97 | 161 | the queue using the various methods provided: |
paulb@97 | 162 | |
paulb@97 | 163 | queue = pprocess.Queue(limit=10) |
paulb@97 | 164 | channel = queue.create() |
paulb@97 | 165 | if channel: |
paulb@97 | 166 | # Do some computation. |
paulb@97 | 167 | pprocess.exit(channel) |
paulb@97 | 168 | |
paulb@97 | 169 | The results can then be consumed by treating the queue like an iterator: |
paulb@97 | 170 | |
paulb@97 | 171 | for result in queue: |
paulb@97 | 172 | # Capture each result. |
paulb@97 | 173 | |
paulb@97 | 174 | This approach does not, of course, require the direct handling of channels. One |
paulb@97 | 175 | could instead use the start method on the queue to create processes and to |
paulb@97 | 176 | initiate computations (since a queue is merely an enhanced exchange with a |
paulb@97 | 177 | specific implementation of the store_data method). |
paulb@97 | 178 | |
paulb@119 | 179 | Exchanges as Maps |
paulb@119 | 180 | ----------------- |
paulb@119 | 181 | |
paulb@119 | 182 | Where the above Queue class appears like an attractive solution for the |
paulb@119 | 183 | management of the results of computations, but where the order of their |
paulb@119 | 184 | consumption by the creating process remains important, the Map class may offer a |
paulb@119 | 185 | suitable way of collecting and accessing results: |
paulb@119 | 186 | |
paulb@119 | 187 | results = pprocess.Map(limit=10) |
paulb@119 | 188 | for value in inputs: |
paulb@119 | 189 | results.start(fn, args) |
paulb@119 | 190 | |
paulb@119 | 191 | The results can then be consumed in an order corresponding to the order of the |
paulb@119 | 192 | computations which produced them: |
paulb@119 | 193 | |
paulb@119 | 194 | for result in results: |
paulb@119 | 195 | # Process each result. |
paulb@119 | 196 | |
paulb@119 | 197 | Internally, the Map object records a particular ordering of channels, ensuring |
paulb@119 | 198 | that the received results can be mapped to this ordering, and that the results |
paulb@119 | 199 | can be made available with this ordering preserved. |
paulb@119 | 200 | |
paulb@84 | 201 | Managed Callables |
paulb@84 | 202 | ----------------- |
paulb@84 | 203 | |
paulb@84 | 204 | A further simplification of the above convenient use of message exchanges |
paulb@84 | 205 | involves the creation of callables (eg. functions) which are automatically |
paulb@88 | 206 | monitored by an exchange. We create such a callable by calling the manage method |
paulb@84 | 207 | on an exchange: |
paulb@84 | 208 | |
paulb@88 | 209 | myfn = exchange.manage(fn) |
paulb@84 | 210 | |
paulb@84 | 211 | This callable can then be invoked instead of using the exchange's start method: |
paulb@84 | 212 | |
paulb@84 | 213 | myfn(arg1, arg2, named1=value1, named2=value2) |
paulb@84 | 214 | |
paulb@84 | 215 | The exchange's finish method can be used as usual to process incoming data. |
paulb@84 | 216 | |
paulb@92 | 217 | Making Existing Functions Parallel |
paulb@92 | 218 | ---------------------------------- |
paulb@92 | 219 | |
paulb@92 | 220 | In making a program parallel, existing functions which only return results can |
paulb@92 | 221 | be manually modified to accept and use channels to communicate results back to |
paulb@92 | 222 | the main process. However, a simple alternative is to use the MakeParallel class |
paulb@92 | 223 | to provide a wrapper around unmodified functions which will return the results |
paulb@92 | 224 | from those functions in the channels provided. For example: |
paulb@92 | 225 | |
paulb@92 | 226 | fn = pprocess.MakeParallel(originalfn) |
paulb@92 | 227 | |
paulb@84 | 228 | Map-style Processing |
paulb@84 | 229 | -------------------- |
paulb@84 | 230 | |
paulb@84 | 231 | In situations where a callable would normally be used in conjunction with the |
paulb@84 | 232 | Python built-in map function, an alternative solution can be adopted by using |
paulb@84 | 233 | the pmap function: |
paulb@84 | 234 | |
paulb@84 | 235 | pprocess.pmap(fn, sequence) |
paulb@84 | 236 | |
paulb@84 | 237 | Here, the sequence would have to contain elements that each contain the required |
paulb@92 | 238 | parameters of the specified callable, fn. Note that the callable does not need |
paulb@92 | 239 | to be a parallel-aware function which has a channel argument: the pmap function |
paulb@92 | 240 | automatically wraps the given callable internally. |
paulb@84 | 241 | |
paulb@119 | 242 | Reusing Processes and Channels |
paulb@119 | 243 | ------------------------------ |
paulb@119 | 244 | |
paulb@119 | 245 | So far, all parallel computations have been done with newly-created processes. |
paulb@119 | 246 | However, this can seem somewhat inefficient, especially if processes are being |
paulb@119 | 247 | continually created and destroyed (although if this happens too often, the |
paulb@119 | 248 | amount of work done by each process may be too little, anyway). One solution is |
paulb@119 | 249 | to retain processes after they have done their work and request that they |
paulb@119 | 250 | perform more work for each new parallel task or invocation. To enable the reuse |
paulb@119 | 251 | of processes in this way, a special keyword argument may be specified when |
paulb@119 | 252 | creating Exchange objects (and subclasses such as Map and Queue). For example: |
paulb@119 | 253 | |
paulb@119 | 254 | exchange = MyExchange(limit=10, reuse=1) # reuse up to 10 processes |
paulb@119 | 255 | |
paulb@119 | 256 | Code invoked through such exchanges must be aware of channels and be constructed |
paulb@119 | 257 | in such a way that it does not terminate after sending a result back to the |
paulb@119 | 258 | creating process. Instead, it should repeatedly wait for subsequent sets of |
paulb@119 | 259 | parameters (compatible with those either in the signature of a callable or with |
paulb@119 | 260 | the original values read from the channel). Reusable code is terminated when the |
paulb@119 | 261 | special value of None is sent from the creating process to the created process, |
paulb@119 | 262 | indicating that no more parameters will be sent; this should cause the code to |
paulb@119 | 263 | terminate. |
paulb@119 | 264 | |
paulb@119 | 265 | Making Existing Functions Parallel and Reusable |
paulb@119 | 266 | ----------------------------------------------- |
paulb@119 | 267 | |
paulb@119 | 268 | An easier way of making reusable code sections for parallel use is to employ the |
paulb@119 | 269 | MakeReusable class to wrap an existing callable: |
paulb@119 | 270 | |
paulb@119 | 271 | fn = pprocess.MakeReusable(originalfn) |
paulb@119 | 272 | |
paulb@119 | 273 | This wraps the callable in a similar fashion to MakeParallel, but provides the |
paulb@119 | 274 | necessary mechanisms described above for reusable code. |
paulb@119 | 275 | |
paulb@40 | 276 | Signals and Waiting |
paulb@40 | 277 | ------------------- |
paulb@40 | 278 | |
paulb@40 | 279 | When created/child processes terminate, one would typically want to be informed |
paulb@40 | 280 | of such conditions using a signal handler. Unfortunately, Python seems to have |
paulb@40 | 281 | issues with restartable reads from file descriptors when interrupted by signals: |
paulb@40 | 282 | |
paulb@40 | 283 | http://mail.python.org/pipermail/python-dev/2002-September/028572.html |
paulb@40 | 284 | http://twistedmatrix.com/bugs/issue733 |
paulb@40 | 285 | |
paulb@40 | 286 | Select and Poll |
paulb@40 | 287 | --------------- |
paulb@40 | 288 | |
paulb@40 | 289 | The exact combination of conditions indicating closed pipes remains relatively |
paulb@40 | 290 | obscure. Here is a message/thread describing them (in the context of another |
paulb@40 | 291 | topic): |
paulb@40 | 292 | |
paulb@40 | 293 | http://twistedmatrix.com/pipermail/twisted-python/2005-February/009666.html |
paulb@40 | 294 | |
paulb@47 | 295 | It would seem, from using sockets and from studying the asyncore module, that |
paulb@40 | 296 | sockets are more predictable than pipes. |
paulb@58 | 297 | |
paulb@58 | 298 | Notes about poll implementations can be found here: |
paulb@58 | 299 | |
paulb@58 | 300 | http://www.greenend.org.uk/rjk/2001/06/poll.html |
paulb@40 | 301 | """ |
paulb@40 | 302 | |
paulb@99 | 303 | __version__ = "0.3" |
paulb@40 | 304 | |
paulb@40 | 305 | import os |
paulb@40 | 306 | import sys |
paulb@40 | 307 | import select |
paulb@40 | 308 | import socket |
paulb@40 | 309 | |
paulb@40 | 310 | try: |
paulb@40 | 311 | import cPickle as pickle |
paulb@40 | 312 | except ImportError: |
paulb@40 | 313 | import pickle |
paulb@40 | 314 | |
paulb@84 | 315 | # Communications. |
paulb@84 | 316 | |
paulb@40 | 317 | class AcknowledgementError(Exception): |
paulb@40 | 318 | pass |
paulb@40 | 319 | |
paulb@40 | 320 | class Channel: |
paulb@40 | 321 | |
paulb@40 | 322 | "A communications channel." |
paulb@40 | 323 | |
paulb@40 | 324 | def __init__(self, pid, read_pipe, write_pipe): |
paulb@40 | 325 | |
paulb@40 | 326 | """ |
paulb@40 | 327 | Initialise the channel with a process identifier 'pid', a 'read_pipe' |
paulb@40 | 328 | from which messages will be received, and a 'write_pipe' into which |
paulb@40 | 329 | messages will be sent. |
paulb@40 | 330 | """ |
paulb@40 | 331 | |
paulb@40 | 332 | self.pid = pid |
paulb@40 | 333 | self.read_pipe = read_pipe |
paulb@40 | 334 | self.write_pipe = write_pipe |
paulb@40 | 335 | self.closed = 0 |
paulb@40 | 336 | |
paulb@40 | 337 | def __del__(self): |
paulb@40 | 338 | |
paulb@40 | 339 | # Since signals don't work well with I/O, we close pipes and wait for |
paulb@40 | 340 | # created processes upon finalisation. |
paulb@40 | 341 | |
paulb@40 | 342 | self.close() |
paulb@40 | 343 | |
paulb@40 | 344 | def close(self): |
paulb@40 | 345 | |
paulb@40 | 346 | "Explicitly close the channel." |
paulb@40 | 347 | |
paulb@40 | 348 | if not self.closed: |
paulb@40 | 349 | self.closed = 1 |
paulb@40 | 350 | self.read_pipe.close() |
paulb@40 | 351 | self.write_pipe.close() |
paulb@40 | 352 | #self.wait(os.WNOHANG) |
paulb@40 | 353 | |
paulb@40 | 354 | def wait(self, options=0): |
paulb@40 | 355 | |
paulb@40 | 356 | "Wait for the created process, if any, to exit." |
paulb@40 | 357 | |
paulb@40 | 358 | if self.pid != 0: |
paulb@40 | 359 | try: |
paulb@40 | 360 | os.waitpid(self.pid, options) |
paulb@40 | 361 | except OSError: |
paulb@40 | 362 | pass |
paulb@40 | 363 | |
paulb@40 | 364 | def _send(self, obj): |
paulb@40 | 365 | |
paulb@40 | 366 | "Send the given object 'obj' through the channel." |
paulb@40 | 367 | |
paulb@40 | 368 | pickle.dump(obj, self.write_pipe) |
paulb@40 | 369 | self.write_pipe.flush() |
paulb@40 | 370 | |
paulb@40 | 371 | def send(self, obj): |
paulb@40 | 372 | |
paulb@40 | 373 | """ |
paulb@40 | 374 | Send the given object 'obj' through the channel. Then wait for an |
paulb@40 | 375 | acknowledgement. (The acknowledgement makes the caller wait, thus |
paulb@40 | 376 | preventing processes from exiting and disrupting the communications |
paulb@40 | 377 | channel and losing data.) |
paulb@40 | 378 | """ |
paulb@40 | 379 | |
paulb@40 | 380 | self._send(obj) |
paulb@40 | 381 | if self._receive() != "OK": |
paulb@40 | 382 | raise AcknowledgementError, obj |
paulb@40 | 383 | |
paulb@40 | 384 | def _receive(self): |
paulb@40 | 385 | |
paulb@40 | 386 | "Receive an object through the channel, returning the object." |
paulb@40 | 387 | |
paulb@40 | 388 | obj = pickle.load(self.read_pipe) |
paulb@40 | 389 | if isinstance(obj, Exception): |
paulb@40 | 390 | raise obj |
paulb@40 | 391 | else: |
paulb@40 | 392 | return obj |
paulb@40 | 393 | |
paulb@40 | 394 | def receive(self): |
paulb@40 | 395 | |
paulb@40 | 396 | """ |
paulb@40 | 397 | Receive an object through the channel, returning the object. Send an |
paulb@40 | 398 | acknowledgement of receipt. (The acknowledgement makes the sender wait, |
paulb@40 | 399 | thus preventing processes from exiting and disrupting the communications |
paulb@40 | 400 | channel and losing data.) |
paulb@40 | 401 | """ |
paulb@40 | 402 | |
paulb@40 | 403 | try: |
paulb@40 | 404 | obj = self._receive() |
paulb@40 | 405 | return obj |
paulb@40 | 406 | finally: |
paulb@40 | 407 | self._send("OK") |
paulb@40 | 408 | |
paulb@84 | 409 | # Management of processes and communications. |
paulb@84 | 410 | |
paulb@40 | 411 | class Exchange: |
paulb@40 | 412 | |
paulb@40 | 413 | """ |
paulb@40 | 414 | A communications exchange that can be used to detect channels which are |
paulb@67 | 415 | ready to communicate. Subclasses of this class can define the 'store_data' |
paulb@67 | 416 | method in order to enable the 'add_wait', 'wait' and 'finish' methods. |
paulb@40 | 417 | """ |
paulb@40 | 418 | |
paulb@116 | 419 | def __init__(self, channels=None, limit=None, reuse=0, autoclose=1): |
paulb@40 | 420 | |
paulb@40 | 421 | """ |
paulb@67 | 422 | Initialise the exchange with an optional list of 'channels'. |
paulb@67 | 423 | |
paulb@67 | 424 | If the optional 'limit' is specified, restrictions on the addition of |
paulb@67 | 425 | new channels can be enforced and observed through the 'add_wait', 'wait' |
paulb@67 | 426 | and 'finish' methods. To make use of these methods, create a subclass of |
paulb@67 | 427 | this class and define a working 'store_data' method. |
paulb@67 | 428 | |
paulb@116 | 429 | If the optional 'reuse' parameter is set to a true value, channels and |
paulb@116 | 430 | processes will be reused for waiting computations. |
paulb@116 | 431 | |
paulb@67 | 432 | If the optional 'autoclose' parameter is set to a false value, channels |
paulb@67 | 433 | will not be closed automatically when they are removed from the exchange |
paulb@67 | 434 | - by default they are closed when removed. |
paulb@40 | 435 | """ |
paulb@40 | 436 | |
paulb@67 | 437 | self.limit = limit |
paulb@116 | 438 | self.reuse = reuse |
paulb@116 | 439 | self.autoclose = autoclose |
paulb@99 | 440 | self.waiting = [] |
paulb@40 | 441 | self.readables = {} |
paulb@58 | 442 | self.removed = [] |
paulb@40 | 443 | self.poller = select.poll() |
paulb@40 | 444 | for channel in channels or []: |
paulb@40 | 445 | self.add(channel) |
paulb@40 | 446 | |
paulb@40 | 447 | def add(self, channel): |
paulb@40 | 448 | |
paulb@40 | 449 | "Add the given 'channel' to the exchange." |
paulb@40 | 450 | |
paulb@40 | 451 | self.readables[channel.read_pipe.fileno()] = channel |
paulb@40 | 452 | self.poller.register(channel.read_pipe.fileno(), select.POLLIN | select.POLLHUP | select.POLLNVAL | select.POLLERR) |
paulb@40 | 453 | |
paulb@40 | 454 | def active(self): |
paulb@40 | 455 | |
paulb@40 | 456 | "Return a list of active channels." |
paulb@40 | 457 | |
paulb@40 | 458 | return self.readables.values() |
paulb@40 | 459 | |
paulb@40 | 460 | def ready(self, timeout=None): |
paulb@40 | 461 | |
paulb@40 | 462 | """ |
paulb@40 | 463 | Wait for a period of time specified by the optional 'timeout' (or until |
paulb@40 | 464 | communication is possible) and return a list of channels which are ready |
paulb@40 | 465 | to be read from. |
paulb@40 | 466 | """ |
paulb@40 | 467 | |
paulb@40 | 468 | fds = self.poller.poll(timeout) |
paulb@40 | 469 | readables = [] |
paulb@58 | 470 | self.removed = [] |
paulb@58 | 471 | |
paulb@40 | 472 | for fd, status in fds: |
paulb@40 | 473 | channel = self.readables[fd] |
paulb@55 | 474 | removed = 0 |
paulb@40 | 475 | |
paulb@40 | 476 | # Remove ended/error channels. |
paulb@40 | 477 | |
paulb@40 | 478 | if status & (select.POLLHUP | select.POLLNVAL | select.POLLERR): |
paulb@40 | 479 | self.remove(channel) |
paulb@58 | 480 | self.removed.append(channel) |
paulb@55 | 481 | removed = 1 |
paulb@40 | 482 | |
paulb@40 | 483 | # Record readable channels. |
paulb@40 | 484 | |
paulb@55 | 485 | if status & select.POLLIN: |
paulb@55 | 486 | if not (removed and self.autoclose): |
paulb@55 | 487 | readables.append(channel) |
paulb@40 | 488 | |
paulb@40 | 489 | return readables |
paulb@40 | 490 | |
paulb@40 | 491 | def remove(self, channel): |
paulb@40 | 492 | |
paulb@40 | 493 | """ |
paulb@40 | 494 | Remove the given 'channel' from the exchange. |
paulb@40 | 495 | """ |
paulb@40 | 496 | |
paulb@40 | 497 | del self.readables[channel.read_pipe.fileno()] |
paulb@40 | 498 | self.poller.unregister(channel.read_pipe.fileno()) |
paulb@40 | 499 | if self.autoclose: |
paulb@40 | 500 | channel.close() |
paulb@40 | 501 | channel.wait() |
paulb@40 | 502 | |
paulb@67 | 503 | # Enhanced exchange methods involving channel limits. |
paulb@67 | 504 | |
paulb@67 | 505 | def add_wait(self, channel): |
paulb@67 | 506 | |
paulb@67 | 507 | """ |
paulb@67 | 508 | Add the given 'channel' to the exchange, waiting if the limit on active |
paulb@67 | 509 | channels would be exceeded by adding the channel. |
paulb@67 | 510 | """ |
paulb@67 | 511 | |
paulb@67 | 512 | self.wait() |
paulb@67 | 513 | self.add(channel) |
paulb@67 | 514 | |
paulb@67 | 515 | def wait(self): |
paulb@67 | 516 | |
paulb@67 | 517 | """ |
paulb@67 | 518 | Test for the limit on channels, blocking and reading incoming data until |
paulb@67 | 519 | the number of channels is below the limit. |
paulb@67 | 520 | """ |
paulb@67 | 521 | |
paulb@67 | 522 | # If limited, block until channels have been closed. |
paulb@67 | 523 | |
paulb@67 | 524 | while self.limit is not None and len(self.active()) >= self.limit: |
paulb@67 | 525 | self.store() |
paulb@67 | 526 | |
paulb@116 | 527 | def start_waiting(self, channel): |
paulb@99 | 528 | |
paulb@99 | 529 | """ |
paulb@116 | 530 | Start a waiting process given the reception of data on the given |
paulb@116 | 531 | 'channel'. |
paulb@99 | 532 | """ |
paulb@99 | 533 | |
paulb@99 | 534 | if self.waiting: |
paulb@99 | 535 | callable, args, kw = self.waiting.pop() |
paulb@116 | 536 | |
paulb@116 | 537 | # Try and reuse existing channels if possible. |
paulb@116 | 538 | |
paulb@116 | 539 | if self.reuse: |
paulb@119 | 540 | |
paulb@119 | 541 | # Re-add the channel - this may update information related to |
paulb@119 | 542 | # the channel in subclasses. |
paulb@119 | 543 | |
paulb@119 | 544 | self.add(channel) |
paulb@116 | 545 | channel.send((args, kw)) |
paulb@116 | 546 | else: |
paulb@116 | 547 | self.add(start(callable, *args, **kw)) |
paulb@116 | 548 | |
paulb@116 | 549 | # Where channels are being reused, but where no processes are waiting |
paulb@116 | 550 | # any more, send a special value to tell them to quit. |
paulb@116 | 551 | |
paulb@116 | 552 | elif self.reuse: |
paulb@116 | 553 | channel.send(None) |
paulb@99 | 554 | |
paulb@67 | 555 | def finish(self): |
paulb@67 | 556 | |
paulb@67 | 557 | """ |
paulb@67 | 558 | Finish the use of the exchange by waiting for all channels to complete. |
paulb@67 | 559 | """ |
paulb@67 | 560 | |
paulb@67 | 561 | while self.active(): |
paulb@67 | 562 | self.store() |
paulb@67 | 563 | |
paulb@67 | 564 | def store(self): |
paulb@67 | 565 | |
paulb@67 | 566 | "For each ready channel, process the incoming data." |
paulb@67 | 567 | |
paulb@67 | 568 | for channel in self.ready(): |
paulb@67 | 569 | self.store_data(channel) |
paulb@116 | 570 | self.start_waiting(channel) |
paulb@67 | 571 | |
paulb@67 | 572 | def store_data(self, channel): |
paulb@67 | 573 | |
paulb@67 | 574 | """ |
paulb@67 | 575 | Store incoming data from the specified 'channel'. In subclasses of this |
paulb@67 | 576 | class, such data could be stored using instance attributes. |
paulb@67 | 577 | """ |
paulb@67 | 578 | |
paulb@67 | 579 | raise NotImplementedError, "store_data" |
paulb@67 | 580 | |
paulb@79 | 581 | # Convenience methods. |
paulb@79 | 582 | |
paulb@84 | 583 | def start(self, callable, *args, **kw): |
paulb@79 | 584 | |
paulb@79 | 585 | """ |
paulb@79 | 586 | Using pprocess.start, create a new process for the given 'callable' |
paulb@79 | 587 | using any additional arguments provided. Then, monitor the channel |
paulb@79 | 588 | created between this process and the created process. |
paulb@79 | 589 | """ |
paulb@79 | 590 | |
paulb@99 | 591 | if self.limit is not None and len(self.active()) >= self.limit: |
paulb@99 | 592 | self.waiting.insert(0, (callable, args, kw)) |
paulb@99 | 593 | return |
paulb@99 | 594 | |
paulb@84 | 595 | self.add_wait(start(callable, *args, **kw)) |
paulb@84 | 596 | |
paulb@97 | 597 | def create(self): |
paulb@97 | 598 | |
paulb@97 | 599 | """ |
paulb@97 | 600 | Using pprocess.create, create a new process and return the created |
paulb@97 | 601 | communications channel to the created process. In the creating process, |
paulb@97 | 602 | return None - the channel receiving data from the created process will |
paulb@97 | 603 | be automatically managed by this exchange. |
paulb@97 | 604 | """ |
paulb@97 | 605 | |
paulb@97 | 606 | channel = create() |
paulb@97 | 607 | if channel.pid == 0: |
paulb@97 | 608 | return channel |
paulb@97 | 609 | else: |
paulb@97 | 610 | self.add_wait(channel) |
paulb@97 | 611 | return None |
paulb@97 | 612 | |
paulb@84 | 613 | def manage(self, callable): |
paulb@84 | 614 | |
paulb@84 | 615 | """ |
paulb@84 | 616 | Wrap the given 'callable' in an object which can then be called in the |
paulb@84 | 617 | same way as 'callable', but with new processes and communications |
paulb@84 | 618 | managed automatically. |
paulb@84 | 619 | """ |
paulb@84 | 620 | |
paulb@84 | 621 | return ManagedCallable(callable, self) |
paulb@84 | 622 | |
paulb@84 | 623 | class ManagedCallable: |
paulb@84 | 624 | |
paulb@84 | 625 | "A callable managed by an exchange." |
paulb@84 | 626 | |
paulb@84 | 627 | def __init__(self, callable, exchange): |
paulb@84 | 628 | |
paulb@84 | 629 | """ |
paulb@84 | 630 | Wrap the given 'callable', using the given 'exchange' to monitor the |
paulb@84 | 631 | channels created for communications between this and the created |
paulb@94 | 632 | processes. Note that the 'callable' must be parallel-aware (that is, |
paulb@94 | 633 | have a 'channel' parameter). Use the MakeParallel class to wrap other |
paulb@94 | 634 | kinds of callable objects. |
paulb@84 | 635 | """ |
paulb@84 | 636 | |
paulb@84 | 637 | self.callable = callable |
paulb@84 | 638 | self.exchange = exchange |
paulb@84 | 639 | |
paulb@84 | 640 | def __call__(self, *args, **kw): |
paulb@84 | 641 | |
paulb@84 | 642 | "Invoke the callable with the supplied arguments." |
paulb@84 | 643 | |
paulb@84 | 644 | self.exchange.start(self.callable, *args, **kw) |
paulb@84 | 645 | |
paulb@84 | 646 | # Abstractions and utilities. |
paulb@84 | 647 | |
paulb@84 | 648 | class Map(Exchange): |
paulb@84 | 649 | |
paulb@84 | 650 | "An exchange which can be used like the built-in 'map' function." |
paulb@84 | 651 | |
paulb@107 | 652 | def __init__(self, *args, **kw): |
paulb@107 | 653 | Exchange.__init__(self, *args, **kw) |
paulb@107 | 654 | self.init() |
paulb@107 | 655 | |
paulb@107 | 656 | def init(self): |
paulb@107 | 657 | |
paulb@107 | 658 | "Remember the channel addition order to order output." |
paulb@107 | 659 | |
paulb@107 | 660 | self.channel_number = 0 |
paulb@107 | 661 | self.channels = {} |
paulb@107 | 662 | self.results = [] |
paulb@107 | 663 | |
paulb@84 | 664 | def add(self, channel): |
paulb@84 | 665 | |
paulb@84 | 666 | "Add the given 'channel' to the exchange." |
paulb@84 | 667 | |
paulb@84 | 668 | Exchange.add(self, channel) |
paulb@92 | 669 | self.channels[channel] = self.channel_number |
paulb@92 | 670 | self.channel_number += 1 |
paulb@84 | 671 | |
paulb@107 | 672 | def start(self, callable, *args, **kw): |
paulb@107 | 673 | |
paulb@107 | 674 | """ |
paulb@107 | 675 | Using pprocess.start, create a new process for the given 'callable' |
paulb@107 | 676 | using any additional arguments provided. Then, monitor the channel |
paulb@107 | 677 | created between this process and the created process. |
paulb@107 | 678 | """ |
paulb@107 | 679 | |
paulb@107 | 680 | self.results.append(None) # placeholder |
paulb@107 | 681 | Exchange.start(self, callable, *args, **kw) |
paulb@107 | 682 | |
paulb@110 | 683 | def create(self): |
paulb@110 | 684 | |
paulb@110 | 685 | """ |
paulb@110 | 686 | Using pprocess.create, create a new process and return the created |
paulb@110 | 687 | communications channel to the created process. In the creating process, |
paulb@110 | 688 | return None - the channel receiving data from the created process will |
paulb@110 | 689 | be automatically managed by this exchange. |
paulb@110 | 690 | """ |
paulb@110 | 691 | |
paulb@110 | 692 | self.results.append(None) # placeholder |
paulb@110 | 693 | return Exchange.create(self) |
paulb@110 | 694 | |
paulb@84 | 695 | def __call__(self, callable, sequence): |
paulb@84 | 696 | |
paulb@89 | 697 | "Wrap and invoke 'callable' for each element in the 'sequence'." |
paulb@89 | 698 | |
paulb@92 | 699 | if not isinstance(callable, MakeParallel): |
paulb@92 | 700 | wrapped = MakeParallel(callable) |
paulb@92 | 701 | else: |
paulb@92 | 702 | wrapped = callable |
paulb@84 | 703 | |
paulb@107 | 704 | self.init() |
paulb@84 | 705 | |
paulb@107 | 706 | # Start processes for each element in the sequence. |
paulb@84 | 707 | |
paulb@84 | 708 | for i in sequence: |
paulb@92 | 709 | self.start(wrapped, i) |
paulb@107 | 710 | |
paulb@107 | 711 | # Access to the results occurs through this object. |
paulb@107 | 712 | |
paulb@107 | 713 | return self |
paulb@84 | 714 | |
paulb@107 | 715 | def __getitem__(self, i): |
paulb@107 | 716 | self.finish() |
paulb@107 | 717 | return self.results[i] |
paulb@84 | 718 | |
paulb@107 | 719 | def __iter__(self): |
paulb@107 | 720 | self.finish() |
paulb@107 | 721 | return iter(self.results) |
paulb@84 | 722 | |
paulb@84 | 723 | def store_data(self, channel): |
paulb@84 | 724 | |
paulb@84 | 725 | "Accumulate the incoming data, associating results with channels." |
paulb@84 | 726 | |
paulb@84 | 727 | data = channel.receive() |
paulb@92 | 728 | self.results[self.channels[channel]] = data |
paulb@92 | 729 | del self.channels[channel] |
paulb@84 | 730 | |
paulb@97 | 731 | class Queue(Exchange): |
paulb@97 | 732 | |
paulb@97 | 733 | """ |
paulb@97 | 734 | An exchange acting as a queue, making data from created processes available |
paulb@97 | 735 | in the order in which it is received. |
paulb@97 | 736 | """ |
paulb@97 | 737 | |
paulb@97 | 738 | def __init__(self, *args, **kw): |
paulb@97 | 739 | Exchange.__init__(self, *args, **kw) |
paulb@97 | 740 | self.queue = [] |
paulb@97 | 741 | |
paulb@97 | 742 | def store_data(self, channel): |
paulb@97 | 743 | |
paulb@97 | 744 | "Accumulate the incoming data, associating results with channels." |
paulb@97 | 745 | |
paulb@97 | 746 | data = channel.receive() |
paulb@97 | 747 | self.queue.insert(0, data) |
paulb@97 | 748 | |
paulb@97 | 749 | def __iter__(self): |
paulb@97 | 750 | return self |
paulb@97 | 751 | |
paulb@97 | 752 | def next(self): |
paulb@97 | 753 | |
paulb@97 | 754 | "Return the next element in the queue." |
paulb@97 | 755 | |
paulb@97 | 756 | if self.queue: |
paulb@97 | 757 | return self.queue.pop() |
paulb@97 | 758 | while self.active(): |
paulb@97 | 759 | self.store() |
paulb@97 | 760 | if self.queue: |
paulb@97 | 761 | return self.queue.pop() |
paulb@97 | 762 | else: |
paulb@97 | 763 | raise StopIteration |
paulb@97 | 764 | |
paulb@84 | 765 | class MakeParallel: |
paulb@84 | 766 | |
paulb@84 | 767 | "A wrapper around functions making them able to communicate results." |
paulb@84 | 768 | |
paulb@84 | 769 | def __init__(self, callable): |
paulb@84 | 770 | |
paulb@94 | 771 | """ |
paulb@94 | 772 | Initialise the wrapper with the given 'callable'. This object will then |
paulb@94 | 773 | be able to accept a 'channel' parameter when invoked, and to forward the |
paulb@94 | 774 | result of the given 'callable' via the channel provided back to the |
paulb@94 | 775 | invoking process. |
paulb@94 | 776 | """ |
paulb@84 | 777 | |
paulb@84 | 778 | self.callable = callable |
paulb@84 | 779 | |
paulb@84 | 780 | def __call__(self, channel, *args, **kw): |
paulb@84 | 781 | |
paulb@84 | 782 | "Invoke the callable and return its result via the given 'channel'." |
paulb@84 | 783 | |
paulb@84 | 784 | channel.send(self.callable(*args, **kw)) |
paulb@84 | 785 | |
paulb@119 | 786 | class MakeReusable(MakeParallel): |
paulb@119 | 787 | |
paulb@119 | 788 | """ |
paulb@119 | 789 | A wrapper around functions making them able to communicate results in a |
paulb@119 | 790 | reusable fashion. |
paulb@119 | 791 | """ |
paulb@119 | 792 | |
paulb@119 | 793 | def __call__(self, channel, *args, **kw): |
paulb@119 | 794 | |
paulb@119 | 795 | "Invoke the callable and return its result via the given 'channel'." |
paulb@119 | 796 | |
paulb@119 | 797 | channel.send(self.callable(*args, **kw)) |
paulb@119 | 798 | t = channel.receive() |
paulb@119 | 799 | while t is not None: |
paulb@119 | 800 | args, kw = t |
paulb@119 | 801 | channel.send(self.callable(*args, **kw)) |
paulb@119 | 802 | t = channel.receive() |
paulb@119 | 803 | |
paulb@84 | 804 | # Utility functions. |
paulb@79 | 805 | |
paulb@40 | 806 | def create(): |
paulb@40 | 807 | |
paulb@40 | 808 | """ |
paulb@40 | 809 | Create a new process, returning a communications channel to both the |
paulb@40 | 810 | creating process and the created process. |
paulb@40 | 811 | """ |
paulb@40 | 812 | |
paulb@40 | 813 | parent, child = socket.socketpair() |
paulb@40 | 814 | for s in [parent, child]: |
paulb@40 | 815 | s.setblocking(1) |
paulb@40 | 816 | |
paulb@40 | 817 | pid = os.fork() |
paulb@40 | 818 | if pid == 0: |
paulb@40 | 819 | parent.close() |
paulb@73 | 820 | return Channel(pid, child.makefile("r", 0), child.makefile("w", 0)) |
paulb@40 | 821 | else: |
paulb@40 | 822 | child.close() |
paulb@73 | 823 | return Channel(pid, parent.makefile("r", 0), parent.makefile("w", 0)) |
paulb@40 | 824 | |
paulb@97 | 825 | def exit(channel): |
paulb@97 | 826 | |
paulb@97 | 827 | """ |
paulb@97 | 828 | Terminate a created process, closing the given 'channel'. |
paulb@97 | 829 | """ |
paulb@97 | 830 | |
paulb@97 | 831 | channel.close() |
paulb@97 | 832 | os._exit(0) |
paulb@97 | 833 | |
paulb@84 | 834 | def start(callable, *args, **kw): |
paulb@40 | 835 | |
paulb@40 | 836 | """ |
paulb@40 | 837 | Create a new process which shall start running in the given 'callable'. |
paulb@94 | 838 | Additional arguments to the 'callable' can be given as additional arguments |
paulb@94 | 839 | to this function. |
paulb@94 | 840 | |
paulb@94 | 841 | Return a communications channel to the creating process. For the created |
paulb@94 | 842 | process, supply a channel as the 'channel' parameter in the given 'callable' |
paulb@94 | 843 | so that it may send data back to the creating process. |
paulb@40 | 844 | """ |
paulb@40 | 845 | |
paulb@40 | 846 | channel = create() |
paulb@40 | 847 | if channel.pid == 0: |
paulb@40 | 848 | try: |
paulb@40 | 849 | try: |
paulb@84 | 850 | callable(channel, *args, **kw) |
paulb@40 | 851 | except: |
paulb@40 | 852 | exc_type, exc_value, exc_traceback = sys.exc_info() |
paulb@40 | 853 | channel.send(exc_value) |
paulb@40 | 854 | finally: |
paulb@99 | 855 | exit(channel) |
paulb@40 | 856 | else: |
paulb@40 | 857 | return channel |
paulb@40 | 858 | |
paulb@40 | 859 | def waitall(): |
paulb@40 | 860 | |
paulb@40 | 861 | "Wait for all created processes to terminate." |
paulb@40 | 862 | |
paulb@40 | 863 | try: |
paulb@40 | 864 | while 1: |
paulb@40 | 865 | os.wait() |
paulb@40 | 866 | except OSError: |
paulb@40 | 867 | pass |
paulb@40 | 868 | |
paulb@89 | 869 | def pmap(callable, sequence, limit=None): |
paulb@84 | 870 | |
paulb@89 | 871 | """ |
paulb@89 | 872 | A parallel version of the built-in map function with an optional process |
paulb@94 | 873 | 'limit'. The given 'callable' should not be parallel-aware (that is, have a |
paulb@94 | 874 | 'channel' parameter) since it will be wrapped for parallel communications |
paulb@94 | 875 | before being invoked. |
paulb@94 | 876 | |
paulb@94 | 877 | Return the processed 'sequence' where each element in the sequence is |
paulb@94 | 878 | processed by a different process. |
paulb@89 | 879 | """ |
paulb@84 | 880 | |
paulb@89 | 881 | mymap = Map(limit=limit) |
paulb@84 | 882 | return mymap(callable, sequence) |
paulb@84 | 883 | |
paulb@40 | 884 | # vim: tabstop=4 expandtab shiftwidth=4 |