paulb@40 | 1 | #!/usr/bin/env python |
paulb@40 | 2 | |
paulb@40 | 3 | """ |
paulb@40 | 4 | A simple parallel processing API for Python, inspired somewhat by the thread |
paulb@40 | 5 | module, slightly less by pypar, and slightly less still by pypvm. |
paulb@40 | 6 | |
paulb@41 | 7 | Copyright (C) 2005 Paul Boddie <paul@boddie.org.uk> |
paulb@41 | 8 | |
paulb@41 | 9 | This software is free software; you can redistribute it and/or |
paulb@41 | 10 | modify it under the terms of the GNU General Public License as |
paulb@41 | 11 | published by the Free Software Foundation; either version 2 of |
paulb@41 | 12 | the License, or (at your option) any later version. |
paulb@41 | 13 | |
paulb@41 | 14 | This software is distributed in the hope that it will be useful, |
paulb@41 | 15 | but WITHOUT ANY WARRANTY; without even the implied warranty of |
paulb@41 | 16 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
paulb@41 | 17 | GNU General Public License for more details. |
paulb@41 | 18 | |
paulb@41 | 19 | You should have received a copy of the GNU General Public |
paulb@41 | 20 | License along with this library; see the file LICENCE.txt |
paulb@41 | 21 | If not, write to the Free Software Foundation, Inc., |
paulb@41 | 22 | 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. |
paulb@41 | 23 | |
paulb@41 | 24 | -------- |
paulb@41 | 25 | |
paulb@40 | 26 | Thread-style Processing |
paulb@40 | 27 | ----------------------- |
paulb@40 | 28 | |
paulb@40 | 29 | To create new processes to run a function or any callable object, specify the |
paulb@40 | 30 | "callable" and any arguments as follows: |
paulb@40 | 31 | |
paulb@40 | 32 | channel = start(fn, arg1, arg2, named1=value1, named2=value2) |
paulb@40 | 33 | |
paulb@40 | 34 | This returns a channel which can then be used to communicate with the created |
paulb@40 | 35 | process. Meanwhile, in the created process, the given callable will be invoked |
paulb@40 | 36 | with another channel as its first argument followed by the specified arguments: |
paulb@40 | 37 | |
paulb@40 | 38 | def fn(channel, arg1, arg2, named1, named2): |
paulb@40 | 39 | # Read from and write to the channel. |
paulb@40 | 40 | # Return value is ignored. |
paulb@40 | 41 | ... |
paulb@40 | 42 | |
paulb@40 | 43 | Fork-style Processing |
paulb@40 | 44 | --------------------- |
paulb@40 | 45 | |
paulb@40 | 46 | To create new processes in a similar way to that employed when using os.fork |
paulb@40 | 47 | (ie. the fork system call on various operating systems), use the following |
paulb@40 | 48 | method: |
paulb@40 | 49 | |
paulb@40 | 50 | channel = create() |
paulb@40 | 51 | if channel.pid == 0: |
paulb@40 | 52 | # This code is run by the created process. |
paulb@40 | 53 | # Read from and write to the channel to communicate with the |
paulb@40 | 54 | # creating/calling process. |
paulb@40 | 55 | # An explicit exit of the process may be desirable to prevent the process |
paulb@40 | 56 | # from running code which is intended for the creating/calling process. |
paulb@40 | 57 | ... |
paulb@40 | 58 | else: |
paulb@40 | 59 | # This code is run by the creating/calling process. |
paulb@40 | 60 | # Read from and write to the channel to communicate with the created |
paulb@40 | 61 | # process. |
paulb@40 | 62 | ... |
paulb@40 | 63 | |
paulb@40 | 64 | Message Exchanges |
paulb@40 | 65 | ----------------- |
paulb@40 | 66 | |
paulb@40 | 67 | When creating many processes, each providing results for the consumption of the |
paulb@40 | 68 | main process, the collection of those results in an efficient fashion can be |
paulb@40 | 69 | problematic: if some processes take longer than others, and if we decide to read |
paulb@40 | 70 | from those processes when they are not ready instead of other processes which |
paulb@40 | 71 | are ready, the whole activity will take much longer than necessary. |
paulb@40 | 72 | |
paulb@40 | 73 | One solution to the problem of knowing when to read from channels is to create |
paulb@40 | 74 | an Exchange object, optionally initialising it with a list of channels through |
paulb@40 | 75 | which data is expected to arrive: |
paulb@40 | 76 | |
paulb@40 | 77 | exchange = Exchange() # populate the exchange later |
paulb@40 | 78 | exchange = Exchange(channels) # populate the exchange with channels |
paulb@40 | 79 | |
paulb@40 | 80 | We can add channels to the exchange using the add method: |
paulb@40 | 81 | |
paulb@40 | 82 | exchange.add(channel) |
paulb@40 | 83 | |
paulb@40 | 84 | To test whether an exchange is active - that is, whether it is actually |
paulb@40 | 85 | monitoring any channels - we can use the active method which returns all |
paulb@40 | 86 | channels being monitored by the exchange: |
paulb@40 | 87 | |
paulb@40 | 88 | channels = exchange.active() |
paulb@40 | 89 | |
paulb@40 | 90 | We may then check the exchange to see whether any data is ready to be received; |
paulb@40 | 91 | for example: |
paulb@40 | 92 | |
paulb@40 | 93 | for channel in exchange.ready(): |
paulb@40 | 94 | # Read from and write to the channel. |
paulb@40 | 95 | ... |
paulb@40 | 96 | |
paulb@40 | 97 | If we do not wish to wait indefinitely for a list of channels, we can set a |
paulb@40 | 98 | timeout value as an argument to the ready method (as a floating point number |
paulb@40 | 99 | specifying the timeout in seconds, where 0 means a non-blocking poll as stated |
paulb@40 | 100 | in the select module's select function documentation). |
paulb@40 | 101 | |
paulb@40 | 102 | Signals and Waiting |
paulb@40 | 103 | ------------------- |
paulb@40 | 104 | |
paulb@40 | 105 | When created/child processes terminate, one would typically want to be informed |
paulb@40 | 106 | of such conditions using a signal handler. Unfortunately, Python seems to have |
paulb@40 | 107 | issues with restartable reads from file descriptors when interrupted by signals: |
paulb@40 | 108 | |
paulb@40 | 109 | http://mail.python.org/pipermail/python-dev/2002-September/028572.html |
paulb@40 | 110 | http://twistedmatrix.com/bugs/issue733 |
paulb@40 | 111 | |
paulb@40 | 112 | Select and Poll |
paulb@40 | 113 | --------------- |
paulb@40 | 114 | |
paulb@40 | 115 | The exact combination of conditions indicating closed pipes remains relatively |
paulb@40 | 116 | obscure. Here is a message/thread describing them (in the context of another |
paulb@40 | 117 | topic): |
paulb@40 | 118 | |
paulb@40 | 119 | http://twistedmatrix.com/pipermail/twisted-python/2005-February/009666.html |
paulb@40 | 120 | |
paulb@40 | 121 | It would seem, from using sockets and from studying the asycore module, that |
paulb@40 | 122 | sockets are more predictable than pipes. |
paulb@40 | 123 | """ |
paulb@40 | 124 | |
paulb@40 | 125 | __version__ = "0.2" |
paulb@40 | 126 | |
paulb@40 | 127 | import os |
paulb@40 | 128 | import sys |
paulb@40 | 129 | import select |
paulb@40 | 130 | import socket |
paulb@40 | 131 | |
paulb@40 | 132 | try: |
paulb@40 | 133 | import cPickle as pickle |
paulb@40 | 134 | except ImportError: |
paulb@40 | 135 | import pickle |
paulb@40 | 136 | |
paulb@40 | 137 | class AcknowledgementError(Exception): |
paulb@40 | 138 | pass |
paulb@40 | 139 | |
paulb@40 | 140 | class Channel: |
paulb@40 | 141 | |
paulb@40 | 142 | "A communications channel." |
paulb@40 | 143 | |
paulb@40 | 144 | def __init__(self, pid, read_pipe, write_pipe): |
paulb@40 | 145 | |
paulb@40 | 146 | """ |
paulb@40 | 147 | Initialise the channel with a process identifier 'pid', a 'read_pipe' |
paulb@40 | 148 | from which messages will be received, and a 'write_pipe' into which |
paulb@40 | 149 | messages will be sent. |
paulb@40 | 150 | """ |
paulb@40 | 151 | |
paulb@40 | 152 | self.pid = pid |
paulb@40 | 153 | self.read_pipe = read_pipe |
paulb@40 | 154 | self.write_pipe = write_pipe |
paulb@40 | 155 | self.closed = 0 |
paulb@40 | 156 | |
paulb@40 | 157 | def __del__(self): |
paulb@40 | 158 | |
paulb@40 | 159 | # Since signals don't work well with I/O, we close pipes and wait for |
paulb@40 | 160 | # created processes upon finalisation. |
paulb@40 | 161 | |
paulb@40 | 162 | self.close() |
paulb@40 | 163 | |
paulb@40 | 164 | def close(self): |
paulb@40 | 165 | |
paulb@40 | 166 | "Explicitly close the channel." |
paulb@40 | 167 | |
paulb@40 | 168 | if not self.closed: |
paulb@40 | 169 | self.closed = 1 |
paulb@40 | 170 | self.read_pipe.close() |
paulb@40 | 171 | self.write_pipe.close() |
paulb@40 | 172 | #self.wait(os.WNOHANG) |
paulb@40 | 173 | |
paulb@40 | 174 | def wait(self, options=0): |
paulb@40 | 175 | |
paulb@40 | 176 | "Wait for the created process, if any, to exit." |
paulb@40 | 177 | |
paulb@40 | 178 | if self.pid != 0: |
paulb@40 | 179 | try: |
paulb@40 | 180 | os.waitpid(self.pid, options) |
paulb@40 | 181 | except OSError: |
paulb@40 | 182 | pass |
paulb@40 | 183 | |
paulb@40 | 184 | def _send(self, obj): |
paulb@40 | 185 | |
paulb@40 | 186 | "Send the given object 'obj' through the channel." |
paulb@40 | 187 | |
paulb@40 | 188 | pickle.dump(obj, self.write_pipe) |
paulb@40 | 189 | self.write_pipe.flush() |
paulb@40 | 190 | |
paulb@40 | 191 | def send(self, obj): |
paulb@40 | 192 | |
paulb@40 | 193 | """ |
paulb@40 | 194 | Send the given object 'obj' through the channel. Then wait for an |
paulb@40 | 195 | acknowledgement. (The acknowledgement makes the caller wait, thus |
paulb@40 | 196 | preventing processes from exiting and disrupting the communications |
paulb@40 | 197 | channel and losing data.) |
paulb@40 | 198 | """ |
paulb@40 | 199 | |
paulb@40 | 200 | self._send(obj) |
paulb@40 | 201 | if self._receive() != "OK": |
paulb@40 | 202 | raise AcknowledgementError, obj |
paulb@40 | 203 | |
paulb@40 | 204 | def _receive(self): |
paulb@40 | 205 | |
paulb@40 | 206 | "Receive an object through the channel, returning the object." |
paulb@40 | 207 | |
paulb@40 | 208 | obj = pickle.load(self.read_pipe) |
paulb@40 | 209 | if isinstance(obj, Exception): |
paulb@40 | 210 | raise obj |
paulb@40 | 211 | else: |
paulb@40 | 212 | return obj |
paulb@40 | 213 | |
paulb@40 | 214 | def receive(self): |
paulb@40 | 215 | |
paulb@40 | 216 | """ |
paulb@40 | 217 | Receive an object through the channel, returning the object. Send an |
paulb@40 | 218 | acknowledgement of receipt. (The acknowledgement makes the sender wait, |
paulb@40 | 219 | thus preventing processes from exiting and disrupting the communications |
paulb@40 | 220 | channel and losing data.) |
paulb@40 | 221 | """ |
paulb@40 | 222 | |
paulb@40 | 223 | try: |
paulb@40 | 224 | obj = self._receive() |
paulb@40 | 225 | return obj |
paulb@40 | 226 | finally: |
paulb@40 | 227 | self._send("OK") |
paulb@40 | 228 | |
paulb@40 | 229 | class Exchange: |
paulb@40 | 230 | |
paulb@40 | 231 | """ |
paulb@40 | 232 | A communications exchange that can be used to detect channels which are |
paulb@40 | 233 | ready to communicate. |
paulb@40 | 234 | """ |
paulb@40 | 235 | |
paulb@40 | 236 | def __init__(self, channels=None, autoclose=1): |
paulb@40 | 237 | |
paulb@40 | 238 | """ |
paulb@40 | 239 | Initialise the exchange with an optional list of 'channels'. If the |
paulb@40 | 240 | optional 'autoclose' parameter is set to a false value, channels will |
paulb@40 | 241 | not be closed automatically when they are removed from the exchange - by |
paulb@40 | 242 | default they are closed when removed. |
paulb@40 | 243 | """ |
paulb@40 | 244 | |
paulb@40 | 245 | self.autoclose = autoclose |
paulb@40 | 246 | self.readables = {} |
paulb@40 | 247 | self.poller = select.poll() |
paulb@40 | 248 | for channel in channels or []: |
paulb@40 | 249 | self.add(channel) |
paulb@40 | 250 | |
paulb@40 | 251 | def add(self, channel): |
paulb@40 | 252 | |
paulb@40 | 253 | "Add the given 'channel' to the exchange." |
paulb@40 | 254 | |
paulb@40 | 255 | self.readables[channel.read_pipe.fileno()] = channel |
paulb@40 | 256 | self.poller.register(channel.read_pipe.fileno(), select.POLLIN | select.POLLHUP | select.POLLNVAL | select.POLLERR) |
paulb@40 | 257 | |
paulb@40 | 258 | def active(self): |
paulb@40 | 259 | |
paulb@40 | 260 | "Return a list of active channels." |
paulb@40 | 261 | |
paulb@40 | 262 | return self.readables.values() |
paulb@40 | 263 | |
paulb@40 | 264 | def ready(self, timeout=None): |
paulb@40 | 265 | |
paulb@40 | 266 | """ |
paulb@40 | 267 | Wait for a period of time specified by the optional 'timeout' (or until |
paulb@40 | 268 | communication is possible) and return a list of channels which are ready |
paulb@40 | 269 | to be read from. |
paulb@40 | 270 | """ |
paulb@40 | 271 | |
paulb@40 | 272 | fds = self.poller.poll(timeout) |
paulb@40 | 273 | readables = [] |
paulb@40 | 274 | for fd, status in fds: |
paulb@40 | 275 | channel = self.readables[fd] |
paulb@40 | 276 | |
paulb@40 | 277 | # Remove ended/error channels. |
paulb@40 | 278 | |
paulb@40 | 279 | if status & (select.POLLHUP | select.POLLNVAL | select.POLLERR): |
paulb@40 | 280 | self.remove(channel) |
paulb@40 | 281 | |
paulb@40 | 282 | # Record readable channels. |
paulb@40 | 283 | |
paulb@40 | 284 | elif status & select.POLLIN: |
paulb@40 | 285 | readables.append(channel) |
paulb@40 | 286 | |
paulb@40 | 287 | return readables |
paulb@40 | 288 | |
paulb@40 | 289 | def remove(self, channel): |
paulb@40 | 290 | |
paulb@40 | 291 | """ |
paulb@40 | 292 | Remove the given 'channel' from the exchange. |
paulb@40 | 293 | """ |
paulb@40 | 294 | |
paulb@40 | 295 | del self.readables[channel.read_pipe.fileno()] |
paulb@40 | 296 | self.poller.unregister(channel.read_pipe.fileno()) |
paulb@40 | 297 | if self.autoclose: |
paulb@40 | 298 | channel.close() |
paulb@40 | 299 | channel.wait() |
paulb@40 | 300 | |
paulb@40 | 301 | def create(): |
paulb@40 | 302 | |
paulb@40 | 303 | """ |
paulb@40 | 304 | Create a new process, returning a communications channel to both the |
paulb@40 | 305 | creating process and the created process. |
paulb@40 | 306 | """ |
paulb@40 | 307 | |
paulb@40 | 308 | parent, child = socket.socketpair() |
paulb@40 | 309 | for s in [parent, child]: |
paulb@40 | 310 | s.setblocking(1) |
paulb@40 | 311 | |
paulb@40 | 312 | pid = os.fork() |
paulb@40 | 313 | if pid == 0: |
paulb@40 | 314 | parent.close() |
paulb@40 | 315 | return Channel(pid, child.makefile("r"), child.makefile("w")) |
paulb@40 | 316 | else: |
paulb@40 | 317 | child.close() |
paulb@40 | 318 | return Channel(pid, parent.makefile("r"), parent.makefile("w")) |
paulb@40 | 319 | |
paulb@40 | 320 | def start(callable, *args, **kwargs): |
paulb@40 | 321 | |
paulb@40 | 322 | """ |
paulb@40 | 323 | Create a new process which shall start running in the given 'callable'. |
paulb@40 | 324 | Return a communications channel to the creating process, and supply such a |
paulb@40 | 325 | channel to the created process as the 'channel' parameter in the given |
paulb@40 | 326 | 'callable'. Additional arguments to the 'callable' can be given as |
paulb@40 | 327 | additional arguments to this function. |
paulb@40 | 328 | """ |
paulb@40 | 329 | |
paulb@40 | 330 | channel = create() |
paulb@40 | 331 | if channel.pid == 0: |
paulb@40 | 332 | try: |
paulb@40 | 333 | try: |
paulb@40 | 334 | callable(channel, *args, **kwargs) |
paulb@40 | 335 | except: |
paulb@40 | 336 | exc_type, exc_value, exc_traceback = sys.exc_info() |
paulb@40 | 337 | channel.send(exc_value) |
paulb@40 | 338 | finally: |
paulb@40 | 339 | channel.close() |
paulb@40 | 340 | sys.exit(0) |
paulb@40 | 341 | else: |
paulb@40 | 342 | return channel |
paulb@40 | 343 | |
paulb@40 | 344 | def waitall(): |
paulb@40 | 345 | |
paulb@40 | 346 | "Wait for all created processes to terminate." |
paulb@40 | 347 | |
paulb@40 | 348 | try: |
paulb@40 | 349 | while 1: |
paulb@40 | 350 | os.wait() |
paulb@40 | 351 | except OSError: |
paulb@40 | 352 | pass |
paulb@40 | 353 | |
paulb@40 | 354 | # vim: tabstop=4 expandtab shiftwidth=4 |