pprocess (annotate docs/reference.html in 1d66900174c8)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

paulb@129

2

<html xmlns="http://www.w3.org/1999/xhtml" lang="en-gb">

paulb@129

3

<head>

paulb@129

4

  <meta http-equiv="content-type" content="text/html; charset=UTF-8" />

paulb@129

5

  <title>pprocess - Reference</title>

paulb@129

6

  <link href="styles.css" rel="stylesheet" type="text/css" />

paulb@129

7

</head>

paulb@129

8

<body>

<h1>pprocess - Reference</h1>

<p>The <code>pprocess</code> module defines a simple parallel processing API

paulb@129

13

for Python, inspired somewhat by the <code>thread</code> module, slightly less

paulb@129

14

by <a href="http://datamining.anu.edu.au/~ole/pypar/">pypar</a>, and slightly

paulb@129

15

less still by <a href="http://pypvm.sourceforge.net/">pypvm</a>.</p>

<p>This document complements the <a href="tutorial.html">tutorial</a> by

paulb@129

18

providing an overview of the different styles of parallel programming supported

paulb@129

19

by the module. For an introduction and in order to get a clearer idea of the

paulb@129

20

most suitable styles for your own programs, consult the

paulb@129

21

<a href="tutorial.html">tutorial</a>.</p>

<ul>

paulb@129

24

<li><a href="#Thread">Thread-style Processing</a></li>

paulb@129

25

<li><a href="#Fork">Fork-style Processing</a></li>

paulb@129

26

<li><a href="#Exchanges">Message Exchanges</a></li>

paulb@129

27

<li><a href="#Convenient">Convenient Message Exchanges</a></li>

paulb@129

28

<li><a href="#Queues">Exchanges as Queues</a></li>

paulb@129

29

<li><a href="#Maps">Exchanges as Maps</a></li>

paulb@129

30

<li><a href="#Managed">Managed Callables</a></li>

paulb@129

31

<li><a href="#MakeParallel">Making Existing Functions Parallel</a></li>

paulb@129

32

<li><a href="#Map">Map-style Processing</a></li>

paulb@129

33

<li><a href="#Reusing">Reusing Processes and Channels</a></li>

paulb@129

34

<li><a href="#MakeReusable">Making Existing Functions Parallel and Reusable</a></li>

paul@159

35

<li><a href="#Continuous">Continuous Processes and Channels</a></li>

paulb@145

36

<li><a href="#BackgroundCallable">Background Processes and Callables</a></li>

paulb@145

37

<li><a href="#PersistentQueue">Background and Persistent Queues</a></li>

paulb@129

38

<li><a href="#Implementation">Implementation Notes</a></li>

paulb@129

39

</ul>

<h2 id="Thread">Thread-style Processing</h2>

<p>To create new processes to run a function or any callable object, specify the

paulb@129

44

"callable" and any arguments as follows:</p>

<pre>

paulb@129

47

channel = pprocess.start(fn, arg1, arg2, named1=value1, named2=value2)

paulb@129

48

</pre>

<p>This returns a channel which can then be used to communicate with the created

paulb@129

51

process. Meanwhile, in the created process, the given callable will be invoked

paulb@129

52

with another channel as its first argument followed by the specified arguments:</p>

<pre>

paulb@129

55

def fn(channel, arg1, arg2, named1, named2):

paulb@129

56

    # Read from and write to the channel.

paulb@129

57

    # Return value is ignored.

paulb@129

58

...

paulb@129

59

</pre>

<h2 id="Fork">Fork-style Processing</h2>

<p>To create new processes in a similar way to that employed when using <code>os.fork</code>

paulb@129

64

(ie. the <code>fork</code> system call on various operating systems), use the following

paulb@129

65

method:</p>

<pre>

paulb@129

68

channel = pprocess.create()

paulb@129

69

if channel.pid == 0:

paulb@129

70

    # This code is run by the created process.

paulb@129

71

    # Read from and write to the channel to communicate with the

paulb@129

72

    # creating/calling process.

paulb@129

73

    # An explicit exit of the process may be desirable to prevent the process

paulb@129

74

    # from running code which is intended for the creating/calling process.

paulb@129

75

...

paulb@129

76

    pprocess.exit(channel)

paulb@129

77

else:

paulb@129

78

    # This code is run by the creating/calling process.

paulb@129

79

    # Read from and write to the channel to communicate with the created

paulb@129

80

    # process.

paulb@129

81

...

paulb@129

82

</pre>

<h2 id="Exchanges">Message Exchanges</h2>

<p>When creating many processes, each providing results for the consumption of the

paulb@129

87

main process, the collection of those results in an efficient fashion can be

paulb@129

88

problematic: if some processes take longer than others, and if we decide to read

paulb@129

89

from those processes when they are not ready instead of other processes which

paulb@129

90

are ready, the whole activity will take much longer than necessary.</p>

<p>One solution to the problem of knowing when to read from channels is to create

paulb@129

93

an <code>Exchange</code> object, optionally initialising it with a list of channels

paulb@129

94

through which data is expected to arrive:</p>

<pre>

paulb@129

97

exchange = pprocess.Exchange()           # populate the exchange later

paulb@129

98

exchange = pprocess.Exchange(channels)   # populate the exchange with channels

paulb@129

99

</pre>

<p>We can add channels to the exchange using the <code>add</code> method:</p>

<pre>

paulb@129

104

exchange.add(channel)

paulb@129

105

</pre>

<p>To test whether an exchange is active - that is, whether it is actually

paulb@129

108

monitoring any channels - we can use the <code>active</code> method which

paulb@129

109

returns all channels being monitored by the exchange:</p>

<pre>

paulb@129

112

channels = exchange.active()

paulb@129

113

</pre>

<p>We may then check the exchange to see whether any data is ready to be received;

paulb@129

116

for example:</p>

<pre>

paulb@129

119

for channel in exchange.ready():

paulb@129

120

    # Read from and write to the channel.

paulb@129

121

...

paulb@129

122

</pre>

<p>If we do not wish to wait indefinitely for a list of channels, we can set a

paulb@129

125

timeout value as an argument to the <code>ready</code> method (as a floating

paulb@129

126

point number specifying the timeout in seconds, where <code>0</code> means a

paulb@129

127

non-blocking poll as stated in the <code>select</code> module's <code>select</code>

paulb@129

128

function documentation).</p>

<h2 id="Convenient">Convenient Message Exchanges</h2>

<p>A convenient form of message exchanges can be adopted by defining a subclass of

paulb@129

133

the <code>Exchange</code> class and defining a particular method:</p>

<pre>

paulb@129

136

class MyExchange(pprocess.Exchange):

paulb@129

137

    def store_data(self, channel):

paulb@129

138

        data = channel.receive()

paulb@129

139

        # Do something with data here.

paulb@129

140

</pre>

<p>The exact operations performed on the received data might be as simple as

paulb@129

143

storing it on an instance attribute. To make use of the exchange, we would

paulb@129

144

instantiate it as usual:</p>

<pre>

paulb@129

147

exchange = MyExchange()         # populate the exchange later

paulb@129

148

exchange = MyExchange(limit=10) # set a limit for later population

paulb@129

149

</pre>

<p>The exchange can now be used in a simpler fashion than that shown above. We can

paulb@129

152

add channels as before using the <code>add</code> method, or we can choose to only

paulb@129

153

add channels if the specified limit of channels is not exceeded:</p>

<pre>

paulb@129

156

exchange.add(channel)           # add a channel as normal

paulb@129

157

exchange.add_wait(channel)      # add a channel, waiting if the limit would be

paulb@129

158

                                # exceeded

paulb@129

159

</pre>

<p>Or we can request that the exchange create a channel on our behalf:</p>

<pre>

paulb@129

164

channel = exchange.create()

paulb@129

165

</pre>

<p>We can even start processes and monitor channels without ever handling the

paulb@129

168

channel ourselves:</p>

<pre>

paulb@129

171

exchange.start(fn, arg1, arg2, named1=value1, named2=value2)

paulb@129

172

</pre>

<p>We can explicitly wait for "free space" for channels by calling the

paulb@129

175

<code>wait</code> method, although the <code>start</code> and <code>add_wait</code>

paulb@129

176

methods make this less interesting:</p>

<pre>

paulb@129

179

exchange.wait()

paulb@129

180

</pre>

<p>Finally, when finishing the computation, we can choose to merely call the

paulb@129

183

<code>finish</code> method and have the remaining data processed automatically:</p>

<pre>

paulb@129

186

exchange.finish()

paulb@129

187

</pre>

<p>Clearly, this approach is less flexible but more convenient than the raw message

paulb@129

190

exchange API as described above. However, it permits much simpler and clearer

paulb@129

191

code.</p>

<h2 id="Queues">Exchanges as Queues</h2>

<p>Instead of having to subclass the <code>pprocess.Exchange</code> class and

paulb@129

196

to define the <code>store_data</code> method, it might be more desirable to let

paulb@129

197

the exchange manage the communications between created and creating processes

paulb@129

198

and to let the creating process just consume received data as it arrives,

paulb@129

199

without particular regard for the order of the received data - perhaps the

paulb@129

200

creating process has its own way of managing such issues.</p>

<p>For such situations, the <code>Queue</code> class may be instantiated and

paulb@129

203

channels added to the queue using the various methods provided:</p>

<pre>

paulb@129

206

queue = pprocess.Queue(limit=10)

paulb@129

207

channel = queue.create()

paulb@129

208

if channel:

paulb@129

209

    # Do some computation.

paulb@129

210

    pprocess.exit(channel)

paulb@129

211

</pre>

<p>The results can then be consumed by treating the queue like an iterator:</p>

<pre>

paulb@129

216

for result in queue:

paulb@129

217

    # Capture each result.

paulb@129

218

</pre>

<p>This approach does not, of course, require the direct handling of channels.

paulb@129

221

One could instead use the <code>start</code> method on the queue to create

paulb@129

222

processes and to initiate computations (since a queue is merely an enhanced

paulb@129

223

exchange with a specific implementation of the <code>store_data</code>

paulb@129

224

method).</p>

<h2 id="Maps">Exchanges as Maps</h2>

<p>Where the above <code>Queue</code> class appears like an attractive solution

paulb@129

229

for the management of the results of computations, but where the order of their

paulb@129

230

consumption by the creating process remains important, the <code>Map</code>

paulb@129

231

class may offer a suitable way of collecting and accessing results:</p>

<pre>

paulb@129

234

results = pprocess.Map(limit=10)

paulb@129

235

for value in inputs:

paulb@129

236

    results.start(fn, args)

paulb@129

237

</pre>

<p>The results can then be consumed in an order corresponding to the order of the

paulb@129

240

computations which produced them:</p>

<pre>

paulb@129

243

for result in results:

paulb@129

244

    # Process each result.

paulb@129

245

</pre>

<p>Internally, the <code>Map</code> object records a particular ordering of

paulb@129

248

channels, ensuring that the received results can be mapped to this ordering,

paulb@129

249

and that the results can be made available with this ordering preserved.</p>

<h2 id="Managed">Managed Callables</h2>

<p>A further simplification of the above convenient use of message exchanges

paulb@129

254

involves the creation of callables (eg. functions) which are automatically

paulb@129

255

monitored by an exchange. We create such a callable by calling the

paulb@129

256

<code>manage</code> method on an exchange:</p>

<pre>

paulb@129

259

myfn = exchange.manage(fn)

paulb@129

260

</pre>

<p>This callable can then be invoked instead of using the exchange's

paulb@129

263

<code>start</code> method:</p>

<pre>

paulb@129

266

myfn(arg1, arg2, named1=value1, named2=value2)

paulb@129

267

</pre>

<p>The exchange's <code>finish</code> method can be used as usual to process

paulb@129

270

incoming data.</p>

<h2 id="MakeParallel">Making Existing Functions Parallel</h2>

<p>In making a program parallel, existing functions which only return results can

paulb@129

275

be manually modified to accept and use channels to communicate results back to

paulb@129

276

the main process. However, a simple alternative is to use the <code>MakeParallel</code>

paulb@129

277

class to provide a wrapper around unmodified functions which will return the results

paulb@129

278

from those functions in the channels provided. For example:</p>

<pre>

paulb@129

281

fn = pprocess.MakeParallel(originalfn)

paulb@129

282

</pre>

<h2 id="Map">Map-style Processing</h2>

<p>In situations where a callable would normally be used in conjunction with the

paulb@129

287

Python built-in <code>map</code> function, an alternative solution can be adopted by using

paulb@129

288

the <code>pmap</code> function:</p>

<pre>

paulb@129

291

pprocess.pmap(fn, sequence)

paulb@129

292

</pre>

<p>Here, the sequence would have to contain elements that each contain the

paulb@129

295

required parameters of the specified callable, <code>fn</code>. Note that the

paulb@129

296

callable does not need to be a parallel-aware function which has a

paulb@129

297

<code>channel</code> argument: the <code>pmap</code> function automatically

paulb@129

298

wraps the given callable internally.</p>

<h2 id="Reusing">Reusing Processes and Channels</h2>

<p>So far, all parallel computations have been done with newly-created

paulb@129

303

processes. However, this can seem somewhat inefficient, especially if processes

paulb@129

304

are being continually created and destroyed (although if this happens too

paulb@129

305

often, the amount of work done by each process may be too little, anyway). One

paulb@129

306

solution is to retain processes after they have done their work and request

paulb@129

307

that they perform more work for each new parallel task or invocation. To enable

paulb@129

308

the reuse of processes in this way, a special keyword argument may be specified

paulb@129

309

when creating <code>Exchange</code> instances (and instances of subclasses such

paulb@129

310

as <code>Map</code> and <code>Queue</code>). For example:</p>

<pre>

paulb@129

313

exchange = MyExchange(limit=10, reuse=1) # reuse up to 10 processes

paulb@129

314

</pre>

<p>Code invoked through such exchanges must be aware of channels and be

paulb@129

317

constructed in such a way that it does not terminate after sending a result

paulb@129

318

back to the creating process. Instead, it should repeatedly wait for subsequent

paulb@129

319

sets of parameters (compatible with those either in the signature of a callable

paulb@129

320

or with the original values read from the channel). Reusable code is terminated

paulb@129

321

when the special value of <code>None</code> is sent from the creating process

paulb@129

322

to the created process, indicating that no more parameters will be sent; this

paulb@129

323

should cause the code to terminate.</p>

<h2 id="MakeReusable">Making Existing Functions Parallel and Reusable</h2>

<p>An easier way of making reusable code sections for parallel use is to employ the

paulb@129

328

<code>MakeReusable</code> class to wrap an existing callable:</p>

<pre>

paulb@129

331

fn = pprocess.MakeReusable(originalfn)

paulb@129

332

</pre>

<p>This wraps the callable in a similar fashion to <code>MakeParallel</code>, but

paulb@129

335

provides the necessary mechanisms described above for reusable code.</p>

<h2 id="Continuous">Continuous Processes and Channels</h2>

<p>Much of the usage of exchanges so far has concentrated on processes which

paul@158

340

are created, whose callables are invoked, and then, once those callables have

paul@158

341

returned, either they are invoked again in the same process (when reused) or

paul@158

342

in a new process (when not reused). However, the underlying mechanisms

paul@158

343

actually support processes whose callables not only receive input at the start

paul@158

344

of their execution and send output at the end of their execution, but may

paul@158

345

provide output on a continuous basis (similar to iterator or generator

paul@158

346

objects).</p>

<p>To enable support for continuous communications between processes, a

paul@158

349

keyword argument must be specified when creating an <code>Exchange</code>

paul@158

350

instance (or an instance of a subclass of <code>Exchange</code> such as

paul@158

351

<code>Map</code> or <code>Queue</code>):</p>

<pre>

paul@158

354

exchange = MyExchange(limit=10, continuous=1) # support up to 10 processes

paul@158

355

</pre>

<p>Code invoked in this mode of communication must be aware of channels, since

paul@158

358

it will need to explicitly send data via a channel to the creating process,

paul@158

359

instead of terminating and sending data only once (as would be done

paul@158

360

automatically using convenience classes such as

paul@158

361

<code>MakeParallel</code>).</p>

<h2 id="BackgroundCallable">Background Processes and Callables</h2>

<p>So far, all parallel computations have involved created processes which

paulb@145

366

depend on the existence of the created process to collect results and to

paulb@145

367

communicate with these created processes, preventing the created process from

paulb@145

368

terminating, even if the created processes actually perform work and potentially

paulb@145

369

create output which need not concern the process which created them. In order to

paulb@145

370

separate creating and created processes, the concept of a background process

paulb@145

371

(also known as a daemon process) is introduced.</p>

<p>The <code>BackgroundCallable</code> class acts somewhat like the

paulb@145

374

<code>manage</code> method on exchange-based objects, although no exchange is

paulb@145

375

immediately involved, and instances of <code>BackgroundCallable</code> provide

paulb@145

376

wrappers around existing parallel-aware callables which then be invoked in order

paulb@145

377

to initiate a background computation in a created process. For example:</p>

<pre>

paulb@145

380

backgroundfn = pprocess.BackgroundCallable(address, fn)

paulb@145

381

</pre>

<p>This wraps the supplied callable (which can itself be the result of using

paulb@145

384

<code>MakeParallel</code>), with the resulting wrapper lending itself to

paulb@145

385

invocation like any other function. One distinguishing feature is that of the

paulb@145

386

<code>address</code>: in order to contact the background process after

paulb@145

387

invocation to (amongst other things) receive any result, a specific address

paulb@145

388

must be given to define the contact point between the created process and any

paulb@145

389

processes seeking to connect to it. Since these "persistent" communications

paulb@145

390

employ special files (specifically UNIX-domain sockets), the address must be a

paulb@145

391

suitable filename.</p>

<h2 id="PersistentQueue">Background and Persistent Queues</h2>

<p>Background processes employing persistent communications require adaptations

paulb@145

396

of the facilities described in the sections above. For a single background

paulb@145

397

process, the <code>BackgroundQueue</code> function is sufficient to create a

paulb@145

398

queue-like object which can monitor the communications channel between the

paulb@145

399

connecting process and a background process. For example:</p>

<pre>

paulb@145

402

queue = pprocess.BackgroundQueue(address)

paulb@145

403

</pre>

<p>This code will cause the process reachable via the given <code>address</code>

paulb@145

406

to be contacted and any results made available via the created queue-like

paulb@145

407

object.</p>

<p>Where many background processes have been created, a single

paulb@145

410

<code>PersistentQueue</code> object can monitor their communications by being

paulb@145

411

connected to them all, as in the following example:</p>

<pre>

paulb@145

414

queue = pprocess.PersistentQueue()

paulb@145

415

for address in addresses:

paulb@145

416

    queue.connect(address)

paulb@145

417

</pre>

<p>Here, the queue monitors all previously created processes whose addresses

paulb@145

420

reside in the <code>addresses</code> sequence. Upon iterating over the queue,

paulb@145

421

results will be taken from whichever process happens to have data available in

paulb@145

422

no particular pre-defined order.</p>

<h2 id="Implementation">Implementation Notes</h2>

<h3>Signals and Waiting</h3>

<p>When created/child processes terminate, one would typically want to be informed

paulb@129

429

of such conditions using a signal handler. Unfortunately, Python seems to have

paulb@129

430

issues with restartable reads from file descriptors when interrupted by signals:</p>

<ul>

paulb@129

433

<li><a href="http://mail.python.org/pipermail/python-dev/2002-September/028572.html">http://mail.python.org/pipermail/python-dev/2002-September/028572.html</a></li>

paulb@129

434

<li><a href="http://twistedmatrix.com/bugs/issue733">http://twistedmatrix.com/bugs/issue733</a></li>

paulb@129

435

</ul>

<h3>Select and Poll</h3>

<p>The exact combination of conditions indicating closed pipes remains relatively

paulb@129

440

obscure. Here is a message/thread describing them (in the context of another

paulb@129

441

topic):</p>

<ul>

paulb@129

444

<li><a href="http://twistedmatrix.com/pipermail/twisted-python/2005-February/009666.html">http://twistedmatrix.com/pipermail/twisted-python/2005-February/009666.html</a></li>

paulb@129

445

</ul>

<p>It would seem, from using sockets and from studying the <code>asyncore</code>

paulb@129

448

module, that sockets are more predictable than pipes.</p>

<p>Notes about <code>poll</code> implementations can be found here:</p>

<ul>

paulb@129

453

<li><a href="http://www.greenend.org.uk/rjk/2001/06/poll.html">http://www.greenend.org.uk/rjk/2001/06/poll.html</a></li>

paulb@129

454

</ul>

</body>

paulb@129

457

</html>

pprocess

Annotated docs/reference.html