Copyedit tasks tutorial

This commit is contained in:
Tim Chevalier 2012-10-09 16:14:55 -07:00
parent 28cf16a304
commit 4b3be853af

View file

@ -2,38 +2,37 @@
# Introduction # Introduction
The Rust language is designed from the ground up to support pervasive The designers of Rust designed the language from the ground up to support pervasive
and safe concurrency through lightweight, memory-isolated tasks and and safe concurrency through lightweight, memory-isolated tasks and
message passing. message passing.
Rust tasks are not the same as traditional threads - they are what are Rust tasks are not the same as traditional threads: rather, they are more like
often referred to as _green threads_, cooperatively scheduled by the _green threads_. The Rust runtime system schedules tasks cooperatively onto a
Rust runtime onto a small number of operating system threads. Being small number of operating system threads. Because tasks are significantly
significantly cheaper to create than traditional threads, Rust can cheaper to create than traditional threads, Rust can create hundreds of
create hundreds of thousands of concurrent tasks on a typical 32-bit thousands of concurrent tasks on a typical 32-bit system.
system.
Tasks provide failure isolation and recovery. When an exception occurs Tasks provide failure isolation and recovery. When an exception occurs in Rust
in rust code (either by calling `fail` explicitly or by otherwise performing code (as a result of an explicit call to `fail`, an assertion failure, or
an invalid operation) the entire task is destroyed - there is no way another invalid operation), the runtime system destroys the entire
to `catch` an exception as in other languages. Instead tasks may monitor task. Unlike in languages such as Java and C++, there is no way to `catch` an
each other to detect when failure has occurred. exception. Instead, tasks may monitor each other for failure.
Rust tasks have dynamically sized stacks. When a task is first created Rust tasks have dynamically sized stacks. A task begins its life with a small
it starts off with a small amount of stack (currently in the low amount of stack space (currently in the low thousands of bytes, depending on
thousands of bytes, depending on platform) and more stack is acquired as platform), and acquires more stack as needed. Unlike in languages such as C, a
needed. A Rust task will never run off the end of the stack as is Rust task cannot run off the end of the stack. However, tasks do have a stack
possible in many other languages, but they do have a stack budget, and budget. If a Rust task exceeds its stack budget, then it will fail safely:
if a Rust task exceeds its stack budget then it will fail safely. with a checked exception.
Tasks make use of Rust's type system to provide strong memory safety Tasks use Rust's type system to provide strong memory safety guarantees. In
guarantees, disallowing shared mutable state. Communication between particular, the type system guarantees that tasks cannot share mutable state
tasks is facilitated by the transfer of _owned_ data through the with each other. Tasks communicate with each other by transferring _owned_
global _exchange heap_. data through the global _exchange heap_.
This tutorial will explain the basics of tasks and communication in Rust, This tutorial explains the basics of tasks and communication in Rust,
explore some typical patterns in concurrent Rust code, and finally explores some typical patterns in concurrent Rust code, and finally
discuss some of the more exotic synchronization types in the standard discusses some of the more unusual synchronization types in the standard
library. library.
> ***Warning:*** This tutorial is incomplete > ***Warning:*** This tutorial is incomplete
@ -45,23 +44,23 @@ and efficient tasks, all of the task functionality itself is implemented
in the core and standard libraries, which are still under development in the core and standard libraries, which are still under development
and do not always present a consistent interface. and do not always present a consistent interface.
In particular, there are currently two independent modules that provide In particular, there are currently two independent modules that provide a
a message passing interface to Rust code: `core::comm` and `core::pipes`. message passing interface to Rust code: `core::comm` and `core::pipes`.
`core::comm` is an older, less efficient system that is being phased out `core::comm` is an older, less efficient system that is being phased out in
in favor of `pipes`. At some point the existing `core::comm` API will favor of `pipes`. At some point, we will remove the existing `core::comm` API
be removed and the user-facing portions of `core::pipes` will be moved and move the user-facing portions of `core::pipes` to `core::comm`. In this
to `core::comm`. In this tutorial we will discuss `pipes` and ignore tutorial, we discuss `pipes` and ignore the `comm` API.
the `comm` API.
For your reference, these are the standard modules involved in Rust For your reference, these are the standard modules involved in Rust
concurrency at the moment. concurrency at this writing.
* [`core::task`] - All code relating to tasks and task scheduling * [`core::task`] - All code relating to tasks and task scheduling
* [`core::comm`] - The deprecated message passing API * [`core::comm`] - The deprecated message passing API
* [`core::pipes`] - The new message passing infrastructure and API * [`core::pipes`] - The new message passing infrastructure and API
* [`std::comm`] - Higher level messaging types based on `core::pipes` * [`std::comm`] - Higher level messaging types based on `core::pipes`
* [`std::sync`] - More exotic synchronization tools, including locks * [`std::sync`] - More exotic synchronization tools, including locks
* [`std::arc`] - The ARC type, for safely sharing immutable data * [`std::arc`] - The ARC (atomic reference counted) type, for safely sharing
immutable data
* [`std::par`] - Some basic tools for implementing parallel algorithms * [`std::par`] - Some basic tools for implementing parallel algorithms
[`core::task`]: core/task.html [`core::task`]: core/task.html
@ -74,11 +73,11 @@ concurrency at the moment.
# Basics # Basics
The programming interface for creating and managing tasks is contained The programming interface for creating and managing tasks lives
in the `task` module of the `core` library, making it available to all in the `task` module of the `core` library, and is thus available to all
Rust code by default. At it's simplest, creating a task is a matter of Rust code by default. At its simplest, creating a task is a matter of
calling the `spawn` function, passing a closure to run in the new calling the `spawn` function with a closure argument. `spawn` executes the
task. closure in the new task.
~~~~ ~~~~
# use io::println; # use io::println;
@ -97,17 +96,17 @@ do spawn {
} }
~~~~ ~~~~
In Rust, there is nothing special about creating tasks - the language In Rust, there is nothing special about creating tasks: a task is not a
itself doesn't know what a 'task' is. Instead, Rust provides in the concept that appears in the language semantics. Instead, Rust's type system
type system all the tools necessary to implement safe concurrency, provides all the tools necessary to implement safe concurrency: particularly,
_owned types_ in particular, and leaves the dirty work up to the _owned types_. The language leaves the implementation details to the core
core library. library.
The `spawn` function has a very simple type signature: `fn spawn(f: The `spawn` function has a very simple type signature: `fn spawn(f:
~fn())`. Because it accepts only owned closures, and owned closures ~fn())`. Because it accepts only owned closures, and owned closures
contained only owned data, `spawn` can safely move the entire closure contain only owned data, `spawn` can safely move the entire closure
and all its associated state into an entirely different task for and all its associated state into an entirely different task for
execution. Like any closure, the function passed to spawn may capture execution. Like any closure, the function passed to `spawn` may capture
an environment that it carries across tasks. an environment that it carries across tasks.
~~~ ~~~
@ -123,8 +122,8 @@ do spawn {
} }
~~~ ~~~
By default tasks will be multiplexed across the available cores, running By default, the scheduler multiplexes tasks across the available cores, running
in parallel, thus on a multicore machine, running the following code in parallel. Thus, on a multicore machine, running the following code
should interleave the output in vaguely random order. should interleave the output in vaguely random order.
~~~ ~~~
@ -145,17 +144,16 @@ communicate with it. Recall that Rust does not have shared mutable
state, so one task may not manipulate variables owned by another task. state, so one task may not manipulate variables owned by another task.
Instead we use *pipes*. Instead we use *pipes*.
Pipes are simply a pair of endpoints, with one for sending messages A pipe is simply a pair of endpoints: one for sending messages and another for
and another for receiving messages. Pipes are low-level communication receiving messages. Pipes are low-level communication building-blocks and so
building-blocks and so come in a variety of forms, appropriate for come in a variety of forms, each one appropriate for a different use case. In
different use cases, but there are just a few varieties that are most what follows, we cover the most commonly used varieties.
commonly used, which we will cover presently.
The simplest way to create a pipe is to use the `pipes::stream` The simplest way to create a pipe is to use the `pipes::stream`
function to create a `(Chan, Port)` pair. In Rust parlance a 'channel' function to create a `(Chan, Port)` pair. In Rust parlance, a *channel*
is a sending endpoint of a pipe, and a 'port' is the receiving is a sending endpoint of a pipe, and a *port* is the receiving
endpoint. Consider the following example of performing two calculations endpoint. Consider the following example of calculating two results
concurrently. concurrently:
~~~~ ~~~~
use task::spawn; use task::spawn;
@ -174,17 +172,17 @@ let result = port.recv();
# fn some_other_expensive_computation() {} # fn some_other_expensive_computation() {}
~~~~ ~~~~
Let's examine this example in detail. The `let` statement first creates a Let's examine this example in detail. First, the `let` statement creates a
stream for sending and receiving integers (recall that `let` can be stream for sending and receiving integers (the left-hand side of the `let`,
used for destructuring patterns, in this case separating a tuple into `(chan, port)`, is an example of a *destructuring let*: the pattern separates
its component parts). a tuple into its component parts).
~~~~ ~~~~
# use pipes::{stream, Chan, Port}; # use pipes::{stream, Chan, Port};
let (chan, port): (Chan<int>, Port<int>) = stream(); let (chan, port): (Chan<int>, Port<int>) = stream();
~~~~ ~~~~
The channel will be used by the child task to send data to the parent task, The child task will use the channel to send data to the parent task,
which will wait to receive the data on the port. The next statement which will wait to receive the data on the port. The next statement
spawns the child task. spawns the child task.
@ -200,14 +198,14 @@ do spawn {
} }
~~~~ ~~~~
Notice that `chan` was transferred to the child task implicitly by Notice that the creation of the task closure transfers `chan` to the child
capturing it in the task closure. Both `Chan` and `Port` are sendable task implicitly: the closure captures `chan` in its environment. Both `Chan`
types and may be captured into tasks or otherwise transferred between and `Port` are sendable types and may be captured into tasks or otherwise
them. In the example, the child task performs an expensive computation transferred between them. In the example, the child task runs an expensive
then sends the result over the captured channel. computation, then sends the result over the captured channel.
Finally, the parent continues by performing some other expensive Finally, the parent continues with some other expensive
computation and then waiting for the child's result to arrive on the computation, then waits for the child's result to arrive on the
port: port:
~~~~ ~~~~
@ -219,12 +217,11 @@ some_other_expensive_computation();
let result = port.recv(); let result = port.recv();
~~~~ ~~~~
The `Port` and `Chan` pair created by `stream` enable efficient The `Port` and `Chan` pair created by `stream` enables efficient communication
communication between a single sender and a single receiver, but between a single sender and a single receiver, but multiple senders cannot use
multiple senders cannot use a single `Chan`, nor can multiple a single `Chan`, and multiple receivers cannot use a single `Port`. What if our
receivers use a single `Port`. What if our example needed to perform example needed to computer multiple results across a number of tasks? The
multiple computations across a number of tasks? The following cannot following program is ill-typed:
be written:
~~~ {.xfail-test} ~~~ {.xfail-test}
# use task::{spawn}; # use task::{spawn};
@ -265,18 +262,18 @@ let result = port.recv() + port.recv() + port.recv();
# fn some_expensive_computation(_i: uint) -> int { 42 } # fn some_expensive_computation(_i: uint) -> int { 42 }
~~~ ~~~
Here we transfer ownership of the channel into a new `SharedChan` Here we transfer ownership of the channel into a new `SharedChan` value. Like
value. Like `Chan`, `SharedChan` is a non-copyable, owned type `Chan`, `SharedChan` is a non-copyable, owned type (sometimes also referred to
(sometimes also referred to as an 'affine' or 'linear' type). Unlike as an *affine* or *linear* type). Unlike with `Chan`, though, the programmer
`Chan` though, `SharedChan` may be duplicated with the `clone()` may duplicate a `SharedChan`, with the `clone()` method. A cloned
method. A cloned `SharedChan` produces a new handle to the same `SharedChan` produces a new handle to the same channel, allowing multiple
channel, allowing multiple tasks to send data to a single port. tasks to send data to a single port. Between `spawn`, `stream` and
Between `spawn`, `stream` and `SharedChan` we have enough tools `SharedChan`, we have enough tools to implement many useful concurrency
to implement many useful concurrency patterns. patterns.
Note that the above `SharedChan` example is somewhat contrived since Note that the above `SharedChan` example is somewhat contrived since
you could also simply use three `stream` pairs, but it serves to you could also simply use three `stream` pairs, but it serves to
illustrate the point. For reference, written with multiple streams it illustrate the point. For reference, written with multiple streams, it
might look like the example below. might look like the example below.
~~~ ~~~
@ -299,15 +296,17 @@ let result = ports.foldl(0, |accum, port| *accum + port.recv() );
# Handling task failure # Handling task failure
Rust has a built-in mechanism for raising exceptions, written `fail` Rust has a built-in mechanism for raising exceptions. The `fail` construct
(or `fail ~"reason"`, or sometimes `assert expr`), and it causes the (which can also be written with an error string as an argument: `fail
task to unwind its stack, running destructors and freeing memory along ~reason`) and the `assert` construct (which effectively calls `fail` if a
the way, and then exit itself. Unlike C++, exceptions in Rust are boolean expression is false) are both ways to raise exceptions. When a task
unrecoverable within a single task - once a task fails there is no way raises an exception the task unwinds its stack---running destructors and
to "catch" the exception. freeing memory along the way---and then exits. Unlike exceptions in C++,
exceptions in Rust are unrecoverable within a single task: once a task fails,
there is no way to "catch" the exception.
All tasks are, by default, _linked_ to each other, meaning their fate All tasks are, by default, _linked_ to each other. That means that the fates
is intertwined, and if one fails so do all of them. of all tasks are intertwined: if one fails, so do all the others.
~~~ ~~~
# use task::spawn; # use task::spawn;
@ -321,11 +320,15 @@ do_some_work();
# }; # };
~~~ ~~~
While it isn't possible for a task to recover from failure, While it isn't possible for a task to recover from failure, tasks may notify
tasks may be notified when _other_ tasks fail. The simplest way each other of failure. The simplest way of handling task failure is with the
of handling task failure is with the `try` function, which is `try` function, which is similar to `spawn`, but immediately blocks waiting
similar to spawn, but immediately blocks waiting for the child for the child task to finish. `try` returns a value of type `Result<int,
task to finish. ()>`. `Result` is an `enum` type with two variants: `Ok` and `Err`. In this
case, because the type arguments to `Result` are `int` and `()`, callers can
pattern-match on a result to check whether it's an `Ok` result with an `int`
field (representing a successful result) or an `Err` result (representing
termination with an error).
~~~ ~~~
# fn some_condition() -> bool { false } # fn some_condition() -> bool { false }
@ -349,8 +352,8 @@ an `Error` result.
[`Result`]: core/result.html [`Result`]: core/result.html
> ***Note:*** A failed task does not currently produce a useful error > ***Note:*** A failed task does not currently produce a useful error
> value (all error results from `try` are equal to `Err(())`). In the > value (`try` always returns `Err(())`). In the
> future it may be possible for tasks to intercept the value passed to > future, it may be possible for tasks to intercept the value passed to
> `fail`. > `fail`.
TODO: Need discussion of `future_result` in order to make failure TODO: Need discussion of `future_result` in order to make failure
@ -362,11 +365,11 @@ it trips, indicates an unrecoverable logic error); in other cases you
might want to contain the failure at a certain boundary (perhaps a might want to contain the failure at a certain boundary (perhaps a
small piece of input from the outside world, which you happen to be small piece of input from the outside world, which you happen to be
processing in parallel, is malformed and its processing task can't processing in parallel, is malformed and its processing task can't
proceed). Hence the need for different _linked failure modes_. proceed). Hence, you will need different _linked failure modes_.
## Failure modes ## Failure modes
By default, task failure is _bidirectionally linked_, which means if By default, task failure is _bidirectionally linked_, which means that if
either task dies, it kills the other one. either task dies, it kills the other one.
~~~ ~~~
@ -382,8 +385,8 @@ sleep_forever(); // Will get woken up by force, then fail
# }; # };
~~~ ~~~
If you want parent tasks to kill their children, but not for a child If you want parent tasks to be able to kill their children, but do not want a
task's failure to kill the parent, you can call parent to die automatically if one of its child task dies, you can call
`task::spawn_supervised` for _unidirectionally linked_ failure. The `task::spawn_supervised` for _unidirectionally linked_ failure. The
function `task::try`, which we saw previously, uses `spawn_supervised` function `task::try`, which we saw previously, uses `spawn_supervised`
internally, with additional logic to wait for the child task to finish internally, with additional logic to wait for the child task to finish
@ -411,7 +414,7 @@ do try { // Unidirectionally linked
Supervised failure is useful in any situation where one task manages Supervised failure is useful in any situation where one task manages
multiple fallible child tasks, and the parent task can recover multiple fallible child tasks, and the parent task can recover
if any child files. On the other hand, if the _parent_ (supervisor) fails if any child fails. On the other hand, if the _parent_ (supervisor) fails,
then there is nothing the children can do to recover, so they should then there is nothing the children can do to recover, so they should
also fail. also fail.
@ -456,11 +459,11 @@ fail;
A very common thing to do is to spawn a child task where the parent A very common thing to do is to spawn a child task where the parent
and child both need to exchange messages with each other. The and child both need to exchange messages with each other. The
function `std::comm::DuplexStream()` supports this pattern. We'll function `std::comm::DuplexStream()` supports this pattern. We'll
look briefly at how it is used. look briefly at how to use it.
To see how `spawn_conversation()` works, we will create a child task To see how `spawn_conversation()` works, we will create a child task
that receives `uint` messages, converts them to a string, and sends that repeatedly receives a `uint` message, converts it to a string, and sends
the string in response. The child terminates when `0` is received. the string in response. The child terminates when it receives `0`.
Here is the function that implements the child task: Here is the function that implements the child task:
~~~~ ~~~~
@ -470,8 +473,8 @@ fn stringifier(channel: &DuplexStream<~str, uint>) {
let mut value: uint; let mut value: uint;
loop { loop {
value = channel.recv(); value = channel.recv();
channel.send(uint::to_str(value, 10u)); channel.send(uint::to_str(value, 10));
if value == 0u { break; } if value == 0 { break; }
} }
} }
~~~~ ~~~~
@ -481,7 +484,7 @@ receiving. The `stringifier` function takes a `DuplexStream` that can
send strings (the first type parameter) and receive `uint` messages send strings (the first type parameter) and receive `uint` messages
(the second type parameter). The body itself simply loops, reading (the second type parameter). The body itself simply loops, reading
from the channel and then sending its response back. The actual from the channel and then sending its response back. The actual
response itself is simply the strified version of the received value, response itself is simply the stringified version of the received value,
`uint::to_str(value)`. `uint::to_str(value)`.
Here is the code for the parent task: Here is the code for the parent task:
@ -506,11 +509,11 @@ do spawn || {
stringifier(&to_child); stringifier(&to_child);
}; };
from_child.send(22u); from_child.send(22);
assert from_child.recv() == ~"22"; assert from_child.recv() == ~"22";
from_child.send(23u); from_child.send(23);
from_child.send(0u); from_child.send(0);
assert from_child.recv() == ~"23"; assert from_child.recv() == ~"23";
assert from_child.recv() == ~"0"; assert from_child.recv() == ~"0";
@ -518,6 +521,8 @@ assert from_child.recv() == ~"0";
# } # }
~~~~ ~~~~
The parent task first calls `DuplexStream` to create a pair of bidirectional endpoints. It then uses `task::spawn` to create the child task, which captures one end of the communication channel. As a result, both parent The parent task first calls `DuplexStream` to create a pair of bidirectional
and child can send and receive data to and from the other. endpoints. It then uses `task::spawn` to create the child task, which captures
one end of the communication channel. As a result, both parent and child can
send and receive data to and from the other.