Start moving to the rustc guide!
This commit is contained in:
parent
27a046e933
commit
a05c5538d4
10 changed files with 21 additions and 1291 deletions
|
@ -623,6 +623,7 @@ For people new to Rust, and just starting to contribute, or even for
|
|||
more seasoned developers, some useful places to look for information
|
||||
are:
|
||||
|
||||
* The [rustc guide] contains information about how various parts of the compiler work
|
||||
* [Rust Forge][rustforge] contains additional documentation, including write-ups of how to achieve common tasks
|
||||
* The [Rust Internals forum][rif], a place to ask questions and
|
||||
discuss Rust's internals
|
||||
|
@ -635,6 +636,7 @@ are:
|
|||
* **Google!** ([search only in Rust Documentation][gsearchdocs] to find types, traits, etc. quickly)
|
||||
* Don't be afraid to ask! The Rust community is friendly and helpful.
|
||||
|
||||
[rustc guide]: https://rust-lang-nursery.github.io/rustc-guide/about-this-guide.html
|
||||
[gdfrustc]: http://manishearth.github.io/rust-internals-docs/rustc/
|
||||
[gsearchdocs]: https://www.google.com/search?q=site:doc.rust-lang.org+your+query+here
|
||||
[rif]: http://internals.rust-lang.org
|
||||
|
|
|
@ -227,9 +227,13 @@ variety of channels on Mozilla's IRC network, irc.mozilla.org. The
|
|||
most popular channel is [#rust], a venue for general discussion about
|
||||
Rust. And a good place to ask for help would be [#rust-beginners].
|
||||
|
||||
Also, the [rustc guide] might be a good place to start if you want to
|
||||
find out how various parts of the compiler work.
|
||||
|
||||
[IRC]: https://en.wikipedia.org/wiki/Internet_Relay_Chat
|
||||
[#rust]: irc://irc.mozilla.org/rust
|
||||
[#rust-beginners]: irc://irc.mozilla.org/rust-beginners
|
||||
[rustc-guide]: https://rust-lang-nursery.github.io/rustc-guide/about-this-guide.html
|
||||
|
||||
## License
|
||||
[license]: #license
|
||||
|
|
15
src/README.md
Normal file
15
src/README.md
Normal file
|
@ -0,0 +1,15 @@
|
|||
This directory contains the source code of the rust project, including:
|
||||
- `rustc` and its tests
|
||||
- `libstd`
|
||||
- Various submodules for tools, like rustdoc, rls, etc.
|
||||
|
||||
For more information on how various parts of the compiler work, see the [rustc guide].
|
||||
|
||||
Their is also useful content in the following READMEs, which are gradually being moved over to the guide:
|
||||
- https://github.com/rust-lang/rust/tree/master/src/librustc/ty/maps
|
||||
- https://github.com/rust-lang/rust/tree/master/src/librustc/dep_graph
|
||||
- https://github.com/rust-lang/rust/blob/master/src/librustc/infer/region_constraints
|
||||
- https://github.com/rust-lang/rust/tree/master/src/librustc/infer/higher_ranked
|
||||
- https://github.com/rust-lang/rust/tree/master/src/librustc/infer/lexical_region_resolve
|
||||
|
||||
[rustc guide]: https://rust-lang-nursery.github.io/rustc-guide/about-this-guide.html
|
|
@ -1,204 +0,0 @@
|
|||
An informal guide to reading and working on the rustc compiler.
|
||||
==================================================================
|
||||
|
||||
If you wish to expand on this document, or have a more experienced
|
||||
Rust contributor add anything else to it, please get in touch:
|
||||
|
||||
* https://internals.rust-lang.org/
|
||||
* https://chat.mibbit.com/?server=irc.mozilla.org&channel=%23rust
|
||||
|
||||
or file a bug:
|
||||
|
||||
https://github.com/rust-lang/rust/issues
|
||||
|
||||
Your concerns are probably the same as someone else's.
|
||||
|
||||
You may also be interested in the
|
||||
[Rust Forge](https://forge.rust-lang.org/), which includes a number of
|
||||
interesting bits of information.
|
||||
|
||||
Finally, at the end of this file is a GLOSSARY defining a number of
|
||||
common (and not necessarily obvious!) names that are used in the Rust
|
||||
compiler code. If you see some funky name and you'd like to know what
|
||||
it stands for, check there!
|
||||
|
||||
The crates of rustc
|
||||
===================
|
||||
|
||||
Rustc consists of a number of crates, including `syntax`,
|
||||
`rustc`, `rustc_back`, `rustc_trans`, `rustc_driver`, and
|
||||
many more. The source for each crate can be found in a directory
|
||||
like `src/libXXX`, where `XXX` is the crate name.
|
||||
|
||||
(NB. The names and divisions of these crates are not set in
|
||||
stone and may change over time -- for the time being, we tend towards
|
||||
a finer-grained division to help with compilation time, though as
|
||||
incremental improves that may change.)
|
||||
|
||||
The dependency structure of these crates is roughly a diamond:
|
||||
|
||||
```
|
||||
rustc_driver
|
||||
/ | \
|
||||
/ | \
|
||||
/ | \
|
||||
/ v \
|
||||
rustc_trans rustc_borrowck ... rustc_metadata
|
||||
\ | /
|
||||
\ | /
|
||||
\ | /
|
||||
\ v /
|
||||
rustc
|
||||
|
|
||||
v
|
||||
syntax
|
||||
/ \
|
||||
/ \
|
||||
syntax_pos syntax_ext
|
||||
```
|
||||
|
||||
The `rustc_driver` crate, at the top of this lattice, is effectively
|
||||
the "main" function for the rust compiler. It doesn't have much "real
|
||||
code", but instead ties together all of the code defined in the other
|
||||
crates and defines the overall flow of execution. (As we transition
|
||||
more and more to the [query model](ty/maps/README.md), however, the
|
||||
"flow" of compilation is becoming less centrally defined.)
|
||||
|
||||
At the other extreme, the `rustc` crate defines the common and
|
||||
pervasive data structures that all the rest of the compiler uses
|
||||
(e.g., how to represent types, traits, and the program itself). It
|
||||
also contains some amount of the compiler itself, although that is
|
||||
relatively limited.
|
||||
|
||||
Finally, all the crates in the bulge in the middle define the bulk of
|
||||
the compiler -- they all depend on `rustc`, so that they can make use
|
||||
of the various types defined there, and they export public routines
|
||||
that `rustc_driver` will invoke as needed (more and more, what these
|
||||
crates export are "query definitions", but those are covered later
|
||||
on).
|
||||
|
||||
Below `rustc` lie various crates that make up the parser and error
|
||||
reporting mechanism. For historical reasons, these crates do not have
|
||||
the `rustc_` prefix, but they are really just as much an internal part
|
||||
of the compiler and not intended to be stable (though they do wind up
|
||||
getting used by some crates in the wild; a practice we hope to
|
||||
gradually phase out).
|
||||
|
||||
Each crate has a `README.md` file that describes, at a high-level,
|
||||
what it contains, and tries to give some kind of explanation (some
|
||||
better than others).
|
||||
|
||||
The compiler process
|
||||
====================
|
||||
|
||||
The Rust compiler is in a bit of transition right now. It used to be a
|
||||
purely "pass-based" compiler, where we ran a number of passes over the
|
||||
entire program, and each did a particular check of transformation.
|
||||
|
||||
We are gradually replacing this pass-based code with an alternative
|
||||
setup based on on-demand **queries**. In the query-model, we work
|
||||
backwards, executing a *query* that expresses our ultimate goal (e.g.,
|
||||
"compile this crate"). This query in turn may make other queries
|
||||
(e.g., "get me a list of all modules in the crate"). Those queries
|
||||
make other queries that ultimately bottom out in the base operations,
|
||||
like parsing the input, running the type-checker, and so forth. This
|
||||
on-demand model permits us to do exciting things like only do the
|
||||
minimal amount of work needed to type-check a single function. It also
|
||||
helps with incremental compilation. (For details on defining queries,
|
||||
check out `src/librustc/ty/maps/README.md`.)
|
||||
|
||||
Regardless of the general setup, the basic operations that the
|
||||
compiler must perform are the same. The only thing that changes is
|
||||
whether these operations are invoked front-to-back, or on demand. In
|
||||
order to compile a Rust crate, these are the general steps that we
|
||||
take:
|
||||
|
||||
1. **Parsing input**
|
||||
- this processes the `.rs` files and produces the AST ("abstract syntax tree")
|
||||
- the AST is defined in `syntax/ast.rs`. It is intended to match the lexical
|
||||
syntax of the Rust language quite closely.
|
||||
2. **Name resolution, macro expansion, and configuration**
|
||||
- once parsing is complete, we process the AST recursively, resolving paths
|
||||
and expanding macros. This same process also processes `#[cfg]` nodes, and hence
|
||||
may strip things out of the AST as well.
|
||||
3. **Lowering to HIR**
|
||||
- Once name resolution completes, we convert the AST into the HIR,
|
||||
or "high-level IR". The HIR is defined in `src/librustc/hir/`; that module also includes
|
||||
the lowering code.
|
||||
- The HIR is a lightly desugared variant of the AST. It is more processed than the
|
||||
AST and more suitable for the analyses that follow. It is **not** required to match
|
||||
the syntax of the Rust language.
|
||||
- As a simple example, in the **AST**, we preserve the parentheses
|
||||
that the user wrote, so `((1 + 2) + 3)` and `1 + 2 + 3` parse
|
||||
into distinct trees, even though they are equivalent. In the
|
||||
HIR, however, parentheses nodes are removed, and those two
|
||||
expressions are represented in the same way.
|
||||
3. **Type-checking and subsequent analyses**
|
||||
- An important step in processing the HIR is to perform type
|
||||
checking. This process assigns types to every HIR expression,
|
||||
for example, and also is responsible for resolving some
|
||||
"type-dependent" paths, such as field accesses (`x.f` -- we
|
||||
can't know what field `f` is being accessed until we know the
|
||||
type of `x`) and associated type references (`T::Item` -- we
|
||||
can't know what type `Item` is until we know what `T` is).
|
||||
- Type checking creates "side-tables" (`TypeckTables`) that include
|
||||
the types of expressions, the way to resolve methods, and so forth.
|
||||
- After type-checking, we can do other analyses, such as privacy checking.
|
||||
4. **Lowering to MIR and post-processing**
|
||||
- Once type-checking is done, we can lower the HIR into MIR ("middle IR"), which
|
||||
is a **very** desugared version of Rust, well suited to the borrowck but also
|
||||
certain high-level optimizations.
|
||||
5. **Translation to LLVM and LLVM optimizations**
|
||||
- From MIR, we can produce LLVM IR.
|
||||
- LLVM then runs its various optimizations, which produces a number of `.o` files
|
||||
(one for each "codegen unit").
|
||||
6. **Linking**
|
||||
- Finally, those `.o` files are linked together.
|
||||
|
||||
Glossary
|
||||
========
|
||||
|
||||
The compiler uses a number of...idiosyncratic abbreviations and
|
||||
things. This glossary attempts to list them and give you a few
|
||||
pointers for understanding them better.
|
||||
|
||||
- AST -- the **abstract syntax tree** produced by the `syntax` crate; reflects user syntax
|
||||
very closely.
|
||||
- codegen unit -- when we produce LLVM IR, we group the Rust code into a number of codegen
|
||||
units. Each of these units is processed by LLVM independently from one another,
|
||||
enabling parallelism. They are also the unit of incremental re-use.
|
||||
- cx -- we tend to use "cx" as an abbrevation for context. See also tcx, infcx, etc.
|
||||
- `DefId` -- an index identifying a **definition** (see `librustc/hir/def_id.rs`). Uniquely
|
||||
identifies a `DefPath`.
|
||||
- HIR -- the **High-level IR**, created by lowering and desugaring the AST. See `librustc/hir`.
|
||||
- `HirId` -- identifies a particular node in the HIR by combining a
|
||||
def-id with an "intra-definition offset".
|
||||
- `'gcx` -- the lifetime of the global arena (see `librustc/ty`).
|
||||
- generics -- the set of generic type parameters defined on a type or item
|
||||
- ICE -- internal compiler error. When the compiler crashes.
|
||||
- ICH -- incremental compilation hash.
|
||||
- infcx -- the inference context (see `librustc/infer`)
|
||||
- MIR -- the **Mid-level IR** that is created after type-checking for use by borrowck and trans.
|
||||
Defined in the `src/librustc/mir/` module, but much of the code that manipulates it is
|
||||
found in `src/librustc_mir`.
|
||||
- obligation -- something that must be proven by the trait system; see `librustc/traits`.
|
||||
- local crate -- the crate currently being compiled.
|
||||
- node-id or `NodeId` -- an index identifying a particular node in the
|
||||
AST or HIR; gradually being phased out and replaced with `HirId`.
|
||||
- query -- perhaps some sub-computation during compilation; see `librustc/maps`.
|
||||
- provider -- the function that executes a query; see `librustc/maps`.
|
||||
- sess -- the **compiler session**, which stores global data used throughout compilation
|
||||
- side tables -- because the AST and HIR are immutable once created, we often carry extra
|
||||
information about them in the form of hashtables, indexed by the id of a particular node.
|
||||
- span -- a location in the user's source code, used for error
|
||||
reporting primarily. These are like a file-name/line-number/column
|
||||
tuple on steroids: they carry a start/end point, and also track
|
||||
macro expansions and compiler desugaring. All while being packed
|
||||
into a few bytes (really, it's an index into a table). See the
|
||||
`Span` datatype for more.
|
||||
- substs -- the **substitutions** for a given generic type or item
|
||||
(e.g., the `i32, u32` in `HashMap<i32, u32>`)
|
||||
- tcx -- the "typing context", main data structure of the compiler (see `librustc/ty`).
|
||||
- trans -- the code to **translate** MIR into LLVM IR.
|
||||
- trait reference -- a trait and values for its type parameters (see `librustc/ty`).
|
||||
- ty -- the internal representation of a **type** (see `librustc/ty`).
|
|
@ -1,119 +0,0 @@
|
|||
# Introduction to the HIR
|
||||
|
||||
The HIR -- "High-level IR" -- is the primary IR used in most of
|
||||
rustc. It is a desugared version of the "abstract syntax tree" (AST)
|
||||
that is generated after parsing, macro expansion, and name resolution
|
||||
have completed. Many parts of HIR resemble Rust surface syntax quite
|
||||
closely, with the exception that some of Rust's expression forms have
|
||||
been desugared away (as an example, `for` loops are converted into a
|
||||
`loop` and do not appear in the HIR).
|
||||
|
||||
This README covers the main concepts of the HIR.
|
||||
|
||||
### Out-of-band storage and the `Crate` type
|
||||
|
||||
The top-level data-structure in the HIR is the `Crate`, which stores
|
||||
the contents of the crate currently being compiled (we only ever
|
||||
construct HIR for the current crate). Whereas in the AST the crate
|
||||
data structure basically just contains the root module, the HIR
|
||||
`Crate` structure contains a number of maps and other things that
|
||||
serve to organize the content of the crate for easier access.
|
||||
|
||||
For example, the contents of individual items (e.g., modules,
|
||||
functions, traits, impls, etc) in the HIR are not immediately
|
||||
accessible in the parents. So, for example, if had a module item `foo`
|
||||
containing a function `bar()`:
|
||||
|
||||
```
|
||||
mod foo {
|
||||
fn bar() { }
|
||||
}
|
||||
```
|
||||
|
||||
Then in the HIR the representation of module `foo` (the `Mod`
|
||||
stuct) would have only the **`ItemId`** `I` of `bar()`. To get the
|
||||
details of the function `bar()`, we would lookup `I` in the
|
||||
`items` map.
|
||||
|
||||
One nice result from this representation is that one can iterate
|
||||
over all items in the crate by iterating over the key-value pairs
|
||||
in these maps (without the need to trawl through the IR in total).
|
||||
There are similar maps for things like trait items and impl items,
|
||||
as well as "bodies" (explained below).
|
||||
|
||||
The other reason to setup the representation this way is for better
|
||||
integration with incremental compilation. This way, if you gain access
|
||||
to a `&hir::Item` (e.g. for the mod `foo`), you do not immediately
|
||||
gain access to the contents of the function `bar()`. Instead, you only
|
||||
gain access to the **id** for `bar()`, and you must invoke some
|
||||
function to lookup the contents of `bar()` given its id; this gives us
|
||||
a chance to observe that you accessed the data for `bar()` and record
|
||||
the dependency.
|
||||
|
||||
### Identifiers in the HIR
|
||||
|
||||
Most of the code that has to deal with things in HIR tends not to
|
||||
carry around references into the HIR, but rather to carry around
|
||||
*identifier numbers* (or just "ids"). Right now, you will find four
|
||||
sorts of identifiers in active use:
|
||||
|
||||
- `DefId`, which primarily names "definitions" or top-level items.
|
||||
- You can think of a `DefId` as being shorthand for a very explicit
|
||||
and complete path, like `std::collections::HashMap`. However,
|
||||
these paths are able to name things that are not nameable in
|
||||
normal Rust (e.g., impls), and they also include extra information
|
||||
about the crate (such as its version number, as two versions of
|
||||
the same crate can co-exist).
|
||||
- A `DefId` really consists of two parts, a `CrateNum` (which
|
||||
identifies the crate) and a `DefIndex` (which indixes into a list
|
||||
of items that is maintained per crate).
|
||||
- `HirId`, which combines the index of a particular item with an
|
||||
offset within that item.
|
||||
- the key point of a `HirId` is that it is *relative* to some item (which is named
|
||||
via a `DefId`).
|
||||
- `BodyId`, this is an absolute identifier that refers to a specific
|
||||
body (definition of a function or constant) in the crate. It is currently
|
||||
effectively a "newtype'd" `NodeId`.
|
||||
- `NodeId`, which is an absolute id that identifies a single node in the HIR tree.
|
||||
- While these are still in common use, **they are being slowly phased out**.
|
||||
- Since they are absolute within the crate, adding a new node
|
||||
anywhere in the tree causes the node-ids of all subsequent code in
|
||||
the crate to change. This is terrible for incremental compilation,
|
||||
as you can perhaps imagine.
|
||||
|
||||
### HIR Map
|
||||
|
||||
Most of the time when you are working with the HIR, you will do so via
|
||||
the **HIR Map**, accessible in the tcx via `tcx.hir` (and defined in
|
||||
the `hir::map` module). The HIR map contains a number of methods to
|
||||
convert between ids of various kinds and to lookup data associated
|
||||
with a HIR node.
|
||||
|
||||
For example, if you have a `DefId`, and you would like to convert it
|
||||
to a `NodeId`, you can use `tcx.hir.as_local_node_id(def_id)`. This
|
||||
returns an `Option<NodeId>` -- this will be `None` if the def-id
|
||||
refers to something outside of the current crate (since then it has no
|
||||
HIR node), but otherwise returns `Some(n)` where `n` is the node-id of
|
||||
the definition.
|
||||
|
||||
Similarly, you can use `tcx.hir.find(n)` to lookup the node for a
|
||||
`NodeId`. This returns a `Option<Node<'tcx>>`, where `Node` is an enum
|
||||
defined in the map; by matching on this you can find out what sort of
|
||||
node the node-id referred to and also get a pointer to the data
|
||||
itself. Often, you know what sort of node `n` is -- e.g., if you know
|
||||
that `n` must be some HIR expression, you can do
|
||||
`tcx.hir.expect_expr(n)`, which will extract and return the
|
||||
`&hir::Expr`, panicking if `n` is not in fact an expression.
|
||||
|
||||
Finally, you can use the HIR map to find the parents of nodes, via
|
||||
calls like `tcx.hir.get_parent_node(n)`.
|
||||
|
||||
### HIR Bodies
|
||||
|
||||
A **body** represents some kind of executable code, such as the body
|
||||
of a function/closure or the definition of a constant. Bodies are
|
||||
associated with an **owner**, which is typically some kind of item
|
||||
(e.g., a `fn()` or `const`), but could also be a closure expression
|
||||
(e.g., `|x, y| x + y`). You can use the HIR map to find the body
|
||||
associated with a given def-id (`maybe_body_owned_by()`) or to find
|
||||
the owner of a body (`body_owner_def_id()`).
|
|
@ -1,4 +0,0 @@
|
|||
The HIR map, accessible via `tcx.hir`, allows you to quickly navigate the
|
||||
HIR and convert between various forms of identifiers. See [the HIR README] for more information.
|
||||
|
||||
[the HIR README]: ../README.md
|
|
@ -1,227 +0,0 @@
|
|||
# Type inference engine
|
||||
|
||||
The type inference is based on standard HM-type inference, but
|
||||
extended in various way to accommodate subtyping, region inference,
|
||||
and higher-ranked types.
|
||||
|
||||
## A note on terminology
|
||||
|
||||
We use the notation `?T` to refer to inference variables, also called
|
||||
existential variables.
|
||||
|
||||
We use the term "region" and "lifetime" interchangeably. Both refer to
|
||||
the `'a` in `&'a T`.
|
||||
|
||||
The term "bound region" refers to regions bound in a function
|
||||
signature, such as the `'a` in `for<'a> fn(&'a u32)`. A region is
|
||||
"free" if it is not bound.
|
||||
|
||||
## Creating an inference context
|
||||
|
||||
You create and "enter" an inference context by doing something like
|
||||
the following:
|
||||
|
||||
```rust
|
||||
tcx.infer_ctxt().enter(|infcx| {
|
||||
// use the inference context `infcx` in here
|
||||
})
|
||||
```
|
||||
|
||||
Each inference context creates a short-lived type arena to store the
|
||||
fresh types and things that it will create, as described in
|
||||
[the README in the ty module][ty-readme]. This arena is created by the `enter`
|
||||
function and disposed after it returns.
|
||||
|
||||
[ty-readme]: src/librustc/ty/README.md
|
||||
|
||||
Within the closure, the infcx will have the type `InferCtxt<'cx, 'gcx,
|
||||
'tcx>` for some fresh `'cx` and `'tcx` -- the latter corresponds to
|
||||
the lifetime of this temporary arena, and the `'cx` is the lifetime of
|
||||
the `InferCtxt` itself. (Again, see [that ty README][ty-readme] for
|
||||
more details on this setup.)
|
||||
|
||||
The `tcx.infer_ctxt` method actually returns a build, which means
|
||||
there are some kinds of configuration you can do before the `infcx` is
|
||||
created. See `InferCtxtBuilder` for more information.
|
||||
|
||||
## Inference variables
|
||||
|
||||
The main purpose of the inference context is to house a bunch of
|
||||
**inference variables** -- these represent types or regions whose precise
|
||||
value is not yet known, but will be uncovered as we perform type-checking.
|
||||
|
||||
If you're familiar with the basic ideas of unification from H-M type
|
||||
systems, or logic languages like Prolog, this is the same concept. If
|
||||
you're not, you might want to read a tutorial on how H-M type
|
||||
inference works, or perhaps this blog post on
|
||||
[unification in the Chalk project].
|
||||
|
||||
[Unification in the Chalk project]: http://smallcultfollowing.com/babysteps/blog/2017/03/25/unification-in-chalk-part-1/
|
||||
|
||||
All told, the inference context stores four kinds of inference variables as of this
|
||||
writing:
|
||||
|
||||
- Type variables, which come in three varieties:
|
||||
- General type variables (the most common). These can be unified with any type.
|
||||
- Integral type variables, which can only be unified with an integral type, and
|
||||
arise from an integer literal expression like `22`.
|
||||
- Float type variables, which can only be unified with a float type, and
|
||||
arise from a float literal expression like `22.0`.
|
||||
- Region variables, which represent lifetimes, and arise all over the dang place.
|
||||
|
||||
All the type variables work in much the same way: you can create a new
|
||||
type variable, and what you get is `Ty<'tcx>` representing an
|
||||
unresolved type `?T`. Then later you can apply the various operations
|
||||
that the inferencer supports, such as equality or subtyping, and it
|
||||
will possibly **instantiate** (or **bind**) that `?T` to a specific
|
||||
value as a result.
|
||||
|
||||
The region variables work somewhat differently, and are described
|
||||
below in a separate section.
|
||||
|
||||
## Enforcing equality / subtyping
|
||||
|
||||
The most basic operations you can perform in the type inferencer is
|
||||
**equality**, which forces two types `T` and `U` to be the same. The
|
||||
recommended way to add an equality constraint is using the `at`
|
||||
method, roughly like so:
|
||||
|
||||
```
|
||||
infcx.at(...).eq(t, u);
|
||||
```
|
||||
|
||||
The first `at()` call provides a bit of context, i.e., why you are
|
||||
doing this unification, and in what environment, and the `eq` method
|
||||
performs the actual equality constraint.
|
||||
|
||||
When you equate things, you force them to be precisely equal. Equating
|
||||
returns a `InferResult` -- if it returns `Err(err)`, then equating
|
||||
failed, and the enclosing `TypeError` will tell you what went wrong.
|
||||
|
||||
The success case is perhaps more interesting. The "primary" return
|
||||
type of `eq` is `()` -- that is, when it succeeds, it doesn't return a
|
||||
value of any particular interest. Rather, it is executed for its
|
||||
side-effects of constraining type variables and so forth. However, the
|
||||
actual return type is not `()`, but rather `InferOk<()>`. The
|
||||
`InferOk` type is used to carry extra trait obligations -- your job is
|
||||
to ensure that these are fulfilled (typically by enrolling them in a
|
||||
fulfillment context). See the [trait README] for more background here.
|
||||
|
||||
[trait README]: ../traits/README.md
|
||||
|
||||
You can also enforce subtyping through `infcx.at(..).sub(..)`. The same
|
||||
basic concepts apply as above.
|
||||
|
||||
## "Trying" equality
|
||||
|
||||
Sometimes you would like to know if it is *possible* to equate two
|
||||
types without error. You can test that with `infcx.can_eq` (or
|
||||
`infcx.can_sub` for subtyping). If this returns `Ok`, then equality
|
||||
is possible -- but in all cases, any side-effects are reversed.
|
||||
|
||||
Be aware though that the success or failure of these methods is always
|
||||
**modulo regions**. That is, two types `&'a u32` and `&'b u32` will
|
||||
return `Ok` for `can_eq`, even if `'a != 'b`. This falls out from the
|
||||
"two-phase" nature of how we solve region constraints.
|
||||
|
||||
## Snapshots
|
||||
|
||||
As described in the previous section on `can_eq`, often it is useful
|
||||
to be able to do a series of operations and then roll back their
|
||||
side-effects. This is done for various reasons: one of them is to be
|
||||
able to backtrack, trying out multiple possibilities before settling
|
||||
on which path to take. Another is in order to ensure that a series of
|
||||
smaller changes take place atomically or not at all.
|
||||
|
||||
To allow for this, the inference context supports a `snapshot` method.
|
||||
When you call it, it will start recording changes that occur from the
|
||||
operations you perform. When you are done, you can either invoke
|
||||
`rollback_to`, which will undo those changes, or else `confirm`, which
|
||||
will make the permanent. Snapshots can be nested as long as you follow
|
||||
a stack-like discipline.
|
||||
|
||||
Rather than use snapshots directly, it is often helpful to use the
|
||||
methods like `commit_if_ok` or `probe` that encapsulate higher-level
|
||||
patterns.
|
||||
|
||||
## Subtyping obligations
|
||||
|
||||
One thing worth discussing are subtyping obligations. When you force
|
||||
two types to be a subtype, like `?T <: i32`, we can often convert those
|
||||
into equality constraints. This follows from Rust's rather limited notion
|
||||
of subtyping: so, in the above case, `?T <: i32` is equivalent to `?T = i32`.
|
||||
|
||||
However, in some cases we have to be more careful. For example, when
|
||||
regions are involved. So if you have `?T <: &'a i32`, what we would do
|
||||
is to first "generalize" `&'a i32` into a type with a region variable:
|
||||
`&'?b i32`, and then unify `?T` with that (`?T = &'?b i32`). We then
|
||||
relate this new variable with the original bound:
|
||||
|
||||
&'?b i32 <: &'a i32
|
||||
|
||||
This will result in a region constraint (see below) of `'?b: 'a`.
|
||||
|
||||
One final interesting case is relating two unbound type variables,
|
||||
like `?T <: ?U`. In that case, we can't make progress, so we enqueue
|
||||
an obligation `Subtype(?T, ?U)` and return it via the `InferOk`
|
||||
mechanism. You'll have to try again when more details about `?T` or
|
||||
`?U` are known.
|
||||
|
||||
## Region constraints
|
||||
|
||||
Regions are inferred somewhat differently from types. Rather than
|
||||
eagerly unifying things, we simply collect constraints as we go, but
|
||||
make (almost) no attempt to solve regions. These constraints have the
|
||||
form of an outlives constraint:
|
||||
|
||||
'a: 'b
|
||||
|
||||
Actually the code tends to view them as a subregion relation, but it's the same
|
||||
idea:
|
||||
|
||||
'b <= 'a
|
||||
|
||||
(There are various other kinds of constriants, such as "verifys"; see
|
||||
the `region_constraints` module for details.)
|
||||
|
||||
There is one case where we do some amount of eager unification. If you have an equality constraint
|
||||
between two regions
|
||||
|
||||
'a = 'b
|
||||
|
||||
we will record that fact in a unification table. You can then use
|
||||
`opportunistic_resolve_var` to convert `'b` to `'a` (or vice
|
||||
versa). This is sometimes needed to ensure termination of fixed-point
|
||||
algorithms.
|
||||
|
||||
## Extracting region constraints
|
||||
|
||||
Ultimately, region constraints are only solved at the very end of
|
||||
type-checking, once all other constraints are known. There are two
|
||||
ways to solve region constraints right now: lexical and
|
||||
non-lexical. Eventually there will only be one.
|
||||
|
||||
To solve **lexical** region constraints, you invoke
|
||||
`resolve_regions_and_report_errors`. This will "close" the region
|
||||
constraint process and invoke the `lexical_region_resolve` code. Once
|
||||
this is done, any further attempt to equate or create a subtyping
|
||||
relationship will yield an ICE.
|
||||
|
||||
Non-lexical region constraints are not handled within the inference
|
||||
context. Instead, the NLL solver (actually, the MIR type-checker)
|
||||
invokes `take_and_reset_region_constraints` periodically. This
|
||||
extracts all of the outlives constraints from the region solver, but
|
||||
leaves the set of variables intact. This is used to get *just* the
|
||||
region constraints that resulted from some particular point in the
|
||||
program, since the NLL solver needs to know not just *what* regions
|
||||
were subregions but *where*. Finally, the NLL solver invokes
|
||||
`take_region_var_origins`, which "closes" the region constraint
|
||||
process in the same way as normal solving.
|
||||
|
||||
## Lexical region resolution
|
||||
|
||||
Lexical region resolution is done by initially assigning each region
|
||||
variable to an empty value. We then process each outlives constraint
|
||||
repeatedly, growing region variables until a fixed-point is reached.
|
||||
Region variables can be grown using a least-upper-bound relation on
|
||||
the region lattice in a fairly straight-forward fashion.
|
|
@ -1,90 +0,0 @@
|
|||
# MIR definition and pass system
|
||||
|
||||
This file contains the definition of the MIR datatypes along with the
|
||||
various types for the "MIR Pass" system, which lets you easily
|
||||
register and define new MIR transformations and analyses.
|
||||
|
||||
Most of the code that operates on MIR can be found in the
|
||||
`librustc_mir` crate or other crates. The code found here in
|
||||
`librustc` is just the datatype definitions, along with the functions
|
||||
which operate on MIR to be placed everywhere else.
|
||||
|
||||
## MIR Data Types and visitor
|
||||
|
||||
The main MIR data type is `rustc::mir::Mir`, defined in `mod.rs`.
|
||||
There is also the MIR visitor (in `visit.rs`) which allows you to walk
|
||||
the MIR and override what actions will be taken at various points (you
|
||||
can visit in either shared or mutable mode; the latter allows changing
|
||||
the MIR in place). Finally `traverse.rs` contains various traversal
|
||||
routines for visiting the MIR CFG in [different standard orders][traversal]
|
||||
(e.g. pre-order, reverse post-order, and so forth).
|
||||
|
||||
[traversal]: https://en.wikipedia.org/wiki/Tree_traversal
|
||||
|
||||
## MIR pass suites and their integration into the query system
|
||||
|
||||
As a MIR *consumer*, you are expected to use one of the queries that
|
||||
returns a "final MIR". As of the time of this writing, there is only
|
||||
one: `optimized_mir(def_id)`, but more are expected to come in the
|
||||
future. For foreign def-ids, we simply read the MIR from the other
|
||||
crate's metadata. But for local def-ids, the query will construct the
|
||||
MIR and then iteratively optimize it by putting it through various
|
||||
pipeline stages. This section describes those pipeline stages and how
|
||||
you can extend them.
|
||||
|
||||
To produce the `optimized_mir(D)` for a given def-id `D`, the MIR
|
||||
passes through several suites of optimizations, each represented by a
|
||||
query. Each suite consists of multiple optimizations and
|
||||
transformations. These suites represent useful intermediate points
|
||||
where we want to access the MIR for type checking or other purposes:
|
||||
|
||||
- `mir_build(D)` -- not a query, but this constructs the initial MIR
|
||||
- `mir_const(D)` -- applies some simple transformations to make MIR ready for constant evaluation;
|
||||
- `mir_validated(D)` -- applies some more transformations, making MIR ready for borrow checking;
|
||||
- `optimized_mir(D)` -- the final state, after all optimizations have been performed.
|
||||
|
||||
### Stealing
|
||||
|
||||
The intermediate queries `mir_const()` and `mir_validated()` yield up
|
||||
a `&'tcx Steal<Mir<'tcx>>`, allocated using
|
||||
`tcx.alloc_steal_mir()`. This indicates that the result may be
|
||||
**stolen** by the next suite of optimizations -- this is an
|
||||
optimization to avoid cloning the MIR. Attempting to use a stolen
|
||||
result will cause a panic in the compiler. Therefore, it is important
|
||||
that you do not read directly from these intermediate queries except as
|
||||
part of the MIR processing pipeline.
|
||||
|
||||
Because of this stealing mechanism, some care must also be taken to
|
||||
ensure that, before the MIR at a particular phase in the processing
|
||||
pipeline is stolen, anyone who may want to read from it has already
|
||||
done so. Concretely, this means that if you have some query `foo(D)`
|
||||
that wants to access the result of `mir_const(D)` or
|
||||
`mir_validated(D)`, you need to have the successor pass "force"
|
||||
`foo(D)` using `ty::queries::foo::force(...)`. This will force a query
|
||||
to execute even though you don't directly require its result.
|
||||
|
||||
As an example, consider MIR const qualification. It wants to read the
|
||||
result produced by the `mir_const()` suite. However, that result will
|
||||
be **stolen** by the `mir_validated()` suite. If nothing was done,
|
||||
then `mir_const_qualif(D)` would succeed if it came before
|
||||
`mir_validated(D)`, but fail otherwise. Therefore, `mir_validated(D)`
|
||||
will **force** `mir_const_qualif` before it actually steals, thus
|
||||
ensuring that the reads have already happened:
|
||||
|
||||
```
|
||||
mir_const(D) --read-by--> mir_const_qualif(D)
|
||||
| ^
|
||||
stolen-by |
|
||||
| (forces)
|
||||
v |
|
||||
mir_validated(D) ------------+
|
||||
```
|
||||
|
||||
### Implementing and registering a pass
|
||||
|
||||
To create a new MIR pass, you simply implement the `MirPass` trait for
|
||||
some fresh singleton type `Foo`. Once you have implemented a trait for
|
||||
your type `Foo`, you then have to insert `Foo` into one of the suites;
|
||||
this is done in `librustc_driver/driver.rs` by invoking `push_pass(S,
|
||||
Foo)` with the appropriate suite substituted for `S`.
|
||||
|
|
@ -1,482 +0,0 @@
|
|||
# TRAIT RESOLUTION
|
||||
|
||||
This document describes the general process and points out some non-obvious
|
||||
things.
|
||||
|
||||
## Major concepts
|
||||
|
||||
Trait resolution is the process of pairing up an impl with each
|
||||
reference to a trait. So, for example, if there is a generic function like:
|
||||
|
||||
```rust
|
||||
fn clone_slice<T:Clone>(x: &[T]) -> Vec<T> { /*...*/ }
|
||||
```
|
||||
|
||||
and then a call to that function:
|
||||
|
||||
```rust
|
||||
let v: Vec<isize> = clone_slice(&[1, 2, 3])
|
||||
```
|
||||
|
||||
it is the job of trait resolution to figure out (in which case)
|
||||
whether there exists an impl of `isize : Clone`
|
||||
|
||||
Note that in some cases, like generic functions, we may not be able to
|
||||
find a specific impl, but we can figure out that the caller must
|
||||
provide an impl. To see what I mean, consider the body of `clone_slice`:
|
||||
|
||||
```rust
|
||||
fn clone_slice<T:Clone>(x: &[T]) -> Vec<T> {
|
||||
let mut v = Vec::new();
|
||||
for e in &x {
|
||||
v.push((*e).clone()); // (*)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The line marked `(*)` is only legal if `T` (the type of `*e`)
|
||||
implements the `Clone` trait. Naturally, since we don't know what `T`
|
||||
is, we can't find the specific impl; but based on the bound `T:Clone`,
|
||||
we can say that there exists an impl which the caller must provide.
|
||||
|
||||
We use the term *obligation* to refer to a trait reference in need of
|
||||
an impl.
|
||||
|
||||
## Overview
|
||||
|
||||
Trait resolution consists of three major parts:
|
||||
|
||||
- SELECTION: Deciding how to resolve a specific obligation. For
|
||||
example, selection might decide that a specific obligation can be
|
||||
resolved by employing an impl which matches the self type, or by
|
||||
using a parameter bound. In the case of an impl, Selecting one
|
||||
obligation can create *nested obligations* because of where clauses
|
||||
on the impl itself. It may also require evaluating those nested
|
||||
obligations to resolve ambiguities.
|
||||
|
||||
- FULFILLMENT: The fulfillment code is what tracks that obligations
|
||||
are completely fulfilled. Basically it is a worklist of obligations
|
||||
to be selected: once selection is successful, the obligation is
|
||||
removed from the worklist and any nested obligations are enqueued.
|
||||
|
||||
- COHERENCE: The coherence checks are intended to ensure that there
|
||||
are never overlapping impls, where two impls could be used with
|
||||
equal precedence.
|
||||
|
||||
## Selection
|
||||
|
||||
Selection is the process of deciding whether an obligation can be
|
||||
resolved and, if so, how it is to be resolved (via impl, where clause, etc).
|
||||
The main interface is the `select()` function, which takes an obligation
|
||||
and returns a `SelectionResult`. There are three possible outcomes:
|
||||
|
||||
- `Ok(Some(selection))` -- yes, the obligation can be resolved, and
|
||||
`selection` indicates how. If the impl was resolved via an impl,
|
||||
then `selection` may also indicate nested obligations that are required
|
||||
by the impl.
|
||||
|
||||
- `Ok(None)` -- we are not yet sure whether the obligation can be
|
||||
resolved or not. This happens most commonly when the obligation
|
||||
contains unbound type variables.
|
||||
|
||||
- `Err(err)` -- the obligation definitely cannot be resolved due to a
|
||||
type error, or because there are no impls that could possibly apply,
|
||||
etc.
|
||||
|
||||
The basic algorithm for selection is broken into two big phases:
|
||||
candidate assembly and confirmation.
|
||||
|
||||
### Candidate assembly
|
||||
|
||||
Searches for impls/where-clauses/etc that might
|
||||
possibly be used to satisfy the obligation. Each of those is called
|
||||
a candidate. To avoid ambiguity, we want to find exactly one
|
||||
candidate that is definitively applicable. In some cases, we may not
|
||||
know whether an impl/where-clause applies or not -- this occurs when
|
||||
the obligation contains unbound inference variables.
|
||||
|
||||
The basic idea for candidate assembly is to do a first pass in which
|
||||
we identify all possible candidates. During this pass, all that we do
|
||||
is try and unify the type parameters. (In particular, we ignore any
|
||||
nested where clauses.) Presuming that this unification succeeds, the
|
||||
impl is added as a candidate.
|
||||
|
||||
Once this first pass is done, we can examine the set of candidates. If
|
||||
it is a singleton set, then we are done: this is the only impl in
|
||||
scope that could possibly apply. Otherwise, we can winnow down the set
|
||||
of candidates by using where clauses and other conditions. If this
|
||||
reduced set yields a single, unambiguous entry, we're good to go,
|
||||
otherwise the result is considered ambiguous.
|
||||
|
||||
#### The basic process: Inferring based on the impls we see
|
||||
|
||||
This process is easier if we work through some examples. Consider
|
||||
the following trait:
|
||||
|
||||
```rust
|
||||
trait Convert<Target> {
|
||||
fn convert(&self) -> Target;
|
||||
}
|
||||
```
|
||||
|
||||
This trait just has one method. It's about as simple as it gets. It
|
||||
converts from the (implicit) `Self` type to the `Target` type. If we
|
||||
wanted to permit conversion between `isize` and `usize`, we might
|
||||
implement `Convert` like so:
|
||||
|
||||
```rust
|
||||
impl Convert<usize> for isize { /*...*/ } // isize -> usize
|
||||
impl Convert<isize> for usize { /*...*/ } // usize -> isize
|
||||
```
|
||||
|
||||
Now imagine there is some code like the following:
|
||||
|
||||
```rust
|
||||
let x: isize = ...;
|
||||
let y = x.convert();
|
||||
```
|
||||
|
||||
The call to convert will generate a trait reference `Convert<$Y> for
|
||||
isize`, where `$Y` is the type variable representing the type of
|
||||
`y`. When we match this against the two impls we can see, we will find
|
||||
that only one remains: `Convert<usize> for isize`. Therefore, we can
|
||||
select this impl, which will cause the type of `$Y` to be unified to
|
||||
`usize`. (Note that while assembling candidates, we do the initial
|
||||
unifications in a transaction, so that they don't affect one another.)
|
||||
|
||||
There are tests to this effect in src/test/run-pass:
|
||||
|
||||
traits-multidispatch-infer-convert-source-and-target.rs
|
||||
traits-multidispatch-infer-convert-target.rs
|
||||
|
||||
#### Winnowing: Resolving ambiguities
|
||||
|
||||
But what happens if there are multiple impls where all the types
|
||||
unify? Consider this example:
|
||||
|
||||
```rust
|
||||
trait Get {
|
||||
fn get(&self) -> Self;
|
||||
}
|
||||
|
||||
impl<T:Copy> Get for T {
|
||||
fn get(&self) -> T { *self }
|
||||
}
|
||||
|
||||
impl<T:Get> Get for Box<T> {
|
||||
fn get(&self) -> Box<T> { box get_it(&**self) }
|
||||
}
|
||||
```
|
||||
|
||||
What happens when we invoke `get_it(&box 1_u16)`, for example? In this
|
||||
case, the `Self` type is `Box<u16>` -- that unifies with both impls,
|
||||
because the first applies to all types, and the second to all
|
||||
boxes. In the olden days we'd have called this ambiguous. But what we
|
||||
do now is do a second *winnowing* pass that considers where clauses
|
||||
and attempts to remove candidates -- in this case, the first impl only
|
||||
applies if `Box<u16> : Copy`, which doesn't hold. After winnowing,
|
||||
then, we are left with just one candidate, so we can proceed. There is
|
||||
a test of this in `src/test/run-pass/traits-conditional-dispatch.rs`.
|
||||
|
||||
#### Matching
|
||||
|
||||
The subroutines that decide whether a particular impl/where-clause/etc
|
||||
applies to a particular obligation. At the moment, this amounts to
|
||||
unifying the self types, but in the future we may also recursively
|
||||
consider some of the nested obligations, in the case of an impl.
|
||||
|
||||
#### Lifetimes and selection
|
||||
|
||||
Because of how that lifetime inference works, it is not possible to
|
||||
give back immediate feedback as to whether a unification or subtype
|
||||
relationship between lifetimes holds or not. Therefore, lifetime
|
||||
matching is *not* considered during selection. This is reflected in
|
||||
the fact that subregion assignment is infallible. This may yield
|
||||
lifetime constraints that will later be found to be in error (in
|
||||
contrast, the non-lifetime-constraints have already been checked
|
||||
during selection and can never cause an error, though naturally they
|
||||
may lead to other errors downstream).
|
||||
|
||||
#### Where clauses
|
||||
|
||||
Besides an impl, the other major way to resolve an obligation is via a
|
||||
where clause. The selection process is always given a *parameter
|
||||
environment* which contains a list of where clauses, which are
|
||||
basically obligations that can assume are satisfiable. We will iterate
|
||||
over that list and check whether our current obligation can be found
|
||||
in that list, and if so it is considered satisfied. More precisely, we
|
||||
want to check whether there is a where-clause obligation that is for
|
||||
the same trait (or some subtrait) and for which the self types match,
|
||||
using the definition of *matching* given above.
|
||||
|
||||
Consider this simple example:
|
||||
|
||||
```rust
|
||||
trait A1 { /*...*/ }
|
||||
trait A2 : A1 { /*...*/ }
|
||||
|
||||
trait B { /*...*/ }
|
||||
|
||||
fn foo<X:A2+B> { /*...*/ }
|
||||
```
|
||||
|
||||
Clearly we can use methods offered by `A1`, `A2`, or `B` within the
|
||||
body of `foo`. In each case, that will incur an obligation like `X :
|
||||
A1` or `X : A2`. The parameter environment will contain two
|
||||
where-clauses, `X : A2` and `X : B`. For each obligation, then, we
|
||||
search this list of where-clauses. To resolve an obligation `X:A1`,
|
||||
we would note that `X:A2` implies that `X:A1`.
|
||||
|
||||
### Confirmation
|
||||
|
||||
Confirmation unifies the output type parameters of the trait with the
|
||||
values found in the obligation, possibly yielding a type error. If we
|
||||
return to our example of the `Convert` trait from the previous
|
||||
section, confirmation is where an error would be reported, because the
|
||||
impl specified that `T` would be `usize`, but the obligation reported
|
||||
`char`. Hence the result of selection would be an error.
|
||||
|
||||
### Selection during translation
|
||||
|
||||
During type checking, we do not store the results of trait selection.
|
||||
We simply wish to verify that trait selection will succeed. Then
|
||||
later, at trans time, when we have all concrete types available, we
|
||||
can repeat the trait selection. In this case, we do not consider any
|
||||
where-clauses to be in scope. We know that therefore each resolution
|
||||
will resolve to a particular impl.
|
||||
|
||||
One interesting twist has to do with nested obligations. In general, in trans,
|
||||
we only need to do a "shallow" selection for an obligation. That is, we wish to
|
||||
identify which impl applies, but we do not (yet) need to decide how to select
|
||||
any nested obligations. Nonetheless, we *do* currently do a complete resolution,
|
||||
and that is because it can sometimes inform the results of type inference. That is,
|
||||
we do not have the full substitutions in terms of the type variables of the impl available
|
||||
to us, so we must run trait selection to figure everything out.
|
||||
|
||||
Here is an example:
|
||||
|
||||
```rust
|
||||
trait Foo { /*...*/ }
|
||||
impl<U,T:Bar<U>> Foo for Vec<T> { /*...*/ }
|
||||
|
||||
impl Bar<usize> for isize { /*...*/ }
|
||||
```
|
||||
|
||||
After one shallow round of selection for an obligation like `Vec<isize>
|
||||
: Foo`, we would know which impl we want, and we would know that
|
||||
`T=isize`, but we do not know the type of `U`. We must select the
|
||||
nested obligation `isize : Bar<U>` to find out that `U=usize`.
|
||||
|
||||
It would be good to only do *just as much* nested resolution as
|
||||
necessary. Currently, though, we just do a full resolution.
|
||||
|
||||
# Higher-ranked trait bounds
|
||||
|
||||
One of the more subtle concepts at work are *higher-ranked trait
|
||||
bounds*. An example of such a bound is `for<'a> MyTrait<&'a isize>`.
|
||||
Let's walk through how selection on higher-ranked trait references
|
||||
works.
|
||||
|
||||
## Basic matching and skolemization leaks
|
||||
|
||||
Let's walk through the test `compile-fail/hrtb-just-for-static.rs` to see
|
||||
how it works. The test starts with the trait `Foo`:
|
||||
|
||||
```rust
|
||||
trait Foo<X> {
|
||||
fn foo(&self, x: X) { }
|
||||
}
|
||||
```
|
||||
|
||||
Let's say we have a function `want_hrtb` that wants a type which
|
||||
implements `Foo<&'a isize>` for any `'a`:
|
||||
|
||||
```rust
|
||||
fn want_hrtb<T>() where T : for<'a> Foo<&'a isize> { ... }
|
||||
```
|
||||
|
||||
Now we have a struct `AnyInt` that implements `Foo<&'a isize>` for any
|
||||
`'a`:
|
||||
|
||||
```rust
|
||||
struct AnyInt;
|
||||
impl<'a> Foo<&'a isize> for AnyInt { }
|
||||
```
|
||||
|
||||
And the question is, does `AnyInt : for<'a> Foo<&'a isize>`? We want the
|
||||
answer to be yes. The algorithm for figuring it out is closely related
|
||||
to the subtyping for higher-ranked types (which is described in
|
||||
`middle::infer::higher_ranked::doc`, but also in a [paper by SPJ] that
|
||||
I recommend you read).
|
||||
|
||||
1. Skolemize the obligation.
|
||||
2. Match the impl against the skolemized obligation.
|
||||
3. Check for skolemization leaks.
|
||||
|
||||
[paper by SPJ]: http://research.microsoft.com/en-us/um/people/simonpj/papers/higher-rank/
|
||||
|
||||
So let's work through our example. The first thing we would do is to
|
||||
skolemize the obligation, yielding `AnyInt : Foo<&'0 isize>` (here `'0`
|
||||
represents skolemized region #0). Note that now have no quantifiers;
|
||||
in terms of the compiler type, this changes from a `ty::PolyTraitRef`
|
||||
to a `TraitRef`. We would then create the `TraitRef` from the impl,
|
||||
using fresh variables for it's bound regions (and thus getting
|
||||
`Foo<&'$a isize>`, where `'$a` is the inference variable for `'a`). Next
|
||||
we relate the two trait refs, yielding a graph with the constraint
|
||||
that `'0 == '$a`. Finally, we check for skolemization "leaks" -- a
|
||||
leak is basically any attempt to relate a skolemized region to another
|
||||
skolemized region, or to any region that pre-existed the impl match.
|
||||
The leak check is done by searching from the skolemized region to find
|
||||
the set of regions that it is related to in any way. This is called
|
||||
the "taint" set. To pass the check, that set must consist *solely* of
|
||||
itself and region variables from the impl. If the taint set includes
|
||||
any other region, then the match is a failure. In this case, the taint
|
||||
set for `'0` is `{'0, '$a}`, and hence the check will succeed.
|
||||
|
||||
Let's consider a failure case. Imagine we also have a struct
|
||||
|
||||
```rust
|
||||
struct StaticInt;
|
||||
impl Foo<&'static isize> for StaticInt;
|
||||
```
|
||||
|
||||
We want the obligation `StaticInt : for<'a> Foo<&'a isize>` to be
|
||||
considered unsatisfied. The check begins just as before. `'a` is
|
||||
skolemized to `'0` and the impl trait reference is instantiated to
|
||||
`Foo<&'static isize>`. When we relate those two, we get a constraint
|
||||
like `'static == '0`. This means that the taint set for `'0` is `{'0,
|
||||
'static}`, which fails the leak check.
|
||||
|
||||
## Higher-ranked trait obligations
|
||||
|
||||
Once the basic matching is done, we get to another interesting topic:
|
||||
how to deal with impl obligations. I'll work through a simple example
|
||||
here. Imagine we have the traits `Foo` and `Bar` and an associated impl:
|
||||
|
||||
```rust
|
||||
trait Foo<X> {
|
||||
fn foo(&self, x: X) { }
|
||||
}
|
||||
|
||||
trait Bar<X> {
|
||||
fn bar(&self, x: X) { }
|
||||
}
|
||||
|
||||
impl<X,F> Foo<X> for F
|
||||
where F : Bar<X>
|
||||
{
|
||||
}
|
||||
```
|
||||
|
||||
Now let's say we have a obligation `for<'a> Foo<&'a isize>` and we match
|
||||
this impl. What obligation is generated as a result? We want to get
|
||||
`for<'a> Bar<&'a isize>`, but how does that happen?
|
||||
|
||||
After the matching, we are in a position where we have a skolemized
|
||||
substitution like `X => &'0 isize`. If we apply this substitution to the
|
||||
impl obligations, we get `F : Bar<&'0 isize>`. Obviously this is not
|
||||
directly usable because the skolemized region `'0` cannot leak out of
|
||||
our computation.
|
||||
|
||||
What we do is to create an inverse mapping from the taint set of `'0`
|
||||
back to the original bound region (`'a`, here) that `'0` resulted
|
||||
from. (This is done in `higher_ranked::plug_leaks`). We know that the
|
||||
leak check passed, so this taint set consists solely of the skolemized
|
||||
region itself plus various intermediate region variables. We then walk
|
||||
the trait-reference and convert every region in that taint set back to
|
||||
a late-bound region, so in this case we'd wind up with `for<'a> F :
|
||||
Bar<&'a isize>`.
|
||||
|
||||
# Caching and subtle considerations therewith
|
||||
|
||||
In general we attempt to cache the results of trait selection. This
|
||||
is a somewhat complex process. Part of the reason for this is that we
|
||||
want to be able to cache results even when all the types in the trait
|
||||
reference are not fully known. In that case, it may happen that the
|
||||
trait selection process is also influencing type variables, so we have
|
||||
to be able to not only cache the *result* of the selection process,
|
||||
but *replay* its effects on the type variables.
|
||||
|
||||
## An example
|
||||
|
||||
The high-level idea of how the cache works is that we first replace
|
||||
all unbound inference variables with skolemized versions. Therefore,
|
||||
if we had a trait reference `usize : Foo<$1>`, where `$n` is an unbound
|
||||
inference variable, we might replace it with `usize : Foo<%0>`, where
|
||||
`%n` is a skolemized type. We would then look this up in the cache.
|
||||
If we found a hit, the hit would tell us the immediate next step to
|
||||
take in the selection process: i.e., apply impl #22, or apply where
|
||||
clause `X : Foo<Y>`. Let's say in this case there is no hit.
|
||||
Therefore, we search through impls and where clauses and so forth, and
|
||||
we come to the conclusion that the only possible impl is this one,
|
||||
with def-id 22:
|
||||
|
||||
```rust
|
||||
impl Foo<isize> for usize { ... } // Impl #22
|
||||
```
|
||||
|
||||
We would then record in the cache `usize : Foo<%0> ==>
|
||||
ImplCandidate(22)`. Next we would confirm `ImplCandidate(22)`, which
|
||||
would (as a side-effect) unify `$1` with `isize`.
|
||||
|
||||
Now, at some later time, we might come along and see a `usize :
|
||||
Foo<$3>`. When skolemized, this would yield `usize : Foo<%0>`, just as
|
||||
before, and hence the cache lookup would succeed, yielding
|
||||
`ImplCandidate(22)`. We would confirm `ImplCandidate(22)` which would
|
||||
(as a side-effect) unify `$3` with `isize`.
|
||||
|
||||
## Where clauses and the local vs global cache
|
||||
|
||||
One subtle interaction is that the results of trait lookup will vary
|
||||
depending on what where clauses are in scope. Therefore, we actually
|
||||
have *two* caches, a local and a global cache. The local cache is
|
||||
attached to the `ParamEnv` and the global cache attached to the
|
||||
`tcx`. We use the local cache whenever the result might depend on the
|
||||
where clauses that are in scope. The determination of which cache to
|
||||
use is done by the method `pick_candidate_cache` in `select.rs`. At
|
||||
the moment, we use a very simple, conservative rule: if there are any
|
||||
where-clauses in scope, then we use the local cache. We used to try
|
||||
and draw finer-grained distinctions, but that led to a serious of
|
||||
annoying and weird bugs like #22019 and #18290. This simple rule seems
|
||||
to be pretty clearly safe and also still retains a very high hit rate
|
||||
(~95% when compiling rustc).
|
||||
|
||||
# Specialization
|
||||
|
||||
Defined in the `specialize` module.
|
||||
|
||||
The basic strategy is to build up a *specialization graph* during
|
||||
coherence checking. Insertion into the graph locates the right place
|
||||
to put an impl in the specialization hierarchy; if there is no right
|
||||
place (due to partial overlap but no containment), you get an overlap
|
||||
error. Specialization is consulted when selecting an impl (of course),
|
||||
and the graph is consulted when propagating defaults down the
|
||||
specialization hierarchy.
|
||||
|
||||
You might expect that the specialization graph would be used during
|
||||
selection -- i.e., when actually performing specialization. This is
|
||||
not done for two reasons:
|
||||
|
||||
- It's merely an optimization: given a set of candidates that apply,
|
||||
we can determine the most specialized one by comparing them directly
|
||||
for specialization, rather than consulting the graph. Given that we
|
||||
also cache the results of selection, the benefit of this
|
||||
optimization is questionable.
|
||||
|
||||
- To build the specialization graph in the first place, we need to use
|
||||
selection (because we need to determine whether one impl specializes
|
||||
another). Dealing with this reentrancy would require some additional
|
||||
mode switch for selection. Given that there seems to be no strong
|
||||
reason to use the graph anyway, we stick with a simpler approach in
|
||||
selection, and use the graph only for propagating default
|
||||
implementations.
|
||||
|
||||
Trait impl selection can succeed even when multiple impls can apply,
|
||||
as long as they are part of the same specialization family. In that
|
||||
case, it returns a *single* impl on success -- this is the most
|
||||
specialized impl *known* to apply. However, if there are any inference
|
||||
variables in play, the returned impl may not be the actual impl we
|
||||
will use at trans time. Thus, we take special care to avoid projecting
|
||||
associated types unless either (1) the associated type does not use
|
||||
`default` and thus cannot be overridden or (2) all input types are
|
||||
known concretely.
|
|
@ -1,165 +0,0 @@
|
|||
# Types and the Type Context
|
||||
|
||||
The `ty` module defines how the Rust compiler represents types
|
||||
internally. It also defines the *typing context* (`tcx` or `TyCtxt`),
|
||||
which is the central data structure in the compiler.
|
||||
|
||||
## The tcx and how it uses lifetimes
|
||||
|
||||
The `tcx` ("typing context") is the central data structure in the
|
||||
compiler. It is the context that you use to perform all manner of
|
||||
queries. The struct `TyCtxt` defines a reference to this shared context:
|
||||
|
||||
```rust
|
||||
tcx: TyCtxt<'a, 'gcx, 'tcx>
|
||||
// -- ---- ----
|
||||
// | | |
|
||||
// | | innermost arena lifetime (if any)
|
||||
// | "global arena" lifetime
|
||||
// lifetime of this reference
|
||||
```
|
||||
|
||||
As you can see, the `TyCtxt` type takes three lifetime parameters.
|
||||
These lifetimes are perhaps the most complex thing to understand about
|
||||
the tcx. During Rust compilation, we allocate most of our memory in
|
||||
**arenas**, which are basically pools of memory that get freed all at
|
||||
once. When you see a reference with a lifetime like `'tcx` or `'gcx`,
|
||||
you know that it refers to arena-allocated data (or data that lives as
|
||||
long as the arenas, anyhow).
|
||||
|
||||
We use two distinct levels of arenas. The outer level is the "global
|
||||
arena". This arena lasts for the entire compilation: so anything you
|
||||
allocate in there is only freed once compilation is basically over
|
||||
(actually, when we shift to executing LLVM).
|
||||
|
||||
To reduce peak memory usage, when we do type inference, we also use an
|
||||
inner level of arena. These arenas get thrown away once type inference
|
||||
is over. This is done because type inference generates a lot of
|
||||
"throw-away" types that are not particularly interesting after type
|
||||
inference completes, so keeping around those allocations would be
|
||||
wasteful.
|
||||
|
||||
Often, we wish to write code that explicitly asserts that it is not
|
||||
taking place during inference. In that case, there is no "local"
|
||||
arena, and all the types that you can access are allocated in the
|
||||
global arena. To express this, the idea is to use the same lifetime
|
||||
for the `'gcx` and `'tcx` parameters of `TyCtxt`. Just to be a touch
|
||||
confusing, we tend to use the name `'tcx` in such contexts. Here is an
|
||||
example:
|
||||
|
||||
```rust
|
||||
fn not_in_inference<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>, def_id: DefId) {
|
||||
// ---- ----
|
||||
// Using the same lifetime here asserts
|
||||
// that the innermost arena accessible through
|
||||
// this reference *is* the global arena.
|
||||
}
|
||||
```
|
||||
|
||||
In contrast, if we want to code that can be usable during type inference, then you
|
||||
need to declare a distinct `'gcx` and `'tcx` lifetime parameter:
|
||||
|
||||
```rust
|
||||
fn maybe_in_inference<'a, 'gcx, 'tcx>(tcx: TyCtxt<'a, 'gcx, 'tcx>, def_id: DefId) {
|
||||
// ---- ----
|
||||
// Using different lifetimes here means that
|
||||
// the innermost arena *may* be distinct
|
||||
// from the global arena (but doesn't have to be).
|
||||
}
|
||||
```
|
||||
|
||||
### Allocating and working with types
|
||||
|
||||
Rust types are represented using the `Ty<'tcx>` defined in the `ty`
|
||||
module (not to be confused with the `Ty` struct from [the HIR]). This
|
||||
is in fact a simple type alias for a reference with `'tcx` lifetime:
|
||||
|
||||
```rust
|
||||
pub type Ty<'tcx> = &'tcx TyS<'tcx>;
|
||||
```
|
||||
|
||||
[the HIR]: ../hir/README.md
|
||||
|
||||
You can basically ignore the `TyS` struct -- you will basically never
|
||||
access it explicitly. We always pass it by reference using the
|
||||
`Ty<'tcx>` alias -- the only exception I think is to define inherent
|
||||
methods on types. Instances of `TyS` are only ever allocated in one of
|
||||
the rustc arenas (never e.g. on the stack).
|
||||
|
||||
One common operation on types is to **match** and see what kinds of
|
||||
types they are. This is done by doing `match ty.sty`, sort of like this:
|
||||
|
||||
```rust
|
||||
fn test_type<'tcx>(ty: Ty<'tcx>) {
|
||||
match ty.sty {
|
||||
ty::TyArray(elem_ty, len) => { ... }
|
||||
...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The `sty` field (the origin of this name is unclear to me; perhaps
|
||||
structural type?) is of type `TypeVariants<'tcx>`, which is an enum
|
||||
defining all of the different kinds of types in the compiler.
|
||||
|
||||
> NB: inspecting the `sty` field on types during type inference can be
|
||||
> risky, as there may be inference variables and other things to
|
||||
> consider, or sometimes types are not yet known that will become
|
||||
> known later.).
|
||||
|
||||
To allocate a new type, you can use the various `mk_` methods defined
|
||||
on the `tcx`. These have names that correpond mostly to the various kinds
|
||||
of type variants. For example:
|
||||
|
||||
```rust
|
||||
let array_ty = tcx.mk_array(elem_ty, len * 2);
|
||||
```
|
||||
|
||||
These methods all return a `Ty<'tcx>` -- note that the lifetime you
|
||||
get back is the lifetime of the innermost arena that this `tcx` has
|
||||
access to. In fact, types are always canonicalized and interned (so we
|
||||
never allocate exactly the same type twice) and are always allocated
|
||||
in the outermost arena where they can be (so, if they do not contain
|
||||
any inference variables or other "temporary" types, they will be
|
||||
allocated in the global arena). However, the lifetime `'tcx` is always
|
||||
a safe approximation, so that is what you get back.
|
||||
|
||||
> NB. Because types are interned, it is possible to compare them for
|
||||
> equality efficiently using `==` -- however, this is almost never what
|
||||
> you want to do unless you happen to be hashing and looking for
|
||||
> duplicates. This is because often in Rust there are multiple ways to
|
||||
> represent the same type, particularly once inference is involved. If
|
||||
> you are going to be testing for type equality, you probably need to
|
||||
> start looking into the inference code to do it right.
|
||||
|
||||
You can also find various common types in the `tcx` itself by accessing
|
||||
`tcx.types.bool`, `tcx.types.char`, etc (see `CommonTypes` for more).
|
||||
|
||||
### Beyond types: Other kinds of arena-allocated data structures
|
||||
|
||||
In addition to types, there are a number of other arena-allocated data
|
||||
structures that you can allocate, and which are found in this
|
||||
module. Here are a few examples:
|
||||
|
||||
- `Substs`, allocated with `mk_substs` -- this will intern a slice of types, often used to
|
||||
specify the values to be substituted for generics (e.g., `HashMap<i32, u32>`
|
||||
would be represented as a slice `&'tcx [tcx.types.i32, tcx.types.u32]`).
|
||||
- `TraitRef`, typically passed by value -- a **trait reference**
|
||||
consists of a reference to a trait along with its various type
|
||||
parameters (including `Self`), like `i32: Display` (here, the def-id
|
||||
would reference the `Display` trait, and the substs would contain
|
||||
`i32`).
|
||||
- `Predicate` defines something the trait system has to prove (see `traits` module).
|
||||
|
||||
### Import conventions
|
||||
|
||||
Although there is no hard and fast rule, the `ty` module tends to be used like so:
|
||||
|
||||
```rust
|
||||
use ty::{self, Ty, TyCtxt};
|
||||
```
|
||||
|
||||
In particular, since they are so common, the `Ty` and `TyCtxt` types
|
||||
are imported directly. Other types are often referenced with an
|
||||
explicit `ty::` prefix (e.g., `ty::TraitRef<'tcx>`). But some modules
|
||||
choose to import a larger or smaller set of names explicitly.
|
Loading…
Add table
Reference in a new issue