Skip to content

start to document MIR borrow check #190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 11, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ cache:
- cargo
before_install:
- shopt -s globstar
- MAX_LINE_LENGTH=80 bash ci/check_line_lengths.sh src/**/*.md
- MAX_LINE_LENGTH=100 bash ci/check_line_lengths.sh src/**/*.md
install:
- source ~/.cargo/env || true
- bash ci/install.sh
Expand Down
2 changes: 1 addition & 1 deletion ci/check_line_lengths.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

if [ "$1" == "--help" ]; then
echo 'Usage:'
echo ' MAX_LINE_LENGTH=80' "$0" 'src/**/*.md'
echo ' MAX_LINE_LENGTH=100' "$0" 'src/**/*.md'
exit 1
fi

Expand Down
7 changes: 5 additions & 2 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,12 @@
- [MIR construction](./mir/construction.md)
- [MIR visitor and traversal](./mir/visitor.md)
- [MIR passes: getting the MIR for a function](./mir/passes.md)
- [MIR borrowck](./mir/borrowck.md)
- [MIR-based region checking (NLL)](./mir/regionck.md)
- [MIR optimizations](./mir/optimizations.md)
- [The borrow checker](./borrow_check.md)
- [Tracking moves and initialization](./borrow_check/moves_and_initialization.md)
- [Move paths](./borrow_check/moves_and_initialization/move_paths.md)
- [MIR type checker](./borrow_check/type_check.md)
- [Region inference](./borrow_check/region_inference.md)
- [Constant evaluation](./const-eval.md)
- [miri const evaluator](./miri.md)
- [Parameter Environments](./param_env.md)
Expand Down
4 changes: 2 additions & 2 deletions src/appendix/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ MIR | the Mid-level IR that is created after type-checking
miri | an interpreter for MIR used for constant evaluation ([see more](./miri.html))
normalize | a general term for converting to a more canonical form, but in the case of rustc typically refers to [associated type normalization](./traits/associated-types.html#normalize)
newtype | a "newtype" is a wrapper around some other type (e.g., `struct Foo(T)` is a "newtype" for `T`). This is commonly used in Rust to give a stronger type for indices.
NLL | [non-lexical lifetimes](./mir/regionck.html), an extension to Rust's borrowing system to make it be based on the control-flow graph.
NLL | [non-lexical lifetimes](./borrow_check/region_inference.html), an extension to Rust's borrowing system to make it be based on the control-flow graph.
node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.
obligation | something that must be proven by the trait system ([see more](traits/resolution.html))
projection | a general term for a "relative path", e.g. `x.f` is a "field projection", and `T::Item` is an ["associated type projection"](./traits/goals-and-clauses.html#trait-ref)
Expand All @@ -53,7 +53,7 @@ rib | a data structure in the name resolver that keeps trac
sess | the compiler session, which stores global data used throughout compilation
side tables | because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node.
sigil | like a keyword but composed entirely of non-alphanumeric tokens. For example, `&` is a sigil for references.
skolemization | a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on skolemization and universes](./mir/regionck.html#skol) for more details.
skolemization | a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on skolemization and universes](./borrow_check/region_inference.html#skol) for more details.
soundness | soundness is a technical term in type theory. Roughly, if a type system is sound, then if a program type-checks, it is type-safe; i.e. I can never (in safe rust) force a value into a variable of the wrong type. (see "completeness").
span | a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more.
substs | the substitutions for a given generic type or item (e.g. the `i32`, `u32` in `HashMap<i32, u32>`)
Expand Down
63 changes: 63 additions & 0 deletions src/borrow_check.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# MIR borrow check

The borrow check is Rust's "secret sauce" – it is tasked with
enforcing a number of properties:

- That all variables are initialized before they are used.
- That you can't move the same value twice.
- That you can't move a value while it is borrowed.
- That you can't access a place while it is mutably borrowed (except through
the reference).
- That you can't mutate a place while it is shared borrowed.
- etc

At the time of this writing, the code is in a state of transition. The
"main" borrow checker still works by processing [the HIR](hir.html),
but that is being phased out in favor of the MIR-based borrow checker.
Accordingly, this documentation focuses on the new, MIR-based borrow
checker.

Doing borrow checking on MIR has several advantages:

- The MIR is *far* less complex than the HIR; the radical desugaring
helps prevent bugs in the borrow checker. (If you're curious, you
can see
[a list of bugs that the MIR-based borrow checker fixes here][47366].)
- Even more importantly, using the MIR enables ["non-lexical lifetimes"][nll],
which are regions derived from the control-flow graph.

[47366]: https://github.com/rust-lang/rust/issues/47366
[nll]: http://rust-lang.github.io/rfcs/2094-nll.html

### Major phases of the borrow checker

The borrow checker source is found in
[the `rustc_mir::borrow_check` module][b_c]. The main entry point is
the [`mir_borrowck`] query.

[b_c]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html
[`mir_borrowck`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/fn.mir_borrowck.html

- We first create a **local copy** of the MIR. In the coming steps,
we will modify this copy in place to modify the types and things to
include references to the new regions that we are computing.
- We then invoke [`replace_regions_in_mir`] to modify our local MIR.
Among other things, this function will replace all of the [regions](./appendix/glossary.html) in
the MIR with fresh [inference variables](./appendix/glossary.html).
- Next, we perform a number of
[dataflow analyses](./appendix/background.html#dataflow) that
compute what data is moved and when.
- We then do a [second type check](borrow_check/type_check.html) across the MIR:
the purpose of this type check is to determine all of the constraints between
different regions.
- Next, we do [region inference](borrow_check/region_inference.html), which computes
the values of each region — basically, points in the control-flow graph.
- At this point, we can compute the "borrows in scope" at each point.
- Finally, we do a second walk over the MIR, looking at the actions it
does and reporting errors. For example, if we see a statement like
`*a + 1`, then we would check that the variable `a` is initialized
and that it is not mutably borrowed, as either of those would
require an error to be reported.
- Doing this check requires the results of all the previous analyses.

[`replace_regions_in_mir`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/fn.replace_regions_in_mir.html
50 changes: 50 additions & 0 deletions src/borrow_check/moves_and_initialization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Tracking moves and initialization

Part of the borrow checker's job is to track which variables are
"initialized" at any given point in time -- this also requires
figuring out where moves occur and tracking those.

## Initialization and moves

From a user's perspective, initialization -- giving a variable some
value -- and moves -- transfering ownership to another place -- might
seem like distinct topics. Indeed, our borrow checker error messages
often talk about them differently. But **within the borrow checker**,
they are not nearly as separate. Roughly speaking, the borrow checker
tracks the set of "initialized places" at any point in the source
code. Assigning to a previously uninitialized local variable adds it
to that set; moving from a local variable removes it from that set.

Consider this example:

```rust,ignore
fn foo() {
let a: Vec<u32>;

// a is not initialized yet

a = vec![22];

// a is initialized here

std::mem::drop(a); // a is moved here

// a is no longer initialized here

let l = a.len(); //~ ERROR
}
```

Here you can see that `a` starts off as uninitialized; once it is
assigned, it becomes initialized. But when `drop(a)` is called, that
moves `a` into the call, and hence it becomes uninitialized again.

## Subsections

To make it easier to peruse, this section is broken into a number of
subsections:

- [Move paths](./moves_and_initialization/move_paths.html the
*move path* concept that we use to track which local variables (or parts of
local variables, in some cases) are initialized.
- TODO *Rest not yet written* =)
128 changes: 128 additions & 0 deletions src/borrow_check/moves_and_initialization/move_paths.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Move paths

In reality, it's not enough to track initialization at the granularity
of local variables. Rust also allows us to do moves and initialization
at the field granularity:

```rust,ignore
fn foo() {
let a: (Vec<u32>, Vec<u32>) = (vec![22], vec![44]);

// a.0 and a.1 are both initialized

let b = a.0; // moves a.0

// a.0 is not initializd, but a.1 still is

let c = a.0; // ERROR
let d = a.1; // OK
}
```

To handle this, we track initialization at the granularity of a **move
path**. A [`MovePath`] represents some location that the user can
initialize, move, etc. So e.g. there is a move-path representing the
local variable `a`, and there is a move-path representing `a.0`. Move
paths roughly correspond to the concept of a [`Place`] from MIR, but
they are indexed in ways that enable us to do move analysis more
efficiently.

[`MovePath`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePath.html
[`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/enum.Place.html

## Move path indices

Although there is a [`MovePath`] data structure, they are never
referenced directly. Instead, all the code passes around *indices* of
type
[`MovePathIndex`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/indexes/struct.MovePathIndex.html). If
you need to get information about a move path, you use this index with
the [`move_paths` field of the `MoveData`][move_paths]. For example,
to convert a [`MovePathIndex`] `mpi` into a MIR [`Place`], you might
access the [`MovePath::place`] field like so:

```rust,ignore
move_data.move_paths[mpi].place
```

[move_paths]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#structfield.move_paths
[`MovePath::place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePath.html#structfield.place

## Building move paths

One of the first things we do in the MIR borrow check is to construct
the set of move paths. This is done as part of the
[`MoveData::gather_moves`] function. This function uses a MIR visitor
called [`Gatherer`] to walk the MIR and look at how each [`Place`]
within is accessed. For each such [`Place`], it constructs a
corresponding [`MovePathIndex`]. It also records when/where that
particular move path is moved/initialized, but we'll get to that in a
later section.

[`Gatherer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/builder/struct.Gatherer.html
[`MoveData::gather_moves`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#method.gather_moves

### Illegal move paths

We don't actually create a move-path for **every** [`Place`] that gets
used. In particular, if it is illegal to move from a [`Place`], then
there is no need for a [`MovePathIndex`]. Some examples:

- You cannot move from a static variable, so we do not create a [`MovePathIndex`]
for static variables.
- You cannot move an individual element of an array, so if we have e.g. `foo: [String; 3]`,
there would be no move-path for `foo[1]`.
- You cannot move from inside of a borrowed reference, so if we have e.g. `foo: &String`,
there would be no move-path for `*foo`.

These rules are enforced by the [`move_path_for`] function, which
converts a [`Place`] into a [`MovePathIndex`] -- in error cases like
those just discussed, the function returns an `Err`. This in turn
means we don't have to bother tracking whether those places are
initialized (which lowers overhead).

[`move_path_for`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/builder/struct.Gatherer.html#method.move_path_for

## Looking up a move-path

If you have a [`Place`] and you would like to convert it to a [`MovePathIndex`], you
can do that using the [`MovePathLookup`] structure found in the [`rev_lookup`] field
of [`MoveData`]. There are two different methods:

[`MovePathLookup`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html
[`rev_lookup`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#structfield.rev_lookup

- [`find_local`], which takes a [`mir::Local`] representing a local
variable. This is the easier method, because we **always** create a
[`MovePathIndex`] for every local variable.
- [`find`], which takes an arbitrary [`Place`]. This method is a bit
more annoying to use, precisely because we don't have a
[`MovePathIndex`] for **every** [`Place`] (as we just discussed in
the "illegal move paths" section). Therefore, [`find`] returns a
[`LookupResult`] indicating the closest path it was able to find
that exists (e.g., for `foo[1]`, it might return just the path for
`foo`).

[`find`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html#method.find
[`find_local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html#method.find_local
[`mir::Local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/struct.Local.html
[`LookupResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/enum.LookupResult.html

## Cross-references

As we noted above, move-paths are stored in a big vector and
referenced via their [`MovePathIndex`]. However, within this vector,
they are also structured into a tree. So for example if you have the
[`MovePathIndex`] for `a.b.c`, you can go to its parent move-path
`a.b`. You can also iterate over all children paths: so, from `a.b`,
you might iterate to find the path `a.b.c` (here you are iterating
just over the paths that are **actually referenced** in the source,
not all **possible** paths that could have been referenced). These
references are used for example in the [`has_any_child_of`] function,
which checks whether the dataflow results contain a value for the
given move-path (e.g., `a.b`) or any child of that move-path (e.g.,
`a.b.c`).

[`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/enum.Place.html
[`has_any_child_of`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/at_location/struct.FlowAtLocation.html#method.has_any_child_of

4 changes: 2 additions & 2 deletions src/mir/regionck.md → src/borrow_check/region_inference.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# MIR-based region checking (NLL)
# Region inference (NLL)

The MIR-based region checking code is located in
[the `rustc_mir::borrow_check::nll` module][nll]. (NLL, of course,
stands for "non-lexical lifetimes", a term that will hopefully be
deprecated once they become the standard kind of lifetime.)

[nll]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check/nll
[nll]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/index.html

The MIR-based region analysis consists of two major functions:

Expand Down
10 changes: 10 additions & 0 deletions src/borrow_check/type_check.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# The MIR type-check

A key component of the borrow check is the
[MIR type-check](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/type_check/index.html).
This check walks the MIR and does a complete "type check" -- the same
kind you might find in any other language. In the process of doing
this type-check, we also uncover the region constraints that apply to
the program.

TODO -- elaborate further? Maybe? :)
59 changes: 0 additions & 59 deletions src/mir/borrowck.md

This file was deleted.