Auto merge of #134268 - lqd:polonius-next, r=jackh726

Foundations of location-sensitive polonius

I'd like to land the prototype I'm describing in the [polonius project goal](https://github.com/rust-lang/rust-project-goals/issues/118). It still is incomplete and naive and terrible but it's working "well enough" to consider landing.

I'd also like to make review easier by not opening a huge PR, but have a couple small-ish ones (the +/- line change summary of this PR looks big, but >80% is moving datalog to a single place).

This PR starts laying the foundation for that work:
- it refactors and collects 99% of the old datalog fact gen, which was spread around everywhere, into a single dedicated module. It's still present at 3 small places (one of which we should revert anyways) that are kinda deep within localized components and are not as easily extractable into the rest of fact gen, so it's fine for now.
- starts introducing the localized constraints, the building blocks of the naive way of implementing the location-sensitive analysis in-tree, which is roughly sketched out in https://smallcultfollowing.com/babysteps/blog/2023/09/22/polonius-part-1/ and https://smallcultfollowing.com/babysteps/blog/2023/09/29/polonius-part-2/ but with a different vibe than per-point environments described in these posts, just `r1@p: r2@q` constraints.
- sets up the skeleton of generating these localized constraints: converting NLL typeck constraints, and creating liveness constraints
- introduces the polonius dual to NLL MIR to help development and debugging. It doesn't do much currently but is a way to see these localized constraints: it's an NLL MIR dump + a dumb listing of the constraints, that can be dumped with `-Zdump-mir=polonius -Zpolonius=next`. Its current state is not intended to be a long-term thing, just for testing purposes -- I will replace its contents in the future with a different approach (an HTML+js file where we can more easily explore/filter/trace these constraints and loan reachability, have mermaid graphs of the usual graphviz dumps, etc).

I've started documenting the approach in this PR, I'll add more in the future. It's quite simple, and should be very clear when more constraints are introduced anyways.

r? `@matthewjasper`

Best reviewed per commit so that the datalog move is less bothersome to read, but if you'd prefer we separate that into a different PR, I can do that (and michael has offered to review these more mechanical changes if it'd help).
This commit is contained in:
bors 2024-12-21 21:15:31 +00:00
commit 426d173423
8 changed files with 409 additions and 34 deletions

View file

@ -202,6 +202,7 @@ fn do_mir_borrowck<'tcx>(
polonius_output,
opt_closure_req,
nll_errors,
localized_outlives_constraints,
} = nll::compute_regions(
&infcx,
free_regions,
@ -315,6 +316,16 @@ fn do_mir_borrowck<'tcx>(
mbcx.report_move_errors();
// If requested, dump polonius MIR.
polonius::dump_polonius_mir(
&infcx,
body,
&regioncx,
&borrow_set,
localized_outlives_constraints,
&opt_closure_req,
);
// For each non-user used mutable variable, check if it's been assigned from
// a user-declared local. If so, then put that local into the used_mut set.
// Note that this set is expected to be small - only upvars from closures

View file

@ -29,6 +29,7 @@ use crate::consumers::ConsumerOptions;
use crate::diagnostics::RegionErrors;
use crate::facts::{AllFacts, AllFactsExt, RustcFacts};
use crate::location::LocationTable;
use crate::polonius::LocalizedOutlivesConstraintSet;
use crate::region_infer::RegionInferenceContext;
use crate::type_check::{self, MirTypeckResults};
use crate::universal_regions::UniversalRegions;
@ -45,6 +46,9 @@ pub(crate) struct NllOutput<'tcx> {
pub polonius_output: Option<Box<PoloniusOutput>>,
pub opt_closure_req: Option<ClosureRegionRequirements<'tcx>>,
pub nll_errors: RegionErrors<'tcx>,
/// When using `-Zpolonius=next`: the localized typeck and liveness constraints.
pub localized_outlives_constraints: Option<LocalizedOutlivesConstraintSet>,
}
/// Rewrites the regions in the MIR to use NLL variables, also scraping out the set of universal
@ -135,6 +139,15 @@ pub(crate) fn compute_regions<'a, 'tcx>(
elements,
);
// If requested for `-Zpolonius=next`, convert NLL constraints to localized outlives
// constraints.
let localized_outlives_constraints =
if infcx.tcx.sess.opts.unstable_opts.polonius.is_next_enabled() {
Some(polonius::create_localized_constraints(&mut regioncx, body))
} else {
None
};
// If requested: dump NLL facts, and run legacy polonius analysis.
let polonius_output = all_facts.as_ref().and_then(|all_facts| {
if infcx.tcx.sess.opts.unstable_opts.nll_facts {
@ -175,6 +188,7 @@ pub(crate) fn compute_regions<'a, 'tcx>(
polonius_output,
opt_closure_req: closure_region_requirements,
nll_errors,
localized_outlives_constraints,
}
}
@ -215,40 +229,7 @@ pub(super) fn dump_nll_mir<'tcx>(
&0,
body,
|pass_where, out| {
match pass_where {
// Before the CFG, dump out the values for each region variable.
PassWhere::BeforeCFG => {
regioncx.dump_mir(tcx, out)?;
writeln!(out, "|")?;
if let Some(closure_region_requirements) = closure_region_requirements {
writeln!(out, "| Free Region Constraints")?;
for_each_region_constraint(tcx, closure_region_requirements, &mut |msg| {
writeln!(out, "| {msg}")
})?;
writeln!(out, "|")?;
}
if borrow_set.len() > 0 {
writeln!(out, "| Borrows")?;
for (borrow_idx, borrow_data) in borrow_set.iter_enumerated() {
writeln!(
out,
"| {:?}: issued at {:?} in {:?}",
borrow_idx, borrow_data.reserve_location, borrow_data.region
)?;
}
writeln!(out, "|")?;
}
}
PassWhere::BeforeLocation(_) => {}
PassWhere::AfterTerminator(_) => {}
PassWhere::BeforeBlock(_) | PassWhere::AfterLocation(_) | PassWhere::AfterCFG => {}
}
Ok(())
emit_nll_mir(tcx, regioncx, closure_region_requirements, borrow_set, pass_where, out)
},
options,
);
@ -266,6 +247,51 @@ pub(super) fn dump_nll_mir<'tcx>(
};
}
/// Produces the actual NLL MIR sections to emit during the dumping process.
pub(crate) fn emit_nll_mir<'tcx>(
tcx: TyCtxt<'tcx>,
regioncx: &RegionInferenceContext<'tcx>,
closure_region_requirements: &Option<ClosureRegionRequirements<'tcx>>,
borrow_set: &BorrowSet<'tcx>,
pass_where: PassWhere,
out: &mut dyn io::Write,
) -> io::Result<()> {
match pass_where {
// Before the CFG, dump out the values for each region variable.
PassWhere::BeforeCFG => {
regioncx.dump_mir(tcx, out)?;
writeln!(out, "|")?;
if let Some(closure_region_requirements) = closure_region_requirements {
writeln!(out, "| Free Region Constraints")?;
for_each_region_constraint(tcx, closure_region_requirements, &mut |msg| {
writeln!(out, "| {msg}")
})?;
writeln!(out, "|")?;
}
if borrow_set.len() > 0 {
writeln!(out, "| Borrows")?;
for (borrow_idx, borrow_data) in borrow_set.iter_enumerated() {
writeln!(
out,
"| {:?}: issued at {:?} in {:?}",
borrow_idx, borrow_data.reserve_location, borrow_data.region
)?;
}
writeln!(out, "|")?;
}
}
PassWhere::BeforeLocation(_) => {}
PassWhere::AfterTerminator(_) => {}
PassWhere::BeforeBlock(_) | PassWhere::AfterLocation(_) | PassWhere::AfterCFG => {}
}
Ok(())
}
#[allow(rustc::diagnostic_outside_of_impl)]
#[allow(rustc::untranslatable_diagnostic)]
pub(super) fn dump_annotation<'tcx, 'infcx>(

View file

@ -0,0 +1,45 @@
use rustc_middle::ty::RegionVid;
use rustc_mir_dataflow::points::PointIndex;
/// A localized outlives constraint reifies the CFG location where the outlives constraint holds,
/// within the origins themselves as if they were different from point to point: from `a: b`
/// outlives constraints to `a@p: b@p`, where `p` is the point in the CFG.
///
/// This models two sources of constraints:
/// - constraints that traverse the subsets between regions at a given point, `a@p: b@p`. These
/// depend on typeck constraints generated via assignments, calls, etc. (In practice there are
/// subtleties where a statement's effect only starts being visible at the successor point, via
/// the "result" of that statement).
/// - constraints that traverse the CFG via the same region, `a@p: a@q`, where `p` is a predecessor
/// of `q`. These depend on the liveness of the regions at these points, as well as their
/// variance.
///
/// The `source` origin at `from` flows into the `target` origin at `to`.
///
/// This dual of NLL's [crate::constraints::OutlivesConstraint] therefore encodes the
/// position-dependent outlives constraints used by Polonius, to model the flow-sensitive loan
/// propagation via reachability within a graph of localized constraints.
#[derive(Copy, Clone, PartialEq, Eq, Debug, Hash)]
pub(crate) struct LocalizedOutlivesConstraint {
pub source: RegionVid,
pub from: PointIndex,
pub target: RegionVid,
pub to: PointIndex,
}
/// A container of [LocalizedOutlivesConstraint]s that can be turned into a traversable
/// `rustc_data_structures` graph.
#[derive(Clone, Default, Debug)]
pub(crate) struct LocalizedOutlivesConstraintSet {
pub outlives: Vec<LocalizedOutlivesConstraint>,
}
impl LocalizedOutlivesConstraintSet {
pub(crate) fn push(&mut self, constraint: LocalizedOutlivesConstraint) {
if constraint.source == constraint.target && constraint.from == constraint.to {
// 'a@p: 'a@p is pretty uninteresting
return;
}
self.outlives.push(constraint);
}
}

View file

@ -0,0 +1,104 @@
use std::io;
use rustc_middle::mir::pretty::{PrettyPrintMirOptions, dump_mir_with_options};
use rustc_middle::mir::{Body, ClosureRegionRequirements, PassWhere};
use rustc_middle::ty::TyCtxt;
use rustc_session::config::MirIncludeSpans;
use crate::borrow_set::BorrowSet;
use crate::polonius::{LocalizedOutlivesConstraint, LocalizedOutlivesConstraintSet};
use crate::{BorrowckInferCtxt, RegionInferenceContext};
/// `-Zdump-mir=polonius` dumps MIR annotated with NLL and polonius specific information.
// Note: this currently duplicates most of NLL MIR, with some additions for the localized outlives
// constraints. This is ok for now as this dump will change in the near future to an HTML file to
// become more useful.
pub(crate) fn dump_polonius_mir<'tcx>(
infcx: &BorrowckInferCtxt<'tcx>,
body: &Body<'tcx>,
regioncx: &RegionInferenceContext<'tcx>,
borrow_set: &BorrowSet<'tcx>,
localized_outlives_constraints: Option<LocalizedOutlivesConstraintSet>,
closure_region_requirements: &Option<ClosureRegionRequirements<'tcx>>,
) {
let tcx = infcx.tcx;
if !tcx.sess.opts.unstable_opts.polonius.is_next_enabled() {
return;
}
let localized_outlives_constraints = localized_outlives_constraints
.expect("missing localized constraints with `-Zpolonius=next`");
// We want the NLL extra comments printed by default in NLL MIR dumps (they were removed in
// #112346). Specifying `-Z mir-include-spans` on the CLI still has priority: for example,
// they're always disabled in mir-opt tests to make working with blessed dumps easier.
let options = PrettyPrintMirOptions {
include_extra_comments: matches!(
tcx.sess.opts.unstable_opts.mir_include_spans,
MirIncludeSpans::On | MirIncludeSpans::Nll
),
};
dump_mir_with_options(
tcx,
false,
"polonius",
&0,
body,
|pass_where, out| {
emit_polonius_mir(
tcx,
regioncx,
closure_region_requirements,
borrow_set,
&localized_outlives_constraints,
pass_where,
out,
)
},
options,
);
}
/// Produces the actual NLL + Polonius MIR sections to emit during the dumping process.
fn emit_polonius_mir<'tcx>(
tcx: TyCtxt<'tcx>,
regioncx: &RegionInferenceContext<'tcx>,
closure_region_requirements: &Option<ClosureRegionRequirements<'tcx>>,
borrow_set: &BorrowSet<'tcx>,
localized_outlives_constraints: &LocalizedOutlivesConstraintSet,
pass_where: PassWhere,
out: &mut dyn io::Write,
) -> io::Result<()> {
// Emit the regular NLL front-matter
crate::nll::emit_nll_mir(
tcx,
regioncx,
closure_region_requirements,
borrow_set,
pass_where.clone(),
out,
)?;
let liveness = regioncx.liveness_constraints();
// Add localized outlives constraints
match pass_where {
PassWhere::BeforeCFG => {
if localized_outlives_constraints.outlives.len() > 0 {
writeln!(out, "| Localized constraints")?;
for constraint in &localized_outlives_constraints.outlives {
let LocalizedOutlivesConstraint { source, from, target, to } = constraint;
let from = liveness.location_from_point(*from);
let to = liveness.location_from_point(*to);
writeln!(out, "| {source:?} at {from:?} -> {target:?} at {to:?}")?;
}
writeln!(out, "|")?;
}
}
_ => {}
}
Ok(())
}

View file

@ -1 +1,180 @@
//! Polonius analysis and support code:
//! - dedicated constraints
//! - conversion from NLL constraints
//! - debugging utilities
//! - etc.
//!
//! The current implementation models the flow-sensitive borrow-checking concerns as a graph
//! containing both information about regions and information about the control flow.
//!
//! Loan propagation is seen as a reachability problem (with some subtleties) between where the loan
//! is introduced and a given point.
//!
//! Constraints arising from type-checking allow loans to flow from region to region at the same CFG
//! point. Constraints arising from liveness allow loans to flow within from point to point, between
//! live regions at these points.
//!
//! Edges can be bidirectional to encode invariant relationships, and loans can flow "back in time"
//! to traverse these constraints arising earlier in the CFG.
//!
//! When incorporating kills in the traversal, the loans reaching a given point are considered live.
//!
//! After this, the usual NLL process happens. These live loans are fed into a dataflow analysis
//! combining them with the points where loans go out of NLL scope (the frontier where they stop
//! propagating to a live region), to yield the "loans in scope" or "active loans", at a given
//! point.
//!
//! Illegal accesses are still computed by checking whether one of these resulting loans is
//! invalidated.
//!
//! More information on this simple approach can be found in the following links, and in the future
//! in the rustc dev guide:
//! - <https://smallcultfollowing.com/babysteps/blog/2023/09/22/polonius-part-1/>
//! - <https://smallcultfollowing.com/babysteps/blog/2023/09/29/polonius-part-2/>
//!
mod constraints;
pub(crate) use constraints::*;
mod dump;
pub(crate) use dump::dump_polonius_mir;
pub(crate) mod legacy;
use rustc_middle::mir::{Body, Location};
use rustc_mir_dataflow::points::PointIndex;
use crate::RegionInferenceContext;
use crate::constraints::OutlivesConstraint;
use crate::region_infer::values::LivenessValues;
use crate::type_check::Locations;
use crate::universal_regions::UniversalRegions;
/// Creates a constraint set for `-Zpolonius=next` by:
/// - converting NLL typeck constraints to be localized
/// - encoding liveness constraints
pub(crate) fn create_localized_constraints<'tcx>(
regioncx: &mut RegionInferenceContext<'tcx>,
body: &Body<'tcx>,
) -> LocalizedOutlivesConstraintSet {
let mut localized_outlives_constraints = LocalizedOutlivesConstraintSet::default();
convert_typeck_constraints(
body,
regioncx.liveness_constraints(),
regioncx.outlives_constraints(),
&mut localized_outlives_constraints,
);
create_liveness_constraints(
body,
regioncx.liveness_constraints(),
regioncx.universal_regions(),
&mut localized_outlives_constraints,
);
// FIXME: here, we can trace loan reachability in the constraint graph and record this as loan
// liveness for the next step in the chain, the NLL loan scope and active loans computations.
localized_outlives_constraints
}
/// Propagate loans throughout the subset graph at a given point (with some subtleties around the
/// location where effects start to be visible).
fn convert_typeck_constraints<'tcx>(
body: &Body<'tcx>,
liveness: &LivenessValues,
outlives_constraints: impl Iterator<Item = OutlivesConstraint<'tcx>>,
localized_outlives_constraints: &mut LocalizedOutlivesConstraintSet,
) {
for outlives_constraint in outlives_constraints {
match outlives_constraint.locations {
Locations::All(_) => {
// For now, turn logical constraints holding at all points into physical edges at
// every point in the graph.
// FIXME: encode this into *traversal* instead.
for (block, bb) in body.basic_blocks.iter_enumerated() {
let statement_count = bb.statements.len();
for statement_index in 0..=statement_count {
let current_location = Location { block, statement_index };
let current_point = liveness.point_from_location(current_location);
localized_outlives_constraints.push(LocalizedOutlivesConstraint {
source: outlives_constraint.sup,
from: current_point,
target: outlives_constraint.sub,
to: current_point,
});
}
}
}
_ => {}
}
}
}
/// Propagate loans throughout the CFG: for each statement in the MIR, create localized outlives
/// constraints for loans that are propagated to the next statements.
pub(crate) fn create_liveness_constraints<'tcx>(
body: &Body<'tcx>,
liveness: &LivenessValues,
universal_regions: &UniversalRegions<'tcx>,
localized_outlives_constraints: &mut LocalizedOutlivesConstraintSet,
) {
for (block, bb) in body.basic_blocks.iter_enumerated() {
let statement_count = bb.statements.len();
for statement_index in 0..=statement_count {
let current_location = Location { block, statement_index };
let current_point = liveness.point_from_location(current_location);
if statement_index < statement_count {
// Intra-block edges, straight line constraints from each point to its successor
// within the same block.
let next_location = Location { block, statement_index: statement_index + 1 };
let next_point = liveness.point_from_location(next_location);
propagate_loans_between_points(
current_point,
next_point,
liveness,
universal_regions,
localized_outlives_constraints,
);
} else {
// Inter-block edges, from the block's terminator to each successor block's entry
// point.
for successor_block in bb.terminator().successors() {
let next_location = Location { block: successor_block, statement_index: 0 };
let next_point = liveness.point_from_location(next_location);
propagate_loans_between_points(
current_point,
next_point,
liveness,
universal_regions,
localized_outlives_constraints,
);
}
}
}
}
}
/// Propagate loans within a region between two points in the CFG, if that region is live at both
/// the source and target points.
fn propagate_loans_between_points(
current_point: PointIndex,
next_point: PointIndex,
_liveness: &LivenessValues,
universal_regions: &UniversalRegions<'_>,
localized_outlives_constraints: &mut LocalizedOutlivesConstraintSet,
) {
// Universal regions are semantically live at all points.
// Note: we always have universal regions but they're not always (or often) involved in the
// subset graph. For now, we emit all their edges unconditionally, but some of these subgraphs
// will be disconnected from the rest of the graph and thus, unnecessary.
// FIXME: only emit the edges of universal regions that existential regions can reach.
for region in universal_regions.universal_regions_iter() {
localized_outlives_constraints.push(LocalizedOutlivesConstraint {
source: region,
from: current_point,
target: region,
to: next_point,
});
}
}

View file

@ -2229,6 +2229,10 @@ impl<'tcx> RegionInferenceContext<'tcx> {
fn scc_representative(&self, scc: ConstraintSccIndex) -> RegionVid {
self.constraint_sccs.annotation(scc).representative
}
pub(crate) fn liveness_constraints(&self) -> &LivenessValues {
&self.liveness_constraints
}
}
impl<'tcx> RegionDefinition<'tcx> {

View file

@ -199,6 +199,11 @@ impl LivenessValues {
self.elements.point_from_location(location)
}
#[inline]
pub(crate) fn location_from_point(&self, point: PointIndex) -> Location {
self.elements.to_location(point)
}
/// When using `-Zpolonius=next`, returns whether the `loan_idx` is live at the given `point`.
pub(crate) fn is_loan_live_at(&self, loan_idx: BorrowIndex, point: PointIndex) -> bool {
self.loans

View file

@ -23,6 +23,7 @@ pub(crate) const ALIGN: usize = 40;
/// An indication of where we are in the control flow graph. Used for printing
/// extra information in `dump_mir`
#[derive(Clone)]
pub enum PassWhere {
/// We have not started dumping the control flow graph, but we are about to.
BeforeCFG,