Introduce deduced parameter attributes, and use them for deducing readonly on

indirect immutable freeze by-value function parameters.

Right now, `rustc` only examines function signatures and the platform ABI when
determining the LLVM attributes to apply to parameters. This results in missed
optimizations, because there are some attributes that can be determined via
analysis of the MIR making up the function body. In particular, `readonly`
could be applied to most indirectly-passed by-value function arguments
(specifically, those that are freeze and are observed not to be mutated), but
it currently is not.

This patch introduces the machinery that allows `rustc` to determine those
attributes. It consists of a query, `deduced_param_attrs`, that, when
evaluated, analyzes the MIR of the function to determine supplementary
attributes. The results of this query for each function are written into the
crate metadata so that the deduced parameter attributes can be applied to
cross-crate functions. In this patch, we simply check the parameter for
mutations to determine whether the `readonly` attribute should be applied to
parameters that are indirect immutable freeze by-value.  More attributes could
conceivably be deduced in the future: `nocapture` and `noalias` come to mind.

Adding `readonly` to indirect function parameters where applicable enables some
potential optimizations in LLVM that are discussed in [issue 103103] and [PR
103070] around avoiding stack-to-stack memory copies that appear in functions
like `core::fmt::Write::write_fmt` and `core::panicking::assert_failed`. These
functions pass a large structure unchanged by value to a subfunction that also
doesn't mutate it. Since the structure in this case is passed as an indirect
parameter, it's a pointer from LLVM's perspective. As a result, the
intermediate copy of the structure that our codegen emits could be optimized
away by LLVM's MemCpyOptimizer if it knew that the pointer is `readonly
nocapture noalias` in both the caller and callee. We already pass `nocapture
noalias`, but we're missing `readonly`, as we can't determine whether a
by-value parameter is mutated by examining the signature in Rust. I didn't have
much success with having LLVM infer the `readonly` attribute, even with fat
LTO; it seems that deducing it at the MIR level is necessary.

No large benefits should be expected from this optimization *now*; LLVM needs
some changes (discussed in [PR 103070]) to more aggressively use the `noalias
nocapture readonly` combination in its alias analysis. I have some LLVM patches
for these optimizations and have had them looked over. With all the patches
applied locally, I enabled LLVM to remove all the `memcpy`s from the following
code:

```rust
fn main() {
    println!("Hello {}", 3);
}
```

which is a significant codegen improvement over the status quo. I expect that
if this optimization kicks in in multiple places even for such a simple
program, then it will apply to Rust code all over the place.

[issue 103103]: https://github.com/rust-lang/rust/issues/103103

[PR 103070]: https://github.com/rust-lang/rust/pull/103070
This commit is contained in:
Patrick Walton 2022-10-17 19:42:15 -07:00
parent b1ab3b738a
commit da630ac79d
14 changed files with 393 additions and 9 deletions

View file

@ -224,6 +224,7 @@ provide! { tcx, def_id, other, cdata,
fn_arg_names => { table }
generator_kind => { table }
trait_def => { table }
deduced_param_attrs => { table }
collect_trait_impl_trait_tys => {
Ok(cdata
.root

View file

@ -30,7 +30,7 @@ use rustc_middle::ty::query::Providers;
use rustc_middle::ty::{self, SymbolName, Ty, TyCtxt};
use rustc_middle::util::common::to_readable_str;
use rustc_serialize::{opaque, Decodable, Decoder, Encodable, Encoder};
use rustc_session::config::CrateType;
use rustc_session::config::{CrateType, OptLevel};
use rustc_session::cstore::{ForeignModule, LinkagePreference, NativeLib};
use rustc_span::hygiene::{ExpnIndex, HygieneEncodeContext, MacroKind};
use rustc_span::symbol::{sym, Symbol};
@ -1441,6 +1441,21 @@ impl<'a, 'tcx> EncodeContext<'a, 'tcx> {
record!(self.tables.unused_generic_params[def_id.to_def_id()] <- unused);
}
}
// Encode all the deduced parameter attributes for everything that has MIR, even for items
// that can't be inlined. But don't if we aren't optimizing in non-incremental mode, to
// save the query traffic.
if tcx.sess.opts.output_types.should_codegen()
&& tcx.sess.opts.optimize != OptLevel::No
&& tcx.sess.opts.incremental.is_none()
{
for &local_def_id in tcx.mir_keys(()) {
if let DefKind::AssocFn | DefKind::Fn = tcx.def_kind(local_def_id) {
record_array!(self.tables.deduced_param_attrs[local_def_id.to_def_id()] <-
self.tcx.deduced_param_attrs(local_def_id.to_def_id()));
}
}
}
}
fn encode_stability(&mut self, def_id: DefId) {

View file

@ -23,7 +23,7 @@ use rustc_middle::mir;
use rustc_middle::ty::fast_reject::SimplifiedType;
use rustc_middle::ty::query::Providers;
use rustc_middle::ty::{self, ReprOptions, Ty};
use rustc_middle::ty::{GeneratorDiagnosticData, ParameterizedOverTcx, TyCtxt};
use rustc_middle::ty::{DeducedParamAttrs, GeneratorDiagnosticData, ParameterizedOverTcx, TyCtxt};
use rustc_serialize::opaque::FileEncoder;
use rustc_session::config::SymbolManglingVersion;
use rustc_session::cstore::{CrateDepKind, ForeignModule, LinkagePreference, NativeLib};
@ -402,6 +402,7 @@ define_tables! {
macro_definition: Table<DefIndex, LazyValue<ast::MacArgs>>,
proc_macro: Table<DefIndex, MacroKind>,
module_reexports: Table<DefIndex, LazyArray<ModChild>>,
deduced_param_attrs: Table<DefIndex, LazyArray<DeducedParamAttrs>>,
trait_impl_trait_tys: Table<DefIndex, LazyValue<FxHashMap<DefId, Ty<'static>>>>,
}

View file

@ -2127,4 +2127,9 @@ rustc_queries! {
) -> Result<(), ErrorGuaranteed> {
desc { |tcx| "checking assoc const `{}` has the same type as trait item", tcx.def_path_str(key.0.to_def_id()) }
}
query deduced_param_attrs(def_id: DefId) -> &'tcx [ty::DeducedParamAttrs] {
desc { |tcx| "deducing parameter attributes for {}", tcx.def_path_str(def_id) }
separate_provide_extern
}
}

View file

@ -455,6 +455,7 @@ impl_arena_copy_decoder! {<'tcx>
rustc_span::def_id::DefId,
rustc_span::def_id::LocalDefId,
(rustc_middle::middle::exported_symbols::ExportedSymbol<'tcx>, rustc_middle::middle::exported_symbols::SymbolExportInfo),
ty::DeducedParamAttrs,
}
#[macro_export]

View file

@ -2954,6 +2954,21 @@ impl<'tcx> TyCtxtAt<'tcx> {
}
}
/// Parameter attributes that can only be determined by examining the body of a function instead
/// of just its signature.
///
/// These can be useful for optimization purposes when a function is directly called. We compute
/// them and store them into the crate metadata so that downstream crates can make use of them.
///
/// Right now, we only have `read_only`, but `no_capture` and `no_alias` might be useful in the
/// future.
#[derive(Clone, Copy, PartialEq, Debug, Default, TyDecodable, TyEncodable, HashStable)]
pub struct DeducedParamAttrs {
/// The parameter is marked immutable in the function and contains no `UnsafeCell` (i.e. its
/// type is freeze).
pub read_only: bool,
}
// We are comparing types with different invariant lifetimes, so `ptr::eq`
// won't work for us.
fn ptr_eq<T, U>(t: *const T, u: *const U) -> bool {

View file

@ -78,7 +78,7 @@ pub use self::consts::{
};
pub use self::context::{
tls, CanonicalUserType, CanonicalUserTypeAnnotation, CanonicalUserTypeAnnotations,
CtxtInterners, DelaySpanBugEmitted, FreeRegionInfo, GeneratorDiagnosticData,
CtxtInterners, DeducedParamAttrs, DelaySpanBugEmitted, FreeRegionInfo, GeneratorDiagnosticData,
GeneratorInteriorTypeCause, GlobalCtxt, Lift, OnDiskCache, TyCtxt, TypeckResults, UserType,
UserTypeAnnotationIndex,
};

View file

@ -61,6 +61,7 @@ trivially_parameterized_over_tcx! {
crate::middle::resolve_lifetime::ObjectLifetimeDefault,
crate::mir::ConstQualifs,
ty::AssocItemContainer,
ty::DeducedParamAttrs,
ty::Generics,
ty::ImplPolarity,
ty::ReprOptions,

View file

@ -0,0 +1,249 @@
//! Deduces supplementary parameter attributes from MIR.
//!
//! Deduced parameter attributes are those that can only be soundly determined by examining the
//! body of the function instead of just the signature. These can be useful for optimization
//! purposes on a best-effort basis. We compute them here and store them into the crate metadata so
//! dependent crates can use them.
use rustc_hir::def_id::DefId;
use rustc_index::bit_set::BitSet;
use rustc_middle::mir::visit::{NonMutatingUseContext, PlaceContext, Visitor};
use rustc_middle::mir::{Body, Local, Location, Operand, Terminator, TerminatorKind, RETURN_PLACE};
use rustc_middle::ty::{self, DeducedParamAttrs, ParamEnv, Ty, TyCtxt};
use rustc_session::config::OptLevel;
use rustc_span::DUMMY_SP;
/// A visitor that determines which arguments have been mutated. We can't use the mutability field
/// on LocalDecl for this because it has no meaning post-optimization.
struct DeduceReadOnly {
/// Each bit is indexed by argument number, starting at zero (so 0 corresponds to local decl
/// 1). The bit is true if the argument may have been mutated or false if we know it hasn't
/// been up to the point we're at.
mutable_args: BitSet<usize>,
}
impl DeduceReadOnly {
/// Returns a new DeduceReadOnly instance.
fn new(arg_count: usize) -> Self {
Self { mutable_args: BitSet::new_empty(arg_count) }
}
}
impl<'tcx> Visitor<'tcx> for DeduceReadOnly {
fn visit_local(&mut self, local: Local, mut context: PlaceContext, _: Location) {
// We're only interested in arguments.
if local == RETURN_PLACE || local.index() > self.mutable_args.domain_size() {
return;
}
// Replace place contexts that are moves with copies. This is safe in all cases except
// function argument position, which we already handled in `visit_terminator()` by using the
// ArgumentChecker. See the comment in that method for more details.
//
// In the future, we might want to move this out into a separate pass, but for now let's
// just do it on the fly because that's faster.
if matches!(context, PlaceContext::NonMutatingUse(NonMutatingUseContext::Move)) {
context = PlaceContext::NonMutatingUse(NonMutatingUseContext::Copy);
}
match context {
PlaceContext::MutatingUse(..)
| PlaceContext::NonMutatingUse(NonMutatingUseContext::Move) => {
// This is a mutation, so mark it as such.
self.mutable_args.insert(local.index() - 1);
}
PlaceContext::NonMutatingUse(..) | PlaceContext::NonUse(..) => {
// Not mutating, so it's fine.
}
}
}
fn visit_terminator(&mut self, terminator: &Terminator<'tcx>, location: Location) {
// OK, this is subtle. Suppose that we're trying to deduce whether `x` in `f` is read-only
// and we have the following:
//
// fn f(x: BigStruct) { g(x) }
// fn g(mut y: BigStruct) { y.foo = 1 }
//
// If, at the generated MIR level, `f` turned into something like:
//
// fn f(_1: BigStruct) -> () {
// let mut _0: ();
// bb0: {
// _0 = g(move _1) -> bb1;
// }
// ...
// }
//
// then it would be incorrect to mark `x` (i.e. `_1`) as `readonly`, because `g`'s write to
// its copy of the indirect parameter would actually be a write directly to the pointer that
// `f` passes. Note that function arguments are the only situation in which this problem can
// arise: every other use of `move` in MIR doesn't actually write to the value it moves
// from.
//
// Anyway, right now this situation doesn't actually arise in practice. Instead, the MIR for
// that function looks like this:
//
// fn f(_1: BigStruct) -> () {
// let mut _0: ();
// let mut _2: BigStruct;
// bb0: {
// _2 = move _1;
// _0 = g(move _2) -> bb1;
// }
// ...
// }
//
// Because of that extra move that MIR construction inserts, `x` (i.e. `_1`) can *in
// practice* safely be marked `readonly`.
//
// To handle the possibility that other optimizations (for example, destination propagation)
// might someday generate MIR like the first example above, we panic upon seeing an argument
// to *our* function that is directly moved into *another* function as an argument. Having
// eliminated that problematic case, we can safely treat moves as copies in this analysis.
//
// In the future, if MIR optimizations cause arguments of a caller to be directly moved into
// the argument of a callee, we can just add that argument to `mutated_args` instead of
// panicking.
//
// Note that, because the problematic MIR is never actually generated, we can't add a test
// case for this.
if let TerminatorKind::Call { ref args, .. } = terminator.kind {
for arg in args {
if let Operand::Move(_) = *arg {
// ArgumentChecker panics if a direct move of an argument from a caller to a
// callee was detected.
//
// If, in the future, MIR optimizations cause arguments to be moved directly
// from callers to callees, change the panic to instead add the argument in
// question to `mutating_uses`.
ArgumentChecker::new(self.mutable_args.domain_size())
.visit_operand(arg, location)
}
}
};
self.super_terminator(terminator, location);
}
}
/// A visitor that simply panics if a direct move of an argument from a caller to a callee was
/// detected.
struct ArgumentChecker {
/// The number of arguments to the calling function.
arg_count: usize,
}
impl ArgumentChecker {
/// Creates a new ArgumentChecker.
fn new(arg_count: usize) -> Self {
Self { arg_count }
}
}
impl<'tcx> Visitor<'tcx> for ArgumentChecker {
fn visit_local(&mut self, local: Local, context: PlaceContext, _: Location) {
// Check to make sure that, if this local is an argument, we didn't move directly from it.
if matches!(context, PlaceContext::NonMutatingUse(NonMutatingUseContext::Move))
&& local != RETURN_PLACE
&& local.index() <= self.arg_count
{
// If, in the future, MIR optimizations cause arguments to be moved directly from
// callers to callees, change this panic to instead add the argument in question to
// `mutating_uses`.
panic!("Detected a direct move from a caller's argument to a callee's argument!")
}
}
}
/// Returns true if values of a given type will never be passed indirectly, regardless of ABI.
fn type_will_always_be_passed_directly<'tcx>(ty: Ty<'tcx>) -> bool {
matches!(
ty.kind(),
ty::Bool
| ty::Char
| ty::Float(..)
| ty::Int(..)
| ty::RawPtr(..)
| ty::Ref(..)
| ty::Slice(..)
| ty::Uint(..)
)
}
/// Returns the deduced parameter attributes for a function.
///
/// Deduced parameter attributes are those that can only be soundly determined by examining the
/// body of the function instead of just the signature. These can be useful for optimization
/// purposes on a best-effort basis. We compute them here and store them into the crate metadata so
/// dependent crates can use them.
pub fn deduced_param_attrs<'tcx>(tcx: TyCtxt<'tcx>, def_id: DefId) -> &'tcx [DeducedParamAttrs] {
// This computation is unfortunately rather expensive, so don't do it unless we're optimizing.
// Also skip it in incremental mode.
if tcx.sess.opts.optimize == OptLevel::No || tcx.sess.opts.incremental.is_some() {
return &[];
}
// If the Freeze language item isn't present, then don't bother.
if tcx.lang_items().freeze_trait().is_none() {
return &[];
}
// Codegen won't use this information for anything if all the function parameters are passed
// directly. Detect that and bail, for compilation speed.
let fn_ty = tcx.type_of(def_id);
if matches!(fn_ty.kind(), ty::FnDef(..)) {
if fn_ty
.fn_sig(tcx)
.inputs()
.skip_binder()
.iter()
.cloned()
.all(type_will_always_be_passed_directly)
{
return &[];
}
}
// Don't deduce any attributes for functions that have no MIR.
if !tcx.is_mir_available(def_id) {
return &[];
}
// Deduced attributes for other crates should be read from the metadata instead of via this
// function.
debug_assert!(def_id.is_local());
// Grab the optimized MIR. Analyze it to determine which arguments have been mutated.
let body: &Body<'tcx> = tcx.optimized_mir(def_id);
let mut deduce_read_only = DeduceReadOnly::new(body.arg_count);
deduce_read_only.visit_body(body);
// Set the `readonly` attribute for every argument that we concluded is immutable and that
// contains no UnsafeCells.
//
// FIXME: This is overly conservative around generic parameters: `is_freeze()` will always
// return false for them. For a description of alternatives that could do a better job here,
// see [1].
//
// [1]: https://github.com/rust-lang/rust/pull/103172#discussion_r999139997
let mut deduced_param_attrs = tcx.arena.alloc_from_iter(
body.local_decls.iter().skip(1).take(body.arg_count).enumerate().map(
|(arg_index, local_decl)| DeducedParamAttrs {
read_only: !deduce_read_only.mutable_args.contains(arg_index)
&& local_decl.ty.is_freeze(tcx.at(DUMMY_SP), ParamEnv::reveal_all()),
},
),
);
// Trailing parameters past the size of the `deduced_param_attrs` array are assumed to have the
// default set of attributes, so we don't have to store them explicitly. Pop them off to save a
// few bytes in metadata.
while deduced_param_attrs.last() == Some(&DeducedParamAttrs::default()) {
let last_index = deduced_param_attrs.len() - 1;
deduced_param_attrs = &mut deduced_param_attrs[0..last_index];
}
deduced_param_attrs
}

View file

@ -56,6 +56,7 @@ mod const_prop_lint;
mod coverage;
mod dead_store_elimination;
mod deaggregator;
mod deduce_param_attrs;
mod deduplicate_blocks;
mod deref_separator;
mod dest_prop;
@ -139,6 +140,7 @@ pub fn provide(providers: &mut Providers) {
promoted_mir_of_const_arg: |tcx, (did, param_did)| {
promoted_mir(tcx, ty::WithOptConstParam { did, const_param_did: Some(param_did) })
},
deduced_param_attrs: deduce_param_attrs::deduced_param_attrs,
..*providers
};
}

View file

@ -848,6 +848,7 @@ impl_ref_decoder! {<'tcx>
rustc_span::def_id::DefId,
rustc_span::def_id::LocalDefId,
(rustc_middle::middle::exported_symbols::ExportedSymbol<'tcx>, rustc_middle::middle::exported_symbols::SymbolExportInfo),
ty::DeducedParamAttrs,
}
//- ENCODING -------------------------------------------------------------------

View file

@ -4,6 +4,7 @@ use rustc_middle::ty::layout::{
fn_can_unwind, FnAbiError, HasParamEnv, HasTyCtxt, LayoutCx, LayoutOf, TyAndLayout,
};
use rustc_middle::ty::{self, Ty, TyCtxt};
use rustc_session::config::OptLevel;
use rustc_span::def_id::DefId;
use rustc_target::abi::call::{
ArgAbi, ArgAttribute, ArgAttributes, ArgExtension, Conv, FnAbi, PassMode, Reg, RegKind,
@ -384,7 +385,7 @@ fn fn_abi_new_uncached<'tcx>(
conv,
can_unwind: fn_can_unwind(cx.tcx(), fn_def_id, sig.abi),
};
fn_abi_adjust_for_abi(cx, &mut fn_abi, sig.abi)?;
fn_abi_adjust_for_abi(cx, &mut fn_abi, sig.abi, fn_def_id)?;
debug!("fn_abi_new_uncached = {:?}", fn_abi);
Ok(cx.tcx.arena.alloc(fn_abi))
}
@ -394,6 +395,7 @@ fn fn_abi_adjust_for_abi<'tcx>(
cx: &LayoutCx<'tcx, TyCtxt<'tcx>>,
fn_abi: &mut FnAbi<'tcx, Ty<'tcx>>,
abi: SpecAbi,
fn_def_id: Option<DefId>,
) -> Result<(), FnAbiError<'tcx>> {
if abi == SpecAbi::Unadjusted {
return Ok(());
@ -404,7 +406,18 @@ fn fn_abi_adjust_for_abi<'tcx>(
|| abi == SpecAbi::RustIntrinsic
|| abi == SpecAbi::PlatformIntrinsic
{
let fixup = |arg: &mut ArgAbi<'tcx, Ty<'tcx>>| {
// Look up the deduced parameter attributes for this function, if we have its def ID and
// we're optimizing in non-incremental mode. We'll tag its parameters with those attributes
// as appropriate.
let deduced_param_attrs = if cx.tcx.sess.opts.optimize != OptLevel::No
&& cx.tcx.sess.opts.incremental.is_none()
{
fn_def_id.map(|fn_def_id| cx.tcx.deduced_param_attrs(fn_def_id)).unwrap_or_default()
} else {
&[]
};
let fixup = |arg: &mut ArgAbi<'tcx, Ty<'tcx>>, arg_idx: Option<usize>| {
if arg.is_ignore() {
return;
}
@ -451,10 +464,30 @@ fn fn_abi_adjust_for_abi<'tcx>(
// so we pick an appropriately sized integer type instead.
arg.cast_to(Reg { kind: RegKind::Integer, size });
}
// If we deduced that this parameter was read-only, add that to the attribute list now.
//
// The `readonly` parameter only applies to pointers, so we can only do this if the
// argument was passed indirectly. (If the argument is passed directly, it's an SSA
// value, so it's implicitly immutable.)
if let (Some(arg_idx), &mut PassMode::Indirect { ref mut attrs, .. }) =
(arg_idx, &mut arg.mode)
{
// The `deduced_param_attrs` list could be empty if this is a type of function
// we can't deduce any parameters for, so make sure the argument index is in
// bounds.
if let Some(deduced_param_attrs) = deduced_param_attrs.get(arg_idx) {
if deduced_param_attrs.read_only {
attrs.regular.insert(ArgAttribute::ReadOnly);
debug!("added deduced read-only attribute");
}
}
}
};
fixup(&mut fn_abi.ret);
for arg in fn_abi.args.iter_mut() {
fixup(arg);
fixup(&mut fn_abi.ret, None);
for (arg_idx, arg) in fn_abi.args.iter_mut().enumerate() {
fixup(arg, Some(arg_idx));
}
} else {
fn_abi.adjust_for_foreign_abi(cx, abi)?;

View file

@ -0,0 +1,60 @@
// compile-flags: -O
#![crate_type = "lib"]
#![allow(incomplete_features)]
#![feature(unsized_locals, unsized_fn_params)]
use std::cell::Cell;
use std::hint;
// Check to make sure that we can deduce the `readonly` attribute from function bodies for
// parameters passed indirectly.
pub struct BigStruct {
blah: [i32; 1024],
}
pub struct BigCellContainer {
blah: [Cell<i32>; 1024],
}
// The by-value parameter for this big struct can be marked readonly.
//
// CHECK: @use_big_struct_immutably({{.*}} readonly {{.*}} %big_struct)
#[no_mangle]
pub fn use_big_struct_immutably(big_struct: BigStruct) {
hint::black_box(&big_struct);
}
// The by-value parameter for this big struct can't be marked readonly, because we mutate it.
//
// CHECK-NOT: @use_big_struct_mutably({{.*}} readonly {{.*}} %big_struct)
#[no_mangle]
pub fn use_big_struct_mutably(mut big_struct: BigStruct) {
big_struct.blah[987] = 654;
hint::black_box(&big_struct);
}
// The by-value parameter for this big struct can't be marked readonly, because it contains
// UnsafeCell.
//
// CHECK-NOT: @use_big_cell_container({{.*}} readonly {{.*}} %big_cell_container)
#[no_mangle]
pub fn use_big_cell_container(big_cell_container: BigCellContainer) {
hint::black_box(&big_cell_container);
}
// Make sure that we don't mistakenly mark a big struct as `readonly` when passed through a generic
// type parameter if it contains UnsafeCell.
//
// CHECK-NOT: @use_something({{.*}} readonly {{.*}} %something)
#[no_mangle]
#[inline(never)]
pub fn use_something<T>(something: T) {
hint::black_box(&something);
}
#[no_mangle]
pub fn forward_big_cell_container(big_cell_container: BigCellContainer) {
use_something(big_cell_container)
}

View file

@ -127,7 +127,7 @@ pub fn mutable_notunpin_borrow(_: &mut NotUnpin) {
pub fn notunpin_borrow(_: &NotUnpin) {
}
// CHECK: @indirect_struct({{%S\*|ptr}} noalias nocapture noundef dereferenceable(32) %_1)
// CHECK: @indirect_struct({{%S\*|ptr}} noalias nocapture noundef readonly dereferenceable(32) %_1)
#[no_mangle]
pub fn indirect_struct(_: S) {
}