Auto merge of #95689 - lqd:self-profiler, r=wesleywiser

Allow self-profiler to only record potentially costly arguments when argument recording is turned on

As discussed [on zulip](https://rust-lang.zulipchat.com/#narrow/stream/247081-t-compiler.2Fperformance/topic/Identifying.20proc-macro.20slowdowns/near/277304909) with `@wesleywiser,` I'd like to record proc-macro expansions in the self-profiler, with some detailed data (per-expansion spans for example, to follow #95473).

At the same time, I'd also like to avoid doing expensive things when tracking a generic activity's arguments, if they were not specifically opted into the event filter mask, to allow the self-profiler to be used in hotter contexts.

This PR tries to offer:
- a way to ensure a closure to record arguments will only be called in that situation, so that potentially costly arguments can still be recorded when needed. With the additional requirement that, if possible, it would offer a way to record non-owned data without adding many `generic_activity_with_arg_{...}`-style methods. This lead to the `generic_activity_with_arg_recorder` single entry-point, and the closure parameter would offer the new methods, able to be executed in a context where costly argument could be created without disturbing the profiled piece of code.
- some facilities/patterns allowing to record more rustc specific data in this situation, without making `rustc_data_structures`  where the self-profiler is defined, depend on other rustc crates (causing circular dependencies): in particular, spans. They are quite tricky to turn into strings (if the default `Debug` impl output does not match the context one needs them for), and since I'd also like to avoid the allocation there when arg recording is turned off today, that has turned into another flexibility requirement for the API in this PR (separating the span-specific recording into an extension trait). **edit**: I've removed this from the PR so that it's easier to review, and opened https://github.com/rust-lang/rust/pull/95739.
- allow for extensibility in the future: other ways to record arguments, or additional data attached to them could be added in the future (e.g. recording the argument's name as well as its data).

Some areas where I'd love feedback:
- the API and names: the `EventArgRecorder` and its method for example. As well as the verbosity that comes from the increased flexibility.
- if I should convert the existing `generic_activity_with_arg{s}` to just forward to `generic_activity_with_arg_recorder` + `recorder.record_arg` (or remove them altogether ? Probably not): I've used the new API in the simple case I could find of allocating for an arg that may not be recorded, and the rest don't seem costly.
- [x] whether this API should panic if no arguments were recorded by the user-provided closure (like this PR currently does: it seems like an error to use an API dedicated to record arguments but not call the methods to then do so) or if this should just record a generic activity without arguments ?
- whether the `record_arg` function should be `#[inline(always)]`, like the `generic_activity_*` functions ?

As mentioned, r? `@wesleywiser` following our recent discussion.
This commit is contained in:
bors 2022-04-16 11:43:28 +00:00
commit febce1fc31
5 changed files with 103 additions and 13 deletions

View file

@ -11,7 +11,7 @@ use rustc_target::spec::SplitDebuginfo;
use crate::{GccCodegenBackend, GccContext};
pub(crate) unsafe fn codegen(cgcx: &CodegenContext<GccCodegenBackend>, _diag_handler: &Handler, module: ModuleCodegen<GccContext>, config: &ModuleConfig) -> Result<CompiledModule, FatalError> {
let _timer = cgcx.prof.generic_activity_with_arg("LLVM_module_codegen", &module.name[..]);
let _timer = cgcx.prof.generic_activity_with_arg("LLVM_module_codegen", &*module.name);
{
let context = &module.module_llvm.context;

View file

@ -313,7 +313,9 @@ fn fat_lto(
for (bc_decoded, name) in serialized_modules {
let _timer = cgcx
.prof
.generic_activity_with_arg("LLVM_fat_lto_link_module", format!("{:?}", name));
.generic_activity_with_arg_recorder("LLVM_fat_lto_link_module", |recorder| {
recorder.record_arg(format!("{:?}", name))
});
info!("linking {:?}", name);
let data = bc_decoded.data();
linker.add(data).map_err(|()| {

View file

@ -721,8 +721,7 @@ pub(crate) fn link(
let mut linker = Linker::new(first.module_llvm.llmod());
for module in elements {
let _timer =
cgcx.prof.generic_activity_with_arg("LLVM_link_module", format!("{:?}", module.name));
let _timer = cgcx.prof.generic_activity_with_arg("LLVM_link_module", &*module.name);
let buffer = ModuleBuffer::new(module.module_llvm.llmod());
linker.add(buffer.data()).map_err(|()| {
let msg = format!("failed to serialize module {:?}", module.name);

View file

@ -74,10 +74,11 @@ pub fn compile_codegen_unit(tcx: TyCtxt<'_>, cgu_name: Symbol) -> (ModuleCodegen
fn module_codegen(tcx: TyCtxt<'_>, cgu_name: Symbol) -> ModuleCodegen<ModuleLlvm> {
let cgu = tcx.codegen_unit(cgu_name);
let _prof_timer = tcx.prof.generic_activity_with_args(
"codegen_module",
&[cgu_name.to_string(), cgu.size_estimate().to_string()],
);
let _prof_timer =
tcx.prof.generic_activity_with_arg_recorder("codegen_module", |recorder| {
recorder.record_arg(cgu_name.to_string());
recorder.record_arg(cgu.size_estimate().to_string());
});
// Instantiate monomorphizations without filling out definitions yet...
let llvm_module = ModuleLlvm::new(tcx, cgu_name.as_str());
{

View file

@ -97,6 +97,7 @@ use std::time::{Duration, Instant};
pub use measureme::EventId;
use measureme::{EventIdBuilder, Profiler, SerializableString, StringId};
use parking_lot::RwLock;
use smallvec::SmallVec;
bitflags::bitflags! {
struct EventFilter: u32 {
@ -183,11 +184,11 @@ impl SelfProfilerRef {
}
}
// This shim makes sure that calls only get executed if the filter mask
// lets them pass. It also contains some trickery to make sure that
// code is optimized for non-profiling compilation sessions, i.e. anything
// past the filter check is never inlined so it doesn't clutter the fast
// path.
/// This shim makes sure that calls only get executed if the filter mask
/// lets them pass. It also contains some trickery to make sure that
/// code is optimized for non-profiling compilation sessions, i.e. anything
/// past the filter check is never inlined so it doesn't clutter the fast
/// path.
#[inline(always)]
fn exec<F>(&self, event_filter: EventFilter, f: F) -> TimingGuard<'_>
where
@ -288,6 +289,66 @@ impl SelfProfilerRef {
})
}
/// Start profiling a generic activity, allowing costly arguments to be recorded. Profiling
/// continues until the `TimingGuard` returned from this call is dropped.
///
/// If the arguments to a generic activity are cheap to create, use `generic_activity_with_arg`
/// or `generic_activity_with_args` for their simpler API. However, if they are costly or
/// require allocation in sufficiently hot contexts, then this allows for a closure to be called
/// only when arguments were asked to be recorded via `-Z self-profile-events=args`.
///
/// In this case, the closure will be passed a `&mut EventArgRecorder`, to help with recording
/// one or many arguments within the generic activity being profiled, by calling its
/// `record_arg` method for example.
///
/// This `EventArgRecorder` may implement more specific traits from other rustc crates, e.g. for
/// richer handling of rustc-specific argument types, while keeping this single entry-point API
/// for recording arguments.
///
/// Note: recording at least one argument is *required* for the self-profiler to create the
/// `TimingGuard`. A panic will be triggered if that doesn't happen. This function exists
/// explicitly to record arguments, so it fails loudly when there are none to record.
///
#[inline(always)]
pub fn generic_activity_with_arg_recorder<F>(
&self,
event_label: &'static str,
mut f: F,
) -> TimingGuard<'_>
where
F: FnMut(&mut EventArgRecorder<'_>),
{
// Ensure this event will only be recorded when self-profiling is turned on.
self.exec(EventFilter::GENERIC_ACTIVITIES, |profiler| {
let builder = EventIdBuilder::new(&profiler.profiler);
let event_label = profiler.get_or_alloc_cached_string(event_label);
// Ensure the closure to create event arguments will only be called when argument
// recording is turned on.
let event_id = if profiler.event_filter_mask.contains(EventFilter::FUNCTION_ARGS) {
// Set up the builder and call the user-provided closure to record potentially
// costly event arguments.
let mut recorder = EventArgRecorder { profiler, args: SmallVec::new() };
f(&mut recorder);
// It is expected that the closure will record at least one argument. If that
// doesn't happen, it's a bug: we've been explicitly called in order to record
// arguments, so we fail loudly when there are none to record.
if recorder.args.is_empty() {
panic!(
"The closure passed to `generic_activity_with_arg_recorder` needs to \
record at least one argument"
);
}
builder.from_label_and_args(event_label, &recorder.args)
} else {
builder.from_label(event_label)
};
TimingGuard::start(profiler, profiler.generic_activity_event_kind, event_id)
})
}
/// Record the size of an artifact that the compiler produces
///
/// `artifact_kind` is the class of artifact (e.g., query_cache, object_file, etc.)
@ -443,6 +504,33 @@ impl SelfProfilerRef {
}
}
/// A helper for recording costly arguments to self-profiling events. Used with
/// `SelfProfilerRef::generic_activity_with_arg_recorder`.
pub struct EventArgRecorder<'p> {
/// The `SelfProfiler` used to intern the event arguments that users will ask to record.
profiler: &'p SelfProfiler,
/// The interned event arguments to be recorded in the generic activity event.
///
/// The most common case, when actually recording event arguments, is to have one argument. Then
/// followed by recording two, in a couple places.
args: SmallVec<[StringId; 2]>,
}
impl EventArgRecorder<'_> {
/// Records a single argument within the current generic activity being profiled.
///
/// Note: when self-profiling with costly event arguments, at least one argument
/// needs to be recorded. A panic will be triggered if that doesn't happen.
pub fn record_arg<A>(&mut self, event_arg: A)
where
A: Borrow<str> + Into<String>,
{
let event_arg = self.profiler.get_or_alloc_cached_string(event_arg);
self.args.push(event_arg);
}
}
pub struct SelfProfiler {
profiler: Profiler,
event_filter_mask: EventFilter,