Make intrinsic fallback bodies cross-crate inlineable
This change was prompted by the stage1 compiler spending 4% of its time when compiling the polymorphic-recursion MIR opt test in `unlikely`.
Intrinsic fallback bodies like `unlikely` should always be inlined, it's very silly if they are not. To do this, we enable the fallback bodies to be cross-crate inlineable. Not that this matters for our workloads since the compiler never actually _uses_ the "fallback bodies", it just uses whatever was cfg(bootstrap)ped, so I've also added `#[inline]` to those.
See the comments for more information.
r? oli-obk
intrinsics::simd: add missing functions, avoid UB-triggering fast-math
Turns out stdarch declares a bunch more SIMD intrinsics that are still missing from libcore.
I hope I got the docs and in particular the safety requirements right for these "unordered" and "nanless" intrinsics.
Many of these are unused even in stdarch, but they are implemented in the codegen backend, so we may as well list them here.
r? `@Amanieu`
Cc `@calebzulawski` `@workingjubilee`
Remove useless `'static` bounds on `Box` allocator
#79327 added `'static` bounds to the allocator parameter for various `Box` + `Pin` APIs to ensure soundness. But it was a bit overzealous, some of the bounds aren't actually needed.
Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison
Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the [LangRef](https://llvm.org/docs/LangRef.html#fastmath), only the flags `nnan` (no nans) and `ninf` (no infs) can produce poison.
And this uses the algebraic float ops to fix https://github.com/rust-lang/rust/issues/120720
cc `@orlp`
Currently all architecture-specific memchr code is only used in
`std::io`. Most of the actual `memchr` capacity exposed to the user
through the slice API is instead implemented in core::slice::memchr.
Hence this commit deletes memchr from std::sys[_common] and replace
calls to it by calls to core::slice::memchr functions. This deletes
(r)memchr from the list of symbols linked to libc.
Refactor trait implementations in `core::convert::num`.
Tracking issue: https://github.com/rust-lang/rust/issues/120257
Implement conversion traits using generic `NonZero` type, and refactor all macros to use a consistent format/order of parameters.
r? `@dtolnay`
Correct the simd_masked_{load,store} intrinsic docs
Explains the uniform pointer being used for these two operations and how elements are offset from it.
Always inline check in `assert_unsafe_precondition` with cfg(debug_assertions)
The current complexities in `assert_unsafe_precondition` are delicately balancing several concerns, among them compile times for the cases where there are no debug assertions. This comes at a large runtime cost when the assertions are enabled, making the debug assertion compiler a lot slower, which is very annoying.
To avoid this, we always inline the check when building with debug assertions.
Numbers (compiling stage1 library after touching core):
- master: 80s
- just adding `#[inline(always)]` to the `cfg(bootstrap)` `debug_assertions` (equivalent to a bootstrap bump (uhh, i just realized that i was on a slightly outdated master so this bump might have happened already), (#121112)): 67s
- this: 54s
So this seems like a good solution. I think we can still get the same run-time perf improvements for other users too by massaging this code further (see my other PR about adding `#[rustc_no_mir_inline]` #121114) but this is a simpler step that solves the imminent problem of "holy shit my rustc is sooo slow".
Funny consequence: This now means compiling the standard library with dbeug assertions makes it faster (than without, when using debug assertions downstream)!
r? ```@saethlin``` (or anyone else if someone wants to review this)
fixes#121110, supposedly
Use intrinsics::debug_assertions in debug_assert_nounwind
This is the first item in https://github.com/rust-lang/rust/issues/120848.
Based on the benchmarking in this PR, it looks like, for the programs in our benchmark suite, enabling all these additional checks does not introduce significant compile-time overhead, with the single exception of `Alignment::new_unchecked`. Therefore, I've added `#[cfg(debug_assertions)]` to that one call site, so that it remains compiled out in the distributed standard library.
The trailing commas in the previous calls to `debug_assert_nounwind!` were causing the macro to expand to `panic_nouwnind_fmt`, which requires more work to set up its arguments, and that overhead alone is measured between this perf run and the next: https://github.com/rust-lang/rust/pull/120863#issuecomment-1937423502