22bcb81c66
improve codegen of fmt_num to delete unreachable panic it seems LLVM doesn't realize that `curr` is always decremented at least once in either loop formatting characters of the input string by their appropriate radix, and so the later `&buf[curr..]` generates a check for out-of-bounds access and panic. this is unreachable in reality as even for `x == T::zero()` we'll produce at least the character `Self::digit(T::zero())`, yielding at least one character output, and `curr` will always be at least one below `buf.len()`. adjust `fmt_int` to make this fact more obvious to the compiler, which fortunately (or unfortunately) results in a measurable performance improvement for workloads heavy on formatting integers. in the program i'd noticed this in, you can see the `cmp $0x80,%rdi; ja 7c` here, which branches to a slice index fail helper: <img width="660" alt="before" src="https://github.com/rust-lang/rust/assets/4615790/ac482d54-21f8-494b-9c83-4beadc3ca0ef"> where after this change the function is broadly similar, but smaller, with one fewer registers updated in each pass through the loop in addition the never-taken `cmp/ja` being gone: <img width="646" alt="after" src="https://github.com/rust-lang/rust/assets/4615790/1bee1d76-b674-43ec-9b21-4587364563aa"> this represents a ~2-3% difference in runtime in my [admittedly comically i32-formatting-bound](https://github.com/athre0z/disas-bench/blob/master/bench/yaxpeax/src/main.rs#L58-L67) use case (printing x86 instructions, including i32 displacements and immediates) as measured on a ryzen 9 3950x. the impact on `<impl LowerHex for i8>::fmt` is both more dramatic and less impactful: it continues to have a loop that is evaluated at most twice, though the compiler doesn't know that to unroll it. the generated code there is identical to the impl for `i32`. there, the smaller loop body has less effect on runtime, and removing the never-taken slice bounds check is offset by whatever address recalculation is happening with the `lea/add/neg` at the end of the loop. it behaves about the same before and after. --- i initially measured slightly better outcomes using `unreachable_unchecked()` here instead, but that was hacking on std and rebuilding with `-Z build-std` on an older rustc (nightly 5b377cece, 2023-06-30). it does not yield better outcomes now, so i see no reason to proceed with that approach at all. <details> <summary>initial notes about that, seemingly irrelevant on modern rustc</summary> i went through a few tries at getting llvm to understand the bounds check isn't necessary, but i should mention the _best_ i'd seen here was actually from the existing `fmt_int` with a diff like ```diff if x == zero { // No more digits left to accumulate. break; }; } } + + if curr >= buf.len() { + unsafe { core::hint::unreachable_unchecked(); } + } let buf = &buf[curr..]; ``` posting a random PR to `rust-lang/rust` to do that without a really really compelling reason seemed a bit absurd, so i tried to work that into something that seems more palatable at a glance. but if you're interested, that certainly produced better (x86_64) code through LLVM. in that case with `buf.iter_mut().rev()` as the iterator, `<impl LowerHex for i8>::fmt` actually unrolls into something like ``` put_char(x & 0xf); let mut len = 1; if x > 0xf { put_char((x >> 4) & 0xf); len = 2; } pad_integral(buf[buf.len() - len..]); ``` it's pretty cool! `<impl LowerHex for i32>::fmt` also was slightly better. that all resulted in closer to an 6% difference in my use case. </details> --- i have not looked at formatters other than LowerHex/UpperHex with this change, though i'd be a bit shocked if any were _worse_. (i have absolutely _no_ idea how you'd regression test this, but that might be just my not knowing what the right tool for that would be in rust-lang/rust. i'm of half a mind that this is small and fiddly enough to not be worth landing lest it quietly regress in the future anyway. but i didn't want to discard the idea without at least offering it upstream here)
165 lines
3.5 KiB
Rust
165 lines
3.5 KiB
Rust
use std::fmt::{self, Write as FmtWrite};
|
|
use std::io::{self, Write as IoWrite};
|
|
|
|
use test::{Bencher, black_box};
|
|
|
|
#[bench]
|
|
fn write_vec_value(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
let mut mem = Vec::new();
|
|
for _ in 0..1000 {
|
|
mem.write_all(black_box("abc").as_bytes()).unwrap();
|
|
}
|
|
});
|
|
}
|
|
|
|
#[bench]
|
|
fn write_vec_ref(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
let mut mem = Vec::new();
|
|
let wr = &mut mem as &mut dyn io::Write;
|
|
for _ in 0..1000 {
|
|
wr.write_all(black_box("abc").as_bytes()).unwrap();
|
|
}
|
|
});
|
|
}
|
|
|
|
#[bench]
|
|
fn write_vec_macro1(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
let mut mem = Vec::new();
|
|
let wr = &mut mem as &mut dyn io::Write;
|
|
for _ in 0..1000 {
|
|
write!(wr, "{}", black_box("abc")).unwrap();
|
|
}
|
|
});
|
|
}
|
|
|
|
#[bench]
|
|
fn write_vec_macro2(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
let mut mem = Vec::new();
|
|
let wr = &mut mem as &mut dyn io::Write;
|
|
for _ in 0..1000 {
|
|
write!(wr, "{}", black_box("abc")).unwrap();
|
|
}
|
|
});
|
|
}
|
|
|
|
#[bench]
|
|
fn write_vec_macro_debug(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
let mut mem = Vec::new();
|
|
let wr = &mut mem as &mut dyn io::Write;
|
|
for _ in 0..1000 {
|
|
write!(wr, "{:?}", black_box("☃")).unwrap();
|
|
}
|
|
});
|
|
}
|
|
|
|
#[bench]
|
|
fn write_str_value(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
let mut mem = String::new();
|
|
for _ in 0..1000 {
|
|
mem.write_str(black_box("abc")).unwrap();
|
|
}
|
|
});
|
|
}
|
|
|
|
#[bench]
|
|
fn write_str_ref(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
let mut mem = String::new();
|
|
let wr = &mut mem as &mut dyn fmt::Write;
|
|
for _ in 0..1000 {
|
|
wr.write_str(black_box("abc")).unwrap();
|
|
}
|
|
});
|
|
}
|
|
|
|
#[bench]
|
|
fn write_str_macro1(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
let mut mem = String::new();
|
|
for _ in 0..1000 {
|
|
write!(mem, "{}", black_box("abc")).unwrap();
|
|
}
|
|
});
|
|
}
|
|
|
|
#[bench]
|
|
fn write_str_macro2(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
let mut mem = String::new();
|
|
let wr = &mut mem as &mut dyn fmt::Write;
|
|
for _ in 0..1000 {
|
|
write!(wr, "{}", black_box("abc")).unwrap();
|
|
}
|
|
});
|
|
}
|
|
|
|
#[bench]
|
|
fn write_str_macro_debug(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
let mut mem = String::new();
|
|
let wr = &mut mem as &mut dyn fmt::Write;
|
|
for _ in 0..1000 {
|
|
write!(wr, "{:?}", black_box("☃")).unwrap();
|
|
}
|
|
});
|
|
}
|
|
|
|
#[bench]
|
|
fn write_str_macro_debug_ascii(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
let mut mem = String::new();
|
|
let wr = &mut mem as &mut dyn fmt::Write;
|
|
for _ in 0..1000 {
|
|
write!(wr, "{:?}", black_box("Hello, World!")).unwrap();
|
|
}
|
|
});
|
|
}
|
|
|
|
#[bench]
|
|
fn write_u128_max(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
test::black_box(format!("{}", u128::MAX));
|
|
});
|
|
}
|
|
|
|
#[bench]
|
|
fn write_u128_min(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
let s = format!("{}", 0u128);
|
|
test::black_box(s);
|
|
});
|
|
}
|
|
|
|
#[bench]
|
|
fn write_u64_max(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
test::black_box(format!("{}", u64::MAX));
|
|
});
|
|
}
|
|
|
|
#[bench]
|
|
fn write_u64_min(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
test::black_box(format!("{}", 0u64));
|
|
});
|
|
}
|
|
|
|
#[bench]
|
|
fn write_u8_max(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
test::black_box(format!("{}", u8::MAX));
|
|
});
|
|
}
|
|
|
|
#[bench]
|
|
fn write_u8_min(bh: &mut Bencher) {
|
|
bh.iter(|| {
|
|
test::black_box(format!("{}", 0u8));
|
|
});
|
|
}
|