Optimize SipHash by reordering compress instructions
This PR optimizes hashing by changing the order of instructions in the sip.rs `compress` macro so the CPU can parallelize it better. The new order is taken directly from Fig 2.1 in [the SipHash paper](https://eprint.iacr.org/2012/351.pdf) (but with the xors moved which makes it a little faster). I attempted to optimize it some more after this, but I think this might be the optimal instruction order. Note that this shouldn't change the behavior of hashing at all, only statements that don't depend on each other were reordered.
It appears like the current order hasn't changed since its [original implementation from 2012](fada46c421 (diff-b751133c229259d7099bbbc7835324e5504b91ab1aded9464f0c48cd22e5e420R35)) which doesn't look like it was written with data dependencies in mind.
Running `./x bench library/core --stage 0 --test-args hash` before and after this change shows the following results:
Before:
```
benchmarks:
hash::sip::bench_bytes_4 7.20/iter +/- 0.70
hash::sip::bench_bytes_7 9.01/iter +/- 0.35
hash::sip::bench_bytes_8 8.12/iter +/- 0.10
hash::sip::bench_bytes_a_16 10.07/iter +/- 0.44
hash::sip::bench_bytes_b_32 13.46/iter +/- 0.71
hash::sip::bench_bytes_c_128 37.75/iter +/- 0.48
hash::sip::bench_long_str 121.18/iter +/- 3.01
hash::sip::bench_str_of_8_bytes 11.20/iter +/- 0.25
hash::sip::bench_str_over_8_bytes 11.20/iter +/- 0.26
hash::sip::bench_str_under_8_bytes 9.89/iter +/- 0.59
hash::sip::bench_u32 9.57/iter +/- 0.44
hash::sip::bench_u32_keyed 6.97/iter +/- 0.10
hash::sip::bench_u64 8.63/iter +/- 0.07
```
After:
```
benchmarks:
hash::sip::bench_bytes_4 6.64/iter +/- 0.14
hash::sip::bench_bytes_7 8.19/iter +/- 0.07
hash::sip::bench_bytes_8 8.59/iter +/- 0.68
hash::sip::bench_bytes_a_16 9.73/iter +/- 0.49
hash::sip::bench_bytes_b_32 12.70/iter +/- 0.06
hash::sip::bench_bytes_c_128 32.38/iter +/- 0.20
hash::sip::bench_long_str 102.99/iter +/- 0.82
hash::sip::bench_str_of_8_bytes 10.71/iter +/- 0.21
hash::sip::bench_str_over_8_bytes 11.73/iter +/- 0.17
hash::sip::bench_str_under_8_bytes 10.33/iter +/- 0.41
hash::sip::bench_u32 10.41/iter +/- 0.29
hash::sip::bench_u32_keyed 9.50/iter +/- 0.30
hash::sip::bench_u64 8.44/iter +/- 1.09
```
I ran this on my computer so there's some noise, but you can tell at least `bench_long_str` is significantly faster (~18%).
Also, I noticed the same compress function from the library is used in the compiler as well, so I took the liberty of copy-pasting this change to there as well.
Thanks `@semisol` for porting SipHash for another project which led me to notice this issue in Rust, and for helping investigate. <3
rustdoc: update to pulldown-cmark 0.11
r? rustdoc
This pull request updates rustdoc to the latest version of pulldown-cmark. Along with adding new markdown extensions (which this PR doesn't enable), the new pulldown-cmark version also fixes a large number of bugs. Because all text files successfully parse as markdown, these bugfixes change the output, which can break people's existing docs.
A crater run, https://github.com/rust-lang/rust/pull/121659, has already been run for this change.
The first commit upgrades and fixes rustdoc. The second commit adds a lint for the footnote and block quote parser changes, which break the largest numbers of docs in the Crater run. The strikethrough change was mitigated in pulldown-cmark itself.
Unblocks https://github.com/rust-lang/rust-clippy/pull/12876
Rollup of 6 pull requests
Successful merges:
- #127092 (Change return-type-notation to use `(..)`)
- #127184 (More refactorings to rustc_interface)
- #127190 (Update LLVM submodule)
- #127253 (Fix incorrect suggestion for extra argument with a type error)
- #127280 (Disable rmake test rustdoc-io-error on riscv64gc-gnu)
- #127294 (Less magic number for corountine)
r? `@ghost`
`@rustbot` modify labels: rollup
Disable rmake test rustdoc-io-error on riscv64gc-gnu
In https://github.com/rust-lang/rust/pull/126917 we disabled `inaccessible-temp-dir` on `riscv64gc-gnu` because the container runs the build as `root` (just like the `armhf-gnu` builds). Tests creating an inaccessible test directory are not possible, since `root` can always touch those directories.
553a69030e/src/ci/docker/host-x86_64/disabled/riscv64gc-gnu/Dockerfile (L99)
This means the tests are run as `root`. As `root`, it's perfectly normal and reasonable to violate permission checks this way:
```bash
$ sudo mkdir scratch
$ sudo chmod o-w scratch
$ sudo mkdir scratch/backs
$
```
Because of this, this PR makes the test ignored on `riscv64gc` (just like on `armhf-gnu`) for now.
As an alternative, I believe the best long-term strategy would be to not run the tests as `root` for this job. Some preliminary exploration was done in https://github.com/rust-lang/rust/pull/126917#issuecomment-2189933970, however that appears a larger lift.
## Testing
> [!NOTE]
> `riscv64gc-unknown-linux-gnu` is a [**Tier 2 with Host Tools** platform](https://doc.rust-lang.org/beta/rustc/platform-support.html), all tests may not necessarily pass! This change should only ignore `inaccessible-temp-dir` and not affect other tests.
You can test out the job locally:
```sh
DEPLOY=1 ./src/ci/docker/run.sh riscv64gc-gnu
```
r? `@jieyouxu`
Fix incorrect suggestion for extra argument with a type error
Fixes#126246
I tried to fix it in the `find_errors` of ArgMatrix, but seems it's hard to avoid breaking some other test cases.
The root cause is we eliminate the first argument even with a type error at here:
6292b2af62/compiler/rustc_hir_typeck/src/fn_ctxt/checks.rs (L664)
So the left argument is always treated as extra one.
But if there is already a type error, an error message will be generated firstly, which make this issue a trivial one.
If you misused a count command like `@count $some.selector '"T'"`, you would panic with OOB:
```
thread 'main' panicked at src/tools/jsondocck/src/main.rs:76:92:
index out of bounds: the len is 2 but the index is 2
```
Fixing this typo, we now get.
```
Invalid command: Second argument to @count must be a valid usize (got `"T"`) on line 20
```
As some point I want to rewrite this code to avoid indexing in general, but this is a nice small fix.
Make jump threading state sparse
Continuation of https://github.com/rust-lang/rust/pull/127024
Both dataflow const-prop and jump threading involve cloning the state vector a lot. This PR replaces the data structure by a sparse vector, considering:
- that jump threading state is typically very sparse (at most 1 or 2 set entries);
- that dataflow const-prop is disabled by default;
- that place/value map is very eager, and prone to creating an overly large state.
The first commit is shared with the previous PR to avoid needless conflicts.
r? `@oli-obk`
Add parse fail test using safe trait/impl trait
Added 2 more tests to be sure that nothing weird happens using `safe` on items.
Needed to do this in separate tests as they give parsing errors.
Remove global error count checks from typeck
Some of these are not reachable anymore, others can now rely on information local to the current typeck run. One check was actually invalid, because it was relying on wfcheck running before typeck, which is not guaranteed in the query system and usually easy to create ICEing examples for via const eval (which runs typeck before wfcheck)
Add `as_lang_item` to `LanguageItems`, new trait solver
Add `as_lang_item` which turns `DefId` into a `TraitSolverLangItem` in the new trait solver, so we can turn the large chain of if statements in `assemble_builtin_impl_candidates` into a match instead.
r? lcnr
Make mtime of reproducible tarballs dependent on git commit
Since https://github.com/rust-lang/rust/pull/123246, our tarballs should be fully reproducible. That means that the mtime of all files and directories in the tarballs is set to the date of the first Rust commit (from 2006). However, this is causing some mtime invalidation issues (https://github.com/rust-lang/rust/issues/125578#issuecomment-2141068906).
Ideally, we would like to keep the mtime reproducible, but still update it with new versions of Rust. That's what this PR does. It modifies the tarball installer bootstrap invocation so that if the current rustc directory is managed by git, we will set the UTC timestamp of the latest commit as the mtime for all files in the archive. This means that the archive should be still fully reproducible from a given commit SHA, but it will also be changed with new beta bumps and `download-rustc` versions.
Note that only files are set to this mtime, directories are still set to the year 2006, because the `tar` library used by `rust-installer` doesn't allow us to selectively override mtime for directories (or at least I haven't found it). We could work around that by doing all the mtime modifications in bootstrap, but that would require more changes. I think/hope that just modifying the file mtimes should be enough. It should at least fix cargo `rustc` mtime invalidation.
Fixes: https://github.com/rust-lang/rust/issues/125578
r? ``@onur-ozkan``
try-job: x86_64-gnu-distcheck
linker: Link dylib crates by path
Linkers seem to support linking dynamic libraries by path.
Not sure why the previous scheme with splitting the path into a directory (passed with `-L`) and a name (passed with `-l`) was used (upd: likely due to https://github.com/rust-lang/rust/pull/126094#issuecomment-2155063414).
When we split a library path `some/dir/libfoo.so` into `-L some/dir` and `-l foo` we add `some/dir` to search directories for *all* libraries looked up by the linker, not just `foo`, and `foo` is also looked up in *all* search directories not just `some/dir`.
Technically we may find some unintended libraries this way.
Therefore linking dylibs via a full path is both simpler and more reliable.
It also makes the set of search directories more easily reproducible when we need to lookup some native library manually (like in https://github.com/rust-lang/rust/pull/123436).
Re-implement a type-size based limit
r? lcnr
This PR reintroduces the type length limit added in #37789, which was accidentally made practically useless by the caching changes to `Ty::walk` in #72412, which caused the `walk` function to no longer walk over identical elements.
Hitting this length limit is not fatal unless we are in codegen -- so it shouldn't affect passes like the mir inliner which creates potentially very large types (which we observed, for example, when the new trait solver compiles `itertools` in `--release` mode).
This also increases the type length limit from `1048576 == 2 ** 20` to `2 ** 24`, which covers all of the code that can be reached with craterbot-check. Individual crates can increase the length limit further if desired.
Perf regression is mild and I think we should accept it -- reinstating this limit is important for the new trait solver and to make sure we don't accidentally hit more type-size related regressions in the future.
Fixes#125460
Implement the `_mm256_zeroupper` and `_mm256_zeroall` intrinsics
These two intrinsics were missing from the original implementation of the AVX intrinsics.
Fortunately their implementation is trivial.
Give remote-test-client a longer timeout
In https://github.com/rust-lang/rust/pull/126959, ``@Mark-Simulacrum`` suggested we simply extend the timeout of the `remote-test-client` instead of making it configurable. This is acceptable for my use case.
I'm doing some work with a remote host running tests using `x.py`'s remote test runner system.
After building the `remote-test-server` with:
```bash
./x build src/tools/remote-test-server --target aarch64-unknown-linux-gnu
```
I can transfer the `remote-test-server` to the remote and set up a port forwarded SSH connection, then I run:
```bash
TEST_DEVICE_ADDR=127.0.0.1:12345 ./x.py test library/core --target aarch64-unknown-linux-gnu
```
This works great if the host is nearby, however if the average round trip time is over 100ms (the timeout), it creates problems as it silently trips the timeout.
This PR extends that timeout to a less strict 2s.
r? ``@Mark-Simulacrum``