Don't make statement nonterminals match pattern nonterminals
Right now, the heuristic we use to check if a token may begin a pattern nonterminal falls back to `may_be_ident`:
ef71f1047e/compiler/rustc_parse/src/parser/nonterminal.rs (L21-L37)
This has the unfortunate side effect that a `stmt` nonterminal eagerly matches against a `pat` nonterminal, leading to a parse error:
```rust
macro_rules! m {
($pat:pat) => {};
($stmt:stmt) => {};
}
macro_rules! m2 {
($stmt:stmt) => {
m! { $stmt }
};
}
m2! { let x = 1 }
```
This PR fixes it by more accurately reflecting the set of nonterminals that may begin a pattern nonterminal.
As a side-effect, I modified `Token::can_begin_pattern` to work correctly and used that in `Parser::nonterminal_may_begin_with`.
By keeping track of attributes that have been previously processed.
This fixes the `macro-rules-derive-cfg.stdout` test, and is necessary
for #124141 which removes nonterminals.
Also shrink the `SmallVec` inline size used in `IntervalSet`. 2 gives
slightly better perf than 4 now that there's an `IntervalSet` in
`Parser`, which is cloned reasonably often.
In a case like this:
```
mod a {
mod b {
#[cfg_attr(unix, inline)]
fn f() {
#[cfg_attr(linux, inline)]
fn g1() {}
#[cfg_attr(linux, inline)]
fn g2() {}
}
}
}
```
We currently end up with the following replacement ranges.
- The lazy tokens for `f` has replacement ranges for `g1` and `g2`.
- The lazy tokens for `a` has replacement ranges for `f`, `g1`, and
`g2`.
I.e. the replacement ranges for `g1` and `g2` are duplicated. In
general, replacement ranges for inner AST nodes are duplicated up the
chain for each nested `collect_tokens` call. And the code that processes
the replacements is careful about the ordering in which the replacements
are applied, to ensure that inner replacements are applied before outer
replacements.
But all of this is unnecessary. If you apply an inner replacement and
then an outer replacement, the outer replacement completely overwrites
the inner replacement.
This commit avoids the duplication by removing replacements from
`self.capture_state.parser_replacements` when they are used. (The effect
on the example above is that the lazy tokesn for `a` no longer include
replacement ranges for `g1` and `g2`.) This eliminates the possibility
of nested replacements on individual AST nodes, which avoids the need
for careful ordering of replacements.
This example triggers an assertion failure:
```
fn f() -> u32 {
#[cfg_eval] #[cfg(not(FALSE))] 0
}
```
The sequence of events:
- `configure_annotatable` calls `parse_expr_force_collect`, which calls
`collect_tokens`.
- Within that, we end up in `parse_expr_dot_or_call`, which again calls
`collect_tokens`.
- The return value of the `f` call is the expression `0`.
- This inner call collects tokens for `0` (parser range 10..11) and
creates a replacement covering `#[cfg(not(FALSE))] 0` (parser range
0..11).
- We return to the outer `collect_tokens` call. The return value of the
`f` call is *again* the expression `0`, again with the range 10..11,
but the replacement from earlier covers the range 0..11. The code
mistakenly assumes that any attributes from an inner `collect_tokens`
call fit entirely within the body of the result of an outer
`collect_tokens` call. So it adjusts the replacement parser range
0..11 to a node range by subtracting 10, resulting in -10..1. This is
an invalid range and triggers an assertion failure.
It's tricky to follow, but basically things get complicated when an AST
node is returned from an inner `collect_tokens` call and then returned
again from an outer `collect_token` node without being wrapped in any
kind of additional layer.
This commit changes `collect_tokens` to return early in some extra cases,
avoiding the construction of lazy tokens. In the example above, the
outer `collect_tokens` returns earlier because the `0` token already has
tokens and `self.capture_state.capturing` is `Capturing::No`. This early
return avoids the creation of the invalid range and the assertion
failure.
Fixes#129166. Note: these invalid ranges have been happening for a long
time. #128725 looks like it's at fault only because it introduced the
assertion that catches the invalid ranges.
Stabilize opaque type precise capturing (RFC 3617)
This PR partially stabilizes opaque type *precise capturing*, which was specified in [RFC 3617](https://github.com/rust-lang/rfcs/pull/3617), and whose syntax was amended by FCP in [#125836](https://github.com/rust-lang/rust/issues/125836).
This feature, as stabilized here, gives us a way to explicitly specify the generic lifetime parameters that an RPIT-like opaque type captures. This solves the problem of overcapturing, for lifetime parameters in these opaque types, and will allow the Lifetime Capture Rules 2024 ([RFC 3498](https://github.com/rust-lang/rfcs/pull/3498)) to be fully stabilized for RPIT in Rust 2024.
### What are we stabilizing?
This PR stabilizes the use of a `use<'a, T>` bound in return-position impl Trait opaque types. Such a bound fully specifies the set of generic parameters captured by the RPIT opaque type, entirely overriding the implicit default behavior. E.g.:
```rust
fn does_not_capture<'a, 'b>() -> impl Sized + use<'a> {}
// ~~~~~~~~~~~~~~~~~~~~
// This RPIT opaque type does not capture `'b`.
```
The way we would suggest thinking of `impl Trait` types *without* an explicit `use<..>` bound is that the `use<..>` bound has been *elided*, and that the bound is filled in automatically by the compiler according to the edition-specific capture rules.
All non-`'static` lifetime parameters, named (i.e. non-APIT) type parameters, and const parameters in scope are valid to name, including an elided lifetime if such a lifetime would also be valid in an outlives bound, e.g.:
```rust
fn elided(x: &u8) -> impl Sized + use<'_> { x }
```
Lifetimes must be listed before type and const parameters, but otherwise the ordering is not relevant to the `use<..>` bound. Captured parameters may not be duplicated. For now, only one `use<..>` bound may appear in a bounds list. It may appear anywhere within the bounds list.
### How does this differ from the RFC?
This stabilization differs from the RFC in one respect: the RFC originally specified `use<'a, T>` as syntactically part of the RPIT type itself, e.g.:
```rust
fn capture<'a>() -> impl use<'a> Sized {}
```
However, settling on the final syntax was left as an open question. T-lang later decided via FCP in [#125836](https://github.com/rust-lang/rust/issues/125836) to treat `use<..>` as a syntactic bound instead, e.g.:
```rust
fn capture<'a>() -> impl Sized + use<'a> {}
```
### What aren't we stabilizing?
The key goal of this PR is to stabilize the parts of *precise capturing* that are needed to enable the migration to Rust 2024.
There are some capabilities of *precise capturing* that the RFC specifies but that we're not stabilizing here, as these require further work on the type system. We hope to lift these limitations later.
The limitations that are part of this PR were specified in the [RFC's stabilization strategy](https://rust-lang.github.io/rfcs/3617-precise-capturing.html#stabilization-strategy).
#### Not capturing type or const parameters
The RFC addresses the overcapturing of type and const parameters; that is, it allows for them to not be captured in opaque types. We're not stabilizing that in this PR. Since all in scope generic type and const parameters are implicitly captured in all editions, this is not needed for the migration to Rust 2024.
For now, when using `use<..>`, all in scope type and const parameters must be nameable (i.e., APIT cannot be used) and included as arguments. For example, this is an error because `T` is in scope and not included as an argument:
```rust
fn test<T>() -> impl Sized + use<> {}
//~^ ERROR `impl Trait` must mention all type parameters in scope in `use<...>`
```
This is due to certain current limitations in the type system related to how generic parameters are represented as captured (i.e. bivariance) and how inference operates.
We hope to relax this in the future, and this stabilization is forward compatible with doing so.
#### Precise capturing for return-position impl Trait **in trait** (RPITIT)
The RFC specifies precise capturing for RPITIT. We're not stabilizing that in this PR. Since RPITIT already adheres to the Lifetime Capture Rules 2024, this isn't needed for the migration to Rust 2024.
The effect of this is that the anonymous associated types created by RPITITs must continue to capture all of the lifetime parameters in scope, e.g.:
```rust
trait Foo<'a> {
fn test() -> impl Sized + use<Self>;
//~^ ERROR `use<...>` precise capturing syntax is currently not allowed in return-position `impl Trait` in traits
}
```
To allow this involves a meaningful amount of type system work related to adding variance to GATs or reworking how generics are represented in RPITITs. We plan to do this work separately from the stabilization. See:
- https://github.com/rust-lang/rust/pull/124029
Supporting precise capturing for RPITIT will also require us to implement a new algorithm for detecting refining capture behavior. This may involve looking through type parameters to detect cases where the impl Trait type in an implementation captures fewer lifetimes than the corresponding RPITIT in the trait definition, e.g.:
```rust
trait Foo {
fn rpit() -> impl Sized + use<Self>;
}
impl<'a> Foo for &'a () {
// This is "refining" due to not capturing `'a` which
// is implied by the trait's `use<Self>`.
fn rpit() -> impl Sized + use<>;
// This is not "refining".
fn rpit() -> impl Sized + use<'a>;
}
```
This stabilization is forward compatible with adding support for this later.
### The technical details
This bound is purely syntactical and does not lower to a [`Clause`](https://doc.rust-lang.org/1.79.0/nightly-rustc/rustc_middle/ty/type.ClauseKind.html) in the type system. For the purposes of the type system (and for the types team's curiosity regarding this stabilization), we have no current need to represent this as a `ClauseKind`.
Since opaques already capture a variable set of lifetimes depending on edition and their syntactical position (e.g. RPIT vs RPITIT), a `use<..>` bound is just a way to explicitly rather than implicitly specify that set of lifetimes, and this only affects opaque type lowering from AST to HIR.
### FCP plan
While there's much discussion of the type system here, the feature in this PR is implemented internally as a transformation that happens before lowering to the type system layer. We already support impl Trait types partially capturing the in scope lifetimes; we just currently only expose that implicitly.
So, in my (errs's) view as a types team member, there's nothing for types to weigh in on here with respect to the implementation being stabilized, and I'd suggest a lang-only proposed FCP (though we'll of course CC the team below).
### Authorship and acknowledgments
This stabilization report was coauthored by compiler-errors and TC.
TC would like to acknowledge the outstanding and speedy work that compiler-errors has done to make this feature happen.
compiler-errors thanks TC for authoring the RFC, for all of his involvement in this feature's development, and pushing the Rust 2024 edition forward.
### Open items
We're doing some things in parallel here. In signaling the intention to stabilize, we want to uncover any latent issues so we can be sure they get addressed. We want to give the maximum time for discussion here to happen by starting it while other remaining miscellaneous work proceeds. That work includes:
- [x] Look into `syn` support.
- https://github.com/dtolnay/syn/issues/1677
- https://github.com/dtolnay/syn/pull/1707
- [x] Look into `rustfmt` support.
- https://github.com/rust-lang/rust/pull/126754
- [x] Look into `rust-analyzer` support.
- https://github.com/rust-lang/rust-analyzer/issues/17598
- https://github.com/rust-lang/rust-analyzer/pull/17676
- [x] Look into `rustdoc` support.
- https://github.com/rust-lang/rust/issues/127228
- https://github.com/rust-lang/rust/pull/127632
- https://github.com/rust-lang/rust/pull/127658
- [x] Suggest this feature to RfL (a known nightly user).
- [x] Add a chapter to the edition guide.
- https://github.com/rust-lang/edition-guide/pull/316
- [x] Update the Reference.
- https://github.com/rust-lang/reference/pull/1577
### (Selected) implementation history
* https://github.com/rust-lang/rfcs/pull/3498
* https://github.com/rust-lang/rfcs/pull/3617
* https://github.com/rust-lang/rust/pull/123468
* https://github.com/rust-lang/rust/issues/125836
* https://github.com/rust-lang/rust/pull/126049
* https://github.com/rust-lang/rust/pull/126753Closes#123432.
cc `@rust-lang/lang` `@rust-lang/types`
`@rustbot` labels +T-lang +I-lang-nominated +A-impl-trait +F-precise_capturing
Tracking:
- https://github.com/rust-lang/rust/issues/123432
----
For the compiler reviewer, I'll leave some inline comments about diagnostics fallout :^)
r? compiler
Stabilize `unsafe_attributes`
# Stabilization report
## Summary
This is a tracking issue for the RFC 3325: unsafe attributes
We are stabilizing `#![feature(unsafe_attributes)]`, which makes certain attributes considered 'unsafe', meaning that they must be surrounded by an `unsafe(...)`, as in `#[unsafe(no_mangle)]`.
RFC: rust-lang/rfcs#3325
Tracking issue: #123757
## What is stabilized
### Summary of stabilization
Certain attributes will now be designated as unsafe attributes, namely, `no_mangle`, `export_name`, and `link_section` (stable only), and these attributes will need to be called by surrounding them in `unsafe(...)` syntax. On editions prior to 2024, this is simply an edition lint, but it will become a hard error in 2024. This also works in `cfg_attr`, but `unsafe` is not allowed for any other attributes, including proc-macros ones.
```rust
#[unsafe(no_mangle)]
fn a() {}
#[cfg_attr(any(), unsafe(export_name = "c"))]
fn b() {}
```
For a table showing the attributes that were considered to be included in the list to require unsafe, and subsequent reasoning about why each such attribute was or was not included, see [this comment here](https://github.com/rust-lang/rust/pull/124214#issuecomment-2124753464)
## Tests
The relevant tests are in `tests/ui/rust-2024/unsafe-attributes` and `tests/ui/attributes/unsafe`.
This commit does the following.
- Renames `collect_tokens_trailing_token` as `collect_tokens`, because
(a) it's annoying long, and (b) the `_trailing_token` bit is less
accurate now that its types have changed.
- In `collect_tokens`, adds a `Option<CollectPos>` argument and a
`UsePreAttrPos` in the return type of `f`. These are used in
`parse_expr_force_collect` (for vanilla expressions) and in
`parse_stmt_without_recovery` (for two different cases of expression
statements). Together these ensure are enough to fix all the problems
with token collection and assoc expressions. The changes to the
`stringify.rs` test demonstrate some of these.
- Adds a new test. The code in this test was causing an assertion
failure prior to this commit, due to an invalid `NodeRange`.
The extra complexity is annoying, but necessary to fix the existing
problems.
This pre-existing type is suitable for use with the return value of the
`f` parameter in `collect_tokens_trailing_token`. The more descriptive
name will be useful because the next commit will add another boolean
value to the return value of `f`.
Fix bug in `Parser::look_ahead`.
The special case was failing to handle invisible delimiters on one path.
Fixes (but doesn't close until beta backported) #128895.
r? `@davidtwco`
Use more slice patterns inside the compiler
Nothing super noteworthy. Just replacing the common 'fragile' pattern of "length check followed by indexing or unwrap" with slice patterns for legibility and 'robustness'.
r? ghost
Previously we would try to issue a suggestion for `let x <op>= 1`, i.e.
a compound assignment within a `let` binding, to remove the `<op>`. The
suggestion code unfortunately incorrectly assumed that the `<op>` is an
exactly-1-byte ASCII character, but this assumption is incorrect because
we also recover Unicode-confusables like `➖=` as `-=`. In this example,
the suggestion code used a `+ BytePos(1)` to calculate the span of the
`<op>` codepoint that looks like `-` but the mult-byte Unicode
look-alike would cause the suggested removal span to be inside a
multi-byte codepoint boundary, triggering a codepoint boundary
assertion.
Issue: <https://github.com/rust-lang/rust/issues/128845>
More unsafe attr verification
This code denies unsafe on attributes such as `#[test]` and `#[ignore]`, while also changing the `MetaItem` parsing so `unsafe` in args like `#[allow(unsafe(dead_code))]` is not accidentally allowed.
Tracking:
- https://github.com/rust-lang/rust/issues/123757
When collecting tokens there are two kinds of range:
- a range relative to the parser's full token stream (which we get when
we are parsing);
- a range relative to a single AST node's token stream (which we use
within `LazyAttrTokenStreamImpl` when replacing tokens).
These are currently both represented with `Range<u32>` and it's easy to
mix them up -- until now I hadn't properly understood the difference.
This commit introduces `ParserRange` and `NodeRange` to distinguish
them. This also requires splitting `ReplaceRange` in two, giving the new
types `ParserReplacement` and `NodeReplacement`. (These latter two names
reduce the overloading of the word "range".)
The commit also rewrites some comments to be clearer.
The end result is a little more verbose, but much clearer.