tree-sitter-qmd - allow multi-line attribute lists on inline images/spans#209
tree-sitter-qmd - allow multi-line attribute lists on inline images/spans#209rundel wants to merge 3 commits into
Conversation
…es/spans
Multi-line `{...}` attribute lists on inline images, spans, and other
inline constructs that share `_pandoc_attr_specifier` were rejected at
the first attribute. The same form rendered fine in TS Quarto and was
common in quarto-web sources, leaving q2 as the lone holdout.
Root cause was twofold in `tree-sitter-markdown/grammar.js`:
- `attribute_specifier` / `_pandoc_attr_specifier` had no tolerance
for whitespace between `{` and the first specifier (or between
`_commonmark_specifier_start_with_class` and `}` — also broke the
single-line `{ .cls }` form).
- `_inline_whitespace` is `choice($._whitespace, $._soft_line_break)`,
a single token, so a `\n` followed by indent on the next line
couldn't match as one inter-attribute separator.
Introduce `_attr_ws: prec(-1, repeat1(choice($._whitespace, $._soft_line_break)))`
and use it where attribute lists need to span lines:
- `optional($._attr_ws)` immediately after `{` in both
`attribute_specifier` and `_pandoc_attr_specifier`.
- Add trailing `optional($._attr_ws)` inside
`_commonmark_specifier_start_with_class` (already present on
`_commonmark_specifier_start_with_kv`). Keeping the trailer inside
the specifier (rather than around the choice at the wrapper)
avoids an LR conflict with `language_specifier`.
- Swap inner `_inline_whitespace` → `_attr_ws` at the four sites
that join successive classes, classes-to-kv, and successive kvs.
`commonmark_specifier`'s leading `optional($._inline_whitespace)` is
left untouched — changing it triggered the language_specifier conflict,
and the wrapper-level `_attr_ws` already absorbs anything that would
have leaked through.
Regenerated artifacts (`parser.c`, `grammar.json`, `node-types.json`)
are produced by `tree-sitter generate`; do not hand-edit. Also
regenerated `crates/pampa/resources/error-corpus/_autogen-table.json`
via `deno run -A scripts/build_error_table.ts` because parser-state
IDs shifted — without this regen, 7 `apostrophe-quotes` tests in
`qmd-syntax-helper` fail (the rule keys off Q-2-10 diagnostics which
map through `(lr_state, sym)` pairs).
Tests:
- 6 new tree-sitter corpus cases in `inline-multiline-attrs.txt`
cover multi-line class-only, multi-line class + key=value,
`{ .cls }` symmetry, span multi-line class, span multi-line kv,
and `# Heading { .cls }` for the block-level path.
- 1 new pampa integration test in `test_attr_source_parsing.rs`
verifies that the AST and `attr_source` byte offsets are correct
for the quarto-web-style multi-line form.
Full tree-sitter corpus: 492/492 pass (was 486). Full workspace:
8943/8943 pass. `cargo xtask verify` Rust legs all green.
End-to-end check:
$ pampa repro.qmd
[ Header 1 ( "test" , [] , [] ) [Str "Test"]
, Para [Image ( "" , ["hero-banner", "img-fluid"]
, [("fig-align", "center"), ("width", "600px")] )
[] ("featured.png" , "")]
, Para [Str "Done."] ]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
I'm not against this fix, but anything involving line breaks in Markdown gives me a case of the heebie jeebies. I think it's prudent to add tests that exercise those spans inside bulleted lists and block quotes before declaring success. (Operational annoyance: every time a grammar fix makes it in, we need to regenerate the error corpus table and |
…e error
Adds edge-case coverage for the multi-line inline-attribute fix:
* tree-sitter corpus (`inline-multiline-attrs.txt`): six new cases for
multi-line `{...}` attribute lists inside list items — bulleted,
ordered, nested, image-in-list, plus two sibling-bullet variants
(top-level and nested) that exercise list continuation across
preceding and following items.
* pampa integration tests (`test_attr_source_parsing.rs`): five new
AST-level tests mirroring the corpus cases — verifying that classes,
key/value pairs, and `attr_source` byte offsets survive the
block→inline boundary for each list shape.
Adds a new error code Q-2-37 for the one shape the grammar fix
cannot reach — multi-line attribute lists inside a blockquote. The
tree-sitter external scanner short-circuits SOFT_LINE_ENDING when the
next line begins with `>` (`scanner.c:2380-2407`), so the inline pass
only sees the first physical line of the attribute list and Q-2-2
fires at the same `(state=2587, sym="_close_block")` pair as a
plain top-level unclosed `{`. The two cases are indistinguishable at
the error-table lookup level but distinguishable in the source text.
* `resources/error-corpus/Q-2-37.json` — documents the new code with
`cases: []` (no state mapping; this entry is emitted manually).
* `readers/qmd_error_messages.rs::upgrade_q22_to_q237_if_in_blockquote`
— post-processes each Q-2-2 diagnostic: if the line of the failing
`{` (after stripping leading whitespace) begins with `>`, rewrites
`code`, `title`, `problem`, `hints`, and clears inherited
`details` so the message reads cleanly without the Q-2-2 anchor
note.
* `tests/test_q_2_37_blockquote_multiline_attrs.rs` — four tests:
image-in-blockquote upgrades to Q-2-37; span-in-blockquote upgrades;
blockquote with leading indent still upgrades; top-level `[attr]{[`
stays Q-2-2 (negative control).
Full tree-sitter corpus: 495/495 (was 489 pre-fix, 493 after the
initial commit on this branch). Full workspace: 8952/8952.
End-to-end check on a real blockquote case:
Error: [Q-2-37] Multi-line inline attribute list inside blockquote
1 │ > {
│ ╰── Inside a blockquote, an inline `{...}`
│ attribute list cannot span multiple lines.
ℹ Put the attribute list on a single line, or move this construct
out of the blockquote.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ne-attrs # Conflicts: # crates/pampa/resources/error-corpus/Q-2-37.json # crates/pampa/resources/error-corpus/_autogen-table.json # crates/tree-sitter-qmd/tree-sitter-markdown/src/parser.c
|
Good call - bulleted lists look like they work fine and block quotes are a nightmare. For now I punted with it just throwing an error on multiline attributes w/in a block quote I'm not sure if the Q-2-2 upgrade to Q-2-38 behavior based on context is acceptable or not. Claude's summary is below. Builds on the earlier multi-line attribute-list fix by adding edge-case coverage and a new diagnostic for the one shape the grammar fix can't reach. List-context testsMulti-line {...} attribute lists already work inside list items thanks to list-continuation handling — these tests lock that in against regression. Tree-sitter corpus (
Pampa integration tests (test_attr_source_parsing.rs, 5 new cases): AST-level mirrors of the above, verifying classes, key/value pairs, and attr_source byte offsets survive the block→inline boundary for each shape. Q-2-38 — blockquote-specific errorMulti-line attribute lists inside a > blockquote don't work: the tree-sitter external scanner short-circuits
|
|
I think I want to leave this as an open issue until we can handle it uniformly. Creating these syntax exceptions is sort of opening the door for future trouble. I'd rather us reject those attributes uniformly even if it's annoying. |
Not sure if this is even something you want to support or not - if the latter feel free to close this. Otherwise this is an attempt at a semi-minimally invasive fix for multi-line attributes.
Multi-line
{...}attribute lists on inline images, spans, and other inline constructs that share_pandoc_attr_specifierwere rejected at the first attribute. The same form rendered fine in TS Quarto and was common in quarto-web sources, leaving q2 as the lone holdout.Root cause was twofold in
tree-sitter-markdown/grammar.js:attribute_specifier/_pandoc_attr_specifierhad no tolerance for whitespace between{and the first specifier (or between_commonmark_specifier_start_with_classand}— also broke the single-line{ .cls }form)._inline_whitespaceischoice($._whitespace, $._soft_line_break), a single token, so a\nfollowed by indent on the next line couldn't match as one inter-attribute separator.Introduce
_attr_ws: prec(-1, repeat1(choice($._whitespace, $._soft_line_break)))and use it where attribute lists need to span lines:optional($._attr_ws)immediately after{in bothattribute_specifierand_pandoc_attr_specifier.optional($._attr_ws)inside_commonmark_specifier_start_with_class(already present on_commonmark_specifier_start_with_kv). Keeping the trailer inside the specifier (rather than around the choice at the wrapper) avoids an LR conflict withlanguage_specifier._inline_whitespace→_attr_wsat the four sites that join successive classes, classes-to-kv, and successive kvs.commonmark_specifier's leadingoptional($._inline_whitespace)is left untouched — changing it triggered the language_specifier conflict, and the wrapper-level_attr_wsalready absorbs anything that would have leaked through.Regenerated artifacts (
parser.c,grammar.json,node-types.json) are produced bytree-sitter generate; do not hand-edit. Also regeneratedcrates/pampa/resources/error-corpus/_autogen-table.jsonviadeno run -A scripts/build_error_table.tsbecause parser-state IDs shifted — without this regen, 7apostrophe-quotestests inqmd-syntax-helperfail (the rule keys off Q-2-10 diagnostics which map through(lr_state, sym)pairs).Tests:
inline-multiline-attrs.txtcover multi-line class-only, multi-line class + key=value,{ .cls }symmetry, span multi-line class, span multi-line kv, and# Heading { .cls }for the block-level path.test_attr_source_parsing.rsverifies that the AST andattr_sourcebyte offsets are correct for the quarto-web-style multi-line form.Full tree-sitter corpus: 492/492 pass (was 486). Full workspace: 8943/8943 pass.
cargo xtask verifyRust legs all green.End-to-end check:
$ pampa repro.qmd
[ Header 1 ( "test" , [] , [] ) [Str "Test"]
, Para [Image ( "" , ["hero-banner", "img-fluid"]
, [("fig-align", "center"), ("width", "600px")] )
[] ("featured.png" , "")]
, Para [Str "Done."] ]