Skip to content

Conversation

lutejka
Copy link
Contributor

@lutejka lutejka commented Aug 28, 2025

Emit the correct error code when a named interpolation placeholder contains an invalid char

this fixes #2247

Summary by CodeRabbit

  • Bug Fixes

    • Stricter validation for named placeholders: detects trailing invalid characters, reports clearer errors, and handles whitespace (including full-width spaces) more accurately during parsing.
  • Documentation

    • Added a formal rule for placeholder names with examples ({msg}, {_userName}, {user-id}, {total$}); minor formatting cleanup.
  • Tests

    • Expanded parameterized tests for invalid placeholders to broaden coverage without behavioral changes.

Copy link

coderabbitai bot commented Aug 28, 2025

Walkthrough

Adds a formalized placeholder name rule to docs, tightens tokenizer parsing to detect trailing invalid characters inside named placeholders and emit a specific error, and expands tests to cover symbol and accented-character invalid cases. No public API changes.

Changes

Cohort / File(s) Summary
Documentation: Placeholder syntax
docs/guide/essentials/syntax.md
Adds a formal rule for named interpolation placeholders (start with letter/_; continue with letters, digits, _, -, $) with examples and minor whitespace formatting.
Compiler: Tokenizer validation
packages/message-compiler/src/tokenizer.ts
After assembling a named identifier, adds an inline check for trailing invalid characters (non-}/non-whitespace) that reads the invalid portion, emits INVALID_TOKEN_IN_PLACEHOLDER, and returns the combined invalid token; retains existing EOF/unterminated-brace handling.
Tests: Named placeholder errors
packages/message-compiler/test/tokenizer/named.test.ts
Refactors imports and replaces looped error assertions with test.each, adding cases for '$', '-', 'àaa', and 'aàa', asserting the same error outcomes with explicit locations.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Src as Source
  participant Tok as Tokenizer
  participant Err as ErrorEmitter

  rect rgb(250,250,255)
    note over Tok: parseNamedPlaceholder()
    Tok->>Src: read initial char
    alt valid start (A-Z, a-z, _)
      Tok->>Src: consume valid continuation (letters/digits/_/-/$)
      Tok->>Src: peek next char
      alt next is "}"
        Tok-->>Src: return name token
      else next is EOF or whitespace
        Tok-->>Src: return name token
      else invalid trailing content
        Tok->>Src: readInvalidIdentifier()
        Tok->>Err: emit INVALID_TOKEN_IN_PLACEHOLDER
        Tok-->>Src: return combined invalid token
      end
    else invalid start or EOF
      Tok->>Err: emit existing UNTERMINATED/INVALID error
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Assessment against linked issues

Objective Addressed Explanation
Allow acute accents within placeholder names [#2247] Changes explicitly treat accented characters as invalid and tests expect errors for à cases.
Prevent tokenizer failure by handling accented names gracefully [#2247] Tokenizer now emits an INVALID_TOKEN_IN_PLACEHOLDER for non-ASCII sequences instead of accepting them; no handling to accept accents was added.

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Formalizing ASCII-focused placeholder syntax (docs/guide/essentials/syntax.md) The linked issue requests accepting accented characters; documenting an ASCII-restricted rule (allowing - and $) does not implement acceptance of accents and is unrelated to the requested fix.

Poem

I hop through braces, keen and bright,
I sniff each token day and night—
If strange accents start to prance,
I thump my foot and halt the dance.
Errors flagged with tidy cheer, 🥕

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b74058b and 703b67b.

📒 Files selected for processing (3)
  • docs/guide/essentials/syntax.md (2 hunks)
  • packages/message-compiler/src/tokenizer.ts (1 hunks)
  • packages/message-compiler/test/tokenizer/named.test.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/message-compiler/test/tokenizer/named.test.ts
  • packages/message-compiler/src/tokenizer.ts
  • docs/guide/essentials/syntax.md
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbit or @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
docs/guide/essentials/syntax.md (1)

274-274: Wording nit

Tighten phrasing.

-The `message.greeting` is linked to the locale message key `message.name`.
+The `message.greeting` links to the `message.name` key.
packages/message-compiler/test/tokenizer/named.test.ts (1)

649-723: Accented placeholders are asserted invalid; this conflicts with Issue #2247’s expectation

The test suite codifies àaa and aàa as invalid, but the linked issue expected placeholders with acute accents to parse without errors. Please confirm product direction (ASCII-only vs. Unicode letters). If Unicode is desired, these cases should move to a “valid” set and tokenizer should allow \p{L}.

If keeping ASCII-only, consider adding an explicit positive test documenting rejection of non-ASCII:

test('rejects non-ASCII letters in named placeholder', () => {
  const errors: CompileError[] = []
  parse(`{àaa}`, { onError: e => errors.push(e) } as TokenizeOptions)
  expect(errors[0].code).toBe(CompileErrorCodes.INVALID_TOKEN_IN_PLACEHOLDER)
})

Also suggest adding coverage for full‑width space inside braces to lock intended behavior:

test('full-width space inside placeholder', () => {
  const errors: CompileError[] = []
  parse(`{name }`, { onError: e => errors.push(e) } as TokenizeOptions)
  // Decide and assert: either allowed (no errors) or unterminated/error.
})
packages/message-compiler/src/tokenizer.ts (1)

491-508: Minor correctness/consistency nits in invalid-trailer check

  • Typo: “stoped” → “stopped”.
  • Use TokenChars.BraceRight instead of raw '}' for consistency.
  • currentChar !== EOF is redundant given the truthy guard.
-    // Check if takeNamedIdentifierChar stoped because of invalid characters
+    // Check if takeNamedIdentifierChar stopped because of invalid characters
     const currentChar = scnr.currentChar()
     if (
-      currentChar &&
-      currentChar !== '}' &&
-      currentChar !== EOF &&
-      !isWhitespace(currentChar)
+      currentChar &&
+      currentChar !== TokenChars.BraceRight &&
+      !isWhitespace(currentChar)
     ) {
       const invalidPart = readInvalidIdentifier(scnr)
       emitError(
         CompileErrorCodes.INVALID_TOKEN_IN_PLACEHOLDER,
         currentPosition(),
         0,
         name + invalidPart
       )
       return name + invalidPart
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f809bb3 and b74058b.

📒 Files selected for processing (3)
  • docs/guide/essentials/syntax.md (2 hunks)
  • packages/message-compiler/src/tokenizer.ts (2 hunks)
  • packages/message-compiler/test/tokenizer/named.test.ts (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
packages/message-compiler/test/tokenizer/named.test.ts (2)
packages/message-compiler/src/tokenizer.ts (2)
  • parse (1014-1025)
  • ERROR_DOMAIN (40-40)
packages/message-compiler/src/errors.ts (4)
  • CompileErrorCodes (25-46)
  • CompileErrorCodes (53-54)
  • errorMessages (57-78)
  • CompileError (14-17)
packages/message-compiler/src/tokenizer.ts (1)
packages/message-compiler/src/errors.ts (2)
  • CompileErrorCodes (25-46)
  • CompileErrorCodes (53-54)
🔇 Additional comments (3)
docs/guide/essentials/syntax.md (1)

29-31: Clarify ASCII-only placeholder identifiers and fix grammar
Update to:

The variable name inside `{}` must start with an ASCII letter (a–z, A–Z) or underscore (`_`), followed by any combination of ASCII letters, digits, underscores (`_`), hyphens (`-`), or dollar signs (`$`).

Note: Non-ASCII letters (e.g. `é`, `à`) are not supported and will raise `INVALID_TOKEN_IN_PLACEHOLDER`.

Confirm whether accented identifiers should be allowed per Issue #2247; if so, adjust the parser and tests accordingly.

packages/message-compiler/test/tokenizer/named.test.ts (1)

6-7: LGTM – import re-org

No behavioral change; reads clearer.

packages/message-compiler/src/tokenizer.ts (1)

793-799: Follow-up: ensure downstream doesn’t assume Named tokens are always valid

After emitting INVALID_TOKEN_IN_PLACEHOLDER you still return a Named token containing invalid chars. Verify parser/codegen paths won’t mis-handle this (e.g., assuming a following BraceRight or using the invalid name as an object key without guarding).

Comment on lines 168 to 171
function isWhitespace(ch: string): boolean {
return ch === SPACE || ch === NEW_LINE || ch === '\u3000' // includes full-width space
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Unify whitespace handling (include U+3000 everywhere spaces are skipped)

You introduced isWhitespace including U+3000, but peekSpaces, skipSpaces, isTextStart, and isInvalidIdentifier still only consider SPACE/NEW_LINE. This can produce inconsistent states (e.g., {name } may still trip UNTERMINATED_CLOSING_BRACE). Use isWhitespace in those helpers.

Proposed changes (outside this hunk) for consistency:

// Replace implementations to rely on isWhitespace:

function peekSpaces(scnr: Scanner): string {
  let buf = ''
  while (true) {
    const ch = scnr.currentPeek()
    if (!ch || !isWhitespace(ch)) break
    buf += ch
    scnr.peek()
  }
  return buf
}

function isTextStart(scnr: Scanner, reset = true): boolean {
  const fn = (hasSpace = false, prev = ''): boolean => {
    const ch = scnr.currentPeek()
    if (ch === TokenChars.BraceLeft) return hasSpace
    if (ch === TokenChars.LinkedAlias || !ch) return hasSpace
    if (ch === TokenChars.Pipe) return !(isWhitespace(prev))
    if (isWhitespace(ch)) { scnr.peek(); return fn(true, ch) }
    return true
  }
  const ret = fn()
  reset && scnr.resetPeek()
  return ret
}

function isInvalidIdentifier(ch: string): boolean {
  return (
    ch !== TokenChars.BraceLeft &&
    ch !== TokenChars.BraceRight &&
    !isWhitespace(ch)
  )
}

🏁 Script executed:

#!/bin/bash
# Search for whitespace handling in helper functions
rg -n "function peekSpaces" -C3 packages/message-compiler/src/tokenizer.ts
rg -n "function skipSpaces" -C3 packages/message-compiler/src/tokenizer.ts
rg -n "function isTextStart" -C3 packages/message-compiler/src/tokenizer.ts
rg -n "function isInvalidIdentifier" -C3 packages/message-compiler/src/tokenizer.ts

Length of output: 1154


🏁 Script executed:

#!/bin/bash
# Show isWhitespace definition
rg -n "function isWhitespace" -C3 packages/message-compiler/src/tokenizer.ts
# Show any uses of isWhitespace
rg -n "isWhitespace" -C3 packages/message-compiler/src/tokenizer.ts
# Show broader context for isTextStart
rg -n "function isTextStart" -C10 packages/message-compiler/src/tokenizer.ts
# Show context for skipSpaces/peekSpaces
rg -n "function skipSpaces" -C10 packages/message-compiler/src/tokenizer.ts
rg -n "function peekSpaces" -C10 packages/message-compiler/src/tokenizer.ts
# Show context for isInvalidIdentifier
rg -n "function isInvalidIdentifier" -C10 packages/message-compiler/src/tokenizer.ts

Length of output: 3541


Replace SPACE/NEW_LINE checks with isWhitespace in these helpers

  • peekSpaces & skipSpaces
  • isTextStart
  • isInvalidIdentifier
    This ensures full-width space (U+3000) is treated as whitespace everywhere.
🤖 Prompt for AI Agents
In packages/message-compiler/src/tokenizer.ts around lines 168 to 171, several
helper functions (peekSpaces, skipSpaces, isTextStart, isInvalidIdentifier)
still check for SPACE or NEW_LINE directly; replace those direct comparisons
with calls to the isWhitespace(ch) helper so that full-width space (U+3000) is
recognized everywhere. Update each occurrence to use isWhitespace(ch) (or the
logical equivalent when checking sequences) and run tests to confirm no behavior
change other than treating U+3000 as whitespace.

…CE when invalid token is in placeholder and update docs
@lutejka lutejka force-pushed the fix-error-on-invalid-token-in-placeholder branch from b74058b to 703b67b Compare August 29, 2025 12:19
@kazupon kazupon added Type: Improvement Includes backwards-compatible fixes 🍰 p2-nice-to-have Priority 2: nothing is broken but it's worth addressing labels Sep 5, 2025
Copy link

github-actions bot commented Sep 5, 2025

Size Report

Bundles

File Size Gzip Brotli
core.esm-browser.prod.js 38.22 kB (+0.16 kB) 11.67 kB (+0.05 kB) 10.41 kB (+0.03 kB)
core.global.prod.js 31.24 kB (+0.13 kB) 10.82 kB (+0.04 kB) 9.70 kB (-0.01 kB)
core.runtime.esm-browser.prod.js 23.77 kB 7.70 kB 6.90 kB
core.runtime.global.prod.js 18.08 kB 7.03 kB 6.31 kB
message-compiler.esm-browser.prod.js 19.45 kB (+0.16 kB) 5.81 kB (+0.04 kB) 5.18 kB (+0.04 kB)
message-compiler.global.prod.js 17.46 kB (+0.13 kB) 5.59 kB (+0.04 kB) 5.01 kB (+0.03 kB)
petite-vue-i18n-core.esm-browser.prod.js 20.70 kB 6.98 kB 6.27 kB
petite-vue-i18n-core.global.prod.js 15.66 kB 6.08 kB 5.51 kB
petite-vue-i18n.esm-browser.prod.js 37.17 kB (+0.16 kB) 11.41 kB (+0.05 kB) 10.23 kB (+0.04 kB)
petite-vue-i18n.global.prod.js 30.00 kB (+0.13 kB) 10.26 kB (+0.05 kB) 9.24 kB (+0.03 kB)
petite-vue-i18n.runtime.esm-browser.prod.js 22.58 kB 7.35 kB 6.64 kB
petite-vue-i18n.runtime.global.prod.js 16.86 kB 6.42 kB 5.82 kB
vue-i18n.esm-browser.prod.js 50.87 kB (+0.16 kB) 15.22 kB (+0.05 kB) 13.58 kB (+0.03 kB)
vue-i18n.global.prod.js 40.71 kB (+0.13 kB) 13.67 kB (+0.04 kB) 12.29 kB (+0.04 kB)
vue-i18n.runtime.esm-browser.prod.js 36.28 kB 11.17 kB 10.03 kB
vue-i18n.runtime.global.prod.js 27.56 kB 9.88 kB 8.91 kB

Usages

Name Size Gzip Brotli
packages/size-check-core (@intlify/core) 9.22 kB 3.82 kB 3.46 kB
packages/size-check-petite-vue-i18n (petite-vue-i18n) 78.12 kB 30.46 kB 27.44 kB
packages/size-check-vue-i18n (vue-i18n) 83.34 kB 31.94 kB 28.72 kB

Copy link

pkg-pr-new bot commented Sep 5, 2025

Open in StackBlitz

@intlify/core

npm i https://pkg.pr.new/@intlify/core@2252

@intlify/core-base

npm i https://pkg.pr.new/@intlify/core-base@2252

@intlify/devtools-types

npm i https://pkg.pr.new/@intlify/devtools-types@2252

@intlify/message-compiler

npm i https://pkg.pr.new/@intlify/message-compiler@2252

petite-vue-i18n

npm i https://pkg.pr.new/petite-vue-i18n@2252

@intlify/shared

npm i https://pkg.pr.new/@intlify/shared@2252

vue-i18n

npm i https://pkg.pr.new/vue-i18n@2252

@intlify/vue-i18n-core

npm i https://pkg.pr.new/@intlify/vue-i18n-core@2252

commit: 703b67b

Copy link
Member

@kazupon kazupon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!
I've just checked your codes.

LGTM!

@kazupon kazupon merged commit fdc9541 into intlify:master Sep 5, 2025
29 of 30 checks passed
kazupon pushed a commit that referenced this pull request Sep 5, 2025
…CE when invalid token is in placeholder and update docs (#2252)
kazupon added a commit that referenced this pull request Sep 5, 2025
…G_BRACE when invalid token is in placeholder and update docs (#2255)

* Emit INVALID_TOKEN_IN_PLACEHOLDER instead of UNTERMINATED_CLOSING_BRACE when invalid token is in placeholder and update docs (#2252)

* chore: drop node v18

---------

Co-authored-by: lutejka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🍰 p2-nice-to-have Priority 2: nothing is broken but it's worth addressing Type: Improvement Includes backwards-compatible fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Acute accent in placeholder name breaks tokenization
2 participants