-
-
Notifications
You must be signed in to change notification settings - Fork 377
Emit INVALID_TOKEN_IN_PLACEHOLDER on invalid palceholder token #2252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit INVALID_TOKEN_IN_PLACEHOLDER on invalid palceholder token #2252
Conversation
WalkthroughAdds a formalized placeholder name rule to docs, tightens tokenizer parsing to detect trailing invalid characters inside named placeholders and emit a specific error, and expands tests to cover symbol and accented-character invalid cases. No public API changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Src as Source
participant Tok as Tokenizer
participant Err as ErrorEmitter
rect rgb(250,250,255)
note over Tok: parseNamedPlaceholder()
Tok->>Src: read initial char
alt valid start (A-Z, a-z, _)
Tok->>Src: consume valid continuation (letters/digits/_/-/$)
Tok->>Src: peek next char
alt next is "}"
Tok-->>Src: return name token
else next is EOF or whitespace
Tok-->>Src: return name token
else invalid trailing content
Tok->>Src: readInvalidIdentifier()
Tok->>Err: emit INVALID_TOKEN_IN_PLACEHOLDER
Tok-->>Src: return combined invalid token
end
else invalid start or EOF
Tok->>Err: emit existing UNTERMINATED/INVALID error
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
docs/guide/essentials/syntax.md (1)
274-274
: Wording nitTighten phrasing.
-The `message.greeting` is linked to the locale message key `message.name`. +The `message.greeting` links to the `message.name` key.packages/message-compiler/test/tokenizer/named.test.ts (1)
649-723
: Accented placeholders are asserted invalid; this conflicts with Issue #2247’s expectationThe test suite codifies
àaa
andaàa
as invalid, but the linked issue expected placeholders with acute accents to parse without errors. Please confirm product direction (ASCII-only vs. Unicode letters). If Unicode is desired, these cases should move to a “valid” set and tokenizer should allow\p{L}
.If keeping ASCII-only, consider adding an explicit positive test documenting rejection of non-ASCII:
test('rejects non-ASCII letters in named placeholder', () => { const errors: CompileError[] = [] parse(`{àaa}`, { onError: e => errors.push(e) } as TokenizeOptions) expect(errors[0].code).toBe(CompileErrorCodes.INVALID_TOKEN_IN_PLACEHOLDER) })Also suggest adding coverage for full‑width space inside braces to lock intended behavior:
test('full-width space inside placeholder', () => { const errors: CompileError[] = [] parse(`{name }`, { onError: e => errors.push(e) } as TokenizeOptions) // Decide and assert: either allowed (no errors) or unterminated/error. })packages/message-compiler/src/tokenizer.ts (1)
491-508
: Minor correctness/consistency nits in invalid-trailer check
- Typo: “stoped” → “stopped”.
- Use
TokenChars.BraceRight
instead of raw'}'
for consistency.currentChar !== EOF
is redundant given the truthy guard.- // Check if takeNamedIdentifierChar stoped because of invalid characters + // Check if takeNamedIdentifierChar stopped because of invalid characters const currentChar = scnr.currentChar() if ( - currentChar && - currentChar !== '}' && - currentChar !== EOF && - !isWhitespace(currentChar) + currentChar && + currentChar !== TokenChars.BraceRight && + !isWhitespace(currentChar) ) { const invalidPart = readInvalidIdentifier(scnr) emitError( CompileErrorCodes.INVALID_TOKEN_IN_PLACEHOLDER, currentPosition(), 0, name + invalidPart ) return name + invalidPart }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
docs/guide/essentials/syntax.md
(2 hunks)packages/message-compiler/src/tokenizer.ts
(2 hunks)packages/message-compiler/test/tokenizer/named.test.ts
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
packages/message-compiler/test/tokenizer/named.test.ts (2)
packages/message-compiler/src/tokenizer.ts (2)
parse
(1014-1025)ERROR_DOMAIN
(40-40)packages/message-compiler/src/errors.ts (4)
CompileErrorCodes
(25-46)CompileErrorCodes
(53-54)errorMessages
(57-78)CompileError
(14-17)
packages/message-compiler/src/tokenizer.ts (1)
packages/message-compiler/src/errors.ts (2)
CompileErrorCodes
(25-46)CompileErrorCodes
(53-54)
🔇 Additional comments (3)
docs/guide/essentials/syntax.md (1)
29-31
: Clarify ASCII-only placeholder identifiers and fix grammar
Update to:The variable name inside `{}` must start with an ASCII letter (a–z, A–Z) or underscore (`_`), followed by any combination of ASCII letters, digits, underscores (`_`), hyphens (`-`), or dollar signs (`$`). Note: Non-ASCII letters (e.g. `é`, `à`) are not supported and will raise `INVALID_TOKEN_IN_PLACEHOLDER`.Confirm whether accented identifiers should be allowed per Issue #2247; if so, adjust the parser and tests accordingly.
packages/message-compiler/test/tokenizer/named.test.ts (1)
6-7
: LGTM – import re-orgNo behavioral change; reads clearer.
packages/message-compiler/src/tokenizer.ts (1)
793-799
: Follow-up: ensure downstream doesn’t assume Named tokens are always validAfter emitting INVALID_TOKEN_IN_PLACEHOLDER you still return a Named token containing invalid chars. Verify parser/codegen paths won’t mis-handle this (e.g., assuming a following BraceRight or using the invalid name as an object key without guarding).
function isWhitespace(ch: string): boolean { | ||
return ch === SPACE || ch === NEW_LINE || ch === '\u3000' // includes full-width space | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Unify whitespace handling (include U+3000 everywhere spaces are skipped)
You introduced isWhitespace
including U+3000, but peekSpaces
, skipSpaces
, isTextStart
, and isInvalidIdentifier
still only consider SPACE/NEW_LINE. This can produce inconsistent states (e.g., {name }
may still trip UNTERMINATED_CLOSING_BRACE). Use isWhitespace
in those helpers.
Proposed changes (outside this hunk) for consistency:
// Replace implementations to rely on isWhitespace:
function peekSpaces(scnr: Scanner): string {
let buf = ''
while (true) {
const ch = scnr.currentPeek()
if (!ch || !isWhitespace(ch)) break
buf += ch
scnr.peek()
}
return buf
}
function isTextStart(scnr: Scanner, reset = true): boolean {
const fn = (hasSpace = false, prev = ''): boolean => {
const ch = scnr.currentPeek()
if (ch === TokenChars.BraceLeft) return hasSpace
if (ch === TokenChars.LinkedAlias || !ch) return hasSpace
if (ch === TokenChars.Pipe) return !(isWhitespace(prev))
if (isWhitespace(ch)) { scnr.peek(); return fn(true, ch) }
return true
}
const ret = fn()
reset && scnr.resetPeek()
return ret
}
function isInvalidIdentifier(ch: string): boolean {
return (
ch !== TokenChars.BraceLeft &&
ch !== TokenChars.BraceRight &&
!isWhitespace(ch)
)
}
🏁 Script executed:
#!/bin/bash
# Search for whitespace handling in helper functions
rg -n "function peekSpaces" -C3 packages/message-compiler/src/tokenizer.ts
rg -n "function skipSpaces" -C3 packages/message-compiler/src/tokenizer.ts
rg -n "function isTextStart" -C3 packages/message-compiler/src/tokenizer.ts
rg -n "function isInvalidIdentifier" -C3 packages/message-compiler/src/tokenizer.ts
Length of output: 1154
🏁 Script executed:
#!/bin/bash
# Show isWhitespace definition
rg -n "function isWhitespace" -C3 packages/message-compiler/src/tokenizer.ts
# Show any uses of isWhitespace
rg -n "isWhitespace" -C3 packages/message-compiler/src/tokenizer.ts
# Show broader context for isTextStart
rg -n "function isTextStart" -C10 packages/message-compiler/src/tokenizer.ts
# Show context for skipSpaces/peekSpaces
rg -n "function skipSpaces" -C10 packages/message-compiler/src/tokenizer.ts
rg -n "function peekSpaces" -C10 packages/message-compiler/src/tokenizer.ts
# Show context for isInvalidIdentifier
rg -n "function isInvalidIdentifier" -C10 packages/message-compiler/src/tokenizer.ts
Length of output: 3541
Replace SPACE/NEW_LINE checks with isWhitespace in these helpers
- peekSpaces & skipSpaces
- isTextStart
- isInvalidIdentifier
This ensures full-width space (U+3000) is treated as whitespace everywhere.
🤖 Prompt for AI Agents
In packages/message-compiler/src/tokenizer.ts around lines 168 to 171, several
helper functions (peekSpaces, skipSpaces, isTextStart, isInvalidIdentifier)
still check for SPACE or NEW_LINE directly; replace those direct comparisons
with calls to the isWhitespace(ch) helper so that full-width space (U+3000) is
recognized everywhere. Update each occurrence to use isWhitespace(ch) (or the
logical equivalent when checking sequences) and run tests to confirm no behavior
change other than treating U+3000 as whitespace.
…CE when invalid token is in placeholder and update docs
b74058b
to
703b67b
Compare
Size ReportBundles
Usages
|
@intlify/core
@intlify/core-base
@intlify/devtools-types
@intlify/message-compiler
petite-vue-i18n
@intlify/shared
vue-i18n
@intlify/vue-i18n-core
commit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution!
I've just checked your codes.
LGTM!
…CE when invalid token is in placeholder and update docs (#2252)
…G_BRACE when invalid token is in placeholder and update docs (#2255) * Emit INVALID_TOKEN_IN_PLACEHOLDER instead of UNTERMINATED_CLOSING_BRACE when invalid token is in placeholder and update docs (#2252) * chore: drop node v18 --------- Co-authored-by: lutejka <[email protected]>
Emit the correct error code when a named interpolation placeholder contains an invalid char
this fixes #2247
Summary by CodeRabbit
Bug Fixes
Documentation
Tests