Skip to content

Commit a611c88

Browse files
authored
Merge pull request #106 from Earlopain/fix-utf8-escapes-in-sets
Fix UTF8 escapes in character classes
2 parents 1500106 + 2efa904 commit a611c88

File tree

3 files changed

+6
-1
lines changed

3 files changed

+6
-1
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
### Fixed
11+
12+
- correctly emit backslash-escaped UTF8 characters in character classes as one token (#104)
13+
1014
## [2.11.2] - 2025-08-12 - Janosch Müller
1115

1216
### Added

lib/regexp_parser/scanner/scanner.rl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -247,7 +247,7 @@
247247
# Treat all remaining escapes - those not supported in sets - as literal.
248248
# (This currently includes \^, \-, \&, \:, although these could potentially
249249
# be meta chars when not escaped, depending on their position in the set.)
250-
any > (escaped_set_alpha, 1) {
250+
any > (escaped_set_alpha, 1) | utf8_multibyte > (escaped_set_alpha, 1) {
251251
emit(:escape, :literal, copy(data, ts-1, te))
252252
fret;
253253
};

spec/scanner/sets_spec.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@
6666
include_examples 'scan', '[\R]', 1 => [:escape, :literal, '\R', 1, 3]
6767
include_examples 'scan', '[\X]', 1 => [:escape, :literal, '\X', 1, 3]
6868
include_examples 'scan', '[\B]', 1 => [:escape, :literal, '\B', 1, 3]
69+
include_examples 'scan', '[\💎]', 1 => [:escape, :literal, '\💎', 1, 3]
6970

7071
include_examples 'scan', /[\d]/, 1 => [:type, :digit, '\d', 1, 3]
7172
include_examples 'scan', /[\da-z]/, 1 => [:type, :digit, '\d', 1, 3]

0 commit comments

Comments
 (0)