Fix UTF8 escapes in character classes #106

Earlopain · 2025-08-27T14:13:45Z

Closes #104, I think I found the fix myself 💪

Otherwise one character gets split up into its individual bytes.

Earlopain · 2025-08-27T14:22:11Z

lib/regexp_parser/scanner/scanner.rl

    # (This currently includes \^, \-, \&, \:, although these could potentially
    # be meta chars when not escaped, depending on their position in the set.)
-    any > (escaped_set_alpha, 1) {
+    any > (escaped_set_alpha, 1) | utf8_multibyte > (escaped_alpha, 1) {


This also works but no tests fail. I mimicked it over to utf8_multibyte for consistency but I'm not really sure what it is supposed to do:

Suggested change

any > (escaped_set_alpha, 1) | utf8_multibyte > (escaped_alpha, 1) {

any | utf8_multibyte {

In fact, ragel doesn't seem to produce different output with or without it

Otherwise one character gets split up into its individual bytes

Earlopain commented Aug 27, 2025

View reviewed changes

Fix UTF8 escapes in character classes

2efa904

Otherwise one character gets split up into its individual bytes

Earlopain force-pushed the fix-utf8-escapes-in-sets branch from bc5f0e4 to 2efa904 Compare August 27, 2025 14:24

jaynetics merged commit a611c88 into ammar:master Sep 15, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix UTF8 escapes in character classes #106

Fix UTF8 escapes in character classes #106

Earlopain commented Aug 27, 2025

Uh oh!

Earlopain Aug 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

	any > (escaped_set_alpha, 1) \| utf8_multibyte > (escaped_alpha, 1) {
	any \| utf8_multibyte {

Fix UTF8 escapes in character classes #106

Fix UTF8 escapes in character classes #106

Conversation

Earlopain commented Aug 27, 2025

Uh oh!

Earlopain Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Earlopain Aug 27, 2025 •

edited

Loading