Skip to content

Conversation

MichalStrehovsky
Copy link
Member

@MichalStrehovsky MichalStrehovsky commented Sep 5, 2025

My original motivation for this was just to stop computing our stable hashcodes from UTF-16 and use UTF-8 instead. We compute hashcodes from names both at compile time and at runtime. I.e. the motivation was not so much compiler perf but run time perf. But I didn't want to regress compiler perf. So this got a bit out of hand.

We should see improvements in reflection performance since now we can hash strings in metadata directly. It should also help working set at runtime because we no longer need to convert strings in metadata format (UTF-8) to strings in System.String format (UTF-16).

The compiler perf improvements are a mixed bag and mostly a wash. We still end up converting pretty much everything to string because of name mangling being string based, and dataflow analysis (shared with ILLinker) being string based.

I think we can address both of those and also get a compiler throughput win. But not in this PR - I did just enough here so that we don't have regression in ILC or crossgen2. All the places that call GetName()/GetNamespace() are optimization opportunities.

Cc @dotnet/ilc-contrib

@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Sep 5, 2025
@PaulusParssinen
Copy link
Contributor

Personally big fan of this direction as it would also make switching ILC/R2R name mangling to UTF-8 more straightforward later on (I have attempt at this PaulusParssinen#3)

@MichalStrehovsky MichalStrehovsky added area-NativeAOT-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Sep 9, 2025
@MichalStrehovsky
Copy link
Member Author

Personally big fan of this direction as it would also make switching ILC/R2R name mangling to UTF-8 more straightforward later on (I have attempt at this PaulusParssinen#3)

Yep, we'll definitely want to do that!

@MichalStrehovsky MichalStrehovsky marked this pull request as ready for review September 9, 2025 13:34
@Copilot Copilot AI review requested due to automatic review settings September 9, 2025 13:34
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR switches the managed type system and compilers from UTF-16 to UTF-8 string representation for better runtime performance and working set improvements. The primary motivation is to eliminate conversions between metadata format (UTF-8) and System.String format (UTF-16) during reflection operations and hashcode computation.

  • Updates type system APIs to use ReadOnlySpan<byte> for names and namespaces instead of string
  • Modifies hashcode algorithms to work directly with UTF-8 bytes
  • Adds helper methods and extensions to support UTF-8 string comparisons and conversions

Reviewed Changes

Copilot reviewed 258 out of 258 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/libraries/Common/src/Internal/VersionResilientHashCode.cs Core hashcode algorithm implementation for UTF-8 strings
src/coreclr/tools/Common/TypeSystem/Canon/CanonTypes.Metadata.cs Updates canon type method signatures to use UTF-8 spans
src/coreclr/tools/Common/TypeSystem/Canon/CanonTypes.Diagnostic.cs Modifies diagnostic name properties to call GetName() methods
src/coreclr/tools/Common/JitInterface/*.cs Updates JIT interface formatters and instruction set lookups to use UTF-8
src/coreclr/tools/Common/Internal/Runtime/EETypeBuilderHelpers.cs Converts type name checks to use UTF-8 byte comparisons
src/coreclr/tools/Common/Internal/Metadata/NativeFormat/*.cs Updates metadata readers and hashcode algorithms for UTF-8
src/coreclr/tools/Common/Compiler/*.cs Updates compiler helpers, name manglers, and layout algorithms
src/coreclr/nativeaot/System.Private.TypeLoader/src/**/*.cs Updates TypeLoader components for UTF-8 string handling
Comments suppressed due to low confidence (1)

@jkotas
Copy link
Member

jkotas commented Sep 9, 2025

We should see improvements in reflection performance

Any numbers to support this?

@MichalStrehovsky
Copy link
Member Author

We should see improvements in reflection performance

Any numbers to support this?

Yep, the example in #66620 is seeing 5+% improvement:

Before:

First Example: 1524123
First Example: 1636400
First Example: 1491577
First Example: 1457558
First Example: 1537521
First Example: 1471351
First Example: 1600548
First Example: 1465896
First Example: 1466326
First Example: 1537594

After:

First Example: 1415612
First Example: 1373304
First Example: 1369793
First Example: 1408123
First Example: 1359021
First Example: 1390025
First Example: 1384851
First Example: 1451764
First Example: 1387562
First Example: 1383689

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can use crossgen2 and naot outer loop runs before merging,

@MichalStrehovsky
Copy link
Member Author

/azp run runtime-nativeaot-outerloop, runtime-coreclr crossgen2 outerloop

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@MichalStrehovsky
Copy link
Member Author

Thank you for reviewing, this one was a bit longer!

@MichalStrehovsky
Copy link
Member Author

/azp run runtime-nativeaot-outerloop, runtime-coreclr crossgen2 outerloop

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@MichalStrehovsky
Copy link
Member Author

/ba-g failures are unrelated and fixed by #119574

@MichalStrehovsky MichalStrehovsky merged commit e7da680 into dotnet:main Sep 11, 2025
195 of 213 checks passed
@MichalStrehovsky MichalStrehovsky deleted the utf8 branch September 12, 2025 10:01
kg added a commit that referenced this pull request Sep 16, 2025
Restore missing Append call that was removed during a past refactoring in #119385
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants