Optimize CrlCollectionAccessor.Contains for large lists #36319
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I faced a performance issue in prod where an endpoint takes 3 minutes to respond. The CPU is spent almost entirely in
CrlCollectionAccessor.Contains
. The problem is that the app is doing an Include on a collection navigation of ~300K items. We will try fixing it by making the collection aHashSet
as mentioned in Relationship navigations but it's incovenient because it's breaking some code and it's hard to tell if the List's ordering is used.In this change, I'm removing the use of the List enumerator to get an easy speed up for large collections. I've benchmarked this using this code
Before:
After:
That offers a significant boost of performance when the dependent entities are over 1000.
I'm a little confused about the Gen 0 which seems to give unreliable results. Maybe someone with more BenchmarkDotnet would have an explanation why it's sometimes
-
.I tried using
IEnumerable.Contains
, I thought it would use a vectorized path but I learnt that it's not possible for reference search (dotnet/runtime#117178). After some type checks it falls on a slow path which is way slower (206 ops/s for Count=1000) than the manual loop. I believe it's because of the IEqualityComparer.Equals virtual calls.