Skip to content

Dynamic PGO Microbenchmark Regressions #87194

@AndyAyersMS

Description

@AndyAyersMS

This issue tracks investigation into microbenchmarks that have reported regressions with Dynamic PGO enabled. It is a continuation of #84264 which tracked regressions from PGO before it was enabled.

The report below is collated from the following autofiling reports.

The table is auto generated by a tool written by @EgorBo but may be edited by hand as regression analysis produces results. The "Score" is the geomean regression across all architectures; benchmarks that did not regress (or get reported) on some architectures are assumed to have produced the same results with and without PGO. "Recent Score" is the current performance (as of 2023-0606) versus the non-PGO result; "Orig Score" is based on the results of auto filing. They will differ if benchmark performance has improved or regressed since the auto filing ran (see for example the results for System.Text.Json.Tests.Perf_Get.GetByte, which has improved already).

Only the 36 entries with recent scores >= 1.3 are included; this leaves off approximately 220 more rows with scores between 1.3 or lower. Our plan is to prioritize investigation of these benchmarks initially, as they have the largest aggregate regressions. If time permits, we will regenerate this chart to pick up the impact of any fixes and see how much of the remainder we can tackle.

Each arch/os result is a hyperlink to the performance data graph for that benchmark. ~Note we currently have no autofiling data for win-x64-intel. If/when that shows up we will regenerate the table.~~

[edit: had to regenerate the table once already, as the scoring logic was off]
[edit: have x64 win intel data now, new table. Not current results have shifted so table is somewhat different...]

cc @dotnet/jit-contrib

Notes Recent Score Orig Score arm64-lin-ampere arm64-win-surface arm64-win-ampere x64-lin-intel x64-win-intel x64-win-amd Benchmark
noise 3.38 1.37 3.37
1.36
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "zqj", Options: None)
noise 3.36 1.37 3.36
1.37
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "zqj", Options: NonBacktracking)
notes 2.71 3.39 2.71
3.39
System.Memory.Span(Int32).EndsWith(Size: 4)
likely same as above 2.62 3.03 2.55
2.27
2.59
3.04
System.Memory.Span(Int32).SequenceEqual(Size: 4)
likely same as above 1.87 1.76 1.87
1.76
System.Memory.Span(Int32).SequenceCompareToDifferent(Size: 512)
(lack of) if conversion 1.82 1.80 1.67
1.63
1.93
1.92
1.86
1.85
System.Tests.Perf_Random.NextSingle
budget 1.75 1.88 1.33
1.47
1.35
1.49
1.90
1.99
2.29
2.43
2.10
2.19
System.Text.Json.Tests.Perf_Get.GetInt16
BDN 1.73 2.81 3.55
3.54
1.89
2.00
1.28
4.73
1.32
2.01
1.39
2.68
System.Buffers.Text.Tests.Base64EncodeDecodeInPlaceTests.Base64EncodeInPlace(NumberOfBytes: 200000000)
notes 1.64 1.63 1.84
1.82
1.65
1.64
System.Tests.Perf_UInt32.TryParseHex(value: "0")
budget 1.61 1.70 1.27
1.44
1.28
1.46
1.24
1.18
2.09
2.17
2.25
2.33
1.86
1.94
System.Text.Json.Tests.Perf_Get.GetSByte
bimodal 1.61 1.59 1.60
1.58
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "Sherlock Holmes", Options: Compiled)
cast expansion 1.60 1.64 1.82
1.87
1.41
1.43
System.Buffers.Tests.ReadOnlySequenceTests(Char).FirstSingleSegment
cast expansion 1.58 1.62 1.58
1.62
System.Buffers.Tests.ReadOnlySequenceTests(Byte).FirstSpanTenSegments
cast expansion 1.52 1.65 1.48
1.81
1.56
1.50
System.Buffers.Tests.ReadOnlySequenceTests(Byte).FirstSingleSegment
cast expansion 1.50 1.73 1.88
2.13
1.20
1.41
System.Buffers.Tests.ReadOnlySequenceTests(Char).FirstTenSegments
likely same as span cases above 1.48 1.28 1.48
1.28
System.Memory.Span(Int32).Reverse(Size: 4)
cast expansion 1.47 1.44 1.47
1.44
System.Buffers.Tests.ReadOnlySequenceTests(Byte).FirstSpanSingleSegment
notes 1.47 1.42 1.46
1.42
Benchstone.BenchF.InvMt.Test
unclear 1.46 1.15 1.46
1.15
MicroBenchmarks.Serializers.Json_FromStream(MyEventsListerViewModel).DataContractJsonSerializer_
fixed itself 1.45 1.09 1.45
1.09
System.Tests.Perf_Uri.EscapeDataString(input: "{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{
unclear 1.44 1.44 1.44
1.44
Burgers.Test1
unclear 1.43 1.27 1.43
1.27
System.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateUsingIndexer(TestCase: ArrayOfNumbers)
unclear, linux arm64 only 1.41 1.58 1.41
1.58
System.Text.Tests.Perf_StringBuilder.Append_Char_Capacity(length: 100000)
unclear, linux arm64 only 1.39 1.62 1.39
1.62
BenchmarksGame.RegexRedux_5.RunBench(options: Compiled)
bimodal 1.39 1.39 1.39
1.39
System.MathBenchmarks.Single.Min
bimodal 1.39 1.39 1.39
1.39
System.MathBenchmarks.Single.Max
unclear, linux arm64 only 1.39 1.32 1.39
1.32
System.IO.Pipes.Tests.Perf_NamedPipeStream.ReadWriteAsync(size: 1000000, Options: Asynchronous)
noise 1.38 1.29 1.38
1.29
System.IO.MemoryMappedFiles.Tests.Perf_MemoryMappedFile.CreateFromFile_Read(capacity: 10000000)
bimodal 1.37 1.37 1.37
1.37
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock.Count(Pattern: "zqj", Options: Compiled)
notes 1.37 1.36 1.26
1.29
1.42
1.43
1.24
1.28
1.60
1.48
System.Collections.Sort(IntStruct).Array(Size: 512)
budget 1.36 1.93 1.15
1.56
1.15
1.58
1.27
1.66
1.42
2.14
1.80
2.67
1.49
2.24
System.Text.Json.Tests.Perf_Get.GetByte
noise 1.35 1.31 1.36
1.33
System.Memory.Span(Char).IndexOfAnyTwoValues(Size: 512)
arm64 only; ldar vs dmb 1.35 1.36 1.35
1.34
1.38
1.40
System.Collections.CtorFromCollection(Int32).ConcurrentBag(Size: 512)
fixed by physical promotion 1.35 1.36 1.35
1.36
Devirtualization.EqualityComparer.ValueTupleCompareWrapped
budget 1.34 1.42 1.42
1.26
1.28
1.38
1.35
1.44
1.35
1.42
1.35
1.55
1.31
1.46
System.Text.Json.Serialization.Tests.WriteJson(ImmutableDictionary(String, String)).SerializeToStream(Mode: SourceGen)
notes 1.34 1.45 1.18
1.29
1.40
1.44
1.13
1.41
1.71
1.71
System.Collections.Sort(IntStruct).List(Size: 512)
notes 1.33 1.33 1.33
1.33
System.Tests.Perf_HashCode.Combine_1
inlining different; exposed local 1.33 1.32 1.34
1.33
1.32
1.32
System.Memory.ReadOnlySequence.Slice_Repeat(Segment: Multiple)
notes 1.33 1.18 1.33
1.18
System.Text.Json.Document.Tests.Perf_EnumerateArray.EnumerateUsingIndexer(TestCase: ArrayOfStrings)
budget 1.32 1.37 1.24
1.39
1.20
1.28
1.37
1.15
1.39
1.46
1.45
1.57
1.27
1.39
System.Text.Json.Serialization.Tests.WriteJson(ImmutableDictionary(String, String)).SerializeToWriter(Mode: SourceGen)
budget 1.32 1.39 1.37
1.28
1.22
1.42
1.34
1.31
1.32
1.38
1.30
1.50
1.34
1.38
System.Text.Json.Serialization.Tests.WriteJson(ImmutableDictionary(String, String)).SerializeToUtf8Bytes(Mode: SourceGen)
budget 1.31 1.88 1.15
1.59
1.18
1.62
1.03
1.37
1.49
2.22
1.66
2.49
1.49
2.24
System.Text.Json.Tests.Perf_Get.GetUInt16
budget 1.31 1.33 1.38
1.25
1.20
1.23
1.23
1.26
1.35
1.46
1.40
1.40
1.41
1.43
System.Text.Json.Serialization.Tests.WriteJson(ImmutableDictionary(String, String)).SerializeToString(Mode: SourceGen)
jcc errata 1.31 1.39 1.31
1.39
Span.Sorting.QuickSortSpan(Size: 512)
lack of cold inline exposes local 1.31 1.29 1.31
1.31
1.31
1.27
System.Memory.ReadOnlySequence.Slice_Start_And_Length(Segment: Multiple)
budget 1.31 1.39 1.32
1.19
1.20
1.50
1.31
1.37
1.40
1.50
1.31
1.34
System.Text.Json.Serialization.Tests.WriteJson(ImmutableDictionary(String, String)).SerializeObjectProperty(Mode: SourceGen)
lack of ldapr 1.30 1.30 1.29
1.30
1.30
1.30
System.Collections.CtorFromCollection(String).ConcurrentBag(Size: 512)

Metadata

Metadata

Assignees

Labels

Priority:2Work that is important, but not critical for the releasearea-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions