-
Notifications
You must be signed in to change notification settings - Fork 14.6k
KAFKA-19606: Fix anomaly of JMX metrics RequestHandlerAvgIdlePercent in kraft combined mode #20481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @0xffff-zhiyan. Left a review of the code changes.
@@ -93,7 +94,8 @@ class KafkaRequestHandler( | |||
val requestChannel: RequestChannel, | |||
apis: ApiRequestHandler, | |||
time: Time, | |||
nodeName: String = "broker" | |||
nodeName: String = "broker", | |||
val perPoolIdleMeter: Meter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we group this with aggregateIdleMeter
in the class header?
@@ -192,6 +197,10 @@ class KafkaRequestHandler( | |||
|
|||
} | |||
|
|||
object KafkaRequestHandlerPool { | |||
val sharedAggregateTotalThreads = new AtomicInteger(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sharedAggregateTotalThreads
is redundant. We can just name this totalThreads
or aggregateThreads
.
private val aggregateIdleMeter = metricsGroup.newMeter(requestHandlerAvgIdleMetricName, "percent", TimeUnit.NANOSECONDS) | ||
|
||
this.logIdent = s"[data-plane Kafka Request Handler on ${nodeName.capitalize} $brokerId] " | ||
val runnables = new mutable.ArrayBuffer[KafkaRequestHandler](numThreads) | ||
// when using shared aggregate counter, register this pool's threads | ||
sharedAggregateTotalThreads.addAndGet(numThreads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets move this into the synchronized
method createHandler
and call incrementAndGet
when each thread is created.
This PR implements KIP-1207
https://issues.apache.org/jira/browse/KAFKA-19606
This PR implements a global shared thread counter mechanism to properly calculate the
RequestHandlerAvgIdlePercent
metric across allKafkaRequestHandlerPool
instances within the same JVM process in Kraft combined mode. This ensures accurate idle percentage calculations, especially in combined KRaft mode where both broker and controller request handler pools coexist.Previously, each
KafkaRequestHandlerPool
calculated idle percentages independently using only its own thread count as the denominator. In combined KRaft mode, this led to:Core Changes
Added sharedAggregateTotalThreads
as a global AtomicInteger inKafkaRequestHandlerPool
Per-pool metric: Uses local thread count (totalHandlerThreads.get)
Aggregate metric: Uses global thread count (sharedAggregateTotalThreads.get)
Test
Added perPoolIdleMeter parameter to all KafkaRequestHandler instantiations
Added global counter initialization: KafkaRequestHandlerPool.sharedAggregateTotalThreads.set(1) in test class setup
Added new unit test verifies:
1.Global counter accumulation across multiple pools
2.Proper idle percentage calculation within [0, 1.05] range
3.Counter cleanup after pool shutdown
POC locally(in kraft combined mode):

