Skip to content

Conversation

0xffff-zhiyan
Copy link
Contributor

@0xffff-zhiyan 0xffff-zhiyan commented Sep 4, 2025

This PR implements KIP-1207

https://issues.apache.org/jira/browse/KAFKA-19606

This PR implements a global shared thread counter mechanism to properly calculate the RequestHandlerAvgIdlePercent metric across all KafkaRequestHandlerPool instances within the same JVM process in Kraft combined mode. This ensures accurate idle percentage calculations, especially in combined KRaft mode where both broker and controller request handler pools coexist.

Previously, each KafkaRequestHandlerPool calculated idle percentages independently using only its own thread count as the denominator. In combined KRaft mode, this led to:

  • Inaccurate aggregate idle percentage calculations
  • Potential metric values exceeding 100% (values > 1.0)

Core Changes

  1. Global Thread Counter: Added sharedAggregateTotalThreads as a global AtomicInteger in KafkaRequestHandlerPool
  2. Modified KafkaRequestHandler to calculate two idle metrics:
    Per-pool metric: Uses local thread count (totalHandlerThreads.get)
    Aggregate metric: Uses global thread count (sharedAggregateTotalThreads.get)

Test
Added perPoolIdleMeter parameter to all KafkaRequestHandler instantiations
Added global counter initialization: KafkaRequestHandlerPool.sharedAggregateTotalThreads.set(1) in test class setup
Added new unit test verifies:
1.Global counter accumulation across multiple pools
2.Proper idle percentage calculation within [0, 1.05] range
3.Counter cleanup after pool shutdown

POC locally(in kraft combined mode):
Screenshot 2025-09-04 at 15 32 32
Screenshot 2025-09-04 at 15 34 26

@github-actions github-actions bot added triage PRs from the community core Kafka Broker labels Sep 4, 2025
@0xffff-zhiyan 0xffff-zhiyan changed the title KIP-1207: Fix anomaly of JMX metrics RequestHandlerAvgIdlePercent in kraft combined mode KAFKA-19606: Fix anomaly of JMX metrics RequestHandlerAvgIdlePercent in kraft combined mode Sep 8, 2025
Copy link
Contributor

@kevin-wu24 kevin-wu24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @0xffff-zhiyan. Left a review of the code changes.

@@ -93,7 +94,8 @@ class KafkaRequestHandler(
val requestChannel: RequestChannel,
apis: ApiRequestHandler,
time: Time,
nodeName: String = "broker"
nodeName: String = "broker",
val perPoolIdleMeter: Meter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we group this with aggregateIdleMeter in the class header?

@@ -192,6 +197,10 @@ class KafkaRequestHandler(

}

object KafkaRequestHandlerPool {
val sharedAggregateTotalThreads = new AtomicInteger(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sharedAggregateTotalThreads is redundant. We can just name this totalThreads or aggregateThreads.

private val aggregateIdleMeter = metricsGroup.newMeter(requestHandlerAvgIdleMetricName, "percent", TimeUnit.NANOSECONDS)

this.logIdent = s"[data-plane Kafka Request Handler on ${nodeName.capitalize} $brokerId] "
val runnables = new mutable.ArrayBuffer[KafkaRequestHandler](numThreads)
// when using shared aggregate counter, register this pool's threads
sharedAggregateTotalThreads.addAndGet(numThreads)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets move this into the synchronized method createHandler and call incrementAndGet when each thread is created.

@github-actions github-actions bot removed the triage PRs from the community label Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Kafka Broker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants