Skip to content

Conversation

xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Sep 8, 2025

Summary

Background

The use case is I have a fairly large table, due to some real world event, I need to scale the ingestion, so in general the way is to double the Kafka partitions, however after the event, I want to shrink the kafka thing back.

So one of the best practice is to double the kafka partition count but keep pinot side partitioning. So old partition 0 will be halved.

This can avoid heavy data rebalance and re-ingestion. ( For a table with 100+TB it's almost impossible to re-ingestion)

Since this PR: #11476 ensures all the consuming/realtime segments partition mapping follows the kafka partition way. The the key thing here is to ensure the SegmentPartitionManager able to produce the desired partitionId.

This PR implements the allowPartitionRemapping feature for segment partition metadata management in Apache Pinot. This feature enables remapping of higher Kafka partition IDs to lower Pinot partition numbers using modulo operation, which is useful when scaling Kafka partitions while maintaining fewer logical partitions in Pinot.

Feature Implementation

Core Changes:

  1. ColumnPartitionConfig.java - Added allowPartitionRemapping field and getter method

    • New boolean field _allowPartitionRemapping to control remapping behavior
    • JSON property allowPartitionRemapping for table configuration
  2. SegmentPartitionMetadataManager.java - Implemented partition remapping logic

    • Added _allowPartitionRemapping field to constructor
    • Implemented modulo-based remapping: partition_id % numPartitions when enabled
    • Enforces strict partition count matching when disabled
  3. BrokerRoutingManager.java - Integrated allowPartitionRemapping flag

    • Extracts allowPartitionRemapping from table configuration
    • Passes flag to SegmentPartitionMetadataManager constructor

Remapping Logic:

  • When allowPartitionRemapping=true: Supports modulo-based remapping (e.g., 8 Kafka partitions to 4 Pinot partitions)

    • Partition 0,4 maps to Pinot partition 0
    • Partition 1,5 maps to Pinot partition 1
    • Partition 2,6 maps to Pinot partition 2
    • Partition 3,7 maps to Pinot partition 3
  • When allowPartitionRemapping=false: Enforces exact partition count matching

    • Only segments with matching partition counts are accepted
    • Segments with different partition counts are marked invalid

Test Coverage

Added comprehensive test suite with 3 new test methods:

  1. testPartitionIdRemappingLogic() - Validates core remapping functionality
  2. testPartitionIdRemappingInvalidCases() - Tests invalid remapping scenarios
  3. testPartitionIdRemappingDisabled() - Tests strict matching when feature is disabled

Use Case

This feature addresses the common scenario where users need to:

  • Scale Kafka partitions (e.g., from 4 to 8 partitions)
  • Maintain existing Pinot partition structure (4 partitions)
  • Remap higher partition IDs using modulo operation for consistent data distribution

@xiangfu0 xiangfu0 force-pushed the loose-segment-partition-count-for-remapping branch from ef371ae to ac032b4 Compare September 8, 2025 21:05
@xiangfu0 xiangfu0 changed the title Add comprehensive tests for partition ID remapping functionality Add tests for allowPartitionRemapping feature in SegmentPartitionMetadataManager Sep 8, 2025
@xiangfu0 xiangfu0 changed the title Add tests for allowPartitionRemapping feature in SegmentPartitionMetadataManager Implement allowPartitionRemapping feature for segment partition metadata management Sep 8, 2025
@xiangfu0 xiangfu0 requested review from Copilot September 8, 2025 21:08
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements the allowPartitionRemapping feature for segment partition metadata management in Apache Pinot. The feature enables mapping segments with higher partition counts to tables with lower partition counts using modulo arithmetic, facilitating scenarios where Kafka partitions need scaling while maintaining fewer logical partitions in Pinot.

Key changes:

  • Added allowPartitionRemapping field to ColumnPartitionConfig class
  • Modified SegmentPartitionMetadataManager to support partition ID remapping logic
  • Added comprehensive test coverage with 3 new test methods validating remapping functionality

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
ColumnPartitionConfig.java Added allowPartitionRemapping boolean field and corresponding getter method
SegmentPartitionMetadataManager.java Added remapping logic with modulo-based partition assignment and validation
BrokerRoutingManager.java Updated constructor call to pass the new allowPartitionRemapping parameter
SegmentPartitionMetadataManagerTest.java Added 3 comprehensive test methods covering valid remapping, invalid cases, and disabled remapping scenarios

@xiangfu0 xiangfu0 added feature kafka query Configuration Config changes (addition/deletion/change in behavior) testing labels Sep 8, 2025
@xiangfu0 xiangfu0 force-pushed the loose-segment-partition-count-for-remapping branch from ac032b4 to e86abb4 Compare September 8, 2025 21:15
Copilot

This comment was marked as outdated.

@xiangfu0 xiangfu0 force-pushed the loose-segment-partition-count-for-remapping branch from e86abb4 to bb72456 Compare September 8, 2025 21:26
@xiangfu0 xiangfu0 requested a review from Copilot September 8, 2025 21:27
Copilot

This comment was marked as outdated.

… allowPartitionRemapping flag

- Add testPartitionIdRemappingLogic() to test modulo-based remapping when allowPartitionRemapping=true
  * Tests 8 Kafka partitions → 4 Pinot partitions remapping using _allowPartitionRemapping flag
  * Validates segments with IDs 0,4 map to partition 0; 1,5 map to partition 1; etc.
  * Verifies fully replicated server tracking with remapped partitions

- Add testPartitionIdRemappingInvalidCases() to test invalid remapping scenarios
  * Tests non-divisible partition counts (8 partitions → 3 partitions) with _allowPartitionRemapping=true
  * Validates segments are correctly marked as invalid when 8 % 3 ≠ 0

- Add testPartitionIdRemappingDisabled() to test behavior when _allowPartitionRemapping=false
  * Tests that segments with mismatched partition counts are marked invalid when flag is disabled
  * Validates only segments with exact partition count matches are accepted
  * Ensures _allowPartitionRemapping=false enforces strict partition count matching

These tests comprehensively validate the _allowPartitionRemapping flag behavior in SegmentPartitionMetadataManager,
covering both enabled (with modulo remapping) and disabled (strict matching) scenarios.

All tests pass and maintain compatibility with existing functionality.
@xiangfu0 xiangfu0 force-pushed the loose-segment-partition-count-for-remapping branch from bb72456 to 885bcd9 Compare September 8, 2025 21:38
@xiangfu0 xiangfu0 requested a review from Copilot September 8, 2025 21:41
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements the allowPartitionRemapping feature for segment partition metadata management in Apache Pinot. The feature enables remapping of higher Kafka partition IDs to lower Pinot partition numbers using modulo operation, which is useful when scaling Kafka partitions while maintaining fewer logical partitions in Pinot.

Key Changes:

  • Added allowPartitionRemapping configuration field to enable/disable partition remapping
  • Implemented modulo-based partition ID remapping logic (e.g., 8 Kafka partitions → 4 Pinot partitions)
  • Enhanced segment partition validation to support both strict matching and flexible remapping modes

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
ColumnPartitionConfig.java Added allowPartitionRemapping field and constructor overloads to support the new configuration
SegmentPartitionMetadataManager.java Implemented core partition remapping logic with modulo operation and validation
BrokerRoutingManager.java Integrated allowPartitionRemapping flag from table configuration into manager instantiation
SegmentPartitionMetadataManagerTest.java Added comprehensive test coverage for remapping functionality and edge cases

- Add @JsonInclude(JsonInclude.Include.NON_DEFAULT) to isAllowPartitionRemapping()
- This prevents serializing allowPartitionRemapping when it's false (default value)
- Fixes SegmentPartitionTest.testSegmentPartitionConfig which expected clean JSON without default fields
- Maintains backward compatibility for JSON serialization/deserialization

Fixes test failure: expected clean JSON without allowPartitionRemapping field when value is false
@codecov-commenter
Copy link

codecov-commenter commented Sep 8, 2025

Codecov Report

❌ Patch coverage is 69.56522% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.46%. Comparing base (af3dc6a) to head (ada720f).

Files with missing lines Patch % Lines
...mentpartition/SegmentPartitionMetadataManager.java 73.33% 2 Missing and 2 partials ⚠️
...che/pinot/broker/routing/BrokerRoutingManager.java 0.00% 2 Missing ⚠️
.../pinot/spi/config/table/ColumnPartitionConfig.java 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #16776      +/-   ##
============================================
+ Coverage     63.42%   63.46%   +0.03%     
+ Complexity     1400     1399       -1     
============================================
  Files          3054     3054              
  Lines        178766   178779      +13     
  Branches      27399    27403       +4     
============================================
+ Hits         113378   113456      +78     
+ Misses        56656    56607      -49     
+ Partials       8732     8716      -16     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.40% <69.56%> (+<0.01%) ⬆️
java-21 63.43% <69.56%> (+0.04%) ⬆️
temurin 63.46% <69.56%> (+0.03%) ⬆️
unittests 63.45% <69.56%> (+0.03%) ⬆️
unittests1 56.55% <83.33%> (+0.06%) ⬆️
unittests2 33.40% <69.56%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Config changes (addition/deletion/change in behavior) feature kafka query testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants