Compute all groups for group by queries with only filtered aggregations #14211

yashmayya · 2024-10-11T12:53:36Z

Fixes Aggregation filter is being pushed down in a way it breaks SQL semantics #12647 and Query having just "one aggregation with filter" produces no result if aggregation' filter does not match any record #13982.
Both the above issues have a good description of the problem but the TLDR is that for group by queries with only filtered aggregations, we currently only compute the groups based on the rows that match the query's main filter and the rows that match at least one of the aggregation filters.
This is good for performance, since in cases where the group by query doesn't have a main filter and there are only filtered aggregations (i.e., no non-filtered aggregations), indexes can be used and a full scan is avoided. However, this leads to results that are not SQL compliant, since all groups (computed over the results of the main query filter) are expected to be returned even for queries with only filtered aggregations.
For the single-stage query engine, this actually appears to be a regression from Fixes filter aggs perf #10356, prior to which all groups should have been returned by default.
For the multi-stage query engine's group by executor, this appears to always have been an issue.
This patch takes a different approach for both the query engines - in the single-stage engine, the default behavior remains the same, but the standard SQL behavior of returning all groups can be enabled using a new query option. In the multi-stage engine, however, all groups will be computed by default. The reasoning is that Pinot has historically chosen performance over standard SQL behavior for the v1 engine in such cases where the behavioral difference might not be too significant. For the multi-stage query engine, however, we have been moving towards standard SQL behavior regardless of potential performance impact.
This patch fixes both the query engines to follow standard SQL behavior. However, a query option is also provided to override this behavior for performance reasons - i.e., if users are okay with skipping computation of empty groups (groups not matching any aggregation filter), then this query option can be set. This can provide a large performance boost if the main query filter has high selectivity.
For the v1 engine's executor, the fix is to simply put an additional AggregationInfo with an empty AggregationFunction array and the main query filter. This ensures that the GroupByExecutor will compute all the groups (from the result of applying the main query filter) but no additional unnecessary aggregation will be done.
For the v2 engine's executor, the fix is similar. Before this patch, all the group keys are only computed if there is at least one non-filtered aggregation. For group by queries with only filtered aggregations, groups were computed over the rows matching the aggregation filters. Now, all the groups will be computed even for group by queries with only filtered aggregations.

yashmayya · 2024-10-11T13:01:31Z

The PR description explains the rationale behind choosing different default behavior for the v1 and v2 query engines here. However, this currently means that there's no way to disable this behavior for the multi-stage query engine to improve query performance (and get non-standard behavior as a tradeoff). If we want to instead keep the behavior of the two query engines consistent, we have two options:

By default, always compute all groups for group by queries with only filtered aggregations. Add an option / performance hint to disable this behavior and improve query performance at the cost of non-standard behavior. The disadvantage of this option is that performance will suffer by default.
Keep the older behavior by default. Add an option to enable this standard SQL behavior of returning all groups at the cost of performance. The disadvantage of this is that the multi-stage query engine behavior in this case won't be SQL compliant by default (which is what we've been evolving it towards).

codecov-commenter · 2024-10-11T13:31:01Z

Codecov Report

Attention: Patch coverage is 60.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 63.81%. Comparing base (59551e4) to head (eb934a6).
Report is 1198 commits behind head on master.

Files with missing lines	Patch %	Lines
...t/core/operator/query/FilteredGroupByOperator.java	50.00%	1 Missing and 1 partial ⚠️
...ry/runtime/operator/MultistageGroupByExecutor.java	33.33%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #14211      +/-   ##
============================================
+ Coverage     61.75%   63.81%   +2.05%     
- Complexity      207     1536    +1329     
============================================
  Files          2436     2623     +187     
  Lines        133233   144424   +11191     
  Branches      20636    22099    +1463     
============================================
+ Hits          82274    92157    +9883     
- Misses        44911    45462     +551     
- Partials       6048     6805     +757

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (+99.99%)`	⬆️
integration	`100.00% <ø> (+99.99%)`	⬆️
integration1	`100.00% <ø> (+99.99%)`	⬆️
integration2	`0.00% <ø> (ø)`
java-11	`63.76% <60.00%> (+2.05%)`	⬆️
java-21	`63.69% <60.00%> (+2.06%)`	⬆️
skip-bytebuffers-false	`63.80% <60.00%> (+2.05%)`	⬆️
skip-bytebuffers-true	`63.66% <60.00%> (+35.93%)`	⬆️
temurin	`63.81% <60.00%> (+2.05%)`	⬆️
unittests	`63.80% <60.00%> (+2.05%)`	⬆️
unittests1	`55.54% <60.00%> (+8.65%)`	⬆️
unittests2	`34.31% <0.00%> (+6.58%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

gortiz · 2024-10-11T13:41:33Z

...runtime/src/main/java/org/apache/pinot/query/runtime/operator/MultistageGroupByExecutor.java

+      if (intKeys == null) {
+        // _groupIdGenerator should still have all the groups even if there are only filtered aggregates for SQL
+        // compliant results.
+        generateGroupByKeys(block);
+      }


Wouldn't be semantically the same to always initialize intKeys to generateGroupByKeys(block); in line 209?

Yup, this is actually left over from my earlier choice of making it optional for the multi-stage engine too. I can move it back if we decide we're okay with keeping this the default behavior for the multi-stage engine.

gortiz · 2024-10-11T15:44:21Z

...-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java

+    // Write the query in a way where the aggregation will not be pushed to the leaf stage, so that we can test the
+    // MultistageGroupByExecutor


nit: I think you can use the is_skip_leaf_stage_group_by hint to do so

Oh nice, I wasn't aware of the existence of this hint.

gortiz · 2024-10-11T15:46:01Z

...on-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java

@@ -3729,4 +3729,39 @@ public void testSkipIndexes(boolean useMultiStageQueryEngine)
    updateTableConfig(tableConfig);
    reloadAllSegments(TEST_UPDATED_RANGE_INDEX_QUERY, true, numTotalDocs);
  }
+
+  @Test
+  public void testFilteredAggregationWithNoValueMatchingAggregationFilter()


nit: can you create one test having the query option and another without it?

gortiz

Personally I think the current solution is fine, but in a future PR it would be cool to support filteredAggregationsComputeAllGroups query option in multi-stage as well.

yashmayya

but in a future PR it would be cool to support filteredAggregationsComputeAllGroups query option in multi-stage as well.

Do you mean keep this new default SQL compliant behavior but use the same query option to allow users to choose the other behavior (i.e., via SET filteredAggregationsComputeAllGroups=false)? That should be straightforward and I can do it in this PR too if we have consensus. I was just a little worried about introducing new behavioral differences between the two query engines and any additional confusion about using a query option to get the opposite behavior in the engines. But I guess we might be fine with the right documentation.

yashmayya · 2024-10-11T17:51:50Z

...runtime/src/main/java/org/apache/pinot/query/runtime/operator/MultistageGroupByExecutor.java

+      if (intKeys == null) {
+        // _groupIdGenerator should still have all the groups even if there are only filtered aggregates for SQL
+        // compliant results.
+        generateGroupByKeys(block);
+      }


Yup, this is actually left over from my earlier choice of making it optional for the multi-stage engine too. I can move it back if we decide we're okay with keeping this the default behavior for the multi-stage engine.

yashmayya · 2024-10-11T17:52:33Z

...-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java

+    // Write the query in a way where the aggregation will not be pushed to the leaf stage, so that we can test the
+    // MultistageGroupByExecutor


Oh nice, I wasn't aware of the existence of this hint.

Jackie-Jiang

Wow, this is much easier than expected!

Jackie-Jiang · 2024-10-11T18:30:06Z

...src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionUtils.java

@@ -384,7 +385,12 @@ public static List<AggregationInfo> buildFilteredAggregationInfos(SegmentContext
      }
    }

-    if (!nonFilteredFunctions.isEmpty()) {
+    if (!nonFilteredFunctions.isEmpty() || QueryOptionsUtils.isFilteredAggregationsComputeAllGroups(


Wow, didn't expect this to be so simple! NIce!

Jackie-Jiang · 2024-10-11T18:38:57Z

...runtime/src/main/java/org/apache/pinot/query/runtime/operator/MultistageGroupByExecutor.java

@@ -241,6 +241,11 @@ private void processAggregate(TransferableBlock block) {
          aggFunction.aggregateGroupBySV(numMatchedRows, filteredIntKeys, groupByResultHolder, blockValSetMap);
        }
      }
+      if (intKeys == null) {


Let's make this also controlled by query option to be consistent with v1

Yep, makes sense, that's also what Gonzalo suggested.

Jackie-Jiang · 2024-10-11T18:41:50Z

pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java

+        // are expected to be returned). If this option is set to true, the v1 query engine will compute all groups for
+        // group by queries with only filtered aggregations. This could require a full scan if the main query does not
+        // have a filter and performance could be much worse, but the result will be SQL compliant.
+        public static final String FILTERED_AGGREGATIONS_COMPUTE_ALL_GROUPS = "filteredAggregationsComputeAllGroups";


Since the overhead seems fine (no extra aggregate needed), I'd suggest doing the standard SQL behavior by default, and add an option to skip the all groups computation

Since the overhead seems fine (no extra aggregate needed), I'd suggest doing the standard SQL behavior by default, and add an option to skip the all groups computation

I think it makes sense to have standard SQL behavior by default and have an option to use the more performant method of only computing required groups. But won't the overhead potentially be quite high for a query like SELECT SUM(X) FILTER (WHERE Y = 1) FROM mytable and there's an inverted index on the filter column Y? If we want to compute all groups, it'll involve a scan over all docs whereas with the previous default behavior, no scan would be required and we'd only need to do the aggregation over the docs matching the filter computed using the inverted index. There won't be any additional aggregation being done but the scan would still be expensive right?

That is true if the selectivity of the filter is very high. I'm okay with the current approach to ensure no performance regression. We can have a separate single option in the future to turn everything into standard SQL

Yes, I think we can have a startup config option to choose and the query option to change the default. It is true that having different behaviors by default on each query engine is strange and by having a startup config option each deployment can customize what do they want (current performance or sql compliance)

I've pushed a change to allow overriding the multi-stage engine behavior in order to prefer performance over SQL compliant results using the same query option filteredAggregationsComputeAllGroups.

Yes, I think we can have a startup config option to choose and the query option to change the default.

I think we can add such a config as a follow up if there's demand for such behavior?

Actually, thinking about this some more, I really don't like introducing additional behavioral differences between the two query engines. Also, using the same query option but with effectively opposite default values in the query engines is fairly unintuitive. I think it makes more sense to do the right thing by default, and allow users to override the behavior for performance reasons using a query option (which will have the same behavior in both query engines).

… in order to prefer performance over SQL compliant results using same query option

… allow overriding behavior for performance

Jackie-Jiang

LGTM! Please add the new query option to the documentation

Compute all groups for group by queries with only filtered aggregations

1b4c8fd

yashmayya added bugfix multi-stage Related to the multi-stage query engine SQL Compliance query labels Oct 11, 2024

yashmayya requested a review from Jackie-Jiang October 11, 2024 12:55

yashmayya marked this pull request as ready for review October 11, 2024 13:03

yashmayya requested a review from gortiz October 11, 2024 13:13

gortiz reviewed Oct 11, 2024

View reviewed changes

gortiz approved these changes Oct 11, 2024

View reviewed changes

yashmayya commented Oct 11, 2024

View reviewed changes

Jackie-Jiang reviewed Oct 11, 2024

View reviewed changes

Address review comments; allow overriding multi-stage engine behavior…

54a9b92

… in order to prefer performance over SQL compliant results using same query option

yashmayya force-pushed the filtered-aggregation-compute-all-groups branch 2 times, most recently from fc0edd3 to 68124da Compare October 12, 2024 05:02

Unify behavior of both query engines to follow standard SQL behavior;…

eb934a6

… allow overriding behavior for performance

yashmayya force-pushed the filtered-aggregation-compute-all-groups branch from 68124da to eb934a6 Compare October 12, 2024 05:57

yashmayya requested review from Jackie-Jiang and gortiz October 16, 2024 02:28

Jackie-Jiang added the documentation label Oct 17, 2024

Jackie-Jiang approved these changes Oct 17, 2024

View reviewed changes

Jackie-Jiang merged commit 76b219b into apache:master Oct 17, 2024
21 checks passed

gortiz mentioned this pull request Oct 18, 2024

Aggregation filter is being pushed down in a way it breaks SQL semantics #12647

Closed

yashmayya mentioned this pull request Oct 18, 2024

Query having just "one aggregation with filter" produces no result if aggregation' filter does not match any record #13982

Closed

Jackie-Jiang mentioned this pull request Mar 5, 2025

ArrayIndexOutOfBoundsException error for multi-stage query #15121

Closed

		// Write the query in a way where the aggregation will not be pushed to the leaf stage, so that we can test the
		// MultistageGroupByExecutor

Compute all groups for group by queries with only filtered aggregations #14211

Compute all groups for group by queries with only filtered aggregations #14211

Conversation

yashmayya commented Oct 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yashmayya commented Oct 11, 2024

Uh oh!

codecov-commenter commented Oct 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gortiz Oct 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gortiz Oct 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gortiz left a comment

Choose a reason for hiding this comment

Uh oh!

yashmayya left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yashmayya Oct 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yashmayya commented Oct 11, 2024 •

edited

Loading

codecov-commenter commented Oct 11, 2024 •

edited

Loading

gortiz Oct 11, 2024 •

edited

Loading

gortiz Oct 11, 2024 •

edited

Loading

yashmayya Oct 12, 2024 •

edited

Loading