Skip to content

Commit 5237d93

Browse files
Max Kupriianovfacebook-github-bot
authored andcommitted
fix TDigest::merge using oversized vectors of centroids
Summary: When joining multiple `TDigest` instances into one using `TDigest::merge(Range<const TDigest*>)`, more than `maxSize` elements appear in the `centroids` vector periodically. This happens because of the missing `k_limit` increment during `q_limit_times_count` initialisation. This diff fixes it. It is hard to say if this bug causes any issues with the precision of the highest percentiles. But it causes some performance degradation: unplanned multiple merging, additional vector resizes etc. It can also lead to more serious bugs as nobody expects the centroids vector to be bigger than was specified. Reviewed By: yfeldblum Differential Revision: D71970273 fbshipit-source-id: 2fea55846426a9986d3d0854b387d321b6e31ed1
1 parent 3335307 commit 5237d93

File tree

1 file changed

+5
-3
lines changed

1 file changed

+5
-3
lines changed

folly/stats/TDigest.cpp

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ void TDigest::internalMerge(
189189
double nextSum = next.mean() * next.weight();
190190
weightSoFar += next.weight();
191191

192-
if (weightSoFar <= q_limit_times_count) {
192+
if (weightSoFar <= q_limit_times_count || k_limit > maxSize_) {
193193
sumsToMerge += nextSum;
194194
weightsToMerge += next.weight();
195195
} else {
@@ -212,6 +212,7 @@ void TDigest::internalMerge(
212212
dst.max_ = newMax;
213213
dst.min_ = newMin;
214214

215+
DCHECK_LE(workingBuffer.size(), maxSize_);
215216
std::swap(dst.centroids_, workingBuffer);
216217
}
217218

@@ -317,15 +318,15 @@ TDigest TDigest::merge(Range<const TDigest*> digests) {
317318
compressed.reserve(maxSize);
318319

319320
double k_limit = 1;
320-
double q_limit_times_count = k_to_q(k_limit, maxSize) * count;
321+
double q_limit_times_count = k_to_q(k_limit++, maxSize) * count;
321322

322323
Centroid cur = centroids.front();
323324
double weightSoFar = cur.weight();
324325
double sumsToMerge = 0;
325326
double weightsToMerge = 0;
326327
for (auto it = centroids.begin() + 1, e = centroids.end(); it != e; ++it) {
327328
weightSoFar += it->weight();
328-
if (weightSoFar <= q_limit_times_count) {
329+
if (weightSoFar <= q_limit_times_count || k_limit > maxSize) {
329330
sumsToMerge += it->mean() * it->weight();
330331
weightsToMerge += it->weight();
331332
} else {
@@ -339,6 +340,7 @@ TDigest TDigest::merge(Range<const TDigest*> digests) {
339340
}
340341
result.sum_ += cur.add(sumsToMerge, weightsToMerge);
341342
compressed.push_back(cur);
343+
DCHECK_LE(compressed.size(), maxSize);
342344
compressed.shrink_to_fit();
343345

344346
// Deal with floating point precision

0 commit comments

Comments
 (0)