PQ: Add support for event-level compression using ZStandard (ZSTD) #18121

yaauie · 2025-09-04T22:43:02Z

Release notes

Adds support for event compression in the persisted queue, controlled by the per-pipeline queue.compression setting, which defaults to none.

What does this PR do?

Adds non-breaking support for event compression to the persisted queue, as
configured by a new per-pipeline setting queue.compression, which supports:

none (default): no compression is performed, but if compressed events are encountered in the queue they will be decompressed
speed: compression optimized for speed (minimal overhead, but less compression)
balanced: compression balancing speed against result size
size: compression optimized for maximum reduction of size (minimal size, but more resource-intensive)
disabled: compression support entirely disabled; if a pipeline is run in this configuration against a PQ that already contains unacked compressed events, the pipeline WILL crash.

This PR does necessary refactors as no-op stand-alone commits to make reviewing more straight-forward. It is best reviewed in commit order.

Why is it important/What is the impact to the user?

Disk IO is often a performance bottleneck when using the PQ. This feature allows users to spend available resources to reduce the size of events on disk, and therefore also the Disk IO.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files (and/or docker env variables)
I have added tests that prove my fix is effective or that my feature works

Author's Checklist

[ ]

How to test this PR locally

Add a ndjson file named example-input.ndjson with event contents

Run Logstash with trace-logging enabled, using -S to set queue.type=persisted, queue.drain=true, and queue.compression=size:

bin/logstash --log.level=trace \
-Squeue.type=persisted \
-Squeue.drain=true \
-Squeue.compression=size \
--config.string 'input { stdin { codec => json_lines } } output { sink {} }' < example-input.ndjson

Observe trace logs showing compression and decompression:

[2025-09-04T22:36:07,055][TRACE][org.logstash.ackedqueue.Queue][main][454a7f73ae57dfec89e89329c9e0eba182f7780fd885c7bf2f17d8afba2bba67] serialized: 9000->3645
[2025-09-04T22:36:07,056][TRACE][org.logstash.ackedqueue.Queue][main][454a7f73ae57dfec89e89329c9e0eba182f7780fd885c7bf2f17d8afba2bba67] serialized: 9448->3838
[2025-09-04T22:36:07,057][TRACE][org.logstash.ackedqueue.Queue][main][454a7f73ae57dfec89e89329c9e0eba182f7780fd885c7bf2f17d8afba2bba67] serialized: 7160->2666
[2025-09-04T22:36:07,059][TRACE][org.logstash.ackedqueue.Queue][main][454a7f73ae57dfec89e89329c9e0eba182f7780fd885c7bf2f17d8afba2bba67] serialized: 8642->3572
[2025-09-04T22:36:07,060][TRACE][org.logstash.ackedqueue.Queue][main][454a7f73ae57dfec89e89329c9e0eba182f7780fd885c7bf2f17d8afba2bba67] serialized: 8714->3739
[2025-09-04T22:36:07,061][TRACE][org.logstash.ackedqueue.Queue][main][454a7f73ae57dfec89e89329c9e0eba182f7780fd885c7bf2f17d8afba2bba67] serialized: 8048->3500
[2025-09-04T22:36:07,063][TRACE][org.logstash.ackedqueue.Queue][main][454a7f73ae57dfec89e89329c9e0eba182f7780fd885c7bf2f17d8afba2bba67] serialized: 10007->3871

...

[2025-09-04T22:36:09,428][TRACE][org.logstash.ackedqueue.Queue][main] deserialized: 3594->8060
[2025-09-04T22:36:09,428][TRACE][org.logstash.ackedqueue.Queue][main] deserialized: 3571->8622
[2025-09-04T22:36:09,428][TRACE][org.logstash.ackedqueue.Queue][main] deserialized: 3673->9051
[2025-09-04T22:36:09,428][TRACE][org.logstash.ackedqueue.Queue][main] deserialized: 3538->8343
[2025-09-04T22:36:09,428][TRACE][org.logstash.ackedqueue.Queue][main] deserialized: 3729->8943
[2025-09-04T22:36:09,428][TRACE][org.logstash.ackedqueue.Queue][main] deserialized: 2683->7460
[2025-09-04T22:36:09,428][TRACE][org.logstash.ackedqueue.Queue][main] deserialized: 3928->10059
[2025-09-04T22:36:09,428][TRACE][org.logstash.ackedqueue.Queue][main] deserialized: 3563->8329
[2025-09-04T22:36:09,428][TRACE][org.logstash.ackedqueue.Queue][main] deserialized: 2708->7285

Inspect the page(s) left behind with lsq-pagedump:

2099    3851    0EB311A1        page.0  ZSTD(9552)
2100    3961    D497496F        page.0  ZSTD(10416)
2101    2667    59F1903D        page.0  ZSTD(6978)
2102    3677    B442D62D        page.0  ZSTD(9006)
2103    3748    3EEF1737        page.0  ZSTD(8791)
2104    2746    DE697FE9        page.0  ZSTD(7903)

Related issues

Superseeds PQ: Add support for event-level compression using deflate #17959
Implements PQ could benefit from event-level compression #17819

Use cases

Constrained or metered disk IO
Limited Disk capacity

github-actions · 2025-09-04T22:43:12Z

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

github-actions · 2025-09-04T22:44:00Z

🔍 Preview links for changed docs

jsvd

first pass, minor annotations, going to test this manually now.

docs/reference/logstash-settings-file.md

docs/reference/persistent-queues.md

jsvd · 2025-09-05T09:15:27Z

logstash-core/src/main/java/org/logstash/ackedqueue/SettingsImpl.java

-            settings.getQueueMaxBytes(), settings.getMaxUnread(), settings.getCheckpointMaxAcks(),
-            settings.getCheckpointMaxWrites(), settings.getCheckpointRetry()
-        );
+        return new BuilderImpl(settings);


Just a suggestion, this Builder refactoring could have been a separate PR as it doesn't require the compression settings at all and is still a significant part of the changeset in this PR.

Another reason to introduce this change ASAP: yet another parameter is coming in https://github.com/elastic/logstash/pull/18000/files

pared off as #18180

logstash-core/src/main/java/org/logstash/ackedqueue/Queue.java

logstash-core/src/main/java/org/logstash/util/CleanerThreadLocal.java

logstash-core/src/main/java/org/logstash/ackedqueue/AbstractZstdAwareCompressionCodec.java

jsvd · 2025-09-05T17:51:22Z

Look at profiling, it seems like with Zstd.compress/decompress the instance spends about nearly 9% of the time doing context initializations:

profile: profile.html

Something worth investigating is the use of thread locals for the contexts.

Adds non-breaking support for event compression to the persisted queue, as configured by a new per-pipeline setting `queue.compression`, which supports: - `none` (default): no compression is performed, but if compressed events are encountered in the queue they will be decompressed - `speed`: compression optimized for speed - `balanced`: compression balancing speed against result size - `size`: compression optimized for maximum reduction of size - `disabled`: compression support entirely disabled; if a pipeline is run in this configuration against a PQ that already contains unacked compressed events, the pipeline WILL crash. To accomplish this, we then provide an abstract base implementation of the CompressionCodec whose decode method is capable of _detecting_ and decoding zstd-encoded payload while letting other payloads through unmodified. The detection is done with an operation on the first four bytes of the payload, so no additional context is needed. An instance of this zstd-aware compression codec is provided with a pass-through encode operation when configured with `queue.compression: none`, which is the default, ensuring that by default logstash is able to decode any event that had previously been written. We provide an additional implementation that is capable of _encoding_ events with a configurable goal: speed, size, or a balance of the two.

Co-authored-by: João Duarte <[email protected]>

elastic-sonarqube · 2025-09-22T17:48:26Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarQube

elasticmachine · 2025-09-22T18:02:40Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: e90a425

Failed CI Steps

🥼 Integration Tests - FIPS mode / part 4-of-6

History

💚 Build #3535 succeeded f18a7b1
💔 Build #3519 failed fc66e66
💔 Build #3440 failed 888e5d4

cc @yaauie

jsvd

Two minor doc changes, otherwise LGTM!

jsvd · 2025-09-25T13:38:52Z

docs/reference/persistent-queues.md


+`queue.compression` {applies_to}`stack: ga 9.2`
+:   Sets the event compression level for use with the Persisted Queue. Default is `none`. Possible values are:
+    * `speed`: optimize for fastest compression operation


none should be part of this list.

Suggested change

* `speed`: optimize for fastest compression operation

* `none`: does not perform compression, but reads compressed events

* `speed`: optimize for fastest compression operation

jsvd · 2025-09-25T13:39:24Z

docs/reference/logstash-settings-file.md

 | `queue.checkpoint.acks` | The maximum number of ACKed events before forcing a checkpoint when persistent queues are enabled (`queue.type: persisted`). Specify `queue.checkpoint.acks: 0` to set this value to unlimited. | 1024 |
 | `queue.checkpoint.writes` | The maximum number of written events before forcing a checkpoint when persistent queues are enabled (`queue.type: persisted`). Specify `queue.checkpoint.writes: 0` to set this value to unlimited. | 1024 |
 | `queue.checkpoint.retry` | When enabled, Logstash will retry four times per attempted checkpoint write for any checkpoint writes that fail. Any subsequent errors are not retried. This is a workaround for failed checkpoint writes that have been seen only on Windows platform, filesystems with non-standard behavior such as SANs and is not recommended except in those specific circumstances. (`queue.type: persisted`) | `true` |
+| `queue.compression` | Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `speed`, `balanced`, and `size`.  | `none` |


Suggested change

| `queue.compression` | Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `speed`, `balanced`, and `size`. | `none` |

| `queue.compression` | Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `none`, `speed`, `balanced`, and `size`. | `none` |

robbavey

Couple of doc comments

robbavey · 2025-09-25T20:26:33Z

docs/reference/logstash-settings-file.md

 | `queue.checkpoint.acks` | The maximum number of ACKed events before forcing a checkpoint when persistent queues are enabled (`queue.type: persisted`). Specify `queue.checkpoint.acks: 0` to set this value to unlimited. | 1024 |
 | `queue.checkpoint.writes` | The maximum number of written events before forcing a checkpoint when persistent queues are enabled (`queue.type: persisted`). Specify `queue.checkpoint.writes: 0` to set this value to unlimited. | 1024 |
 | `queue.checkpoint.retry` | When enabled, Logstash will retry four times per attempted checkpoint write for any checkpoint writes that fail. Any subsequent errors are not retried. This is a workaround for failed checkpoint writes that have been seen only on Windows platform, filesystems with non-standard behavior such as SANs and is not recommended except in those specific circumstances. (`queue.type: persisted`) | `true` |
+| `queue.compression` | Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `speed`, `balanced`, and `size`.  | `none` |


Let's add applies_to tags

Suggested change

| `queue.compression` | Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `speed`, `balanced`, and `size`. | `none` |

| `queue.compression` | {applies_to}`stack: ga 9.2` Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `speed`, `balanced`, `size`, and `none`. | `none` |

Looks like applies_to tags should go at the end of the first column

Suggested change

| `queue.compression` | Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `speed`, `balanced`, and `size`. | `none` |

| `queue.compression` {applies_to}`stack: ga 9.2` | Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `speed`, `balanced`, and `size`. | `none` |

robbavey · 2025-09-25T20:27:21Z

docs/reference/persistent-queues.md

+    * `size`: optimize for smallest possible size on disk, spending more CPU
+    * `balanced`: a balance between the `speed` and `size` settings
+:::{important}
+Enabling compression will make the PQ incompatible with previous Logstash releases that did not support compression.


Do we need to provide additional guidance on the nature of the incompatibility -

That this is a read-only incompatibility - versions of Logstash without compression support are unable to read from PQs written with compression.

However, when upgrading from a version of Logstash that already has a PQ, it is possible to enable compression support for that PQ going forward, and versions of Logstash with compression support will be able to read and write to the PQ.

@robbavey how is this?

Suggested change

Enabling compression will make the PQ incompatible with previous Logstash releases that did not support compression.

Compression can be enabled for an existing PQ, but once compressed elements have been added to a PQ, that PQ cannot be read by previous Logstash releases that did not support compression.

If you need to downgrade Logstash after enabling the PQ, you will need to either delete the PQ or run the pipeline with `queue.drain: true` first to ensure that no compressed elements remain.

yaauie added enhancement persistent queues backport-skip Skip automated backport with mergify labels Sep 4, 2025

github-actions bot deployed to docs-preview September 4, 2025 22:43 View deployment

yaauie mentioned this pull request Sep 4, 2025

PQ: Add support for event-level compression using deflate #17959

Closed

5 tasks

yaauie requested a review from jsvd September 4, 2025 22:45

jsvd reviewed Sep 5, 2025

View reviewed changes

jsvd mentioned this pull request Sep 9, 2025

Measure average batch byte size and event count #18000

Merged

5 tasks

yaauie self-assigned this Sep 9, 2025

yaauie mentioned this pull request Sep 17, 2025

PQ settings refactor: propagate builder upward #18180

Merged

2 tasks

github-actions bot deployed to docs-preview September 17, 2025 17:36 View deployment

github-actions bot deployed to docs-preview September 17, 2025 17:49 View deployment

github-actions bot deployed to docs-preview September 17, 2025 17:52 View deployment

github-actions bot deployed to docs-preview September 18, 2025 18:25 View deployment

yaauie and others added 7 commits September 22, 2025 17:26

noop: add pq compression-codec with no-op implementation

f47aca2

license: add notice for com.github.luben:zstd-jni

1ed5530

pq: log compression encode/decode from the codec

a480ebf

Apply docs suggestions from code review

772da72

Co-authored-by: João Duarte <[email protected]>

remove CleanerThreadLocal utility

b368e49

license: add mapping for com.github.luben:zstd-jni

e90a425

yaauie force-pushed the pq-compression-zstd branch from f18a7b1 to e90a425 Compare September 22, 2025 17:32

github-actions bot deployed to docs-preview September 22, 2025 17:33 View deployment

yaauie requested a review from jsvd September 23, 2025 15:41

yaauie mentioned this pull request Sep 23, 2025

pq: improve zstd buy using shared contexts #18210

Open

jsvd approved these changes Sep 25, 2025

View reviewed changes

robbavey reviewed Sep 25, 2025

View reviewed changes

	* `speed`: optimize for fastest compression operation
	* `none`: does not perform compression, but reads compressed events
	* `speed`: optimize for fastest compression operation

	\| `queue.compression` \| Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `speed`, `balanced`, and `size`. \| `none` \|
	\| `queue.compression` \| Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `none`, `speed`, `balanced`, and `size`. \| `none` \|

	\| `queue.compression` \| Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `speed`, `balanced`, and `size`. \| `none` \|
	\| `queue.compression` \| {applies_to}`stack: ga 9.2` Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `speed`, `balanced`, `size`, and `none`. \| `none` \|

	\| `queue.compression` \| Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `speed`, `balanced`, and `size`. \| `none` \|
	\| `queue.compression` {applies_to}`stack: ga 9.2` \| Set a persisted queue compression level, which allows the pipeline to reduce the event size on disk at the cost of CPU usage. Possible values are `speed`, `balanced`, and `size`. \| `none` \|

	Enabling compression will make the PQ incompatible with previous Logstash releases that did not support compression.
	Compression can be enabled for an existing PQ, but once compressed elements have been added to a PQ, that PQ cannot be read by previous Logstash releases that did not support compression.
	If you need to downgrade Logstash after enabling the PQ, you will need to either delete the PQ or run the pipeline with `queue.drain: true` first to ensure that no compressed elements remain.

PQ: Add support for event-level compression using ZStandard (ZSTD) #18121

Are you sure you want to change the base?

PQ: Add support for event-level compression using ZStandard (ZSTD) #18121

Uh oh!

Conversation

yaauie commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release notes

What does this PR do?

Why is it important/What is the impact to the user?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Uh oh!

github-actions bot commented Sep 4, 2025

🤖 GitHub comments

Uh oh!

github-actions bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

jsvd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jsvd commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elastic-sonarqube bot commented Sep 22, 2025

Quality Gate passed

Uh oh!

elasticmachine commented Sep 22, 2025

💛 Build succeeded, but was flaky

Failed CI Steps

History

Uh oh!

jsvd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsvd Sep 25, 2025 • edited by robbavey Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robbavey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yaauie commented Sep 4, 2025 •

edited

Loading

github-actions bot commented Sep 4, 2025 •

edited

Loading

jsvd commented Sep 5, 2025 •

edited

Loading

jsvd Sep 25, 2025 •

edited by robbavey

Loading