Cosmetic filter fb #514

atuchin-m · 2025-08-25T13:36:09Z

The PR introduces new structures to store cosmetic filters in flatbuffer.

the algorithms to sort and apply rules shouldn't be touched, only storage level is changed.
CosmeticFilterCache is now a view for a flatbuffer data.
Old storage layer (via serde) is removed, the version is now stored in the flatbuffer,
Another container 'FlatMap' is introduced + tests
Most host-specific rules are stored in a single FlatMap (domain_hash => HostnameSpecificRules). Although, the most common rule kinds are stored as a dedicated multi-maps to save memory
a code to build flatbuffer structure is moved to dedicated files.

Perf impact:

improve memory usage by 43%
improves loading .dat file 3x times
a non-important rule-match-first-request/brave-list regression, the absolute number ~1ms is still totally fine for the first request.
a potentially bad regression with cosmetic-class-id-match/brave-list.A new code uses O(log N) binany search instead of O(1) HashSomething lookup. For large number of selectors (~1000) that can make sense. We're going to improve this in a dedicated PR.

github-actions

Rust Benchmark

Benchmark suite	Current: `abd58d9`	Previous: `3652b04`	Ratio
`rule-match-browserlike/brave-list`	`2208259493` ns/iter (`± 12738594`)	`2235196924` ns/iter (`± 17061743`)	`0.99`
`rule-match-first-request/brave-list`	`1112324` ns/iter (`± 15878`)	`1043546` ns/iter (`± 11475`)	`1.07`
`blocker_new/brave-list`	`142944318` ns/iter (`± 469872`)	`157224302` ns/iter (`± 336287`)	`0.91`
`blocker_new/brave-list-deserialize`	`20131556` ns/iter (`± 66881`)	`63494627` ns/iter (`± 260374`)	`0.32`
`memory-usage/brave-list-initial`	`10072443` ns/iter (`± 3`)	`18344519` ns/iter (`± 3`)	`0.55`
`memory-usage/brave-list-initial/max`	`64817658` ns/iter (`± 3`)	`66961309` ns/iter (`± 3`)	`0.97`
`memory-usage/brave-list-initial/alloc-count`	`1534721` ns/iter (`± 3`)	`1616130` ns/iter (`± 3`)	`0.95`
`memory-usage/brave-list-1000-requests`	`2516487` ns/iter (`± 3`)	`2551938` ns/iter (`± 3`)	`0.99`
`memory-usage/brave-list-1000-requests/alloc-count`	`66641` ns/iter (`± 3`)	`68864` ns/iter (`± 3`)	`0.97`
`url_cosmetic_resources/brave-list`	`199372` ns/iter (`± 1291`)	`207270` ns/iter (`± 3089`)	`0.96`
`cosmetic-class-id-match/brave-list`	`15273502` ns/iter (`± 4344129`)	`4335864` ns/iter (`± 1196471`)	`3.52`

This comment was automatically generated by workflow using github-action-benchmark.

github-actions

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Rust Benchmark'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.10.

Benchmark suite	Current: `abd58d9`	Previous: `3652b04`	Ratio
`cosmetic-class-id-match/brave-list`	`15273502` ns/iter (`± 4344129`)	`4335864` ns/iter (`± 1196471`)	`3.52`

This comment was automatically generated by workflow using github-action-benchmark.

boocmp · 2025-08-26T07:21:45Z

src/cosmetic_filter_cache.rs

@@ -191,34 +166,40 @@ impl CosmeticFilterCache {
    ) -> Vec<String> {
        let mut selectors = vec![];

+        let cs = self.filter_data_context.memory.root().cosmetic_filters();


cf ?, or even cosmetic_filters

antonok-edm

(partial review, will look over the rest soon)

antonok-edm · 2025-08-29T16:19:57Z

src/blocker.rs

-        let filter_data_context = FilterDataContext::new(memory);
-        Self::from_context(filter_data_context)
+        let mut filter_set = FilterSet::new(true);
+        filter_set.network_filters = network_filters;


It's a bit odd to have Blocker::new() build a FilterSet, then manually move network filters into the set, then build an Engine, and finally make the blocker from the engine. Understood this is test only, but perhaps we should consider a different constructor method for the tests.

Yeah, that test ideally these test contractors should be removed in favor of building the engine directly (or another high level test method).
Although, preserving them allow to reduce the diff in the PR (we don't need to rewrite a lot of tests). So I suggest to change it in another PR.

P.S. We can avoid using Engine completely because it's a flatbuffer root table. And we need to get serialized flatbuffer data.

antonok-edm · 2025-08-29T18:17:07Z

src/data_format/mod.rs

-/// Newer formats start with this magic byte sequence.
-/// Calculated as the leading 4 bytes of `echo -n 'brave/adblock-rust' | sha512sum`.
-const ADBLOCK_RUST_DAT_MAGIC: [u8; 4] = [0xd1, 0xd9, 0x3a, 0xaf];
-const ADBLOCK_RUST_DAT_VERSION: u8 = 1;


could we keep the byte sequence and version at the start of the serialized file? Brave iOS is currently using it.

Do we use that magic somewhere outside adblock-rust to identify the file?

Now the while file content is a flatbuffer and we verify the format before file using, so it looks like we don't need a extra magic.

When the iOS client is updated to use adblock-rust v0.11.x, it will read the older file and try to interpret it as a flatbuffer rather than looking at the first bytes and knowing to exit immediately because of a version mismatch.

If we're lucky that file will have a deserialization error, but in the worst case it may parse and be full of junk data. Keeping the separate format header will guarantee future clients are always able to early-exit even if we ever switch to something other than flatbuffers later on.

Verifying a flat buffer is quick and the migration happens once a few months. I don't think it makes sense to save some CPU cycles here.

As for correctness: theoretically it's possible to have a broken file that will be interpreted as valid flatbuffer. The same thing for the old .dat: a partially written file will pass magic header verification.

I would add some kind of CRC sum instead (to be sure that the file is consistent).
If you want to preserve magic number: let's keep it, but as the part of a flatbuffer.

added seahash as a checksum instead.

checksum doesn't address my concern, let me try to explain the consequences a bit more clearly:

There are currently millions of active iOS installations of Brave Browser with a cached file of the format [ADBLOCK_RUST_DAT_MAGIC[0..4], ADBLOCK_RUST_DAT_VERSION, <SomeMessagePackPayload>[0..]]. This data wasn't serialized as a flatbuffer, so it is unlikely to deserialize into a valid flatbuffer according to our new schema here. However, "unlikely" isn't good enough for a file that is read into memory at startup and could cause irrecoverable browser crash loops¹ if internal invariants about the data structures are not upheld properly.

Even worse, we cannot empirically verify parsing incompatibility on a historical snapshot of that cached file because each browser install may have different lists enabled and may have even serialized this file using any previous version of adblock-rust.

If you'd like to go through the effort to read the specs for both flatbuffer and messagepack's internal binary representations, as well as their respective (current and historical) library code to rule out possible implementation quirks, then you may be able to prove that it is impossible for a previous version of adblock-rust to have serialized any Engine into a payload that this new flatbuffer implementation will be able to deserialize without errors. That would make me ~70% comfortable with leaving out such a header check.

Alternatively, we could keep the header and be 100% sure it will be respected. This particular use case is exactly why it was added originally.

¹: there is ongoing (but currently incomplete) work to isolate this in a separate process to at least make it recoverable at runtime.

FWIW the checksum is probably a good idea too in the event of disk corruption, but it's not a complete solution to that migration problem.

I got your concerns and returned the magic header.

I would say the situation you're describing is almost impossible, because the current data file have to pass 3 independent checks:

the flatbuffer format verification(many bytes, >8)

version (4 bytes)

checksum (8 bytes)

But there is nothing wrong with one extra check for sure.

P.S. Chrome uses a just a checksum in about the same situation: https://chromium-review.googlesource.com/c/chromium/src/+/1093152

antonok-edm · 2025-08-29T18:17:46Z

src/engine.rs

@@ -51,16 +57,18 @@ pub struct Engine {
    filter_data_context: FilterDataContextRef,
 }

+const ADBLOCK_FLATBUFFER_VERSION: u32 = 1;


should be incremented

Because it a new format, we can start with 1 here.

src/flatbuffers/fb_network_filter.fbs

src/cosmetic_filter_cache.rs

src/cosmetic_filter_cache_builder.rs

src/data_format/mod.rs

atuchin-m added 6 commits August 18, 2025 19:26

Cosmetic filters in a flatbuffer

3419a9d

use last byte to encode permissions

c2bf91e

unstage flat_map

c9d417c

return flat map builder

50edf82

decompose network filters

16013a4

decompose cosmetic filters

d4dd188

atuchin-m self-assigned this Aug 25, 2025

atuchin-m requested a review from antonok-edm as a code owner August 25, 2025 13:36

github-actions bot reviewed Aug 25, 2025

View reviewed changes

atuchin-m added 6 commits August 25, 2025 21:38

cargo fmt

28f6c9b

Fix cargo clippy

d87eeb3

Fix cargo fmt & some review issues

db99da0

move make_flatbuffer to engine.rs

1b51596

update comments

72e699f

add more tests

35f30ff

atuchin-m force-pushed the cosmetic-filter-fb branch from f5bbf84 to 35f30ff Compare August 26, 2025 19:53

atuchin-m requested a review from boocmp August 26, 2025 20:07

cargo fmt

4ce4fb0

boocmp approved these changes Sep 2, 2025

View reviewed changes

antonok-edm reviewed Sep 2, 2025

View reviewed changes

antonok-edm reviewed Sep 9, 2025

View reviewed changes

src/flatbuffers/fb_network_filter.fbs Outdated Show resolved Hide resolved

src/cosmetic_filter_cache.rs Outdated Show resolved Hide resolved

src/cosmetic_filter_cache_builder.rs Outdated Show resolved Hide resolved

atuchin-m added 6 commits September 11, 2025 12:43

Fix doc format in .fbs

02efd49

add filter_data_context.rs

07888d9

Add hash to serialized data

c08f3f1

update hashes in tests

7fd35dd

move mod for generated code

c60139b

return ADBLOCK_RUST_DAT_MAGIC

93bb457

antonok-edm reviewed Sep 16, 2025

View reviewed changes

src/data_format/mod.rs Outdated Show resolved Hide resolved

Move version to .dat header

26af341

antonok-edm approved these changes Sep 16, 2025

View reviewed changes

src/data_format/mod.rs Outdated Show resolved Hide resolved

set ADBLOCK_FLATBUFFER_VERSION=2

abd58d9

atuchin-m merged commit cdf6eb1 into 0.11.x Sep 16, 2025
6 of 7 checks passed

atuchin-m deleted the cosmetic-filter-fb branch September 16, 2025 22:52

Cosmetic filter fb #514

Cosmetic filter fb #514

Uh oh!

Conversation

atuchin-m commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Rust Benchmark

Uh oh!

github-actions bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antonok-edm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atuchin-m Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

atuchin-m commented Aug 25, 2025 •

edited

Loading

github-actions bot left a comment •

edited

Loading

github-actions bot left a comment •

edited

Loading

atuchin-m Sep 11, 2025 •

edited

Loading