feat: Add specialized ApifyRequestQueue clients #573

Pijukatel · 2025-08-27T14:09:20Z

Description

ApifyRequestQueueClient can be created in two access modes - single, shared:
- shared - current version that supports multiple producers/consumers and locking of requests. More Apify API calls, higher API usage -> more expensive, slower.
- single - new constrained client for self-consumer and multiple constrained producers. (Detailed constraints in the docs). Fewer Apify API calls, lower API usage -> cheaper, faster.
Most of the ApifyRequestQueueClient tests were moved away from actor-based tests, so that they can be parametrized for both variants of the ApifyRequestQueueClients and to make local debugging easier.

Usage:

RequestQueue with shared:
await RequestQueue.open(storage_client=ApifyStorageClient(request_queue_access="shared"))
RequestQueue with default single:
await RequestQueue.open(storage_client=ApifyStorageClient())

Stats difference:

The full client is doing significantly more API calls and regarding the API usage it is doing 50% more RequestQueue writes and also more RequestQueue reads.

Example rq related stats for crawler started with 1000 requests:
shared:
API calls: 2123
API usage: {'readCount': 1000, 'writeCount': 3000, 'deleteCount': 0, 'headItemReadCount': 0, 'storageBytes': 104035}

single:
API calls: 1059
API usage: {'readCount': 3, 'writeCount': 2000, 'deleteCount': 0, 'headItemReadCount': 14, 'storageBytes': 103826}

Issues

Part of: Add specialized/parametrized RequestQueueClients #513

Migrate most actor based tests to normal force cloud rq tests (for future parametrization of the Apify clients)

vdusek

high-level things

vdusek · 2025-09-19T11:29:31Z

src/apify/storage_clients/_apify/_request_queue_client_simple.py

+    - Only one client is consuming the request queue at the time.
+    - Multiple producers can put requests to the queue, but their forefront requests are not guaranteed to be handled
+      so quickly as this client does not aggressively fetch the forefront and relies on local head estimation.
+    - Requests are only added to the queue, never deleted. (Marking as handled is ok.)


Requests are only added to the queue, never deleted. (Marking as handled is ok.)

? I don't get it.

Well API has delete endpoint. We do not expose it in RQ, but if someone is calling it while we use this client to work on that RQ, then it will have unpredictable behavior.
https://docs.apify.com/api/v2/request-queue-request-delete

It is not a normal use case, but better to be explicit about it

vdusek · 2025-09-19T11:31:01Z

src/apify/storage_clients/_apify/_request_queue_client_simple.py

+    - Multiple producers can put requests to the queue, but their forefront requests are not guaranteed to be handled
+      so quickly as this client does not aggressively fetch the forefront and relies on local head estimation.
+    - Requests are only added to the queue, never deleted. (Marking as handled is ok.)
+    - Other producers can add new requests, but not modify existing ones (otherwise caching can miss the updates)


Other producers can add new requests, but not modify existing ones (otherwise caching can miss the updates)

Modify existing ones? What do you mean by that?

API has update endpoint. If someone (other producers) are updating existing requests and this client has already cached the requests locally, then the client will use the outdated request.
https://docs.apify.com/api/v2/request-queue-request-put

It is not a normal use case, but better to be explicit about it

docs/04_upgrading/upgrading_to_v3.md

vdusek · 2025-09-19T11:34:26Z

tests/integration/conftest.py

+        # Reset the Actor class state.
+        apify._actor.Actor.__wrapped__.__class__._is_any_instance_initialized = False  # type: ignore[attr-defined]
+        apify._actor.Actor.__wrapped__.__class__._is_rebooting = False  # type: ignore[attr-defined]


Why do we need this now?

We don't need it, but I saw a warning log in some tests and realized that we do not isolate the tests so well because is_any_instance_initialized and _is_rebooting were leaking from the previous tests

This warning could be observed in tests
WARN Repeated Actor initialization detected - this is non-standard usage, proceed with care

src/apify/storage_clients/_apify/_storage_client.py

tests/integration/conftest.py

tests/integration/test_request_queue.py

docs/04_upgrading/upgrading_to_v3.md

janbuchar · 2025-09-23T09:48:18Z

src/apify/storage_clients/_apify/_request_queue_client.py

-
-    _DEFAULT_LOCK_TIME: Final[timedelta] = timedelta(minutes=3)
-    """The default lock time for requests in the queue."""
+    """Base class for Apify platform implementations of the request queue client."""


I'm pretty sure that we don't want to do it like this. Using inheritance to share code makes it very hard to understand the difference between the two child classes - I'm speaking from experience with the JS counterpart which does the same and it's confusing as hell.

I'd suggest either having a single class with several if-else blocks, or if that proves too spaghettific, some approach based on composition (strategy pattern?)

I think this is not the case of hard-to-understand inheritance. These two classes inherit from "intermediate base class" only the identical methods. The reader can then focus on the implementations specific to the specialized class. There is no overriding of the intermediate class, so there should be no confusion.

There is no overriding of the intermediate class, so there should be no confusion.

it isn't there now, who knows, what the future will bring... please consider the alternatives I suggested

Ok, using composition instead

src/apify/storage_clients/_apify/_storage_client.py

janbuchar · 2025-09-23T11:41:55Z

tests/integration/conftest.py


    async with Actor:
-        rq = await Actor.open_request_queue(name=request_queue_name, force_cloud=True)
+        rq = await RequestQueue.open(storage_client=ApifyStorageClient(access=request.param))


Please put the yield inside a finally block so that we minimize the chance of leaving a dangling RQ.

Could you please describe the scenario it solves?

(Btw. Now that unnamed storages will be used in tests, the platform will not keep them forever even if they somehow leak the test.)

If the test throws an error, it will be propagated through that yield. For that reason, it's better to put cleanup in a finally block. I agree that in this case it's unlikely that it'd cause a serious issue, but there's no good reason not to add that block.

I am still not sure I follow. The pytest fixture after the yield will run even if there was an exception in the test. It will not run if there was an exception in the fixture code itself, but if there was an exception during rq creation, then it was not created and there is nothing to clean. Or do I miss something?

tests/integration/conftest.py

tests/integration/test_request_queue.py

…ient

janbuchar · 2025-09-24T07:44:37Z

tests/integration/test_apify_storages.py

+            cloud_storage_client=ApifyStorageClient(access='shared'),
+        )
+    )
+    async with Actor():


I sure hope that the parentheses aren't required here 🙂

Not required. Removed

janbuchar · 2025-09-24T08:29:59Z

src/apify/storage_clients/_hybrid_apify/_storage_client.py

+
+
+@docs_group('Storage clients')
+class ApifyHybridStorageClient(StorageClient):


Not sure about the name - after all, this is the default behavior, and "hybrid" does not sound like something that should be a default. Let's try to think of something better. Maybe we could even toot our own horn with something like SmartApifyStorageClient.

Ok, renamed

janbuchar · 2025-09-24T08:34:00Z

src/apify/storage_clients/_hybrid_apify/_storage_client.py

+            storage_client=self._get_suitable_storage_client(force_cloud=force_cloud),
+        )
+
+    @cached_property


I'd skip the caching here. IMO the subtle bugs it could introduce are not worth the tiny performance gain

Well, I was not thinking about the performance. More like, this should never change during runtime. If it changes during runtime, then a lot of assumptions made in the code are no longer valid.

Agreed. But caching the value only further obscures the problem, if one appears

Ok, removed

src/apify/_actor.py

janbuchar · 2025-09-24T08:55:30Z

src/apify/_actor.py

+            # The client was manually set to the right type in the service locator. This is the explicit way.
+            return storage_client
+
+        if isinstance(storage_client, ApifyStorageClient):


So if I do service_locator.set_cstorage_client(ApifyStorageClient()) before Actor init, this will force using the filesystem locally? That doesn't sound too desirable.

Well, at this point, it is a guess what the user wants. Only a fully explicit setting of the SmartApifyStorageClient tells us what the user is really trying to do. We can guess in the other direction or throw an exception and allow only one of the two options: a fully implicit(default) or fully explicit client (SmartApifyStorageClient).

It is kind of an edge case, so I am fine with any of those. If you have a strong preference, I will do it that way.

In this particular case, I would throw instead of guessing, but include detailed instructions for setting a custom client.

Ok, throwing

janbuchar · 2025-09-24T09:32:50Z

src/apify/storage_clients/_apify/_request_queue_client.py

-
-    _DEFAULT_LOCK_TIME: Final[timedelta] = timedelta(minutes=3)
-    """The default lock time for requests in the queue."""
+    """Base class for Apify platform implementations of the request queue client."""


There is no overriding of the intermediate class, so there should be no confusion.

it isn't there now, who knows, what the future will bring... please consider the alternatives I suggested

src/apify/storage_clients/_hybrid_apify/_storage_client.py

Co-authored-by: Jan Buchar <[email protected]>

janbuchar

LGTM, thanks!

Approved by Honza and Vlada is on vacation now.

Mantisus

I like it.

Just one question.

Mantisus · 2025-09-26T00:48:28Z

src/apify/storage_clients/_apify/_request_queue_single_client.py

+
+        if request.handled_at is None:
+            request.handled_at = datetime.now(tz=timezone.utc)
+            self.metadata.handled_request_count += 1


The value of pending_request_count in metadata is not updated anywhere. Only total_request_count and handled_request_count are updated. Is this correct?

Thanks. I added tracking of pending requests in local metadata estimation.

Pijukatel added 5 commits August 26, 2025 13:26

Draft for tests

8e2f5d4

Updated draft

1d869a4

Try to use list_head

08df986

Locks not needed with in_progress

6131fff

Add alternate client

553663a

github-actions bot assigned Pijukatel Aug 27, 2025

github-actions bot added this to the 122nd sprint - Tooling team milestone Aug 27, 2025

github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Aug 27, 2025

Pijukatel changed the title ~~No locking queue~~ feat: No locking queue Aug 27, 2025

Pijukatel added 2 commits August 28, 2025 13:03

WIP

eadab26

Find the chacing problem.

249f8f5

Migrate most actor based tests to normal force cloud rq tests (for future parametrization of the Apify clients)

Pijukatel changed the title ~~feat: No locking queue~~ feat: Add specialized ApifyRequestQueueClientSimple Aug 28, 2025

Pijukatel requested a review from vdusek August 28, 2025 13:09

Merge remote-tracking branch 'origin/master' into no-locking-queue

4ada123

github-actions bot added the tested Temporary label used only programatically for some analytics. label Aug 28, 2025

Pijukatel requested a review from janbuchar August 28, 2025 13:15

Pijukatel added 4 commits August 28, 2025 15:51

Wip changes

10e0652

Add init cache test, update upgrading guide

359c46e

Merge remote-tracking branch 'origin/master' into no-locking-queue

ce090c0

Finalize change and add few more tests

b511011

Pijukatel marked this pull request as ready for review September 19, 2025 08:57

Pijukatel added 2 commits September 19, 2025 11:06

Merge remote-tracking branch 'origin/master' into no-locking-queue

fb32861

Remove unnecesary methods from the specialized client

7ec13ef

Pijukatel mentioned this pull request Sep 19, 2025

Investigate caching options in ApifyRequestQueueClient #550

Open

vdusek previously requested changes Sep 19, 2025

View reviewed changes

Pijukatel added 4 commits September 19, 2025 16:14

Merge remote-tracking branch 'origin/master' into no-locking-queue

10bc7e2

Rename default_request_queue_apify

7712410

Use single and shared literals and rename the RQ client classes

e63f546

Merge remote-tracking branch 'origin/master' into no-locking-queue

ffa70ff

Pijukatel requested a review from vdusek September 19, 2025 14:58

Update upgrading guide

79c02f5

Pijukatel changed the title ~~feat: Add specialized ApifyRequestQueueClientSimple~~ feat: Add specialized ApifyRequestQueue clients Sep 23, 2025

janbuchar reviewed Sep 23, 2025

View reviewed changes

Pijukatel added 4 commits September 24, 2025 08:34

Extract storage related complexity from Actor to dedicated storage cl…

d29a534

…ient

Merge remote-tracking branch 'origin/master' into no-locking-queue

506b770

Update log test

1cc80bb

Rename access to request_queue_access

860b0ec

Pijukatel requested a review from janbuchar September 24, 2025 08:28

janbuchar reviewed Sep 24, 2025

View reviewed changes

Pijukatel and others added 2 commits September 24, 2025 13:53

Update src/apify/_actor.py

e6c6fc5

Co-authored-by: Jan Buchar <[email protected]>

Review comments

da2f5df

Pijukatel force-pushed the no-locking-queue branch from f06eb70 to da2f5df Compare September 24, 2025 12:22

Pijukatel added 2 commits September 24, 2025 16:58

Merge remote-tracking branch 'origin/master' into no-locking-queue

8861c5e

Review comments

1e8a834

Pijukatel force-pushed the no-locking-queue branch from 0d16af0 to 1e8a834 Compare September 24, 2025 15:34

Pijukatel requested a review from janbuchar September 24, 2025 15:45

Pijukatel added 2 commits September 25, 2025 10:58

Update based on Crawlee update

de941d4

Merge remote-tracking branch 'origin/master' into no-locking-queue

b4a588d

vdusek requested review from Mantisus and vdusek and removed request for vdusek September 25, 2025 09:27

Pijukatel added 3 commits September 25, 2025 13:25

Use composition instead of inheritance

c5968bc

Polish some docs

49c357e

More docs polishing

6edb093

janbuchar approved these changes Sep 25, 2025

View reviewed changes

Mantisus reviewed Sep 26, 2025

View reviewed changes

Track pending_request_count in local metadata estimation

b17ebef

Pijukatel merged commit f830ab0 into master Sep 26, 2025
23 checks passed

Pijukatel deleted the no-locking-queue branch September 26, 2025 08:19



		@docs_group('Storage clients')
		class ApifyHybridStorageClient(StorageClient):

feat: Add specialized ApifyRequestQueue clients #573

feat: Add specialized ApifyRequestQueue clients #573

Uh oh!

Conversation

Pijukatel commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Usage:

Stats difference:

Issues

Uh oh!

vdusek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pijukatel Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pijukatel Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Pijukatel commented Aug 27, 2025 •

edited

Loading

Pijukatel Sep 19, 2025 •

edited

Loading

Pijukatel Sep 24, 2025 •

edited

Loading