-
Notifications
You must be signed in to change notification settings - Fork 199
Batch Support #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Batch Support #142
Conversation
Could that also support adding jobs to already existing batch? Like SleepyJob enqueuing another job that would also be added to the batch. |
@mbajur Definitely. As long as you're in a job that's part of the batch, adding another job to the batch would work fine. It'd be pretty simple to extend the existing code to handle that - something like this would solve your use-case I think? class SleepyJob < ApplicationJob
queue_as :background
def perform(seconds_to_sleep)
Rails.logger.info "Feeling #{seconds_to_sleep} seconds sleepy..."
sleep seconds_to_sleep
batch.enqueue { AnotherJob.perform_later }
end
end I can update the This would only work safely inside of the job - if you were outside of the job, it's possible the batch would finish before the job gets created. |
Yes that would absolutely do the trick for me :) Thank you! |
Hi @rosa 👋🏼 Congrats on getting SolidQueue past incubation and under the Rails umbrella officially! I'm sure you've got alot on your plate! Are there any questions I can answer in regards to this PR? I can take the interface/functionality further, but I wanted to discuss it a bit before doing that. If there's anything additional you'd like me to tighten up/try out before discussing it, i'm happy to do so. Also ok to just be on hold and not ready to discuss this further atm. Since it's been a couple months, I figured i'd check in. |
Hey @jpcamara, so sorry for the delay and the silence here. I just haven't had the proper time to dedicate to this, and I think this requires and deserves more than a quick look. Thank you so much for putting this together! My instinct with this kind of feature is that they require someone to use them before they're ready. From just looking at the code, I'm not quite sure what kind of edge cases and race conditions could arise here. This is how most of Solid Queue has been built: we've used it in HEY before making it "official" for everyone, seeing how it behaves under certain loads and what kind of problems we encounter. We caught and improved quite a few things that way. We don't have a use case for batches right now, so I'm afraid I won't be able to take this feature to this "production-test" point on my side. Do you see yourself using this in a production setting? |
That makes sense! It was a bit of a chicken and an egg issue for me - I wanted to have batches in SolidQueue before starting to transition some things over, because I have code using Sidekiq Pro batches. But I can start experimenting with it now and report back. I'll continue to work on this PR as well in that case, too. |
Thanks for all the hard work here. I'm a big fan of batch jobs so I've been keeping my eye on this PR for a while. Sidekiq Pro and others support the notion of child batches. Like batch jobs, child batches let you work with a higher level abstraction that conceptually simplifies your background work. While implementing that in this PR would be feature creep, I wanted to raise awareness of it in hopes that we design with its extensibility in mind. Thanks again @jpcamara |
hey @dimroc! I couldn't agree more. I used child batches in Sidekiq recently in a project and it highlighted the need to add them to this PR - it's an important feature. I've been putting alot of work into releasing a blog series on ruby concurrency and it's been eating up my free coding-related time, but i'm prioritizing getting back to this soon. Thanks for the feedback! |
c58f11d
to
5ba1c27
Compare
I've added child batches to this PR @dimroc |
Added support to this PR for enqueueing within a job using the syntax I suggested: |
@jpcamara just an idea here. Could this functionality become its own gem? Like an add-on for solid_queue? |
One place I think this could be useful could be rolling back changes. Since the move to SQLite doesn't create jobs to a secondary database until after committing records to the main database, there is no atomic guarantee that both records will commit to both databases, leaving us to need manual rollback logic. Additionally, this reminds me that sidekiq had some atomic writing of batch jobs guarantee, either in pro or enterprise. Same thing might happen if you make a change in your database and call an external API. With SQLite you no longer want to make those changes in your database and make the [long-running] external API call in a transaction. Let's say you have to call stripe twice and want to keep a record of what stage you're at. But something goes wrong with the second stripe call (say the charge is declined) and you need to rollback your database so it's not in an inconsistent state. In this case you can queue the rollback as part of the batch. If memory serves, I think the pattern for this sort of thing is called Sagas, or a DAG such as Elixir has with GenStage. I also found this pattern extremely useful in the past because you can parallelize work more effectively. Say I need to make 1000 remote API calls. Each call should really be in its own job so that it can be retried if it hits the API limits and I need to perform some final job once all 1000 API calls have been made and the batch is complete. |
@mariochavez it's true, it probably could be a gem! Sidekiq has batches as a pro feature, but there is also an open-source gem that mostly supports the same api https://github.com/breamware/sidekiq-batch. There are some gotchas with approaching it that way (one, for instance, around jobs being automatically cleaned up and not being able to do anything but warn users against it). But my main motivation is that I personally want it as a first-class feature of SolidQueue. It's a first-class (albeit paid) feature of Sidekiq, and it's a first-class feature of GoodJob. Being built-in means it's more likely to get use/support and alleviates concerns it may be abandoned at some point. I also think it's a great core feature of a job library. Gush is awesome! It definitely works similarly, though this being backed by a DB in SolidQueue means it has more ACID-type guarantees. Something I would like to see is an even more sophisticated "workflow" type layer that worked with any activejob system, and that's something I've toyed around wtih over the past year. That kind of system I think goes a step beyond, is more complicated, and is better served as a separate gem. I think batch support being included in the job server is a good fit. |
Thanks for all this work @jpcamara! I'm curious what the expected behavior is with An argument for keeping this in the main gem is to ease integration testing with the other features to increase cohesion and stability. I can see it getting hairy when you stack a few different configurations on top of nested batches, and having to assert proper behavior. https://github.com/rails/solid_queue?tab=readme-ov-file#concurrency-controls |
I would love to see batches in Solid Queue! My use case is primarily creating workflows. |
Just wanna thank @jpcamara for the work he's done, is doing here. Would love to use this 🙏 |
Hey @kaka-ruto, I think I saw a message about you being willing to try this out? That would be great! Most of my free time is working on a RubyConf talk I have in a couple weeks, but i'll be shifting back to this right after that and will give you an update. |
1ef7f38
to
8099030
Compare
looking amazing. let's ship this please <3 |
@jpcamara are you planning on finishing this PR? We're in the process of migrating to SolidQueue and require this functionality, so would be interested in testing or maintaining this branch. |
* Introduces a "batch" concept, similar to batches present in Sidekiq Pro and GoodJob * Batches monitor a set of jobs, and when those jobs are completed can fire off a final job * This introduces a SolidQueue::JobBatch model, as well as the ability to enqueue jobs and associate them with the batch * There are still more ideas to figure out, but this provides a basic batch scaffolding to spark discussion
wait_for_job_batches_to_finish_for(2.seconds) | ||
wait_for_jobs_to_finish_for(2.seconds) | ||
|
||
expected_values = [ "1: 1 jobs succeeded!", "1.1: 1 jobs succeeded!", "2: 1 jobs succeeded!", "3: 1 jobs succeeded!" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expected_values = [ "1: 1 jobs succeeded!", "1.1: 1 jobs succeeded!", "2: 1 jobs succeeded!", "3: 1 jobs succeeded!" ] | |
expected_values = [ "1: 1 jobs succeeded!", "1.1: 1 jobs succeeded!", "2: 2 jobs succeeded!", "3: 1 jobs succeeded!" ] |
Batch 2 has 2 child jobs no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you might want to be more lenient here since it seems to depend on timing.
As you can see below, you verify that 4 jobs were finished, which means that batch 2 should have processed 2 jobs.
@rosa @jpcamara What's next steps with this PR and likelihood/timeline to having it merged? If you're just waiting on some production-test? We do 100k jobs a minute on good_job right now, probably about 5% with batches. It would be a significant lift for us to migrate to solid_queue but if this is all you're waiting on we'd be willing to commit to it. I just don't want to rewrite everything to run on this branch only to have it closed or taken in a different direction. |
* This means we can store the arguments and settings by letting the user do `BatchJob.new(arguments).set(options)` * Yield the batch in `enqueue` in case someone needs info from it * When you serialize then deserialize an activejob instance, the arguments are in the serialized_arguments field and can only be transferred over by the private method `deserialize_arguments_if_needed`. This is pretty janky, so there is probably something i'm missing * `perform_all_later` let's us do a perform_later even with instance, which does not seem to be possible on the instances themselves * Make sure `batch` is still first arg of the batch callback * Add spec for adding arguments and options to the batch callback
* Support Ruby < 3.2 by removing the implicit key/variable syntax
* on_failure fires the first time any of the jobs fail, even once * on_success only fires if all jobs work (after retries) * remove unused job_id
* Allows enqueueing a job within a job, as part of the batch
* Support nested batches * Parent batches will not complete until all child batches have been completed
* Attach success jobs to the parent batch, not to the current batch (which has already finished at this point)
* Thanks to Mikael Henriksson for his work in rails#590. His work decentralizes management of batch status by moving it to the BatchUpdateJob, and tracking status using counts rather than querying specific job statuses after the fact. This is a much simpler approach to tracking the jobs, and allows us to avoid a constantly polling set of queries in the dispatcher. Also add in arbitrary metadata to allow tracking data from start to end of execution. This also means enqueueing a BatchUpdateJob based on callbacks in two different kinds of Batchable, which are included when a job is updated and finished, or when a FailedExecution is created (since failed jobs never "finish"). * This batch feature already took some inspiration from the GoodJob batch implementation (https://github.com/bensheldon/good_job). But now we also increase that by adopting some of the buffering and abstractions in a similar form as GoodJob. To discourage heavy reliance on the JobBatch model, it has been renamed to BatchRecord, and a separate Batch interface is how you interact with batches, with some delegation to the core model. * A new Buffer class (also modeled after GoodJob) was added specifically for batches. This was primarily added to support enqueue_after_transaction_commit. We now override the ActiveJob #enqueue method so we can keep track of which jobs are attempting to enqueue. When enqueue_after_transaction_commit is on, those jobs do not enqueue until all transactions commit. By tracking them at the high level enqueue and keeping a buffer of jobs, we can ensure that the jobs get tracked even when their creation is deferred until the transaction is committed. The side benefit is that we get to enqueue all the jobs together, probably offering some performance advantage. This buffer also keeps track of child batches for the same reason. * To support triggering a callback/BatchUpdateJob when a job finishes, the update to finished_at needed to become an update! call * As a simplification, on_failure is now only fired after all jobs finish, rather than at the first time a job fails * The adapter logic itself also needed to be updated to support the buffer and enqueue_after_transaction_commit. If a job is coming from a batch enqueue, we ignore it here and allow the batching process to enqueue_all at the end of the enqueue block. If the job is originally from a batch, but is retrying, we make sure the job counts in the batch stay updated. I don't love this addition, since it adds alot of complication to the adapter code, all solely oriented around batches * Batches benefit from keeping jobs until the batch has finished. As such, we ignore the preserve jobs setting, but if it is set to false, we enqueue a cleanup job once the batch has finished and clear out finished jobs * Implement preserved jobs test and remove todo * Idempotent updates with pessismistic locks * Check if it finished before we acquired the lock * Use enqueue_all directly rather than passing through activejob for completion jobs Co-authored-by: Mikael Henriksson <[email protected]>
* BatchExecution allows us to know for sure we only ever run completion on a job once. We destroy it and update the counts in a transaction. Also can remove the batch_processed_at field from jobs, which are meant to be touched as little as possible and relevant states reflected in *_execution models * It also gives us a slightly cleaner interface in the batchable classes * Updated some table naming and pruned unused fields/indexes * Increase child batch count as new batches are enqueued, even in existing batches * Refactor to a unified Batch interface * It was overly complicated to split Batch and BatchRecord apart just to keep a more strict interface * That concept was taken from GoodJob, but it didn't feel in the spirit of the simplicity of the SolidQueue project. It was alot of concepts to juggle in your head * Also moved around some files (like the cleanup and empty jobs) to the more appropriate app/jobs
* Use BatchPreparable module to add batch executions and update the batch totals after creating ready/claimed records * Remove buffer logic - this makes it harder to implement empty batches, but results in much simpler to understand code. Also batches will always exist now, whether the jobs inside ever get enqueued or not * Move logic for getting status and querying by status scopes to Trackable module * total_child_batches wasn't being used for much, so remove it
* Making it explicit is the easiest option, and the most in alignment with solid queue * Fix errors around upserting across providers. SQLite and Postgres share identical syntax (at least for this use-case) and mysql works differently
Support for batches in SolidQueue!
Batches are a powerful feature in Sidekiq Pro and GoodJob which help with job coordination. A "Batch" is a collection of jobs, and when those jobs meet certain completion criteria it can optionally trigger another job with the batch record as an argument.
The goal of this feature is to:
This PR provides a functional batch implementation. The following scenarios will work:
You should see the following in your logs:
Here is the full interface, demonstrating a few different options:
Here are the things that are open questions and missing implementation details:
JobBatch
the right name? General feedback on naming in the featureon_success
,on_finish
,on_failure
discard_on
by marking the job as finished. That means the batch cannot identify that the job actually failed.