Versioning Problem of (likely badly written) Dynamically Created DAGS #54348
Replies: 4 comments 9 replies
-
Does using dynamic task mapping help here? |
Beta Was this translation helpful? Give feedback.
-
That's not true in general. If your intention to generate a different dag every time it is parsed, then yes it will be dynamically changing and. as far as I know - you can disable versioning https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#disable-bundle-versioning for that bundle if your bundle contains DAG that change frequently and it is expected (but this is generally a bad practice) Dynamically generated dags are NOT supposed to dynamically change every time they are parsed. They MAY change slower or faster - depending on how often the "input" to those dynamic dags change. For example if you have list of countries, and you generate dynamic dags based on that - your parsed dag should change only when you add a new country - every time (providing that your code to generate dynamic dag is well written) , parsing the same Dag should produce the same - dynamically generated - parsed representation. So in your case you likely: a) your intention is to change your dag every time it is parssed (which is quite strange, but then you should disable versioning for that bundle) b) there is a problem with dynamic dag generation code - it should produce the same parsed representation, every time Dag is parsed. There might be two resons for b)
Most likely it's 1) and you have to fix your dynamic dag generation to be "stable" if you want to version dag. Which means that serialized version of your Dag should be THE SAME if Dag parsing on your Dag is executed - if none of the inputs changed - and they should remain stable over time, so they should be independent on the time when they are parsed. It could also be 2) - but in order to know whether it's 1) or 2) you have to look at the serialized dag table (see https://airflow.apache.org/docs/apache-airflow/stable/database-erd-ref.html ) find two versions of your Dags and find out what's the difference betwen the dags and what kind of "stability" issue in dynamic dag generation causes it. The "compressed" representation in this table is compressed with Also it might be easier for you if you review your dynamic dag generation looking at potential stability issues from those described above. Once you find the reasons - please let us know - it would be interesting to find out what stability issues people like you might have - or maybe you will find some of the problems coming from our serializaiton. So we are looking forward for your investigation. Converting it into a discussion until you investigate it and provide more information. |
Beta Was this translation helpful? Give feedback.
-
I also updated the title to properly reflect it - this is not a "huge problem with dag versioning", it's either lack of understanding that "dynamically generated dags" should have "stable" property if you want to version them or some specific bug in serializaiton code. I hope as result of that discussion maybe you @gbonazzoli -> when you find reasons for that stability issues you can help us to write good documentation describing what "stability" property of such dynamically generated Dags is - or maybe we will even find some bugs in the implementation. But this is NOT huge problenm with versioning in general, it works as intended, and it should work well with dynamically generated Dags - providing of course that those Dags are properly generated and that there are no bugs. Looking forward to results of your investigation @gbonazzoli |
Beta Was this translation helpful? Give feedback.
-
Maybe even - eventually - we might come up with better tooling to analyse such dynamic dag "stabilty" issues - but I find this case an interesting case study to see what could be the reasons, so that we can write our tooling to be useful to quickly analyse such issues. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Apache Airflow version
3.0.4
If "Other Airflow 2 version" selected, which one?
No response
What happened?
I have a Python Source that generate
PythonOperator
Tasks on a DAG.It is the load of the Staging Area in a Datawarehouse, so during this phase there are hundreds of tables that we need to load from the transactional database to the analytical one, Airflow limits the load through
max_active_tasks=32
DAG parameter.The code is:
The problem with this code is that every days Airflow 3.x registers hundreds of different versions for the DAG.
The reality is that is dynamically generated and so it is obvious that changes dynamically.
Is there anyway to limit the problem ?
What you think should happen instead?
No response
How to reproduce
I think that is a general problem afflicting Airflow 3.x with DAG that are dinamically generated through code.
Operating System
Ubuntu 24.04.3 LTS
Versions of Apache Airflow Providers
Deployment
Virtualenv installation
Deployment details
An ubuntu vanilla installation in python 3.12 virtual env.
Anything else?
no
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions