Airflow unable to handle more than 16 tasks at once #32609

atusman24 · 2023-07-14T15:26:48Z

atusman24
Jul 14, 2023

Hi,

We have Airflow version 2.5.0 and we're trying to execute a job that contains 60 tasks which can be executed in parallel. Despite trying several concurrency variable values in Airlfow service and environment settings, the airflow is still unable to execute 60 tasks in parallel.
While executing, it starts out with 32 tasks(screenshot 1,2,3).

And after a few moments 16 out of the 32 change status to 'up_for_try' state(screenshot 4,5) and eventually get queued again.

On the other hand these 32 tasks that are initiated at once are still being executed in ECS as shown in screenshot 6. So it seems like those tasks are still executing while Aiflow is unable to keep track of them and sets them to be rerun.

Any help would be appreciated. Thank you

potiuk · 2023-07-14T16:43:13Z

potiuk
Jul 14, 2023
Collaborator

A lot depends on resources you have and your deployment.

You have not explained which executor you have, how many machines you have, what is the parallelism for those machines, what kind of tasks you have them. Airflow as you see - is correclty scheduling the tasks, now it's your job as deployment manager, to make sure that you have enough workers (for celery) with enough resources (CPU/machines, memory, sometimes GPUS etc) to be able to run as many parallel processes you migh want.

In Kubernetes - you need to have enough resources to be able to run as many PODs you have.

Depending on the capacity of your system, the number of parallel things that might be executed. It's also your job to make sure that you allocate appropriately the resources (for example via K8S so I suggest you look at your deployment and make sure those resources are not blocking you.

5 replies

atusman24 Jul 14, 2023
Author

We're using celery executor with the following environment variables:

`environment_variables = {
            "AIRFLOW__CORE__SQL_ALCHEMY_CONN": airflow_rds.database_connection_url,
            "AIRFLOW__CORE__EXECUTOR": "CeleryExecutor",
            "AIRFLOW__CORE__DAGS_FOLDER": f"{airflow_efs.mount_point}{airflow_efs.airflow_directory}/dags",
            "AIRFLOW__CORE__DEFAULT_TIMEZONE": "Europe/Berlin",
            "AIRFLOW__CORE__ENABLE_XCOM_PICKLING": "True",
            "AIRFLOW__CORE__DAG_DISCOVERY_SAFE_MODE": "False",
            "AIRFLOW__LOGGING__REMOTE_LOGGING": "True",
            "AIRFLOW__CORE__MAX_ACTIVE_TASKS_PER_DAG": "128",
            "AIRFLOW__CORE__PARALLELISM": "128",
            "AIRFLOW__CORE__DAG_CONCURRENCY": "128",
            "AIRFLOW__CORE__MAX_ACTIVE_RUNS": "128",
            "AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID": "CLOUDWATCH_LOGGING",
            "AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER": f"cloudwatch://{airflow_log_group.log_group_arn}",
            "AIRFLOW__CELERY__BROKER_URL": "sqs://",
            "AIRFLOW__CELERY__DEFAULT_QUEUE": queue.queue_name,
            "AIRFLOW__CELERY__RESULT_BACKEND": f"db+{airflow_rds.database_connection_url}",
            "AIRFLOW__CELERY__CELERY_CONFIG_OPTIONS": "celery_config.CELERY_CONFIG",
            "AIRFLOW__WEBSERVER__DEFAULT_UI_TIMEZONE": "Europe/Berlin",
            "AIRFLOW__WEBSERVER__RBAC": "True",
            "AIRFLOW__WEBSERVER__ENABLE_PROXY_FIX": "True",
            "AIRFLOW__API__AUTH_BACKEND": "airflow.api.auth.backend.basic_auth",
            "AIRFLOW_VAR_USER_DAG_S3_PATH": workspace_bucket.s3_url_for_object(
                "airflow/dags/"
            ),
            "AIRFLOW_VAR_WORKSPACE_BUCKET": workspace_bucket.bucket_name,
            "AIRFLOW_VAR_SQS_QUEUE_NAME": queue.queue_name,
            "AIRFLOW_VAR_SQS_QUEUE_URL": queue.queue_url,
            "AIRFLOW_VAR_ENVIRONMENT": DeploymentSettings.ENVIRONMENT,
            "AIRFLOW_VAR_PROJECT_NAME": DeploymentSettings.PROJECT_NAME,
            "AIRFLOW_VAR_ECS_ENVIRONMENT": json.dumps(
                ecs_environment, separators=(",", ":")
            ),
            "AIRFLOW_VAR_VPC_PRIVATE_SUBNETS": json.dumps(
                vpc_private_subnets, separators=(",", ":")
            ),
            "AIRFLOW_VAR_ALARM_SNS_TOPIC": airflow_alarm_sns_topic.topic_arn,
            "AIRFLOW_VAR_SUCCESS_SNS_TOPIC": airflow_success_sns_topic.topic_arn,
        }`

The machine specs are following:

cpu=2048
memory_limit_mib=4096 * 1.5

Also tried with these limits doubled, same outcome.

potiuk Jul 14, 2023
Collaborator

You need to check how many workers you have, what parallelism you have, how many memory you have, how much memory you have per task and see if those are limiting you. Generally - each task is a separate python interpreter process.

If you have ONE celery worker and you have not changed this parameter:

https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#worker-concurrency

Then yeah. you have maximum 16 parallel celery task processing the queue. So if you need more, you either need to increase that (and make sure your machine can handle it - that's your job as a deployment manager to properly allocate resources and observe your system to see if it is capable and adjust it) or you need to add more workers (i.e. machines) to handle the number of tasks.

atusman24 Jul 15, 2023
Author

Thank you for the response and pointing out the documentation. I will look into it further to enable concurrency and figure out how I can check and allocate resources that I need to perform all these tasks in parallel. However, I'm still not able to understand why in ECS the 32 tasks initialised through Airflow keep processing while 16 of them are set to 'up_for_retry' and then 'queued' state by Airflow but they keep executing in ECS. To me it seems like ECS keeps executing them and finishes them but then Airflow re-triggers them again, so in a way some tasks are repeated.

potiuk Jul 15, 2023
Collaborator

I can't understand either - your explanation does not make it clear. You stil have not explained how many workers and how much concurrency you have, and it is impossible to get into your head as a deployment manager managing your environment and help you.

rraghunathan-exelixis Aug 20, 2025

@atusman24 are you able to find out why it is queued state? I'm facing the similar issue, im running 40 tasks per dag, but it is only running 23-25 tasks per run, all others are struck in queued state. There is not proper logs on any of the components.

these are my values,
celery:
worker_concurrency: 50
core:
parallelism: 128
max_active_tasks_per_dag: 50
max_active_runs_per_dag: 50

Please let me know

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Airflow unable to handle more than 16 tasks at once #32609

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Airflow unable to handle more than 16 tasks at once #32609

Uh oh!

atusman24 Jul 14, 2023

Replies: 1 comment · 5 replies

Uh oh!

potiuk Jul 14, 2023 Collaborator

Uh oh!

Uh oh!

atusman24 Jul 14, 2023 Author

Uh oh!

Uh oh!

potiuk Jul 14, 2023 Collaborator

Uh oh!

atusman24 Jul 15, 2023 Author

Uh oh!

potiuk Jul 15, 2023 Collaborator

Uh oh!

rraghunathan-exelixis Aug 20, 2025

atusman24
Jul 14, 2023

Replies: 1 comment 5 replies

potiuk
Jul 14, 2023
Collaborator

atusman24 Jul 14, 2023
Author

potiuk Jul 14, 2023
Collaborator

atusman24 Jul 15, 2023
Author

potiuk Jul 15, 2023
Collaborator