Skip to content

Allow to install custom python libraries #592

@Maleware

Description

@Maleware

Current Situation

If you want to use non-standard python libraries in an Airflow job, you'd need to build a custom image, pip install those and then use your custom image in your cluster.

Preferred Situation

You can configure a requirements.txt, which then will be installed in the Airflow deployment.

Example

E.g. you want to use pandas==2.2.2 in a DAG, currently you would need to setup a CI/CD way of building and deploying a custom Airflow image. The Dockerfile would look like:

FROM oci.stackable.tech/sdp/airflow:${AIRFLOW_VERSION}-stackable${STACKABLE_VERSION}

ARG PYTHON_VERSION=3.9

# Install custom  python libraries
RUN pip install \
    --no-cache-dir \
    --upgrade \
    pandas==2.2.2 

Although this is fairly easy doable it implies maintenance and resources. I consider this being a fairly common use case and thus we should think about if we could cover it with e.g. ( no strong opinion neither on naming nor where it should be in the crd and how )

---
apiVersion: airflow.stackable.tech/v1alpha1
kind: AirflowCluster
metadata:
  name: airflow
spec:
  image:
    productVersion: 2.9.3
  clusterConfig:
    loadExamples: false
    exposeConfig: false
    credentialsSecret: simple-airflow-credentials
    requirements:
      configMap:
          name: custom_requirements

and a configMap

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: custom_requirements
data:
  requirements.txt: |
    pandas==2.2.2 

I think a solution on operator level would remove the pain to construct and maintain a build pipeline to the cluster. It moves the maintenance effort into the Airflow Operator, but this already needs attention ( stackable versions, product versions ).

However, I can't evaluate how much effort we need to put in to archive this and what kind of risks this would imply.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions