This repository is a store of useful scripts used for testing running manual flink tests. The main applications of these scripts are mainly for:
- Debugging
- Compatibility/Migration testing
To increase the number of TaskManager containers (scale) using Docker Compose, you can use the scale key directly inside the env-compose.yml
file. Or one can override it using the scale with the docker-compose CLI. An example of how to do this is shown below:
docker-compose up --scale taskmanager=3
Once you have executed the docker-compose up
command, you can navigate to the flink-web-ui
at localhost:8082.
Note, Hudi and Docker images (Flink) are all using Java 8.
Each Docker image file has a naming convention as such:
flink.Dockerfile.{hudi-version}
The tag follows the following naming convention:
hudi_local_tests:flink1.18__hudi_{hudi-version}
To build the docker images:
docker build -f docker_files/flink.Dockerfile.1.0.1 -t hudi_local_tests:flink1.18__hudi_1.0.1 .
NOTE: For flink-bundle jars that have not been released, and you would like to do tests for, you will need to manually compile the flink-bundle-jar, then copy it /opt/flink/lib
before building as such:
# Copy local file into the image
COPY hudi_jars/hudi-flink1.18-bundle-1.0.2.jar /opt/flink/lib/
To ensure that the container is lightweight, we are not including the full Hadoop dependencies, i.e. including the entire hadoop library and exporting its path as an env variable.
Instead, we will be copying the required jar dependencies to /opt/flink
and building the docker images. Should there be any dependency errors, feel free to add the requried jars to the Docker image files and rebuild them.
To start the cluster, modify the Docker-Compose file's image to the Hudi version you are testing on for both jobmanager
and taskmaanger
.
image: hudi_local_tests:flink1.18__hudi_1.0.1
Then start the cluster as such:
sh start.sh
To stop the cluster:
sh teardown.sh
Mounted volumes on taskmanagers are:
- ./hudi_demo:/opt/flink/examples/hudi_demo
Mounted volumes on jobmanagers are:
- ./test_scripts:/opt/flink/examples/test_scripts
- ./hudi_demo:/opt/flink/examples/hudi_demo
- hudi_demo: For storing savepoints and table data
- test_scripts: For storing sql testing scripts
Flink includes a SQL Client CLI, and it's the easiest way to submit SQL jobs directly to your cluster.
- Start your cluster (if not already):
This command starts a cluster with 2 taskmanagers.
docker-compose up --scale taskmanager=2 -d
- Exec into the JobManager container:
docker exec -it jobmanager bash
- Start the SQL Client inside the container:
./bin/sql-client.sh
- From here, you can run Flink SQL queries interactively or use the
sql-client.sh
to submit.sql
or.jar
jobs (see below).
- Create a SQL file (e.g., myjob.sql):
CREATE TABLE source (
id INT,
name STRING
) WITH (
'connector' = 'datagen'
);
CREATE TABLE sink (
id INT,
name STRING
) WITH (
'connector' = 'print'
);
INSERT INTO sink SELECT * FROM source;
- Copy it into the container:
docker cp myjob.sql jobmanager:/opt/myjob.sql
- Run it:
docker exec -it jobmanager ./bin/sql-client.sh -f /opt/myjob.sql
Since savepoint directory is configured, stopping a job will trigger a savepoint save by default
flink stop job-id
One-liner to Run SQL File from Savepoint:
./bin/sql-client.sh \
-Dexecution.savepoint.path=file:///opt/flink/examples/hudi_demo/savepoints/savepoint-123 \
-Dexecution.savepoint.ignore-unclaimed-state=false \
-f flink_state_demo.sql
If job manager exits with Exited (239)
, it's likely specific to the application or entrypoint inside the container.
docker logs <container_id_or_name>