Zyra Workflow Template

Overview

This repository is a starter template for building containerized workflows with the Zyra CLI. It provides a ready‑to‑use dev container, a reusable GitHub Actions workflow, and example datasets to demonstrate a full pipeline. Clone and adapt it to your own sources, transforms, and outputs.

Included example: a real‑time image‑to‑video pipeline (FTP → metadata validate → MP4 compose → optional Vimeo upload → optional S3 update). Treat this as a reference implementation you can modify or replace with your own stages.

Quick Start: Add a Dataset

Create datasets/<name>.env with the required keys (see datasets/README.md).
Run the GitHub Actions workflow manually with input DATASET_NAME=<name> to validate:
- Acquire → frames under _work/images/${DATASET_NAME}.
- Validate → metadata under _work/images/${DATASET_NAME}/metadata/frames-meta.json.
- Compose → video at _work/output/${DATASET_NAME}.mp4.
Configure credentials for upload/update stages via GitHub Secrets (Vimeo and AWS). For local development, place non-secret values in your project .env and keep it untracked.
Schedule it: enable a cron in the per‑dataset wrapper under .github/workflows/ (uncomment the schedule: block), or create your own wrapper with the desired cron.

Scheduling Tips

Cron examples: 30 3 * * * (daily 03:30), 5 12 * * 4 (Thu 12:05). Times are UTC in GitHub Actions.
Variables: set DATASET_NAME in the wrapper’s with: section; use repo variables to share defaults (e.g., ZYRA_SCHEDULER_IMAGE).
Concurrency: multiple wrappers can run concurrently; frames are cached per dataset.
Reliability: pin the container image by digest; start with a smaller SINCE_PERIOD (e.g., P30D) to seed caches faster.

Current Datasets

These are example datasets provided for demonstration. Each has a per‑dataset workflow under .github/workflows/ with its cron schedule commented out. You can still run any of them manually from GitHub → Actions by choosing the corresponding dataset workflow, or re‑enable its schedule by uncommenting the schedule: block.

The following examples are configured in datasets/*.env. Suggested crons are shown for reference if you choose to re‑enable scheduling.

How to run manually

GitHub → Actions → choose the per‑dataset workflow (e.g., “Dataset (drought)”).
Click “Run workflow” and confirm the branch (typically main).
The run will use the dataset’s .env and produce artifacts under _work/.

Dataset (env)	Suggested Cron	When	Cadence	FTP (host + path)	Pattern	Basemap	Vimeo
drought (`drought.env`)	`5 12 * * 4`	Thu 12:05	7d	`ftp.nnvl.noaa.gov` `/SOS/DroughtRisk_Weekly`	`DroughtRisk_Weekly_YYYYMMDD.png`	—	`/videos/900195230`
fire (`fire.env`)	`30 0 * * *`	Daily 00:30	1d	`public.sos.noaa.gov` `/rt/fire/4096`	`fire_YYYYMMDD.png`	`earth_vegetation.jpg`	`/videos/919356484`
ozone (`ozone.env`)	`45 1 * * *`	Daily 01:45	1d	`public.sos.noaa.gov` `/rt/ozone/4096`	`ozone_YYYYMMDD.png`	—	`/videos/919343002`
land_temp (`land_temp.env`)	`25 2 * * *`	Daily 02:25	1d	`public.sos.noaa.gov` `/rt/land_temp/4096`	`land_temp_YYYYMMDD.png`	—	`/videos/920212337`
sst (`sst.env`)	`30 3 * * *`	Daily 03:30	1d	`public.sos.noaa.gov` `/rt/sst/nesdis/sst/4096`	`sst_YYYYMMDD.png`	—	`/videos/920241809`
sst-anom (`sst-anom.env`)	`45 4 * * *`	Daily 04:45	1d	`public.sos.noaa.gov` `/rt/sst/nesdis/sst_anom/4096`	`sst_anom_YYYYMMDD.png`	`earth_vegetation.jpg`	`/videos/920245845`
snow_ice (`snow_ice.env`)	`25 5 * * *`	Daily 05:25	1d	`public.sos.noaa.gov` `/rt/snow_ice/4096`	`snow_ice_YYYYMMDD.png`	—	`/videos/920619332`
clouds (`clouds.env`)	`10 6 * * *`	Daily 06:10	10m	`ftp.sos.noaa.gov` `/sosrt/rt/noaa/sat/linear/raw`	`linear_rgb_cyl_YYYYMMDD_HHMM.jpg`	—	`/videos/907632335`
enhanced-clouds (`enhanced-clouds.env`)	`30 7 * * *`	Daily 07:30	10m	`ftp.sos.noaa.gov` `/sosrt/rt/noaa/sat/enhanced/raw`	`enhanced_rgb_cyl_YYYYMMDD_HHMM.jpg`	—	`/videos/920672356`
precip (`precip.env`)	`45 8 * * *`	Daily 08:45	30m	`public.sos.noaa.gov` `/rt/precip/3600`	`imergert_composite.YYYY-MM-DDTHH_MM_SSZ.png`	—	`/videos/921800789`
precip-water (`precip-water.env`)	`45 9 * * *`	Daily 09:45	1d	`public.sos.noaa.gov` `/rt/precipitable_water/4096`	`pw_YYYYMMDD.png`	—	`/videos/923507546`

Notes

Basemap: when BASEMAP_IMAGE is set in the dataset env, the compose stage includes it via --basemap <file>.
Cadence: derived from PERIOD_SECONDS in each env; adjust to match source update intervals.

Usage & Scheduling

Primary CI: GitHub Actions (reusable workflow and per‑dataset wrappers).
Contributor guide: see AGENTS.md for structure, style, testing, and PR guidelines.
Local development: docker compose up --build -d and docker compose exec zyra-scheduler bash. The devcontainer layers Node.js + Codex CLI on top of a runtime image. If no runtime is provided, it falls back to python:3.11-slim for lint/tests.

Kubernetes (Rancher) Deployment

Badges:
Example manifests under k8s/ for running datasets as CronJobs with a persistent cache/output volume.
Workflow:
- Create a ConfigMap from datasets/<name>.env (non‑secrets).
- Create Secrets for VIMEO_* and AWS_* if you use upload/update.
- Create a PVC (k8s/pvc.yaml) for /data to persist frames/outputs across runs.
- Apply a dataset CronJob (e.g., k8s/cronjob-drought.yaml) and set a schedule.
Start here: see k8s/README.md for a step‑by‑step guide and cronjob-template.yaml to copy for other datasets.

GitHub Actions

Reusable workflow: .github/workflows/zyra.yml implements the example pipeline stages (acquire → validate → compose → upload → update). Modify the steps or add your own Zyra commands to fit your needs.
Manual run: Actions → Zyra Video Pipeline → Run workflow with input DATASET_NAME (e.g., drought). Optional inputs: ZYRA_VERBOSITY and ZYRA_SCHEDULER_IMAGE.
Scheduling: use per‑dataset wrappers (below). The reusable workflow itself has no cron.
Runtime container: defaults to ghcr.io/noaa-gsl/zyra-scheduler:latest. Override with workflow input or repo variable ZYRA_SCHEDULER_IMAGE (prefer pinned digest).
Working paths: binds the repo to /app and a workspace data dir to /data inside the job container.
Cache: caches _work/images/$DATASET_NAME keyed by dataset name to speed up runs.
Secrets (optional stages):
- Vimeo upload: VIMEO_CLIENT_ID, VIMEO_CLIENT_SECRET, VIMEO_ACCESS_TOKEN.
- S3 update: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY (and S3_URL in the dataset env).

Per‑dataset schedules (template style)

The main workflow is also reusable via workflow_call. Create tiny wrappers per dataset with their own cron and fixed dataset name.
Example (.github/workflows/dataset-drought.yml):
- on.schedule: '45 4 * * *' (daily 04:45 UTC)
- jobs.run.uses: ./.github/workflows/zyra.yml
- jobs.run.with.DATASET_NAME: drought
- jobs.run.secrets: inherit to pass Vimeo/AWS secrets.
Repeat for each dataset (dataset-<name>.yml), setting the appropriate cron and dataset env stem (must match datasets/<name>.env). The top‑level zyra.yml intentionally has no cron.

Setup

Create a .env from .env.example and set:
- HOST_DATA_PATH: absolute host path for /data bind mount.
- (Optional) ZYRA_SCHEDULER_IMAGE: container image to use for CI/devcontainer. If omitted, the devcontainer uses python:3.11-slim, which is sufficient for local linting and tests.

Image source

The recommended runtime image is published on GitHub Container Registry: ghcr.io/noaa-gsl/zyra-scheduler:latest.

If your environment cannot access GHCR, set ZYRA_SCHEDULER_IMAGE to an accessible tag in your own registry and docker login to that registry. For local development only, you may omit ZYRA_SCHEDULER_IMAGE to fall back to python:3.11-slim.

Example

cp .env.example .env
# Option A: Use GHCR latest (recommended)
export ZYRA_SCHEDULER_IMAGE=ghcr.io/noaa-gsl/zyra-scheduler:latest
docker login ghcr.io

# Option B: Comment/remove ZYRA_SCHEDULER_IMAGE in .env to use python:3.11-slim

docker compose up --build -d
docker compose exec zyra-scheduler bash

Security note: never commit real secrets. Keep .env untracked (see .gitignore) and set credentials via environment or your secrets store.

Local Debugging (Dev Container)

Enter the container:
- docker compose exec zyra-scheduler bash
Load a dataset env (example: fire):
- export DATASET_NAME=fire
- set -a; . datasets/$DATASET_NAME.env; set +a
- Verify: echo "$FTP_HOST $FTP_PATH" and env | grep -E '^(DATASET_ID|FTP_|VIMEO_URI|SINCE_PERIOD|PERIOD_SECONDS|PATTERN|DATE_FORMAT|BASEMAP_IMAGE)='
Prepare working dirs (mirrors CI paths):
- export DATA_ROOT="$PWD/_work"
- export FRAMES_DIR="$DATA_ROOT/images/$DATASET_NAME"
- export OUTPUT_DIR="$DATA_ROOT/output"
- export OUTPUT_PATH="$OUTPUT_DIR/$DATASET_NAME.mp4"
- mkdir -p "$FRAMES_DIR" "$OUTPUT_DIR"
Acquire frames from FTP (example):
- zyra acquire ftp "ftp://${FTP_HOST}${FTP_PATH}" --sync-dir "$FRAMES_DIR" --since-period "$SINCE_PERIOD" --pattern "$PATTERN" --date-format "$DATE_FORMAT"
Validate frames and write metadata:
- zyra transform metadata --frames-dir "$FRAMES_DIR" --pattern "$PATTERN" --datetime-format "$DATE_FORMAT" --period-seconds "$PERIOD_SECONDS" --output "$FRAMES_DIR/metadata/frames-meta.json"
Compose the video:
- With basemap: zyra visualize compose-video --frames "$FRAMES_DIR" --output "$OUTPUT_PATH" --basemap "$BASEMAP_IMAGE"
- No basemap: zyra visualize compose-video --frames "$FRAMES_DIR" --output "$OUTPUT_PATH"
- Verify output: ls -lh "$OUTPUT_PATH" && (cd "$OUTPUT_DIR" && sha256sum "$DATASET_NAME.mp4" > "$DATASET_NAME.mp4.sha256")
Optional: Upload to Vimeo (requires creds):
- zyra decimate vimeo --input "$OUTPUT_PATH" --replace-uri "$VIMEO_URI"
Optional: Update S3 dataset.json (requires AWS creds and S3_URL):
- zyra acquire s3 --url "$S3_URL" --output "$FRAMES_DIR/metadata/dataset.json.bak"
- zyra transform update-dataset-json --input-url "$S3_URL" --dataset-id "$DATASET_ID" --meta "$FRAMES_DIR/metadata/frames-meta.json" --vimeo-uri "$VIMEO_URI" --output - | zyra decimate s3 --read-stdin --url "$S3_URL"

Debug logging

Set ZYRA_VERBOSITY in CI/CD Variables or the Run Pipeline form:
- debug: verbose logging (includes ffmpeg output and detailed steps). Also adds -v to all zyra CLI calls in CI.
- info: default logging (general progress and summaries).
- quiet: errors only (suppresses most logs).

Tips

To speed up first runs, temporarily set SINCE_PERIOD=P30D (or smaller) in the dataset env.
If compose fails or produces no file, list the output dir (ls -la "$OUTPUT_DIR") and confirm frames matched the PATTERN under $FRAMES_DIR.
Basemap must be readable inside the container; supply a full path if it isn’t bundled in the image.
Clean state: rm -rf _work/ to remove cached frames/outputs.

Datasets

See datasets/README.md for the full .env schema, examples, and CI behavior.
Place per‑dataset environment files in datasets/, named <name>.env (e.g., drought.env). These are sourced by CI jobs. Typical keys:
- DATASET_ID: unique SOS identifier for the dataset.
- FTP_HOST and FTP_PATH: remote source for frames.
- VIMEO_URI: Vimeo video resource to replace.
- SINCE_PERIOD, PERIOD_SECONDS: temporal window and cadence.
- PATTERN, DATE_FORMAT: filename and timestamp parsing.

Notes

Prefer immutable tags or pin by digest for reliability in CI. Avoid mutating tags in place.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
charts/zyra-workflows		charts/zyra-workflows
datasets		datasets
k8s		k8s
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Zyra Workflow Template

Overview

Quick Start: Add a Dataset

Scheduling Tips

Current Datasets

Usage & Scheduling

Kubernetes (Rancher) Deployment

GitHub Actions

Setup

Image source

Example

Local Debugging (Dev Container)

Datasets

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

NOAA-GSL/zyra-scheduler

Folders and files

Latest commit

History

Repository files navigation

Zyra Workflow Template

Overview

Quick Start: Add a Dataset

Scheduling Tips

Current Datasets

Usage & Scheduling

Kubernetes (Rancher) Deployment

GitHub Actions

Setup

Image source

Example

Local Debugging (Dev Container)

Datasets

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages