Question

How can I stop an Argo workflow when an error occurs in a step?

Answer and Explanation

To stop an Argo workflow when an error occurs in a step, you can leverage Argo's built-in features for error handling and workflow control. Here's how you can achieve this:

1. Using `failFast` in Workflow Definition:

- The `failFast` parameter, when set to `true` at the workflow level, will cause the entire workflow to terminate immediately upon the failure of any step. This is the simplest way to stop a workflow on the first error.

- Example in YAML:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: error-handling-
spec:
  failFast: true
  entrypoint: main
  templates:
  - name: main
    steps:
    - - name: step1
       template: task1
    - - name: step2
       template: task2
  - name: task1
    container:
      image: alpine:latest
      command: ["sh", "-c"]
      args: ["exit 1"]
  - name: task2
    container:
      image: alpine:latest
      command: ["sh", "-c"]
      args: ["echo 'This step will not run'"]

- In this example, if `task1` fails (exits with a non-zero code), the entire workflow will stop, and `task2` will not be executed.

2. Using `onExit` Handlers:

- You can define an `onExit` template that will be executed when a workflow or a step completes, regardless of its success or failure. This can be used to perform cleanup or logging, but it doesn't directly stop the workflow on error.

- Example in YAML:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: error-handling-
spec:
  entrypoint: main
  onExit: cleanup
  templates:
  - name: main
    steps:
    - - name: step1
       template: task1
    - - name: step2
       template: task2
  - name: task1
    container:
      image: alpine:latest
      command: ["sh", "-c"]
      args: ["exit 1"]
  - name: task2
    container:
      image: alpine:latest
      command: ["sh", "-c"]
      args: ["echo 'This step will not run'"]
  - name: cleanup
    container:
      image: alpine:latest
      command: ["sh", "-c"]
      args: ["echo 'Cleanup task executed'"]

- In this case, `cleanup` will run after the workflow completes, regardless of whether `task1` or `task2` failed. However, it doesn't stop the workflow mid-execution.

3. Using `when` Conditions:

- You can use `when` conditions to conditionally execute steps based on the status of previous steps. However, this doesn't stop the workflow; it just skips steps based on conditions.

- Example in YAML:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: error-handling-
spec:
  entrypoint: main
  templates:
  - name: main
    steps:
    - - name: step1
       template: task1
    - - name: step2
       template: task2
       when: " == Succeeded"
  - name: task1
    container:
      image: alpine:latest
      command: ["sh", "-c"]
      args: ["exit 1"]
  - name: task2
    container:
      image: alpine:latest
      command: ["sh", "-c"]
      args: ["echo 'This step will not run'"]

- In this example, `step2` will only run if `step1` succeeds. If `step1` fails, `step2` will be skipped, but the workflow will still complete.

Recommendation:

- For most cases, using `failFast: true` at the workflow level is the most straightforward way to stop an Argo workflow immediately when an error occurs in any step. This ensures that you don't waste resources on subsequent steps that depend on the failed step.

- If you need more complex error handling, consider using `onExit` for cleanup and logging, but remember that it won't stop the workflow mid-execution. `when` conditions are useful for conditional execution but not for immediate termination on error.

More questions