How do you write outputs to volumes and share outputs from one step to another? about hera HOT 6 CLOSED

argoproj-labs commented on September 28, 2024

How do you write outputs to volumes and share outputs from one step to another?

from hera.

Comments (6)

flaviuvadan commented on September 28, 2024 2

That's a good question, also wanna know about this. I've done one POC which uses several tricks to be able train an random regressor.

btw this is using an next_when implementation.

That example looks awesome @rodrigobaron! You should definitely consider writing a small document, or module header documentation, and submit a PR to add that to the examples folder 🙂

from hera.

flaviuvadan commented on September 28, 2024 1

I have several steps that write outputs. These could write to volumes or artifacts, and the files need to be shared. What's the recommended way to handle this? Is this handled implicitly by hera?

Hey @tachyus-ryan! Thanks for another question 🙂 You can create a Volume, that is passed to a Resources object, which is then passed to a Task in order to specify a volume. This volume can be shared between the resources of multiple tasks, which makes Argo Workflows provision a single volume that is then shared between tasks! Here's an example with results of Argo Workflows logs:

from hera.v1.resources import Resources
from hera.v1.volume import Volume
from hera.v1.task import Task
from hera.v1.workflow import Workflow
from hera.v1.workflow_service import WorkflowService

def f(i: int):
    import os

    print(f'This is task: {i}')

    if i == 1:
        print('Adding stuff to /mnt/vol to test content in t2')
        with open('/mnt/vol/test.txt', 'w') as _f:
            _f.write('testing content from task 1')
    else:
        print(f"listdir output: {os.listdir('/mnt/vol')}")
        try:
            with open('/mnt/vol/test.txt', 'r') as _f:
                print(_f.read())
        except Exception as e:
            print(f'Hera is missing some features: {e}')

ws = WorkflowService('my-argo-server.com', 'my-argo-token')
w = Workflow('fv-testing')

v = Volume(size='5Gi', mount_path='/mnt/vol')
r = Resources(volume=v)

# you can use a different Resources object for each task but with the same volume
t1 = Task('t1', f, func_params=[{'i': 1}], resources=r)
t2 = Task('t2', f, func_params=[{'i': 2}], resources=r)
t1.next(t2)
w.add_tasks(t1, t2)
w.submit()

This is the YAML template:

- name: fv-testing-2924be2b
  inputs: {}
  outputs: {}
  metadata: {}
  dag:
    tasks:
      - name: t1
        template: t1
        arguments:
          parameters:
            - name: i
              value: '1'
      - name: t2
        template: t2
        arguments:
          parameters:
            - name: i
              value: '2'
        dependencies:
          - t1
  parallelism: 50
- name: t1
  inputs:
    parameters:
      - name: i
        value: '1'
  outputs: {}
  metadata: {}
  script:
    name: t1
    image: 'python:3.7'
    command:
      - python
    resources:
      limits:
        cpu: '4'
        memory: 16Gi
      requests:
        cpu: '4'
        memory: 16Gi
    volumeMounts:
      - name: c76b6c91-4bea-4865-ad3d-b2bfdc7d046d # SAME VOLUME ID AS BELOW
        mountPath: /mnt/vol
    source: |
      import json
      i = json.loads('{{inputs.parameters.i}}')

      import os
      print(f'This is task: {i}')
      if i == 1:
          print('Adding stuff to /mnt/vol to test content in t2')
          with open('/mnt/vol/test.txt', 'w') as _f:
              _f.write('testing content from task 1')
      else:
          print(f"listdir: {os.listdir('/mnt/vol')}")
          with open('/mnt/vol/test.txt', 'r') as _f:
              print(_f.read())
- name: t2
  inputs:
    parameters:
      - name: i
        value: '2'
  outputs: {}
  metadata: {}
  script:
    name: t2
    image: 'python:3.7'
    command:
      - python
    resources:
      limits:
        cpu: '4'
        memory: 16Gi
      requests:
        cpu: '4'
        memory: 16Gi
    volumeMounts:
      - name: c76b6c91-4bea-4865-ad3d-b2bfdc7d046d  # SAME VOLUME ID AS ABOVE
        mountPath: /mnt/vol
    source: |
      import json
      i = json.loads('{{inputs.parameters.i}}')

      import os
      print(f'This is task: {i}')
      if i == 1:
          print('Adding stuff to /mnt/vol to test content in t2')
          with open('/mnt/vol/test.txt', 'w') as _f:
              _f.write('testing content from task 1')
      else:
          print(f"listdir: {os.listdir('/mnt/vol')}")
          with open('/mnt/vol/test.txt', 'r') as _f:
              print(_f.read())

Hope this helps! We should definitely add this as an example so thank you for bringing it up! 🙂

from hera.

flaviuvadan commented on September 28, 2024 1

Two more questions on volumes:

Can a volume be shared in a parallel workflow?

That is not supported currently for dynamically provisioned volumes. I just tested it and the workflow controller will provision workflow independent volumes. I did this by making 2 workflows through Hera but using the same tasks in both of them, with the same resources. They do have the same volume ID but the workflow controller prepends the workflow name to the provisioned PVC. Now, it is possible, however, to do this with a non-dynamic volume. For instance, if you have a disk that's always available, you can use the ExistingVolume object to mount that between workflows in every task if you'd like!

How do you retrieve the results from a volume (or even a Task) for use in subsequent processing?

Typically, you store the results at a consistent path between tasks. The example above stores "results" at a specific path on the volume. There's the possibility of writing a common results path in each task, or have a task print to stdout the path where it stores results on, so the next task can take that as input and use it to retrieve and operate on those results. The alternative is uploading the "results" to a cloud provider bucket.

from hera.

flaviuvadan commented on September 28, 2024 1

@tachyus-ryan did the message above provide sufficient clarity? I am wondering if we can close this issue 🙂

from hera.

rodrigobaron commented on September 28, 2024

That's a good question, also wanna know about this. I've done one POC which uses several tricks to be able train an random regressor.

btw this is using an next_when implementation.

from hera.

tachyus-ryan commented on September 28, 2024

Two more questions on volumes:

Can a volume be shared in a parallel workflow?
How do you retrieve the results from a volume (or even a Task) for use in subsequent processing?

from hera.

How do you write outputs to volumes and share outputs from one step to another? about hera HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs