GithubHelp home page GithubHelp logo

Comments (6)

flaviuvadan avatar flaviuvadan commented on September 28, 2024 2

That's a good question, also wanna know about this. I've done one POC which uses several tricks to be able train an random regressor.

btw this is using an next_when implementation.

That example looks awesome @rodrigobaron! You should definitely consider writing a small document, or module header documentation, and submit a PR to add that to the examples folder 🙂

from hera.

flaviuvadan avatar flaviuvadan commented on September 28, 2024 1

I have several steps that write outputs. These could write to volumes or artifacts, and the files need to be shared. What's the recommended way to handle this? Is this handled implicitly by hera?

Hey @tachyus-ryan! Thanks for another question 🙂 You can create a Volume, that is passed to a Resources object, which is then passed to a Task in order to specify a volume. This volume can be shared between the resources of multiple tasks, which makes Argo Workflows provision a single volume that is then shared between tasks! Here's an example with results of Argo Workflows logs:

from hera.v1.resources import Resources
from hera.v1.volume import Volume
from hera.v1.task import Task
from hera.v1.workflow import Workflow
from hera.v1.workflow_service import WorkflowService

def f(i: int):
    import os

    print(f'This is task: {i}')

    if i == 1:
        print('Adding stuff to /mnt/vol to test content in t2')
        with open('/mnt/vol/test.txt', 'w') as _f:
            _f.write('testing content from task 1')
    else:
        print(f"listdir output: {os.listdir('/mnt/vol')}")
        try:
            with open('/mnt/vol/test.txt', 'r') as _f:
                print(_f.read())
        except Exception as e:
            print(f'Hera is missing some features: {e}')

ws = WorkflowService('my-argo-server.com', 'my-argo-token')
w = Workflow('fv-testing')

v = Volume(size='5Gi', mount_path='/mnt/vol')
r = Resources(volume=v)

# you can use a different Resources object for each task but with the same volume
t1 = Task('t1', f, func_params=[{'i': 1}], resources=r)
t2 = Task('t2', f, func_params=[{'i': 2}], resources=r)
t1.next(t2)
w.add_tasks(t1, t2)
w.submit()

Screen Shot 2021-11-04 at 23 32 51

Screen Shot 2021-11-04 at 23 33 16

Screen Shot 2021-11-04 at 23 33 30

This is the YAML template:

- name: fv-testing-2924be2b
  inputs: {}
  outputs: {}
  metadata: {}
  dag:
    tasks:
      - name: t1
        template: t1
        arguments:
          parameters:
            - name: i
              value: '1'
      - name: t2
        template: t2
        arguments:
          parameters:
            - name: i
              value: '2'
        dependencies:
          - t1
  parallelism: 50
- name: t1
  inputs:
    parameters:
      - name: i
        value: '1'
  outputs: {}
  metadata: {}
  script:
    name: t1
    image: 'python:3.7'
    command:
      - python
    resources:
      limits:
        cpu: '4'
        memory: 16Gi
      requests:
        cpu: '4'
        memory: 16Gi
    volumeMounts:
      - name: c76b6c91-4bea-4865-ad3d-b2bfdc7d046d # SAME VOLUME ID AS BELOW
        mountPath: /mnt/vol
    source: |
      import json
      i = json.loads('{{inputs.parameters.i}}')

      import os
      print(f'This is task: {i}')
      if i == 1:
          print('Adding stuff to /mnt/vol to test content in t2')
          with open('/mnt/vol/test.txt', 'w') as _f:
              _f.write('testing content from task 1')
      else:
          print(f"listdir: {os.listdir('/mnt/vol')}")
          with open('/mnt/vol/test.txt', 'r') as _f:
              print(_f.read())
- name: t2
  inputs:
    parameters:
      - name: i
        value: '2'
  outputs: {}
  metadata: {}
  script:
    name: t2
    image: 'python:3.7'
    command:
      - python
    resources:
      limits:
        cpu: '4'
        memory: 16Gi
      requests:
        cpu: '4'
        memory: 16Gi
    volumeMounts:
      - name: c76b6c91-4bea-4865-ad3d-b2bfdc7d046d  # SAME VOLUME ID AS ABOVE
        mountPath: /mnt/vol
    source: |
      import json
      i = json.loads('{{inputs.parameters.i}}')

      import os
      print(f'This is task: {i}')
      if i == 1:
          print('Adding stuff to /mnt/vol to test content in t2')
          with open('/mnt/vol/test.txt', 'w') as _f:
              _f.write('testing content from task 1')
      else:
          print(f"listdir: {os.listdir('/mnt/vol')}")
          with open('/mnt/vol/test.txt', 'r') as _f:
              print(_f.read())

Hope this helps! We should definitely add this as an example so thank you for bringing it up! 🙂

from hera.

flaviuvadan avatar flaviuvadan commented on September 28, 2024 1

Two more questions on volumes:

  • Can a volume be shared in a parallel workflow?

That is not supported currently for dynamically provisioned volumes. I just tested it and the workflow controller will provision workflow independent volumes. I did this by making 2 workflows through Hera but using the same tasks in both of them, with the same resources. They do have the same volume ID but the workflow controller prepends the workflow name to the provisioned PVC. Now, it is possible, however, to do this with a non-dynamic volume. For instance, if you have a disk that's always available, you can use the ExistingVolume object to mount that between workflows in every task if you'd like!

Screen Shot 2021-11-09 at 07 22 47

  • How do you retrieve the results from a volume (or even a Task) for use in subsequent processing?

Typically, you store the results at a consistent path between tasks. The example above stores "results" at a specific path on the volume. There's the possibility of writing a common results path in each task, or have a task print to stdout the path where it stores results on, so the next task can take that as input and use it to retrieve and operate on those results. The alternative is uploading the "results" to a cloud provider bucket.

from hera.

flaviuvadan avatar flaviuvadan commented on September 28, 2024 1

@tachyus-ryan did the message above provide sufficient clarity? I am wondering if we can close this issue 🙂

from hera.

rodrigobaron avatar rodrigobaron commented on September 28, 2024

That's a good question, also wanna know about this. I've done one POC which uses several tricks to be able train an random regressor.

btw this is using an next_when implementation.

from hera.

tachyus-ryan avatar tachyus-ryan commented on September 28, 2024

Two more questions on volumes:

  • Can a volume be shared in a parallel workflow?
  • How do you retrieve the results from a volume (or even a Task) for use in subsequent processing?

from hera.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.