Comments (6)
That's a good question, also wanna know about this. I've done one POC which uses several tricks to be able train an random regressor.
btw this is using an next_when implementation.
That example looks awesome @rodrigobaron! You should definitely consider writing a small document, or module header documentation, and submit a PR to add that to the examples folder 🙂
from hera.
I have several steps that write outputs. These could write to volumes or artifacts, and the files need to be shared. What's the recommended way to handle this? Is this handled implicitly by hera?
Hey @tachyus-ryan! Thanks for another question 🙂 You can create a Volume, that is passed to a Resources object, which is then passed to a Task in order to specify a volume. This volume can be shared between the resources of multiple tasks, which makes Argo Workflows provision a single volume that is then shared between tasks! Here's an example with results of Argo Workflows logs:
from hera.v1.resources import Resources
from hera.v1.volume import Volume
from hera.v1.task import Task
from hera.v1.workflow import Workflow
from hera.v1.workflow_service import WorkflowService
def f(i: int):
import os
print(f'This is task: {i}')
if i == 1:
print('Adding stuff to /mnt/vol to test content in t2')
with open('/mnt/vol/test.txt', 'w') as _f:
_f.write('testing content from task 1')
else:
print(f"listdir output: {os.listdir('/mnt/vol')}")
try:
with open('/mnt/vol/test.txt', 'r') as _f:
print(_f.read())
except Exception as e:
print(f'Hera is missing some features: {e}')
ws = WorkflowService('my-argo-server.com', 'my-argo-token')
w = Workflow('fv-testing')
v = Volume(size='5Gi', mount_path='/mnt/vol')
r = Resources(volume=v)
# you can use a different Resources object for each task but with the same volume
t1 = Task('t1', f, func_params=[{'i': 1}], resources=r)
t2 = Task('t2', f, func_params=[{'i': 2}], resources=r)
t1.next(t2)
w.add_tasks(t1, t2)
w.submit()
This is the YAML template:
- name: fv-testing-2924be2b
inputs: {}
outputs: {}
metadata: {}
dag:
tasks:
- name: t1
template: t1
arguments:
parameters:
- name: i
value: '1'
- name: t2
template: t2
arguments:
parameters:
- name: i
value: '2'
dependencies:
- t1
parallelism: 50
- name: t1
inputs:
parameters:
- name: i
value: '1'
outputs: {}
metadata: {}
script:
name: t1
image: 'python:3.7'
command:
- python
resources:
limits:
cpu: '4'
memory: 16Gi
requests:
cpu: '4'
memory: 16Gi
volumeMounts:
- name: c76b6c91-4bea-4865-ad3d-b2bfdc7d046d # SAME VOLUME ID AS BELOW
mountPath: /mnt/vol
source: |
import json
i = json.loads('{{inputs.parameters.i}}')
import os
print(f'This is task: {i}')
if i == 1:
print('Adding stuff to /mnt/vol to test content in t2')
with open('/mnt/vol/test.txt', 'w') as _f:
_f.write('testing content from task 1')
else:
print(f"listdir: {os.listdir('/mnt/vol')}")
with open('/mnt/vol/test.txt', 'r') as _f:
print(_f.read())
- name: t2
inputs:
parameters:
- name: i
value: '2'
outputs: {}
metadata: {}
script:
name: t2
image: 'python:3.7'
command:
- python
resources:
limits:
cpu: '4'
memory: 16Gi
requests:
cpu: '4'
memory: 16Gi
volumeMounts:
- name: c76b6c91-4bea-4865-ad3d-b2bfdc7d046d # SAME VOLUME ID AS ABOVE
mountPath: /mnt/vol
source: |
import json
i = json.loads('{{inputs.parameters.i}}')
import os
print(f'This is task: {i}')
if i == 1:
print('Adding stuff to /mnt/vol to test content in t2')
with open('/mnt/vol/test.txt', 'w') as _f:
_f.write('testing content from task 1')
else:
print(f"listdir: {os.listdir('/mnt/vol')}")
with open('/mnt/vol/test.txt', 'r') as _f:
print(_f.read())
Hope this helps! We should definitely add this as an example so thank you for bringing it up! 🙂
from hera.
Two more questions on volumes:
- Can a volume be shared in a parallel workflow?
That is not supported currently for dynamically provisioned volumes. I just tested it and the workflow controller will provision workflow independent volumes. I did this by making 2 workflows through Hera but using the same tasks in both of them, with the same resources. They do have the same volume ID but the workflow controller prepends the workflow name to the provisioned PVC. Now, it is possible, however, to do this with a non-dynamic volume. For instance, if you have a disk that's always available, you can use the ExistingVolume object to mount that between workflows in every task if you'd like!
- How do you retrieve the results from a volume (or even a Task) for use in subsequent processing?
Typically, you store the results at a consistent path between tasks. The example above stores "results" at a specific path on the volume. There's the possibility of writing a common results path in each task, or have a task print to stdout the path where it stores results on, so the next task can take that as input and use it to retrieve and operate on those results. The alternative is uploading the "results" to a cloud provider bucket.
from hera.
@tachyus-ryan did the message above provide sufficient clarity? I am wondering if we can close this issue 🙂
from hera.
That's a good question, also wanna know about this. I've done one POC which uses several tricks to be able train an random regressor.
btw this is using an next_when implementation.
from hera.
Two more questions on volumes:
- Can a volume be shared in a parallel workflow?
- How do you retrieve the results from a volume (or even a Task) for use in subsequent processing?
from hera.
Related Issues (20)
- TemplateNameConflict error on hera==5.16.1 HOT 1
- Issue: Incorrect `env` Structure in `initContainers` Generated by Hera HOT 2
- CLI does not correctly generate script templates that use the runner
- New dag/steps decorators issues/inconsistencies
- Add argo linting for examples
- Add runner debug mode for extra logging HOT 1
- Artifact name should be optional for runner inputs
- Custom serialization for non-user types and non-serializable types for Hera runner HOT 1
- Cannot run script with tuple annotation in return type HOT 1
- Arguments do not build correctly when using a dict
- DAGs/Steps cannot be added to TemplateSets
- Validate Input/Output class for new decorator functions at build time
- Locally-runnable DAG breaks down for complex examples
- Support Literal types in script runner
- Mypy call-arg errors when using script decorator HOT 7
- Mypy misc error using new Workflow/TemplateSet decorators
- Difficult to use script functions in fully type-safe code
- Fields silently ignored in Pydantic I/O types HOT 1
- Cannot output a non-serialisable Artifact using Pydantic IO HOT 2
- Allow using Pydantic I/O in with-based code HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hera.