Comments (3)
Hey there, @tomtou-bspace! Great to see you back diving into the intricacies of workflow orchestration. Hope you're ready for another round of troubleshooting and insights. Let's get to it!
Based on the information you've provided, it seems like the issue you're encountering is due to the way outputs are being passed between map_task A
and map_task B
. Here are some suggestions on how you might resolve this issue:
-
Ensure Task Isolation: Make sure that each sub-task in
map_task B
is designed to only access its corresponding output frommap_task A
. This might involve adjusting how outputs are passed between tasks to ensure they are not bundled together in a way that requires downloading all outputs when only one is needed. -
Use ArrayNode for Enhanced Mapping Support: The Flyte documentation mentions
ArrayNode
as an enhancement over regular map tasks, offering wider mapping support and improved subtask execution handling, including cache management and subtask failure handling. By usingArrayNode
from the experimental module, you might gain better control over how outputs are managed and accessed, potentially resolving the issue of unnecessary data downloads.Here is a sample code snippet:
from flytekit import task, workflow from flytekit.experimental import map_task @task def your_task(a: int) -> int: ... @workflow def your_workflow(xs: list[int]) -> list[int]: return map_task(your_task)(a=xs)
-
Review Task Output and Input Handling: Re-examine how outputs from
map_task A
are structured and howmap_task B
accesses these outputs. It might be necessary to refactor the tasks to ensure that each sub-task inmap_task B
explicitly accesses only its corresponding output frommap_task A
, possibly by indexing or other means of direct reference. -
Optimize Data Storage and Access: If the workflow involves storing outputs of
map_task A
in a shared location (e.g., cloud storage), consider structuring the storage so that each output is stored in a separate, directly accessible file or object. This way, sub-tasks inmap_task B
can directly access only the data they need without downloading everything. -
Leverage Task Metadata and Resource Overrides: Although not directly related to the issue of data downloading, optimizing the execution of map tasks through the use of
TaskMetadata
for caching andwith_overrides
for resource allocation (as shown in the provided examples) can contribute to overall workflow efficiency. This might indirectly alleviate the impact of increased runtime due to unnecessary data downloads.
For more information, you can refer to the Flyte documentation on map tasks.
Regarding the structure and type of the output from map_task A
and how it is passed to map_task B
, the output from a task execution is encapsulated in the TaskExecutionClosure
message. The TaskExecutionClosure
contains a field named OutputResult
, which can be one of three types: OutputUri
(string), OutputData
(*core.LiteralMap), or Error
(*core.ExecutionError). The specific type that is used depends on the outcome of the task execution. If the task execution was successful and produced output data, the OutputData
field would be used. If the task execution resulted in an error, the Error
field would be used. The OutputUri
field is deprecated and not used in new implementations. You can find more details in the FlyteIDL protobuf definitions.
I hope these suggestions help you resolve the issue. If you need further assistance, please provide more details about your implementation, and I'll be glad to help.
Sources
- docs/user_guide/advanced_composition/map_tasks.md
- flyteidl/gen/pb-go/flyteidl/admin/task_execution.pb.go
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
from flyte.
Thank you for opening your first issue here! 🛠
from flyte.
@tomtou-bspace , this is a known limitation of the current implementation of map tasks. Unfortunately, we're not going to revisit this decision in the short / medium term as the team is focused on other projects right now.
from flyte.
Related Issues (20)
- [Core feature] Support ftp for FlyteFile
- [BUG] Bug handling Optional type hints under conditionals.
- [Core feature] Support warnings in task outputs HOT 2
- [BUG] map_task / ArrayNode workflows cannot be used with FlyteRemote HOT 1
- [BUG] flyte-core helm chart missing ServiceAccounts and chart has warning messages HOT 3
- [BUG] Task config should be an input into computing the task version hash HOT 5
- [Core feature] `pyflyte run` should support a simple json/yaml as input for all parameters HOT 3
- [Core feature] Flytekit should support using output with `Non-Any` type as the input with `Any` type. HOT 5
- [BUG] Retriability of timeouts appears inconsistent HOT 2
- [BUG] `PanderaTransformer::to_python_value()` seems to be returning an incorrect type HOT 2
- [BUG] flytectl upgrade is broken after moving to the monorepo HOT 2
- [BUG] Pin fsspec<2024.5.0 HOT 2
- [BUG] Namespace creation fails with default pod template HOT 5
- [BUG] flytectl demo start fails with "Error: malformed version" HOT 2
- [Docs] Clarify PodTemplate restrictions and behavior HOT 1
- [Docs] Prevent using mutable default arguments in flytesnacks HOT 1
- [Core feature] Replace `os.path` with `pathlib` for flytekit HOT 1
- Obfuscate sensitive data in TaskConfig HOT 4
- [BUG] Fix non thread safe token cache behavior HOT 1
- [Core feature] Flyteadmin SMPT email publisher HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flyte.