GithubHelp home page GithubHelp logo

Comments (3)

dosubot avatar dosubot commented on May 31, 2024 5

Hey @Tom-Newton, great to see you diving deep into Flyte's propeller again! 🚀

Thank you for bringing up this important issue related to the distribution of workflows across shards in Flyte. Your observation about the potential for a single shard to become overloaded due to the current shard-key inheritance behavior is valid, and your proposal to generate independent shard-keys for each subworkflow launched by the admin-launcher could indeed help to alleviate this issue.

To implement this, you would need to modify the shard-key generation logic in the Flyte system. Currently, the shard-key is generated based on a range-bounded hash over various components of the FlyteWorkflow metadata, with the keyspace range defined as [0,32). This deterministic approach ensures that each FlyteWorkflow evaluation is handled by a specific FlytePropeller instance, based on the project, domain, and shard-key labels.

However, as you've pointed out, this can lead to an uneven distribution of workflows across shards, especially in cases where a large workflow spawns many subworkflows. To achieve a more uniform distribution, the shard-key generation logic could be updated to include a dynamic component, such as a UUID or a timestamp, in addition to the existing metadata components. This would ensure that each new workflow gets a unique shard-key, leading to a more even distribution across shards.

Here's a conceptual approach to implementing this:

  1. Modify the Shard-Key Generation Logic: Update the logic that generates the shard-key for each FlyteWorkflow to include a dynamic component. For example, you could concatenate the current hash components with a UUID or the current timestamp at the moment of workflow creation.

  2. Update the FlytePropeller Manager Configuration: Ensure that the FlytePropeller Manager is configured to handle the updated range of shard-keys. Since the dynamic component could potentially increase the variability of shard-keys, you might need to adjust the shard-count or the keyspace range to accommodate the new distribution pattern.

  3. Test the Distribution: After implementing the changes, it's crucial to test the new shard-key generation mechanism to ensure that it achieves a more uniform distribution of workflows across shards. You could simulate a large number of workflow creations and analyze the resulting shard-key distribution to verify its uniformity.

Here is a pseudo-code example illustrating how you might implement the dynamic shard-key generation:

import (
    "crypto/sha256"
    "encoding/hex"
    "time"
)

func generateShardKey(workflowMetadata string) string {
    // Generate a dynamic component, e.g., a timestamp
    timestamp := time.Now().UnixNano()
    dynamicComponent := fmt.Sprintf("%d", timestamp)
    
    // Concatenate the workflow metadata with the dynamic component
    combinedKey := workflowMetadata + dynamicComponent
    
    // Hash the combined key to generate the shard-key
    hasher := sha256.New()
    hasher.Write([]byte(combinedKey))
    shardKey := hex.EncodeToString(hasher.Sum(nil))
    
    // Optionally, you could further process the shardKey to fit the desired keyspace range
    // For simplicity, this step is omitted
    
    return shardKey
}

This example combines the workflow metadata with a dynamic component (a timestamp in this case) to generate a unique shard-key for each workflow. You would need to integrate this logic into the part of the Flyte system responsible for initializing FlyteWorkflow CRDs, ensuring that each new workflow gets a unique shard-key based on this mechanism.

Please note that this is a conceptual approach and might require adjustments based on the specific requirements and constraints of your Flyte deployment. Also, keep in mind that any changes to the shard-key generation logic should be thoroughly tested to ensure they don't introduce new issues or regressions.

As for your offer to contribute a Pull Request, that's greatly appreciated! However, as a bot, I'm unable to create or manage Pull Requests. I recommend discussing your proposal and the potential changes with the Flyte maintainers directly to ensure they align with the project's goals and guidelines. They can provide valuable feedback and guidance on how to best implement and test these changes.

I hope this information is helpful and provides a good starting point for addressing this issue. If you have any further questions or need additional clarification, please don't hesitate to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

from flyte.

Tom-Newton avatar Tom-Newton commented on May 31, 2024 2

I think the bot hasn't really understood #5174 should resolve it.

from flyte.

kumare3 avatar kumare3 commented on May 31, 2024

I agree this should use a different shard key

from flyte.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.