GithubHelp home page GithubHelp logo

Comments (3)

bernstei avatar bernstei commented on August 10, 2024

failure scenarios:

  • abort as soon as one job fails : easy - just don't catch xpr.get_results() exception
  • ignore failed jobs, just make sure configs are None or labelled in Atoms.info: easy - catch all xpr.get_results() exceptions, label configs, maybe warn about them
  • deal separately with timed out or crashed jobs.
    • loop over all jobs with timeout=0. Report timed out jobs.
    • if any timed out, resubmit those hoping that RemoteInfo has been modified
    • loop over all jobs with desired timeout.

two flags in RemoteInfo

  • ignore_job_failures, default False
    • just like skip_failures now, but maybe note reason for failure, or xpr.id, in Atoms.info
  • resubmit_killed_jobs, default False.
    • make ExPyRe raise custom expyre.Timeout exception
    • If resubmit_killed_jobs:
      • loop over all jobs with timeout 0, note ones that raise an exception other than expyre.Timeout
      • resubmit jobs that died
      • loop over again

from workflow.

bernstei avatar bernstei commented on August 10, 2024

May be nicer if libAtoms/ExPyRe#23 is done

from workflow.

bernstei avatar bernstei commented on August 10, 2024

@gelzinyte do we want to automatically resubmit (i.e. when setting RemoteInfo.resubmit_killed_jobs = True) only jobs that died without creating a succeeded or failed file (i.e. killed by the queuing system), which then raise an ExPyReJobDiedError, or all jobs that raised any exception, whether it was raised by the remote python process and pickled so it can be re-raised by expyre or an ExPyReJobDiedError raised directly by expire?

The former is easier because it means that I don't have to clean up the job directory.

from workflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.