Comments (3)
failure scenarios:
- abort as soon as one job fails : easy - just don't catch
xpr.get_results()
exception - ignore failed jobs, just make sure configs are
None
or labelled inAtoms.info
: easy - catch allxpr.get_results()
exceptions, label configs, maybe warn about them - deal separately with timed out or crashed jobs.
- loop over all jobs with timeout=0. Report timed out jobs.
- if any timed out, resubmit those hoping that
RemoteInfo
has been modified - loop over all jobs with desired timeout.
two flags in RemoteInfo
ignore_job_failures
, defaultFalse
- just like
skip_failures
now, but maybe note reason for failure, or xpr.id, in Atoms.info
- just like
resubmit_killed_jobs
, defaultFalse
.- make ExPyRe raise custom
expyre.Timeout
exception - If
resubmit_killed_jobs
:- loop over all jobs with timeout 0, note ones that raise an exception other than
expyre.Timeout
- resubmit jobs that died
- loop over again
- loop over all jobs with timeout 0, note ones that raise an exception other than
- make ExPyRe raise custom
from workflow.
May be nicer if libAtoms/ExPyRe#23 is done
from workflow.
@gelzinyte do we want to automatically resubmit (i.e. when setting RemoteInfo.resubmit_killed_jobs = True
) only jobs that died without creating a succeeded or failed file (i.e. killed by the queuing system), which then raise an ExPyReJobDiedError
, or all jobs that raised any exception, whether it was raised by the remote python process and pickled so it can be re-raised by expyre or an ExPyReJobDiedError
raised directly by expire?
The former is easier because it means that I don't have to clean up the job directory.
from workflow.
Related Issues (20)
- duplicate configs in output
- Successful MD job not transferred to local machine HOT 21
- unexpectedly changed rng state confuses detection of identical jobs HOT 1
- failed calculation doesn't respect user's choice of keep_files
- _ConfigSet_loc geting increasingly nested for no reason HOT 3
- Error in MD run (related to rng?) HOT 3
- Errorr encountered when trying to submit remote jobs. HOT 4
- conflict in compatible ASE version? HOT 31
- Coupling NEB with wfl. HOT 1
- error in configset HOT 2
- Renaming the package HOT 2
- `config_type` get concatenated over different operations HOT 8
- Error related to profile HOT 2
- Update Calculators to work with ASE v3.23 Profiles HOT 15
- RDKit is dependency HOT 4
- reconsider complex env-var dict mechanism for associating autopara info with specific function calls
- "Iterative GAP fitting" example using ase Vasp calculator? HOT 1
- Attaching user defined logfile to MD wrapper. HOT 10
- MACE descriptor HOT 9
- copying back generated logfile HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from workflow.