GithubHelp home page GithubHelp logo

workflow's People

Contributors

bapatist avatar bernstei avatar felixrccs avatar gabor1 avatar gelzinyte avatar larsschaaf avatar simonwengert avatar xuhan-96 avatar zausinator avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

workflow's Issues

better handling of failed function calls in a big list of jobs

Right now if a DFT job fails, recovering is ugly. Running the operation again will note the failure during the result gather stage, but it will (by default at least) exit immediately. It will therefore not notice any other failures, and even if that failure is cleaned up manually, rerunning the workflow will resubmit that job, then get to the gather stage, and will be waiting for that job to finish before it even notices the later failures.

It would be better if the pipeline multiple remote jobs just kept track of all the failures, and reported all of them at the end of the loop. The only downside is that it will not be able to raise the same exception (necessarily), because different failures might have occurred due to different exceptions.

Using pandas

For brevity's and clarity's sake I like using pandas in my codes. I feel like this would be a useful addition to the modify_database scripts, as it would permit easy and straightforward filtering. Another usecase might be the plotting functions, although I have minimal experience with them.

I can post example scripts on Monday, I am out of town at the moment.

refactor gap_multistage to remove call to database modify and calculation of ref error

Call to arbitrary database modify can happen outside of gap_multistage, since only overall rescaling of sigmas for different stages is needed. Remove that functionality, and do the modification from gap rss iter fit before calling gap multistage fit.

Remove call to calculate ref error inside gap multistage fit, and instead call it from gap rss iter fit after calling fit.

PR is being prepared.

rename `chunksize` and `job_chunksize`

Some ideas:

job_chunksize -> items_per_job

chunksize ->

  • serial_chunk_size
  • serial_chunksize
  • num_serial
  • num_serial_items
  • num_serial_ops
  • num_operations_in_serial
  • items_in_pool

`setup.py` doesn't install submodules.

All modules need to be listed in setup.py, not just packages=["wfl"]. Same goes for UniversalSOAP and ExPyRe. Would packages=setuptools.find_packages(exclude=["test"]) do what you wanted to do in cc4aceb?

Though I've tried that and no modules get listed still (but wfl works):

In [1]: import wfl

In [2]: dir(wfl)
Out[2]:
['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__']

Structure for autoparallelisation arguments

Collect all arguments for iterable_loop, so these could be more easily passed to functions using iterable_loop and changed without modifying the signature of the calling function.

pytest shutil('orca') is a bad idea

On some linux machines orca is a system executable that has nothing to do with the DFT code. Maybe all the calculator tests should require that the desired executable be in a particular env var, similar to ASE_{CALCULATOR}_COMMAND

conversion from dict to gap command string in gap_fit

The docstring for dict_to_gap_fit_string claims that dicts are turned into : separated strings (e.g. what you'd need for "config_type_sigma"). However, it doesn't seem to actually do this, there's no evidence of code that does this (it's counting on ASE's key_val_dict_to_str, and that doesn't seem to do it), and the simple gap fit pytest doesn't actually test this functionality.

Refactor calculators

Rewrite interfaces to dft calculators (e.g. castep.evaluate_op) into classes inheriting from ASE calculator. Workflows may then be calculator agnostic by using the generic calculator. And choose different ways of parallelise the calculations, for example generic.run directly or generic.run_op within a function that is then parallelised with iterable_loop. An example is in generate_configs.vib.Vibrations.derive_normal_mode_info()

get rid of wfl.parallelization

wfl.parallelization has been entirely (I think) supplanted by wfl.pipeline. A few things (maybe only wfl.generate_configs.collision?) still use it, and those should be redone.

autoparallelize gather results other than just list(Atoms)

Currently autoparallelized functions can only gather results from individual operations if they are Atoms, so they can be saved into a ConfigSet_in. It would be helpful, and probably straightforward, to make it possible to gather arbitrary results, and just pack them into a list (i.e. only in memory, without the ConfigSet disk/ABCD backing store idea, although you could also save and/or pickle them).

refactor atoms_and_dimers.py::prepare

separate isolated atoms and dimers?
remove bad default for bond lengths, either no default or a sensible one
rename max_cutoff -> min_box
rename _scale -> _range or vice versa

deprecate things that are not being done properly

Deprecate bits of code that don't do things the workflow way, or just don't use it. They can be updated to do it the right way (so they can be wrapped by iterable_loop (soon to be autoparallelize), or moved to some user-specific place.

reactions_processing/
plotting/
some things in generate, e.g.
neb, ts, vib, radicals

some things in cli.py

Potential bug: castep geometry optimisations

I'm reporting this in regards to doing CASTEP geometry optimisations thorough workflow, and having issues with the output structures. Specifically, it seems the output structures do not fully correspond to the last structures in castep.castep file, as they should. While the forces do correspond, the lattice parameters of the workflow output structures are identical to those of the starting structures, which I suspect are used to hash the structures.

I mention that I am using the workflow on the old repository, but I checked the log here and cannot see that any changes have been made, so I think the problem may still stand. I could follow up once I move to the new workflow.
This may be actually more of an ASE feature that needs implementing, but I thought I'd flag this up just in case.

ASE EMT calculator for cheap testing

Hi,
While working with wfl, I implemented a simple ase.emt calculator for fast computation, which helped me for developing. The changes include an example python script for an iterative training procedure. In case you're interested, I'd happily create a pull request.

Files affected are:

  • wfl/calculators/dft.py

New files are:

  • wfl/calculators/emt.py
  • examples/EMT_iterative_gap_fit/gap_fit.py
  • examples/EMT_iterative_gap_fit/EMT_atoms.xyz
  • examples/EMT_iterative_gap_fit/init_md.traj
  • examples/EMT_iterative_gap_fit/multistage_gap_params.json

Update ORCA calculator

@stenczelt if I understand correctly, the current ORCA calculator is outdated? If so, what's the latest version and would it be possible to merge it into workflow?

setting WFL_AUTOPARA_NPOOL

When I have been doing single FHI-aims calculations across multiple nodes (AIMS requires a lot of memory), I've been specifying resources in the remoteinfo.json like this:

"resources": { "n" : [4, "nodes"], "max_time": "24h", "ncores_per_task" : 1, "partitions": "highmem" }

Is this the right way to run this calculation? When I do this and manually set OMP_NUM_THREADS and WFL_AUTOPARA_NPOOL to 1 at the start of the job, I get 512 mpi tasks as expected (4 nodes with 128 cores each) and FHI-aims runs correctly. However, by default the code sets WFL_AUTOPARA_NPOOL equal to the number of tasks per node, which is 128. This then ends up parallelising through the chunk size which I do not want. Before I manually set the variables, the FHI-aims output was also telling me that OMP_NUM_THREADS was not 1.

Is this there a right way to do this, using more specification in the resources dictionary? I can't seem to follow how this part of the code passes the information around very easily.

refactor fitting routines

move ace_fit.jl into ACE1pack.

move gap_multistage outside of workflow, probably in QUIP

  • move checking for completed work into separate function, to be called explicitly from workflow
  • move actual fitting work into separate function
  • get rid of input/output file massaging, restart is broken anyway. Just default to output_files=rundir
  • get rid of GAP-specific env var setting based on EXPYRE vars, people find it unexpected - make user do it manually in remote_info
  • remaining code is essentially independent of fit() - turn into decorator?
  • move descriptor_heuristics.py into QUIP as well

same for gap_simple.py, maybe also ace.py

Extend documentation

To do

  • Populate empty pages
  • Revise current content for outdated material
  • Collect minimal examples of specific tasks/operations
  • Clean up READMEs

@bernstei what else?

Docathon Task Ideas

Some ideas to get started with

Pre docathon

  • update function/variable names (especially in comments/docs examples) both workflow and expyre [NB]
  • check if @iloop works cleanly with sphinx [NB]
  • Fix docstring rendering online [EG]
  • Get rid of outdated .yml and split main yml; change trigger for building docs [NB]
  • Getting started page-stub [EG]

June 29 docathon

For some other time

  • test the tutorials/code snippets
  • update cli:
    • make sure command line (click) function explanations render correctly
    • remove (clearly) outdated entries

Reduce hardcoded values currently in wfl.fit.modify_database.*

Currently values such as certain default_{property}_sigma values are hardcoded in files such as in gap_rss_set_config_sigmas_from_convex_hull.modify We could allow the users more flexibility by replacing these hardcoded values with *kwarg arguments, with the previously hardcoded values being the default options.

wider support for nested datastructure in ConfigSet

Support nested structure (list of lists) in ConfigSet under a wider range of circumstances.

In memory: already supported

Files: already supported for multi-file with group iterator returning one group per file. Would need to be extended to single file with special tags, num_of_group and num_in_group, probably with additional tag last_in_group.

ABCD: will need tags, like single file suggestion

Bug: Potential issue when setting sigmas in gap_rss_set_config_sigmas_from_convex_hull.py

I have a gap-rss run started on 29th Nov 2021, with code version git https://github.com/gabor1/workflow/commit/968765772e20d9bab4e8ae21633a7a90bdb08e96-dirty.
This is related to the sigma-setting method in wfl/fit/modify_database/gap_rss_set_config_sigmas_from_convex_hull.py, but I noticed this method didn't change since, so the issues should still stand.

In my gap-rss run, I noticed in the training set for the final iteration that there are structures with energy_sigma = 0.001 (and 18 atoms say).

However, when I recently tried to fit a GAP to my own structures, I found reasons to believe that the energy_sigma and virial_sigma values when set per config, are in fact not per atom, but are taken as a total for the atoms in that structure. So for example the energy_sigma = 0.001 for 18 atoms above corresponds to 0.0000(5) per atom.

In the attached picture I show a case where I set the sigmas to around 1meV/atom for each of the structures, but the resulting errors/atom are orders of magnitude smaller; hence the suspicion.image

This is corroborated by the bad force component fits from the gap-rss (note if github theme is dark one cannot read the labels):
image
And the table of errors, where the force data has much larger errors in ratio to their sigma, than energy or virials do.
Screenshot 2022-02-22 at 18 00 37

I spoke with Gabor and he concurs that when set per config, energy_sigma, virial_sigma and hessian_sigma should be scaled by the number of atoms. While force_sigma should not be scaled.

Castep ft iterable_loop don't skip failed calculations

Just making a note that the mechanism of "parallelise Castep single point energy/forces across multiple configs and if one of them fails return no energy/forces results for that config, but don't fail all of them" seems to be broken for Castep. Relevant to @imagdau

phonon force-constant perturbed configs

Generate perturbed configs for force constant calculations, probably using phonopy and/or phono3py. A version is being written, will be contributed as a PR when ready.

Normalize per-atom (local) descriptor in descriptors/quippy.py

Hi there I am new to wfl (though working with GAP/SOAP for quite some while now).
I was wondering why the per-atom descriptor generated in calc_autopara_wrappable (quippy.py) is not normalized even with normalize=True.
Naturally I included the following in my personal clone:

if local:
...
    if combined_descs.shape[1] > 0:

and then include

        if normalize:
            for k in combined_descs:
                if np.linalg.norm(k) > 0.1:
                    k /= np.linalg.norm(k)

befor then adding the combined descriptor to the ASE.atoms object

        if Zcenter is None:
            use_key = key
        else:
            use_key = f'{key}_Z_{Zcenter}'
        at.new_array(use_key, combined_descs)

Maybe this could/should be included in the main branch as well. Or maybe I do overlook some critical thing here. As mentioned still new on this project.
Best, Jakob

generalize convex hull

make the convex hull based selection able to operate in a space other than _x and _V.

refactor cli.py into separate source files

cli.py is too long, and also includes some things that probably don't belong in it at all. Refactor it to make the source files more modular, and deprecate or get rid of unneeded bits.

select configs with CUR of per-atom descriptors

Add functions to wfl.select_configs.by_descriptor to select configs based on per-atom descriptors, rather than global (per-config) ones. Requires version of prep_descs_and_exclude that extracts per-atom descriptors (from Atoms.arrays), and version of CUR_conf_global that figures out indexing from CUR-selected per-atom descriptors to corresponding configs.

replace `iterable_loop` with functools.partial-based decorator

Replace cumbersome

def run(inputs, outputs, arg1, ...):
    return iterable_loop(inputs, output, op, arg1, ...)

with decorator (although not using @ syntax) iloop, which based on ideas here

May be able to do

def run(inputs, outputs, arg1, ...):
     # some code
run = functools.partial(iloop, run)

if we can figure out how to do run.__doc__ = run.__doc__.format(iloop_docstring_post=iloop_docstring_post)

rename or move many objects/files

  • ConfigSet_in -> ConfigSet, ConfigSet_out -> OutputSpec

  • pipeline/ -> autoparallelize/, iterable_loop -> autoparallelize

  • All low level *_op -> *_autopara_wrappable

  • generate_configs/ -> generate/

  • select_configs/ -> select/

  • move selection_space.py -> select/

  • move mpipool_support.py -> autoparallelize

  • move calc_descriptors.py -> descriptors/calc.py

closed by #64

buildcell

Replace prep stage writing files to return strings, workflow can write them if that makes sense.

Maybe remove separate prep stage, if it can be done cheaply enough, and do it "on the fly" during actual running of buildcell.

documentation fixes

docs:

  • move CUR from generation to selection section

docstrings:

  • make flat_histogram docstring better, since it was confusing to explain
  • add a docstring to atoms_and_dimers.py::prepare

data structures for atom size

make all code that depends on atom "sizes" use a uniform data structure, maybe a class (so that it can be switched from Z1-Z2 using arithmetic mean to some other expression of r_Z1 and r_Z2).

E.g. start from length_scales.yaml dict, but add functionality for per-pair things.

QE test xfail

Test of pwscf via Espresso calculator of an actual converged calculation is broken. The default parameters fail to converge, and pwscf returns an error status after 100 iterations. The new Espresso detects this as a convergence failure. Loosening the convergence tolerance allows it to stop without apparent failure, but then the hard-coded values are wrong.

I suggest finding some system/parameters where convergence is achieved, so the return values can be meaningfully tested.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.