GithubHelp home page GithubHelp logo

acefit.jl's People

Contributors

cheukhinhojerry avatar cortner avatar jpdarby avatar tjjarvinen avatar wcwitt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

acefit.jl's Issues

Notation

Probably a good time to fix a notation for ACEfit. I view this as not completely trivial since it'd be best for the (eventual) documentation to match the code. Some considerations:

  • Something appropriate for linear and various planned nonlinear?
  • Preferences for ASCII/Unicode?

My proposals are in bold below, and I've quickly sketched out some alternatives and reasoning.

Design matrix/feature matrix

  • A. As in Ax = b. Used by ACEfit currently.
  • X. Simple, seems standard in some contexts
  • \Phi. Another standard choice, used in the Deringer et al. Chem Review.
  • \Psi. Used by IPFitting currently.

Regression coefficients or equivalent

  • x. As in Ax = b.
  • c. Probably the best choice?
  • w. Used by some GP literature ("weight-space view"). Seems best to avoid because we use "weights" for something else.
  • \theta

Observation/target vector

  • y. Best for case to match that of c?
  • Y. Used currently by both IPFitting and ACEfit.
  • b
  • t

Making ACEfit.jl Problem agnostic

I wonder whether ACEfit needs the JuLIP dependency. The observation classes that need JuLIP could just be moved into ACE1.jl or ACE1pack.jl. Is there any other reason?

I would really like it if ACEfit could be entirely abstract. (just as HAL should be as well)

Problem with distributed assembly

Reported by @CheukHinHoJerry in ACEsuit/ACE1x.jl#7.

I got this error multiple times with the ACEfit.assemble function with multiple workers for large lsq system and I remember there was an issue about this so I think it's better to post it here. It happens when I am in the middle of assembling the design matrix. This is the full error log:

Worker 18 terminated.
Unhandled Task ERROR: EOFError: read end of file
Stacktrace:
 [1] (::Base.var"#wait_locked#715")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64)
   @ Base ./stream.jl:947
 [2] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64)
   @ Base ./stream.jl:955
 [3] unsafe_read
   @ ./io.jl:761 [inlined]
 [4] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64)
   @ Base ./io.jl:760
 [5] read!
   @ ./io.jl:762 [inlined]
 [6] deserialize_hdr_raw
   @ ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/messages.jl:167 [inlined]
 [7] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
   @ Distributed ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:172
 [8] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
   @ Distributed ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:133
 [9] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})()
   @ Distributed ./task.jl:514
Progress:  21%|████████████████████████▌                                                                                           |  ETA: 0:52:08ERROR: Lo18Progress:  21%|████████████████████████▌                                                                                           |  ETA: 0:51:57)
Stacktrace:
  [1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
    @ Base ./task.jl:920
  [2] wait()
    @ Base ./task.jl:984
  [3] wait(c::Base.GenericCondition{ReentrantLock}; first::Bool)
    @ Base ./condition.jl:130
  [4] wait
    @ ./condition.jl:125 [inlined]
  [5] take_buffered(c::Channel{Any})
    @ Base ./channels.jl:456
  [6] take!(c::Channel{Any})
    @ Base ./channels.jl:450
  [7] take!(::Distributed.RemoteValue)
    @ Distributed ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:726
  [8] remotecall_fetch(f::Function, w::Distributed.Worker, args::ACEfit.DataPacket{AtomsData}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Distributed ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:461
  [9] remotecall_fetch(f::Function, w::Distributed.Worker, args::ACEfit.DataPacket{AtomsData})
    @ Distributed ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:454
 [10] #remotecall_fetch#162
    @ ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:492 [inlined]
 [11] remotecall_fetch(f::Function, id::Int64, args::ACEfit.DataPacket{AtomsData})
    @ Distributed ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:492
 [12] remotecall_pool(rc_f::Function, f::Function, pool::WorkerPool, args::ACEfit.DataPacket{AtomsData}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Distributed ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/workerpool.jl:126
 [13] remotecall_pool
    @ ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/workerpool.jl:123 [inlined]
 [14] #remotecall_fetch#200
    @ ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/workerpool.jl:232 [inlined]
 [15] remotecall_fetch
    @ ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/workerpool.jl:232 [inlined]
 [16] #208#209
    @ ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/workerpool.jl:288 [inlined]
 [17] #208
    @ ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/workerpool.jl:288 [inlined]
 [18] (::Base.var"#978#983"{Distributed.var"#208#210"{Distributed.var"#208#209#211"{WorkerPool, ProgressMeter.var"#56#59"{RemoteChannel{Channel{Bool}}, ACEfit.var"#3#4"{JuLIP.MLIPs.IPSuperBasis{JuLIP.MLIPs.IPBasis}, SharedArrays.SharedVector{Float64}, SharedArrays.SharedVector{Float64}, SharedArrays.SharedMatrix{Float64}}}}}})(r::Base.RefValue{Any}, args::Tuple{ACEfit.DataPacket{AtomsData}})
    @ Base ./asyncmap.jl:100
 [19] macro expansion
    @ ./asyncmap.jl:234 [inlined]
 [20] (::Base.var"#994#995"{Base.var"#978#983"{Distributed.var"#208#210"{Distributed.var"#208#209#211"{WorkerPool, ProgressMeter.var"#56#59"{RemoteChannel{Channel{Bool}}, ACEfit.var"#3#4"{JuLIP.MLIPs.IPSuperBasis{JuLIP.MLIPs.IPBasis}, SharedArrays.SharedVector{Float64}, SharedArrays.SharedVector{Float64}, SharedArrays.SharedMatrix{Float64}}}}}}, Channel{Any}, Nothing})()
    @ Base ./task.jl:514
Stacktrace:
  [1] (::Base.var"#988#990")(x::Task)
    @ Base ./asyncmap.jl:177
  [2] foreach(f::Base.var"#988#990", itr::Vector{Any})
    @ Base ./abstractarray.jl:3073
  [3] maptwice(wrapped_f::Function, chnl::Channel{Any}, worker_tasks::Vector{Any}, c::Vector{ACEfit.DataPacket{AtomsData}})
    @ Base ./asyncmap.jl:177
  [4] wrap_n_exec_twice
    @ ./asyncmap.jl:153 [inlined]
  [5] #async_usemap#973
    @ ./asyncmap.jl:103 [inlined]
  [6] async_usemap
    @ ./asyncmap.jl:84 [inlined]
  [7] #asyncmap#972
    @ ./asyncmap.jl:81 [inlined]
  [8] asyncmap
    @ ./asyncmap.jl:80 [inlined]
  [9] pmap(f::Function, p::WorkerPool, c::Vector{ACEfit.DataPacket{AtomsData}}; distributed::Bool, batch_size::Int64, on_error::Nothing, retry_delays::Vector{Any}, retry_check::Nothing)
    @ Distributed ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/pmap.jl:126
 [10] pmap(f::Function, p::WorkerPool, c::Vector{ACEfit.DataPacket{AtomsData}})
    @ Distributed ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/pmap.jl:99
 [11] pmap(f::Function, c::Vector{ACEfit.DataPacket{AtomsData}}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Distributed ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/pmap.jl:156
 [12] pmap(f::Function, c::Vector{ACEfit.DataPacket{AtomsData}})
    @ Distributed ~/julia_ws/julia-1.9.0/share/julia/stdlib/v1.9/Distributed/src/pmap.jl:156
 [13] macro expansion
    @ ~/.julia/packages/ProgressMeter/sN2xr/src/ProgressMeter.jl:1015 [inlined]
 [14] macro expansion
    @ ./task.jl:476 [inlined]
 [15] macro expansion
    @ ~/.julia/packages/ProgressMeter/sN2xr/src/ProgressMeter.jl:1014 [inlined]
 [16] macro expansion
    @ ./task.jl:476 [inlined]
 [17] progress_map(::Function, ::Vararg{Any}; mapfun::Function, progress::ProgressMeter.Progress, channel_bufflen::Int64, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ ProgressMeter ~/.julia/packages/ProgressMeter/sN2xr/src/ProgressMeter.jl:1007
 [18] assemble(data::Vector{AtomsData}, basis::JuLIP.MLIPs.IPSuperBasis{JuLIP.MLIPs.IPBasis})
    @ ACEfit ~/.julia/packages/ACEfit/ID48n/src/assemble.jl:31
 [19] make_train(model::ACE1x.ACE1Model)
    @ Main ~/julia_ws/ACEworkflows/Fe_pure_jerry/asm_all_lsq.jl:54
 [20] top-level scope
    @ ~/julia_ws/ACEworkflows/Fe_pure_jerry/asm_all_lsq.jl:91
 [21] include(fname::String)
    @ Base.MainInclude ./client.jl:478
 [22] top-level scope
    @ REPL[2]:1
in expression starting at /zfs/users/jerryho528/jerryho528/julia_ws/ACEworkflows/Fe_pure_jerry/asm_all_lsq.jl:71
  [e3f9bc04] ACE1 v0.11.12
  [8c4e8d19] ACE1pack v0.4.1
  [5cc4c08c] ACE1x v0.1.4
  [ad31a8ef] ACEfit v0.1.1
  [f67ccb44] HDF5 v0.16.15
  [682c06a0] JSON v0.21.4
  [898213cb] LowRankApprox v0.5.3
  [91a5bcdd] Plots v1.38.16
  [08abe8d2] PrettyTables v2.2.5
  [de0858da] Printf

Additionally:

It happens every time so it stops me from assembling a large lsq.

Add option to load stresses from xyz

Currently, only virials are supported.

From Slack: "Don't forget the negative sign (assuming you're using ASE's sign convention for the stress) and be careful about Voigt 6-vector vs. 3x3 matrix"

PyCall

  • make python/sklearn dependency a J 1.9 type extension
  • replace PyCall with PythonCall

Incorporating prior knowledge

Everyone agrees that the coefficients needs to be regularised (or 'small') and this is incorporated into the BRR/ARD prior. It's also possible to incorporate other 'expert' information into the prior. One idea floating around which @WillBaldwin0 is looking into I think is to fit to dimer data first, incorporate it into the prior, and then do the full solve. We'd need an interface allowing us to provide this prior c-vector before fitting. This procedure is quite simple and turns out to be really only a change of variables.

Probably worth it to wait and see what Will thinks about this idea first before implementing it properly.

Regression Weights for Forces

... should be allowed to be scalars, vectors (diagonal?!) or matrices. Maybe enforce them to always be matrices? scalars can just be represented as w * I

Progressmeters are not representative of progress

       @showprogress map(f, 1:length(data))

this line counts how many structure have been assembled. What it should count is the total number of atoms left. In a dataset with vastly varying size structures this the progressmeter can be very deceiving.

Proper way to add new dependencies?

After I added Distributed and DistributedArrays as dependencies, the docs part of the CI stopped working, and I can't figure out how to fix it. Sorry @cortner - is there something obvious I'm missing?

Tag new version

Hi @cortner, would you mind tagging a new version? I just fixed one of the warnings that was causing problems in ACE1pack.

MacOS

The unit tests are failing on MacOS for reasons I don't understand. I have commented the relevant line from the CI for now.

Sigmas and weights

There's a strong connection between the GAP sigmas and ACE weights. In principle weights are the square root inverse of the sigmas, but I don't think it's quite as simple. One can dial the weights up for a very simple ACE model, but this will not lead to good training errors as the ACE model is too constrainted to fit the underlying data. Using ARD/BRR this would become apparent because the associated noise term would be large. I think it'd be nice to propagate the noise term through the weights matrix and display the "optimised sigmas" after an ACE fit. GAP users should relate to this quite well I think.

Show progress in assembly

Does the showprogress line

    @showprogress pmap(packets) do p

correctly represent progress? I.e. does it account for the fact that some structures are much bigger than others and more expensive to assemble?

Note in #54 I do this manually.

My evidence for this is that my distributed assembly starts with > 1.5h, then drops to ca 1h for a while and then completes in around 25 minutes total.

Missing output

Checklist for useful output given by IPFitting that isn't yet in ACEfit

  • Pretty-printed list of configs read from .xyz file
  • Pretty-printed error tables

Package Design Fundamentals

Note I've decided in the end to start this package from scratch. IPFitting is too messy to fork from it. Instead my proposal it to maintain IPFitting purely for ACE v0.8 but focus all work for the latest ACE version to ACEfit.jl.

This issue is to explain the design philosophy I propose, get feedback, and ask opinions on a few questions that this leaves open. None of the following is set in stone and all comments and criticism are welcome!

Maxiter for Bayesian Solvers

LSQR has a maxiter parameter but the Bayesian solvers do not. For some problems they just seem to not converge (cf Slack discussion). They should all get this parameter please, and then they should fail with a nice user-friendly message, something along the lines of "even when the solver hasn't converged the quality of the solution may be good, please test this before changing solver parameters"

Weighthooks

where will the weight hooks go in ACEfit.jl?

Cost Estimates

For parallelising loops over configurations or observations we need cost estimates. It is not clear how this should be implemented when we no longer have concrete Atoms objects and just the three basic E, F, V observations. Also needs some testing how important this really in in practise. Cf. ACEfit.jl, data.jl, function cost.

Investigate distributed linear algebra in Julia

As a step towards a distributed ACEFit for linear models, we thought it would be useful---just with toy problems to start---to investigate multiple options for distributed linear algebra in Julia.

Some todos:

  • Investigate ScaLAPACK in Julia
  • Find something for distributed QR

0.0.2

@cortner, would you please tag and publish 0.0.2? Thanks!

Lazy LSQ System

I spent a week hand-tuning a LSQ fit. To do so, I manually managed what IPFitting used to provide within LsqDB. I'd like us to re-introduce such a functionality, but maybe go a step further and make this a lazy datastructure that assembles the design matrix "as needed".

The "standard usage" would remain mostly unaffected by this I think or it could even become an option that need not be used by most users.

For now this is just a note - we can discuss it before doing anything.

What happened to `Dat`?

@wcwitt -- I'm currently trying to implement a fitting script for a new project and noticed for the first time how much the structure of ACEfit has changed. The new AtomsData is now very restrictive and moreover seems to require far more code overhead that the old code that was inspired by IPFitting. I'm guess there were good reasons for those changes though but I don't remember the discussion. Can you remind me please?

Depending on this, I may bring the old datastructures back. As far as I can tell they can easily live side-by-side with your new framework.

ACE1Pack Integration

  • Make branch in ACE1Pack that uses ACEFit
  • Identify ACE1Pack tests that we will use to declare "success"

Sampling coefficients from posterior distribution

The latest linear models in ACE.jl will allow to be parameterised by different c-vectors, and I think ACEfit.jl should have a function to sample these c-vectors from the posterior after having optimised the hyperparameters.

Iteratively reweighted least squares

I think we have two usages for Iteratively reweighted least squares (IRLS) in mind, the first is to optimise any p-norm which @cortner will know a lot about. I think it'd be quite interesting to use IRLS to try and "even out" the relative error on the force components in the training database. After optimising with IRLS we'd have say 10% relative error on both large liquid and small vibrational forces hopefully resulting in both a good liquid rdf and phonon spectrum, without having to specify the weights.

fitting to energy differences

It would be useful to be able to fit energy differences, or perhaps other more complex functions of "raw" fitting targets. A simple but still useful subset of this would be combinations like E(config_1) - E(config_bulk) * N_atom_1 / N_atom_bulk, or maybe E(config_1) - E(config_bulk) * arb_factor.

The trickiest aspect is probably to come up with a syntax for this that isn't super cumbersome. That's the motivation to suggest the simpler forms above, which would help for things like defect energies. Those would still cancel out most of the bulk energy and come much closer to the fitting target being the defect energy without necessarily having to come up with a syntax to precisely specify all the messy chemical potential reference structure details. The user could get with having to specify only the "bulk" config, and perhaps the arbitrary factor, which can default to the N_atom_* ratio isn't appropriate.

v0.0.4

@cortner, would you please bump ACEfit to v0.0.4? There aren't many new changes from v0.0.3, but @gelzinyte and I are trying to bring ACE1pack back to a state where the tests pass, and this is a prerequisite.

Sendto of some models fails

the line assemble.jl#L25

    (nprocs() > 1) && sendto(workers(), basis = basis)

fails for some not-entirely-standard models.

A while back we discussed serializing models to JSON, and then transferring those to the processes.

Again we may want input from a Julia expert here on how this is best done instead of hacking something together.

CC @tjjarvinen

Python dependencies

Python dependencies are ok but should not be required. Is this currently guaranteed?

Iterate in parallel (distributed)

@andresrossb I've moved and slightly edited the iteration interface from IPFitting to ACEfit. Would you be willing to put your draft for the distributed iteration in here as well? We will iterate on it a bit, so please don't push to main directly but make a PR. But you do have push access to ACEfit, so you can create a branch in this repo.

(NB -- I'm thinking we might want to design the nonlinear solvers interface first in ACEfit, since for linear problems people have IPfitting anyhow... We will want to discuss what should be here, vs in ACEflux)

Clarify naming of Bayesian solvers

I've experimented with a few ways of implementing Bayesian ridge (for example), and the proliferation of functions has gotten a bit confusing. Now that more people are starting to use them, it's crucial to reorganize and document them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.