GithubHelp home page GithubHelp logo

drchainsaw / naivegaflux.jl Goto Github PK

View Code? Open in Web Editor NEW
41.0 4.0 1.0 4.97 MB

Evolve Flux networks from scratch!

License: MIT License

Julia 100.00%
deep-learning flux machine-learning neural-networks architecture-search hyperparameter-optimization genetic-algorithm evolution-strategies

naivegaflux.jl's People

Contributors

drchainsaw avatar github-actions[bot] avatar juliatagbot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

laplacekorea

naivegaflux.jl's Issues

GPU + GC woes

Training and validation time increases suddenly and drastically after a few generations when using CuArrays.

GPU-z shows that GPU spends most of the time idling and profiler revealed that the whole program spends >70% of the time waiting for GC.

This might very well be the exact same issue as this:
https://github.com/JuliaGPU/CuArrays.jl/issues/323

Although one should perhaps not completely rule out all the shenanigans going on in this library to swap models in and out of GPU memory (HostCandidate, I'm looking at you) and possibly incorrect "russian doll" iterators.

Anyways, I haven't figured out how to test the changes made to the GC in the above thread.

In the meantime I'll probably add some "exit when GC overhead has become too high" kinda parameter so that one can run the training from a loop in the shell instead (which obviously is not ideal due to compile times).

Avoid copying of fitness data

Due to annoying and stateful fitness functions (I'm looking at you NanGuard!) the fitness generally needs to be associated with a candidate.

One bad consequence of this is that AccuracyFitness which typically contains the validation set gets copied when candidates mutate and also gets serialized once for each candidate.

Bad initialization for identity residual blocks

Current master initializes all layers in a residual block as zeros if weigthinit is identity. This prevents them from ever being updated.

Lets just pretend I put this in there on purpose to see if anyone was actually using this package (as it would have made them submit an issue). ๐Ÿ˜Š

Add noutmutation

Randomly change nout by a percentage.
Good to separate positive and negative changes to allow for control over memory consumption.

Searching for small models is inefficient

The default fitness function for image classifiers is to truncate the fitness to two digits and use the remaining digits for the size.

The idea is that when two models have the same accuracy the one with the smallest size is considered "fittest".

This works to some extent, but for datasets which are very easy to fit (e.g. MNIST) the search for the smallest model is very slow and inefficient. This is because all models end up with basically the same fitness in the end (e.g. 1.0000xxxxx), meaning that sus selection will basically select the whole population again each generation.

The only way a "too big" candidate can be not selected is if it would accidentally be mutated to get a very low accuracy. However, the very same can happen to a small model as well, making the elite selection the only mechanism which actually searches for smaller models and this is very slow.

Unnecessary restriction in AddEdgeMutation

The current implementation of no_shapechange only looks for new outputs in the output direction of the selected vertex. It should be easy to also include parallel paths by looking at the setdiff between flatten(vi) and all vertices in the graph and filter out the ones which does not have the exact same delta shapes compared to the input.

Parallelization

The fitting procedure is trivially parallelizable, but I don't have a setup with multi-GPU to test it.

MutationShield needs improved granularity

It was sufficient back when all mutations had the potential to affect the output size.

Now things like KernelSizeMutation and ActivationFunctionMutation are often perfectly safe to apply, e.g. to vertices which must have the same size as the number of labels.

Type pirating iterators

Easy to refactor:
Iterators.cycle(itr, n) => IterTools.ncycle (or make own implementation)
Flux.onehotbatch(itr::Base.Iterators.PartitionIterator, labels) => Can probably be removed

Resize optimiser state

AutoOptimiser would make it quite easy, but it would be good to have a design which works with the normal way too.

Add option to store population on disk

For large populations (and hard problems) it is possible to run out of (not GPU) RAM. Some kind of FileCandidate analogous to HostCandidate is warranted. Challenging aspect is how to handle filenames in this case as they will quickly duplicate if not mutated along with models.

Rework Candidate-Fitness interaction

The current interworking between candiates (things one want to optimize, typically the model) and fitness strategies (what one wants to optimize) is less than beautiful. Much of it is thanks to the instrumentation-API which is the pattern for e.g. TimeFitness where one as a byproduct of the training also wants to get fitness in terms of how long it took to train.

I think that removing the assumption that one always wants to train the model a little bit before evaluating fitness (e.g. to implement this) can help clear this out into a nicer and more easy to work with design.

In particular having the fitness strategy separate from the candidate seems like an attractive way forward. I'm thinking that an API like fitness(fs::FitnessStrategy, c::AbstractCandidate) or perhaps even fitness(fs::FitnessStrategy, c::AbstractCandidate, generationnr::Int) i.e tightening up the fitness API so it no longer just takes a function. The function was only used for one single fitness strategy anyways as the instrumentation made it impossible to make any assumptions about it. The fitness strategy would then just fetch what it needs from the candidate (e.g. the model) to compute the fitness. Hopefully this removes alot of the implicit metrics where a fitness strategy just to stores the byproduct of some other operation as a mutable state which it then returns when which in turn means that fitness is easier to use and does not need to be associated with a certain candidate.

To optimize some other aspect than the model itself, one would then implement new candidate types which have the stuff one wants to optimize as members (currently we have models and optimizers). Fitness strategies would then need to throw an error if the candidate does not provide the data needed to compute the fitness. I think it is safe to say that for something to be meaningful to search for, it has to affect the fitness somehow, although the implementation might not have to be direct. I think that even creating new candidates with mapcandidate can be done somewhat automatically in the same way Functors.jl work, but perhaps I'll go for something simpler with an abstract type for "root" candidates which will have all their fields mapped (non-root candidates just pass the mapping on to their wrapped candidiate).

This means working out other solutions for

  • Training: This now becomes an own fitness strategy. I'm thinking it should have NaNGuard built in and probably wrap another fitness strategy which calculates the fitness after training. This is slightly awakward as the training step is not the exact fitness one is after, but it makes sense to say "my fitness metric is the accuracy on validationset after training for x iterations" so I think it is ok.
  • fitness caching: I think current CacheCandidate is sufficient. Perhaps use an even more explicit method, like passing a fitness vector to evolve is even more elegant, although I do like the option to look at last evaluated fitness of serialized candidates.
  • TimeFitness: I think this should just wrap another fitness strategy which it calculates the time for
  • SizeFitness: No problem?
  • TrainingAccuracy: Not sure, perhaps ditch it or incorporate it in the training step but default is to ignore it. Maybe training always produces the training accuracy and does not wrap another fitness strategy?

I think it is preferable if wrapping fitness strategies (e.g. TimeFitness) return all fitnesses as a (named?)tuple. This would pave the way for multi-objective optimization. Question is just how much headache it is to combine them? Does one want the tuples nested or not? Perhaps this requires some fitness combiner which just takes an arbitrary method which the user defines based on what the wrapped fitnesses are.

Add proper NaN-guard

Flux (rightfully) throws an exception in case the loss is NaN.

This is however bad from a GA perspective as I don't know of any way to guarantee that no model in the search space can ever end up producing NaNs.

Trivial approach to avoid the exception is to wrap the loss function in a function which checks for NaNs and changes the output if so.

However, models which produce NaNs are typically very slow to evaluate and should be removed from the population as soon as a NaN is spotted.

Population should replenish to the desired size at the next evolution instance.

List of breaking changes in 0.10

  • Minimum julia version set to 1.7
  • evolvemodel replaced by MapCandidate to facilitate mutation/crossover of more than two different types
  • Removed PostMutation as the same thing can be achieved by MutationChain

Simple example of a regression model

Hi and thank you for this package!
I'm having difficulty learning how to apply your package to a regression models w/ a continuous Y (outcome variable).

Is it possible to show a very simple example of how to use your model on the Boston housing data?
For reference I have parsimonious Flux code here.

Small neuron values and OutSelect{Relaxed}

Add and remove edges as well as remove vertices are perhaps unnecessarily constrained as they don't allow for fallback to OutSelect{Relaxed}. Reason is that it sometimes leads to size mismatches due to DrChainsaw/NaiveNASlib.jl#39 even when neuron values are kept strictly positive through default_neuronselect due to what appears to be quantization errors with small values.

Fix poking inside Iterators.Stateful for Julia 1.9 compatibility

This package uses Iterators.Stateful as a markable iterator by mutating its fields. In Julia 1.9 the field taken is changed to remaining causing tests to fail. I suppose the correct fix is to implement an own markable iterator or find a package which has one.

Display Model Architecture

Hello, thank you for all of your work on this package! I have just started working with it and it is making the process of selecting a good architecture much easier. However, is it possible to display the model architecture of the best performing model?

For example, if we consider your 'quick tutorial' example, we can use the model with
model(newnewpopulation[bestcandnr])(datasetvalidate), but is there a way to show how the layers are constructed? I'm thinking something along the lines of summary(model(newnewpopulation[bestcandnr])) to return:

Chain( Dense(15, 8, relu), Dense(8, 2) )

Add way to remove redundant vertices

Currently concatenations and elementwise operations need to be safeguarded from removal as other layers typically don't handle multiple inputs.

However, when a concat or elemwise op is left with just one input, it is basically a noop which just takes up space in the graph.

Tagging such ops (maybe with a new trait) would allow one to search the graph for them and remove them. Another way could be to add some kind of "edge mutation callback" mechanism.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

A case for crossover

The crossover operation does not seem very useful in combination with weight inheritance, but maybe it is still worth adding.

Apart from being fun to implement the following arguments support adding it:

  1. One might not want to use weight inheritance
  2. Even with weight inheritance, populations will eventually consist of many close relatives which could have very similar weights. Swapping a layer for a relatives same layer or just stacking the top half of one candidate with the bottom half of another could turn out to be better than just creating a freshly initialized model.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.