drchainsaw / naivegaflux.jl Goto Github PK

Evolve Flux networks from scratch!

License: MIT License

Julia 100.00%

deep-learning flux machine-learning neural-networks architecture-search hyperparameter-optimization genetic-algorithm evolution-strategies

naivegaflux.jl's People

Contributors

Stargazers

Watchers

Forkers

laplacekorea

naivegaflux.jl's Issues

GPU + GC woes

Training and validation time increases suddenly and drastically after a few generations when using CuArrays.

GPU-z shows that GPU spends most of the time idling and profiler revealed that the whole program spends >70% of the time waiting for GC.

This might very well be the exact same issue as this:
https://github.com/JuliaGPU/CuArrays.jl/issues/323

Although one should perhaps not completely rule out all the shenanigans going on in this library to swap models in and out of GPU memory (HostCandidate, I'm looking at you) and possibly incorrect "russian doll" iterators.

Anyways, I haven't figured out how to test the changes made to the GC in the above thread.

In the meantime I'll probably add some "exit when GC overhead has become too high" kinda parameter so that one can run the training from a loop in the shell instead (which obviously is not ideal due to compile times).

Avoid copying of fitness data

Due to annoying and stateful fitness functions (I'm looking at you NanGuard!) the fitness generally needs to be associated with a candidate.

One bad consequence of this is that AccuracyFitness which typically contains the validation set gets copied when candidates mutate and also gets serialized once for each candidate.

Bad initialization for identity residual blocks

Current master initializes all layers in a residual block as zeros if weigthinit is identity. This prevents them from ever being updated.

Lets just pretend I put this in there on purpose to see if anyone was actually using this package (as it would have made them submit an issue). 😊

Refactor CNN shape checking functions to use new shape utils

Would allow for more freedom when it comes to downsizing.

Outsource augmentation to augmentor

Now that Augmentor.jl is finally 1.0 compatible it is time to throw out the home brewed data augmentation.

Add noutmutation

Randomly change nout by a percentage.
Good to separate positive and negative changes to allow for control over memory consumption.

Optimiser rule as a per layer hyperparameter

Also quite easy to do with AutoOptimiser, but is probably just as doable with the normal way.

This could be a way to visualize: https://discourse.julialang.org/t/scatter-plot-with-multiple-labels-per-point/102477/2

Searching for small models is inefficient

The default fitness function for image classifiers is to truncate the fitness to two digits and use the remaining digits for the size.

The idea is that when two models have the same accuracy the one with the smallest size is considered "fittest".

This works to some extent, but for datasets which are very easy to fit (e.g. MNIST) the search for the smallest model is very slow and inefficient. This is because all models end up with basically the same fitness in the end (e.g. 1.0000xxxxx), meaning that sus selection will basically select the whole population again each generation.

The only way a "too big" candidate can be not selected is if it would accidentally be mutated to get a very low accuracy. However, the very same can happen to a small model as well, making the elite selection the only mechanism which actually searches for smaller models and this is very slow.

Unnecessary restriction in AddEdgeMutation

The current implementation of no_shapechange only looks for new outputs in the output direction of the selected vertex. It should be easy to also include parallel paths by looking at the setdiff between flatten(vi) and all vertices in the graph and filter out the ones which does not have the exact same delta shapes compared to the input.

FileCandidate does not move models to disk

This was surprising behaviour to me JuliaData/MemPool.jl#39

Issue contains a possible workaround (use pooldelete on the fileref).

Those async calls are probably unsafe too. Need to wait for the task to finish before allowing any more operations on the same FileCandidate.

Add Augmentor as weak dependency

And enable augmentation in search space.

Parallelization

The fitting procedure is trivially parallelizable, but I don't have a setup with multi-GPU to test it.

MutationShield needs improved granularity

It was sufficient back when all mutations had the potential to affect the output size.

Now things like KernelSizeMutation and ActivationFunctionMutation are often perfectly safe to apply, e.g. to vertices which must have the same size as the number of labels.

HyperBand-ish

HyperBand seems like it would be applicable to this package as well. Why not inverse hyperband (start small and increase popsize when fitness flattens)?

http://www.jmlr.org/papers/volume18/16-558/16-558.pdf

Type pirating iterators

Easy to refactor:
Iterators.cycle(itr, n) => IterTools.ncycle (or make own implementation)
Flux.onehotbatch(itr::Base.Iterators.PartitionIterator, labels) => Can probably be removed

Change to explicit gradients and optimizers

The writing is on the wall for the implict modes. Time to change this.

Should probably try to make an overhaul of how optimizers mutate at the same time.

Resize optimiser state

AutoOptimiser would make it quite easy, but it would be good to have a design which works with the normal way too.

Add proper serialization/deserialization

Using cpu.(population) then BSON works, but it is extremely slow, both to serialize and to deserialize.

Add option to store population on disk

For large populations (and hard problems) it is possible to run out of (not GPU) RAM. Some kind of FileCandidate analogous to HostCandidate is warranted. Challenging aspect is how to handle filenames in this case as they will quickly duplicate if not mutated along with models.

Rework Candidate-Fitness interaction

The current interworking between candiates (things one want to optimize, typically the model) and fitness strategies (what one wants to optimize) is less than beautiful. Much of it is thanks to the instrumentation-API which is the pattern for e.g. TimeFitness where one as a byproduct of the training also wants to get fitness in terms of how long it took to train.

I think that removing the assumption that one always wants to train the model a little bit before evaluating fitness (e.g. to implement this) can help clear this out into a nicer and more easy to work with design.

In particular having the fitness strategy separate from the candidate seems like an attractive way forward. I'm thinking that an API like fitness(fs::FitnessStrategy, c::AbstractCandidate) or perhaps even fitness(fs::FitnessStrategy, c::AbstractCandidate, generationnr::Int) i.e tightening up the fitness API so it no longer just takes a function. The function was only used for one single fitness strategy anyways as the instrumentation made it impossible to make any assumptions about it. The fitness strategy would then just fetch what it needs from the candidate (e.g. the model) to compute the fitness. Hopefully this removes alot of the implicit metrics where a fitness strategy just to stores the byproduct of some other operation as a mutable state which it then returns when which in turn means that fitness is easier to use and does not need to be associated with a certain candidate.

To optimize some other aspect than the model itself, one would then implement new candidate types which have the stuff one wants to optimize as members (currently we have models and optimizers). Fitness strategies would then need to throw an error if the candidate does not provide the data needed to compute the fitness. I think it is safe to say that for something to be meaningful to search for, it has to affect the fitness somehow, although the implementation might not have to be direct. I think that even creating new candidates with mapcandidate can be done somewhat automatically in the same way Functors.jl work, but perhaps I'll go for something simpler with an abstract type for "root" candidates which will have all their fields mapped (non-root candidates just pass the mapping on to their wrapped candidiate).

This means working out other solutions for

Training: This now becomes an own fitness strategy. I'm thinking it should have NaNGuard built in and probably wrap another fitness strategy which calculates the fitness after training. This is slightly awakward as the training step is not the exact fitness one is after, but it makes sense to say "my fitness metric is the accuracy on validationset after training for x iterations" so I think it is ok.
fitness caching: I think current CacheCandidate is sufficient. Perhaps use an even more explicit method, like passing a fitness vector to evolve is even more elegant, although I do like the option to look at last evaluated fitness of serialized candidates.
TimeFitness: I think this should just wrap another fitness strategy which it calculates the time for
SizeFitness: No problem?
TrainingAccuracy: Not sure, perhaps ditch it or incorporate it in the training step but default is to ignore it. Maybe training always produces the training accuracy and does not wrap another fitness strategy?

I think it is preferable if wrapping fitness strategies (e.g. TimeFitness) return all fitnesses as a (named?)tuple. This would pave the way for multi-objective optimization. Question is just how much headache it is to combine them? Does one want the tuples nested or not? Perhaps this requires some fitness combiner which just takes an arbitrary method which the user defines based on what the wrapped fitnesses are.

Add proper NaN-guard

Flux (rightfully) throws an exception in case the loss is NaN.

This is however bad from a GA perspective as I don't know of any way to guarantee that no model in the search space can ever end up producing NaNs.

Trivial approach to avoid the exception is to wrap the loss function in a function which checks for NaNs and changes the output if so.

However, models which produce NaNs are typically very slow to evaluate and should be removed from the population as soon as a NaN is spotted.

Population should replenish to the desired size at the next evolution instance.

List of breaking changes in 0.10

Minimum julia version set to 1.7
evolvemodel replaced by MapCandidate to facilitate mutation/crossover of more than two different types
Removed PostMutation as the same thing can be achieved by MutationChain

Random augmentation is different for each candidate in a generation

Not sure if it is a problem, but it could be a source of unwanted fitness noise.

Fix ongoing: #30

Simple example of a regression model

Hi and thank you for this package!
I'm having difficulty learning how to apply your package to a regression models w/ a continuous Y (outcome variable).

Is it possible to show a very simple example of how to use your model on the Boston housing data?
For reference I have parsimonious Flux code here.

TrainStrategy caps number of batches per generation to one epoch

This should probably be adressed in RepeatPartitionIterator as the reason is that it caps if nreps is larger than iterator size.

Small neuron values and OutSelect{Relaxed}

Add and remove edges as well as remove vertices are perhaps unnecessarily constrained as they don't allow for fallback to OutSelect{Relaxed}. Reason is that it sometimes leads to size mismatches due to DrChainsaw/NaiveNASlib.jl#39 even when neuron values are kept strictly positive through default_neuronselect due to what appears to be quantization errors with small values.

Fix poking inside Iterators.Stateful for Julia 1.9 compatibility

This package uses Iterators.Stateful as a markable iterator by mutating its fields. In Julia 1.9 the field taken is changed to remaining causing tests to fail. I suppose the correct fix is to implement an own markable iterator or find a package which has one.

Display Model Architecture

Hello, thank you for all of your work on this package! I have just started working with it and it is making the process of selecting a good architecture much easier. However, is it possible to display the model architecture of the best performing model?

For example, if we consider your 'quick tutorial' example, we can use the model with
model(newnewpopulation[bestcandnr])(datasetvalidate), but is there a way to show how the layers are constructed? I'm thinking something along the lines of summary(model(newnewpopulation[bestcandnr])) to return:

Chain( Dense(15, 8, relu), Dense(8, 2) )

Remove CUDA dependency

The few parts which uses it should just be moved to an extension.

Add way to remove redundant vertices

Currently concatenations and elementwise operations need to be safeguarded from removal as other layers typically don't handle multiple inputs.

However, when a concat or elemwise op is left with just one input, it is basically a noop which just takes up space in the graph.

Tagging such ops (maybe with a new trait) would allow one to search the graph for them and remove them. Another way could be to add some kind of "edge mutation callback" mechanism.

One might not want to use weight inheritance
Even with weight inheritance, populations will eventually consist of many close relatives which could have very similar weights. Swapping a layer for a relatives same layer or just stacking the top half of one candidate with the bottom half of another could turn out to be better than just creating a freshly initialized model.

drchainsaw / naivegaflux.jl Goto Github PK

naivegaflux.jl's People

Contributors

Stargazers

Watchers

Forkers

naivegaflux.jl's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs