strangedev / neat_pygenetics Goto Github PK

View Code? Open in Web Editor NEW

4.0 3.0 0.0 5.64 MB

An implementation of neuro-evolution of augmented topologies

Python 100.00%

neat_pygenetics's Introduction

Hi there! Visit our website for more information.

neat_pygenetics's People

Contributors

Stargazers

Watchers

neat_pygenetics's Issues

Better error handling for networking

A list of possible networking errors should be assembled and appropriate counter measures should be put in place.

SimulationStructure for Genomes

We need a dedicated genome structure for simulations.
The structure should be fast to use, while storage efficiency is the lesser priority.

This is a preliminary design:

Naming Consistency

We should decide on some naming conventions.

Endpoints of Nodes:
source - target vs head - tail

tbd

The Genome selector selects two genomes from the Genome-Repository for breeding.
The selection is based on cluster and other stuff.
The Genome Selector will be a main part of a continuous evolutionary process.

Database

We are currently experimenting with MongoDB.
MongoDB stores objects in JSON representation.

In the module DatabaseConnector we have custom JSONEn- and Decoders, which are used to encode StorageGenomes and their AnalysisResults into JSON and decode them back.

I want to build the repositories on top of that and then test performance with many (100,000+) objects.

Addition of types for graph nodes

In order to solve more complex problems, different node types should be added.
Different types should be able to specify their own transport functions, minimum and maximum number of in-/outputs.

Simulator

The simulator uses an arbitrary simulation (in strangedev's case his prg-1 python project, I want to use 2048) to receive inputs for a genome and to test the outputs of the genome.
The simulation then has to prvide a fitness for the tested genome.

For this we need a really abstract structure.
A Simulator should have this interface:

run(genome) // genome in SimulationStructure
where run returns the fitness of the genome.

Very abstract ;D

Cluster Repository

Since we need to store a bit of information about clusters, namely

shared fitness
maximum population size
possibly more
we need to store them in the database.
Therefore, we need a repository for them.

AnalysisStructure for Genomes

We need a dedicated structure for the analysis of genomes.
The structure should be lightweight, but easily traversable and easy to check for cycles etc.
The currently best idea for this is an adjacency list.

Genome Analyzer

The Genome Analyzer performs different tasks on Genomes in the AnalysisStructure.
The currently planned analysis tasks are:

test for coherency
- finding sources and sinks in the hidden layer (and removing them)
finding cycles (and marking them)
marking Genes as disables

The Genome analyzer should return a diff object on request. This diff object should contain information about what in the graph has to change according to the given rules. If there are source or sinks found, for example, they should be contained in the diff object as "to be removed", so that they can be removed from the StorageStructure afterwards.

Gene Repository

The Gene Repository collects and distributes Genes.
How these are stored exacty is not yet decided, but it should be persistent.

Genes should be accessible via getGene(geneID) or something like that. Also the repository should provide the possibility to search for Genes starting or ending in a specific node (for reusage).

I recommend sqlite for testing and prototyping, then switching to MySQL later on.
I haven't toyed with nosql databases yet - maybe they are a viable in this case.

Thoughts and opinions?

MainDirector Class

The MainDirector class is called from main and does everything in an automated run.
It creates databases as necessary and does everything the program flow contains.
It currently contains pseudo code like stuff detailing, how the program should work. See #10
It outsources a lot of its work: there are plans for objects that encapsulate selection or decision making.

By no means perfect, currently a proposal for how things should work, once everything is implemented.

Licensing

Because I like to think that this is going to be a massive success, I am thinking a bit about licensing.
Something open, of course, with free access and usage in my opinion.
But i feel the need to expand whatever license we use (and i don't know many) with a clause protecting this project from any military use. Which is a potential threat with ai.

Genome Repository

The Genome Repository is basically the container for the current population. Old Genomes should not be thrown away, but marked as old and out of use.

Implementation of distributed simulation

NEAT should be able to outsource the calculation of graph outputs over the network.
This would allow for the integration of other programs which use graph structures for their calculations. (NEAT would then only perform graph optimization)

Also, NEAT should be able to distribute calculation node over the network, when output calculation isn't outsourced.

Web Frontend

I suggest building a web frontend. Mainly for statistics and status tracking, but maybe also as a control panel.
Python can do lots for that. Especially for statistics, things like matplotlib and scipy can go a long way for us.
Longterm, as it is only useful, once the application is done/usable.

AnalysisResult Refactor

The AnalysisResult currently contains an adjacency list of cycle closing edges and an adjacency list of the rest of the edges.
This will be changed into a single dictionary mapping the gene_ids of the StorageGenome to true, if they close a circle and to false, if they don't.
This way it is necessary for creating SimulationGenomes to use a GeneRepository, but it does not depend on the AnalysisGenome anymore.

Also the set of disabled nodes is to be removed, since genes are not disabled during the analysis.

Collection of Exceptions

We should have a module dedicated to several necessary Exceptions.
Things like SuccessorAlreadyExistsInNode etc.
These Exceptions should be configurable and verbose e.g. should be parameterized with information about the incident.

Rework of the flow structure wiki page

Customization of Doxygen for python

At the moment, doxygen is used for creating doc pages.
For some reason, doxygen is missing some of our docstrings, this should be fixed.

Implementation of message queue in NEATClient

At the moment, only NEATServer uses a message queue for communication.
For better (and more robust) communication, NEATClient should also implement a threaded message queue.

Genome class / Breeding and Innovation numbers

Problem:

When following the program flow, there has to be a main class which ties all three graph representations together, in order to be able to access all the required methods and information at the right time.

Analysis of the situation:

For most steps, like clustering, breeding, selecting and mutating, the graph has to be examined in order to pick a spot for mutations to happen, compare two genomes, etc.
The AnalysisStructure (AnalysisGenome) is the most appropriate class for these operations, because of the intuitive graph representation (adjacency list).

The simulation structure is only needed for a single step, which is computing the output of the NN.
The storage structure is also only needed when storing the genomes in persistent storage.
(The storage structure could potentially handle innovation numbers / ids, but it shouldn't)

Original proposal:

The Simulation structure should be the main structure tying together all three different data structures.

Objection:

However, the SimulationStructure (SimulationGenome), does not store information on genes (edges) but on neurons (nodes), which makes the analysis structure unfit for performing breeding/mutation/analysis. This is not a design flaw, because simulation structure ought to be a specialized class, only carrying the information it needs.
The simulation structure is therefore unfit for the purpose of being our main data structure and should only be used for computation of the NN.

New proposal:

The analysis structure could potentially carry all of the needed information, but should remain a specialized class and not take the role of the main, combining class for the three structures for design purposes.
Instead, we should wrap the three structures into a Genome class, which will perform graph operations using the analysis structure and create the simulation- and storage structure lazily as needed, because either of them will either lose information or computational fitness once converted.

This will also clear out responsibility issues and make the program design more structured.

Session management for NEATClient

At the moment, no authentication is required for comunication between server and client.
An appropriate method for authentication and session management should be added.

Genome Clustering

We want to cluster our population into species to encourage inbreeding.
Inbreeding is used to prevent "good ideas" from being killed befor they can develop. If a change in a genome, that will eventually lead to greatness, initially decreases the fitness, it would be discarded.
Because of that, we take similar genomes and cluster them, leaving the genomes in clusters alive, even if their fitness is low. Cluster with a low overall fitness produce less offspring, so we can focus our resources on better clusters, but they continue to exists, so they may improve their "ideas".

Therefore, we need a way to cluster our genomes. After breeding for a while inside these clusters, we breed two whole clusters together, thus (probably) creating a new, third, cluster. Afterwards, we recluster everything and keep on inbreeding.

StorageStructure for Genomes

We need a dedicated structure for storing genomes.
This structure should be storage efficient and easy to use for mutation and breeding.

The current concept is to use a list/set of Genes, while these Genes are unique, stored in a seperate repository and referenced via id. (Flyweight pattern)
Genes basically are graph edges; they contain a head, a tail and an innovation value. Additional Gene information that is needed in the Genome is stored there, in combination with the gene id.

strangedev / neat_pygenetics Goto Github PK

neat_pygenetics's Introduction

neat_pygenetics's People

Contributors

Stargazers

Watchers

neat_pygenetics's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs