GithubHelp home page GithubHelp logo

neat_pygenetics's Introduction

neat_pygenetics's People

Contributors

l4v13 avatar strangedev avatar ypj95 avatar

Stargazers

 avatar Ethan Weibman avatar  avatar Hannes Leutloff avatar

Watchers

 avatar  avatar  avatar

neat_pygenetics's Issues

SimulationStructure for Genomes

We need a dedicated genome structure for simulations.
The structure should be fast to use, while storage efficiency is the lesser priority.

This is a preliminary design:
simulationgenome

Naming Consistency

We should decide on some naming conventions.

Endpoints of Nodes:
source - target vs head - tail

tbd

Genome Selector

The Genome selector selects two genomes from the Genome-Repository for breeding.
The selection is based on cluster and other stuff.
The Genome Selector will be a main part of a continuous evolutionary process.

Database

We are currently experimenting with MongoDB.
MongoDB stores objects in JSON representation.

In the module DatabaseConnector we have custom JSONEn- and Decoders, which are used to encode StorageGenomes and their AnalysisResults into JSON and decode them back.

I want to build the repositories on top of that and then test performance with many (100,000+) objects.

Addition of types for graph nodes

In order to solve more complex problems, different node types should be added.
Different types should be able to specify their own transport functions, minimum and maximum number of in-/outputs.

Simulator

The simulator uses an arbitrary simulation (in strangedev's case his prg-1 python project, I want to use 2048) to receive inputs for a genome and to test the outputs of the genome.
The simulation then has to prvide a fitness for the tested genome.

For this we need a really abstract structure.
A Simulator should have this interface:

  • run(genome) // genome in SimulationStructure
    where run returns the fitness of the genome.

Very abstract ;D

Cluster Repository

Since we need to store a bit of information about clusters, namely

  • shared fitness
  • maximum population size
  • possibly more
    we need to store them in the database.
    Therefore, we need a repository for them.

AnalysisStructure for Genomes

We need a dedicated structure for the analysis of genomes.
The structure should be lightweight, but easily traversable and easy to check for cycles etc.
The currently best idea for this is an adjacency list.

Genome Analyzer

The Genome Analyzer performs different tasks on Genomes in the AnalysisStructure.
The currently planned analysis tasks are:

  • test for coherency
    • finding sources and sinks in the hidden layer (and removing them)
  • finding cycles (and marking them)
  • marking Genes as disables

The Genome analyzer should return a diff object on request. This diff object should contain information about what in the graph has to change according to the given rules. If there are source or sinks found, for example, they should be contained in the diff object as "to be removed", so that they can be removed from the StorageStructure afterwards.

Gene Repository

The Gene Repository collects and distributes Genes.
How these are stored exacty is not yet decided, but it should be persistent.

Genes should be accessible via getGene(geneID) or something like that. Also the repository should provide the possibility to search for Genes starting or ending in a specific node (for reusage).

I recommend sqlite for testing and prototyping, then switching to MySQL later on.
I haven't toyed with nosql databases yet - maybe they are a viable in this case.

Thoughts and opinions?

MainDirector Class

The MainDirector class is called from main and does everything in an automated run.
It creates databases as necessary and does everything the program flow contains.
It currently contains pseudo code like stuff detailing, how the program should work. See #10
It outsources a lot of its work: there are plans for objects that encapsulate selection or decision making.

By no means perfect, currently a proposal for how things should work, once everything is implemented.

Licensing

Because I like to think that this is going to be a massive success, I am thinking a bit about licensing.
Something open, of course, with free access and usage in my opinion.
But i feel the need to expand whatever license we use (and i don't know many) with a clause protecting this project from any military use. Which is a potential threat with ai.

Genome Repository

The Genome Repository is basically the container for the current population. Old Genomes should not be thrown away, but marked as old and out of use.

Implementation of distributed simulation

NEAT should be able to outsource the calculation of graph outputs over the network.
This would allow for the integration of other programs which use graph structures for their calculations. (NEAT would then only perform graph optimization)

Also, NEAT should be able to distribute calculation node over the network, when output calculation isn't outsourced.

Web Frontend

I suggest building a web frontend. Mainly for statistics and status tracking, but maybe also as a control panel.
Python can do lots for that. Especially for statistics, things like matplotlib and scipy can go a long way for us.
Longterm, as it is only useful, once the application is done/usable.

AnalysisResult Refactor

The AnalysisResult currently contains an adjacency list of cycle closing edges and an adjacency list of the rest of the edges.
This will be changed into a single dictionary mapping the gene_ids of the StorageGenome to true, if they close a circle and to false, if they don't.
This way it is necessary for creating SimulationGenomes to use a GeneRepository, but it does not depend on the AnalysisGenome anymore.

Also the set of disabled nodes is to be removed, since genes are not disabled during the analysis.

Collection of Exceptions

We should have a module dedicated to several necessary Exceptions.
Things like SuccessorAlreadyExistsInNode etc.
These Exceptions should be configurable and verbose e.g. should be parameterized with information about the incident.

Genome class / Breeding and Innovation numbers

Problem:

When following the program flow, there has to be a main class which ties all three graph representations together, in order to be able to access all the required methods and information at the right time.

Analysis of the situation:

For most steps, like clustering, breeding, selecting and mutating, the graph has to be examined in order to pick a spot for mutations to happen, compare two genomes, etc.
The AnalysisStructure (AnalysisGenome) is the most appropriate class for these operations, because of the intuitive graph representation (adjacency list).

The simulation structure is only needed for a single step, which is computing the output of the NN.
The storage structure is also only needed when storing the genomes in persistent storage.
(The storage structure could potentially handle innovation numbers / ids, but it shouldn't)

Original proposal:

The Simulation structure should be the main structure tying together all three different data structures.

Objection:

However, the SimulationStructure (SimulationGenome), does not store information on genes (edges) but on neurons (nodes), which makes the analysis structure unfit for performing breeding/mutation/analysis. This is not a design flaw, because simulation structure ought to be a specialized class, only carrying the information it needs.
The simulation structure is therefore unfit for the purpose of being our main data structure and should only be used for computation of the NN.

New proposal:

The analysis structure could potentially carry all of the needed information, but should remain a specialized class and not take the role of the main, combining class for the three structures for design purposes.
Instead, we should wrap the three structures into a Genome class, which will perform graph operations using the analysis structure and create the simulation- and storage structure lazily as needed, because either of them will either lose information or computational fitness once converted.

This will also clear out responsibility issues and make the program design more structured.

Session management for NEATClient

At the moment, no authentication is required for comunication between server and client.
An appropriate method for authentication and session management should be added.

Genome Clustering

We want to cluster our population into species to encourage inbreeding.
Inbreeding is used to prevent "good ideas" from being killed befor they can develop. If a change in a genome, that will eventually lead to greatness, initially decreases the fitness, it would be discarded.
Because of that, we take similar genomes and cluster them, leaving the genomes in clusters alive, even if their fitness is low. Cluster with a low overall fitness produce less offspring, so we can focus our resources on better clusters, but they continue to exists, so they may improve their "ideas".

Therefore, we need a way to cluster our genomes. After breeding for a while inside these clusters, we breed two whole clusters together, thus (probably) creating a new, third, cluster. Afterwards, we recluster everything and keep on inbreeding.

StorageStructure for Genomes

We need a dedicated structure for storing genomes.
This structure should be storage efficient and easy to use for mutation and breeding.

The current concept is to use a list/set of Genes, while these Genes are unique, stored in a seperate repository and referenced via id. (Flyweight pattern)
Genes basically are graph edges; they contain a head, a tail and an innovation value. Additional Gene information that is needed in the Genome is stored there, in combination with the gene id.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.