GithubHelp home page GithubHelp logo

What is the reason for the restriction on introducing new edges (source node must be in same or lower layer)? about brain-tokyo-workshop HOT 5 CLOSED

google avatar google commented on July 17, 2024
What is the reason for the restriction on introducing new edges (source node must be in same or lower layer)?

from brain-tokyo-workshop.

Comments (5)

agaier avatar agaier commented on July 17, 2024

I'll be damned you're right. In a network like this

       O-O-O-O-O
      /         \
(in) O           O (output)
      \         /
       O-O-O-O-O

Any of those intermediate nodes should be able to connect to each other --- there is not layer assignment that will make that true with the rule that you can only connect to same or later.

The answer to your question is that it was implemented in the way that it was to reduce complexity. My original implementation added an edge, than did a topological sort to make sure there were no cycles. This is inefficient, but was not a problem in the RL tasks as the networks were never really that big and evaluating the network dominated the cost of the whole algorithm by far. Once it came to MNIST though the situation was reversed, evaluation took almost no time at all and the networks were very big and the most computation was being done on topological sorting. Looks like I missed this case.

Thanks for point it out, I will definitely revisit this. Any clever ideas to test for recurrency with low complexity? Topological sort does more than we need for this.

from brain-tokyo-workshop.

plonerma avatar plonerma commented on July 17, 2024

Edit: Sorry for the wall of text, the second comment might provide a feasible solution, though I am not sure if it is still to complex - it might be worth testing if the benefit for increasing the number pf possible edges is worth the time spent on mutation (esp. in regard to classification tasks).

That's a perfect example where the restriction might be quite drastic!

My original implementation added an edge, than did a topological sort to make sure there were no cycles.

That makes sense! I assume that's why you are explicitly checking for cycles (now, it would probably make sense to throw an exception there since it is not something that should occur during normal usage).

I am trying to figure out a way right now. Your approach has the great advantage that once you picked two nodes, it is guaranteed to not produce cycles.

The only other possibility I came up with is to randomly select two nodes (source node can be selected from input and hidden nodes; destination node from hidden and output nodes) and then do a case differentiation:

  1. ANY of the two nodes is NOT a hidden node => there is no issue, introduce the edge.
  2. There is a indirect directed path from the src node to the dest node (directed path with at least one intermediate node) => not an issue, introduce the edge.
  3. There is a indirect directed path from the destination node to the source node.
  4. There is already an direct edge between the two nodes => either way, we can not introduce a new edge here.

The 3. case could just be resolved by switching the src and dest node (so we arrive at case 2). Of course, this would change the possibilities a bit: Not all possible edges are equally probable, instead short-cuts (edges where a directed path already exists) will be twice as likely (as they will be introduced in either case 2 or 3). I'm not sure if that's such a big issue or if there is way to mitigate the issue.

I not sure what's the best way to handle the 4. case. There is not really much we can do here. Some options include:

  • Just redraw the random numbers (this might lead to long loops; if no edge is possible, we need to make sure we don't get stuck)
  • Just don't do anything - mutation had its chance. Alternatively, allow redrawing the numbers a finite number of times.
  • Restrict destination nodes to those where no direct connection exists. This would also mess with he probabilities though - if a node has connections to all other nodes but one, then the probability of an edge between these two nodes being introduced is much higher than any other. This probably not something we want.

I don't really see a way of making every possible edge equally probable without figuring out which edges are in fact possible (and thats obv. too complex)...

For figuring out if there is a directed path, I suggest using breadth first search using the connectivity matrix. So, something along the lines of this (draw random indices for the nodes in regard to the weight matrix not the node ids in the genes and the later do a lookup):

def path_exists(src, dest):
    if src > dest:  # no path possible (otherwise ordering would have been different)
        return False

    visited = np.zeros(number_of_nodes, dtype=bool)
    visited[src] = True

    connectivity_matrix = weight_matrix != 0  # if edge is nan / disabled there is also a path between the two in the same direction

    n_visited = 0
    while n_visited < np.sum(visited):  # loop as long as we a reaching new nodes
        n_visited = np.sum(visited)  # np.sum counts the number of True values
        visited = np.any(conn_mat[visited], axis=1)
        if visited[dest]:  # dest can be reached from src
            return True

    return False  # no more nodes can be reached from src & dest wasn't one of them

Obviously this is still quite complex, but might be a little bit faster than a complete topological sorting. I don't see another way to ensure all non-recurrent edges are possible...

from brain-tokyo-workshop.

plonerma avatar plonerma commented on July 17, 2024

Maybe a suitable alternative would be to build up a reachability matrix:

connectivity_matrix = weight_matrix != 0
np.fill_diagonal(connectivity_matrix, True)
reachability_matrix = np.copy(connectivity_matrix)

n_paths = 0
while n_paths < np.sum(reachability_matrix):
    n_paths = np.sum(reachability_matrix)
    reachability_matrix = np.dot(connectivity_matrix, reachability_matrix)

# edges are possible where src can not be reached from dest and there is no direct connection from src to dest (* can be used as logical and)
possible_edges =  (reachability_matrix == False).T * (connectivity_matrix == False)

# Disallow edges from output and to inputs (and bias)
possible_edges[n_inputs + 1:, :] = False
possible_edges[:, :-n_out] = False

Using np.where and this matrix we get all valid choices

i = np.random.randint(np.sum(possible_edges))
possible_edges = np.where(possible_edges)
src, dest = possible_edges[0][i],  possible_edges[1][i]

Ps. This approach would also allow for more sophisticated edge selection strategies (eg. increasing the probability of edges to and from nodes that have a small degree and decreasing the probability for edges to and from nodes that already have many connections).

from brain-tokyo-workshop.

agaier avatar agaier commented on July 17, 2024

I agree that using a reachability matrix is a good idea --- the complexity of doing this on MNIST by checking if the graph had any cycles was not only that a large graph has to be sorted, but that there are a lot of possible edges to check (and to find there are no valid edges you have to check every one...). Precomputing a number of valid edges is a nicer solution, especially if we can find a way to update it rather than start over one the edge is added.

I haven't looked at it too closely, but are you sure the reachability matrix can be compute this elegantly? A quick search shows methods the seem much more involved.

Ps. This approach would also allow for more sophisticated edge selection strategies (eg. increasing the probability of edges to and from nodes that have a small degree and decreasing the probability for edges to and from nodes that already have many connections).

This is a nice idea, especially for WANNs where having a ton of connections to the same node with random weights can easily saturate an activation function.

from brain-tokyo-workshop.

plonerma avatar plonerma commented on July 17, 2024

I think caching is a good idea, changing the activation function (obviously) and re-enabling an edge does not change the reachability matrix at all, since re-enabling just makes the path shorter (but it had to exist before anyway). Adding a node does not change it either - we just need to add a row and column: the row can be copied from the old destination (for now all paths will go through that node), and the column can be copied from the old source (same reason essentially).

Adding an edge should not be to complicated. Though it might require a larger update, we don't need to have an outer loop: All nodes that reach the source now also reach all nodes the destination reaches -> this can also be calculated with np.dot.

I haven't looked at it too closely, but are you sure the reachability matrix can be compute this elegantly? A quick search shows methods the seem much more involved.

I haven't tested the implementation yet, but I was aiming for a Breadth-first search (which is also mentioned in the Wikipedia article you referenced). The reason why it is so simple is that np.dot does most of the work and no preprocessing needs to be done beforehand.

I am assuming that edges are only disabled when they are replaced by a new node (0 -> 1 is replaced by 0 -> 2 -> 1). If edges are removed from the network without replacement, it will be hard to figure out which reachability-values relied on the edge without either saving a lot more information or recomputing the matrix from scratch.

from brain-tokyo-workshop.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.