GithubHelp home page GithubHelp logo

abojchevski / graph2gauss Goto Github PK

View Code? Open in Web Editor NEW
170.0 170.0 39.0 17.97 MB

Gaussian node embeddings. Implementation of "Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking".

Home Page: https://www.kdd.in.tum.de/g2g

License: MIT License

Jupyter Notebook 13.49% Python 86.51%
embeddings graphs tensorflow

graph2gauss's People

Contributors

abojchevski avatar alanjohnvarghese avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

graph2gauss's Issues

Possible error in sample_last_hop() function

Thank you for open sourcing the project, and for adding a lot of comments in it!

I think there is a logical error in line number 397 in utils.py : nnz = A[nnz, new_sample].nonzero()[1]. The issue is .nonzero() gives the non-zero indices with respect to the sub-matrix (A[nnz, new_sample]) and not with respect to the original matrix A.

This results in generating incorrect triplets. For example, for the following simple graph:

SimpleGraph

the triplet generated (with seed = 0) is:
0,1,3
1,0,2
2,3,4
3,2,0
4,3,2
3,2,4
3,0,4

Here the first column is the reference node, and the other columns are such that shortest_path(col0,col1) < shortest_path(col0, col2). The last triplet is incorrect because Node 4 is closer than Node 0 for Node 3. Changing line number 397 to: nnz = A[nodes, sampled].nonzero()[1] seems to fix the error.

pubmed dataset loading data error

I am running the demo code just as example.ipynb,
I changed the dataset to pubmed as:

g = load_dataset('data/pubmed.npz')
When I initialize the g2g class
g2g = Graph2Gauss(A=A, X=X, L=128, verbose=True)
the error occurs.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-6c49a6402f64> in <module>()
----> 1 g2g = Graph2Gauss(A=A, X=X, L=128, verbose=True)

/mnt/hp_raid/john-works/weibodata/graph2gauss/g2g/model.py in __init__(self, A, X, L, K, p_val, p_test, n_hidden, max_iter, tolerance, scale, seed, verbose)
     61         train_ones, val_ones, val_zeros, test_ones, test_zeros = train_val_test_split_adjacency(
     62             A=A, p_val=p_val, p_test=p_test, seed=seed, neg_mul=1, every_node=True, connected=False,
---> 63             undirected=(A != A.T).nnz == 0)
     64
     65         # pre-compute the hops for each node for more efficient sampling

/mnt/hp_raid/john-works/weibodata/graph2gauss/g2g/utils.py in train_val_test_split_adjacency(A, p_val, p_test, seed, neg_mul, every_node, connected, undirected)
    120             hold_edges_d1 = np.column_stack((not_in_cover[d_nic > 0],
    121                                              np.row_stack(map(np.random.choice,
--> 122                                                               A[not_in_cover[d_nic > 0]].tolil().rows))))
    123
    124             if np.any(d_nic == 0):

/home/aduman/anaconda3/lib/python3.6/site-packages/numpy/core/shape_base.py in vstack(tup)
    232
    233     """
--> 234     return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
    235
    236 def hstack(tup):

ValueError: need at least one array to concatenate


tensorflow version: 1.4.0,
numpy version: 1.14.2,
scipy version: 1.0.0

Is there a implementation about inductive link prediction?

There seems to be only codes about transductive learning of seen nodes? plz correct me if I'm wrong.
I guess I can add a function with new unseen node as placeholder input to replace the self.X in the calculation of self.mu and self.sigma, though, I'd be grateful if you can share your implementation about this inductive link prediction part.

X matrix

Hi ! can you describe the process to create the X matrix that is contained in the different npz files please ? How do you transform the cora papers into features.
Thank you !

Hello, I have a question.

I'm trying to use this model with my own data

in model.py,

// additionally add any dangling nodes to the hidden ones since we can't learn from them
nodes_dangling = np.where(A_hidden.sum(0).A1 + A_hidden.sum(1).A1 == 0)[0]
if len(nodes_dangling) > 0:
nodes_hide = np.concatenate((nodes_hide, nodes_dangling))

this code concatenate two arrays,
nodes_hide and nodes_dangling

nodes_hide is a set of randomly selected nodes from all nodes,
nodes_dangling is a set of dangling nodes,
and there may be duplicate nodes in these two sets.

therefore, it is thought that it is necessary to find the unique union of two sets rather than simply concatenate.
such as,

nodes_hide = np.concatenate( nodes_hide, np.setdiff1d( nodes_dangling, nodes_hide ) )

S = A + (B - A & B)

Is that right?

something about the parameters (mu and sigma ) of Gaussian distribution

Hi ~
I have read your paper "DEEP GAUSSIAN EMBEDDING OF GRAPHS: UNSUPERVISED INDUCTIVE LEARNING VIA RANKING". To capture the uncertainty, the Gaussian distribution is proposed. Here I have some question about the parameters computation in your code "./g2g/model.py"

"

    .....
    W_mu = tf.get_variable(name='W_mu', shape=[sizes[-1], self.L], dtype=tf.float32, initializer=w_init())
    b_mu = tf.get_variable(name='b_mu', shape=[self.L], dtype=tf.float32, initializer=w_init())
    self.mu = tf.matmul(encoded, W_mu) + b_mu
    W_sigma = tf.get_variable(name='W_sigma', shape=[sizes[-1], self.L], dtype=tf.float32, initializer=w_init())
    b_sigma = tf.get_variable(name='b_sigma', shape=[self.L], dtype=tf.float32, initializer=w_init())
    log_sigma = tf.matmul(encoded, W_sigma) + b_sigma
    self.sigma = tf.nn.elu(log_sigma) + 1 + 1e-14
    .....

"

  1. The first is about the parameters 'mu'. In your code the "mu" was calculated by adding a external layer in the neural network. Is that right? If it is true, would you mind providing me some reference about this process. I am just wandering why the 'mu' can be calculated like this.

  2. The second is about the parameters 'sigma'. It was calculated just like the 'mu' as mentioned above. The 'sigma' should be a Covariance matrix with "self.L \times self.L" dimension, but in the code above, it was a vector with dimension "self.L"

I am beginner about the network representation, if there is some misunderstanding about your work, please figure is out . I would appreciate it. Thanks a lot!

HELP!!!!

I am training the node classification on my dataset.

The shape of A is [14000,14000]. X is [14000,240].

I set L=64, p_val=0.0, p_test=0.0.

BUT the loss keep nan during the training process.

Have tried changed the learning rate from 1 to 1e-8.

Plz give some suggestions or ideas.

orz

Bug with early stopping on loss

Thanks for open sourcing your code!
This line:

early_stopping_score_max = -1.0

is problematic for some datasets when training with p_val = 0 and p_test = 0 (as in the case of node classification), because the loss might start at values much higher than 1 and it might take more than 100 epochs to reach this value. This means that self.__save_vars() is never called, so then an exception is raised here:

self.__restore_vars(sess)

because the attribute self.saved_vars does not exist. I worked around this by changing the first line above to

early_stopping_score_max = -float('inf')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.