abojchevski / graph2gauss Goto Github PK

Gaussian node embeddings. Implementation of "Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking".

Home Page: https://www.kdd.in.tum.de/g2g

License: MIT License

Jupyter Notebook 13.49% Python 86.51%

embeddings graphs tensorflow

graph2gauss's People

Contributors

Stargazers

Watchers

graph2gauss's Issues

Possible error in sample_last_hop() function

Thank you for open sourcing the project, and for adding a lot of comments in it!

I think there is a logical error in line number 397 in utils.py : nnz = A[nnz, new_sample].nonzero()[1]. The issue is .nonzero() gives the non-zero indices with respect to the sub-matrix (A[nnz, new_sample]) and not with respect to the original matrix A.

This results in generating incorrect triplets. For example, for the following simple graph:

the triplet generated (with seed = 0) is:
0,1,3
1,0,2
2,3,4
3,2,0
4,3,2
3,2,4
3,0,4

Here the first column is the reference node, and the other columns are such that shortest_path(col0,col1) < shortest_path(col0, col2). The last triplet is incorrect because Node 4 is closer than Node 0 for Node 3. Changing line number 397 to: nnz = A[nodes, sampled].nonzero()[1] seems to fix the error.

pubmed dataset loading data error

I am running the demo code just as example.ipynb,
I changed the dataset to pubmed as:

g = load_dataset('data/pubmed.npz')
When I initialize the g2g class
g2g = Graph2Gauss(A=A, X=X, L=128, verbose=True)
the error occurs.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-6c49a6402f64> in <module>()
----> 1 g2g = Graph2Gauss(A=A, X=X, L=128, verbose=True)

/mnt/hp_raid/john-works/weibodata/graph2gauss/g2g/model.py in __init__(self, A, X, L, K, p_val, p_test, n_hidden, max_iter, tolerance, scale, seed, verbose)
     61         train_ones, val_ones, val_zeros, test_ones, test_zeros = train_val_test_split_adjacency(
     62             A=A, p_val=p_val, p_test=p_test, seed=seed, neg_mul=1, every_node=True, connected=False,
---> 63             undirected=(A != A.T).nnz == 0)
     64
     65         # pre-compute the hops for each node for more efficient sampling

/mnt/hp_raid/john-works/weibodata/graph2gauss/g2g/utils.py in train_val_test_split_adjacency(A, p_val, p_test, seed, neg_mul, every_node, connected, undirected)
    120             hold_edges_d1 = np.column_stack((not_in_cover[d_nic > 0],
    121                                              np.row_stack(map(np.random.choice,
--> 122                                                               A[not_in_cover[d_nic > 0]].tolil().rows))))
    123
    124             if np.any(d_nic == 0):

/home/aduman/anaconda3/lib/python3.6/site-packages/numpy/core/shape_base.py in vstack(tup)
    232
    233     """
--> 234     return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
    235
    236 def hstack(tup):

ValueError: need at least one array to concatenate

tensorflow version: 1.4.0,
numpy version: 1.14.2,
scipy version: 1.0.0

Is there a implementation about inductive link prediction?

There seems to be only codes about transductive learning of seen nodes? plz correct me if I'm wrong.
I guess I can add a function with new unseen node as placeholder input to replace the self.X in the calculation of self.mu and self.sigma, though, I'd be grateful if you can share your implementation about this inductive link prediction part.

X matrix

Hi ! can you describe the process to create the X matrix that is contained in the different npz files please ? How do you transform the cora papers into features.
Thank you !

Hello, I have a question.

I'm trying to use this model with my own data

in model.py,

// additionally add any dangling nodes to the hidden ones since we can't learn from them
nodes_dangling = np.where(A_hidden.sum(0).A1 + A_hidden.sum(1).A1 == 0)[0]
if len(nodes_dangling) > 0:
nodes_hide = np.concatenate((nodes_hide, nodes_dangling))

this code concatenate two arrays,
nodes_hide and nodes_dangling

nodes_hide is a set of randomly selected nodes from all nodes,
nodes_dangling is a set of dangling nodes,
and there may be duplicate nodes in these two sets.

therefore, it is thought that it is necessary to find the unique union of two sets rather than simply concatenate.
such as,

nodes_hide = np.concatenate( nodes_hide, np.setdiff1d( nodes_dangling, nodes_hide ) )

S = A + (B - A & B)

Is that right?

About the dataset

Hi, could you please show me how to get the dataset, thanks.

Feature dimension of citeseer does not match that in paper

I found that the dimension of the citeseer dataset here (602) does not match that in the paper (2701). Is there any inconsistency here?

something about the parameters (mu and sigma ) of Gaussian distribution

Hi ~
I have read your paper "DEEP GAUSSIAN EMBEDDING OF GRAPHS: UNSUPERVISED INDUCTIVE LEARNING VIA RANKING". To capture the uncertainty, the Gaussian distribution is proposed. Here I have some question about the parameters computation in your code "./g2g/model.py"

    .....
    W_mu = tf.get_variable(name='W_mu', shape=[sizes[-1], self.L], dtype=tf.float32, initializer=w_init())
    b_mu = tf.get_variable(name='b_mu', shape=[self.L], dtype=tf.float32, initializer=w_init())
    self.mu = tf.matmul(encoded, W_mu) + b_mu
    W_sigma = tf.get_variable(name='W_sigma', shape=[sizes[-1], self.L], dtype=tf.float32, initializer=w_init())
    b_sigma = tf.get_variable(name='b_sigma', shape=[self.L], dtype=tf.float32, initializer=w_init())
    log_sigma = tf.matmul(encoded, W_sigma) + b_sigma
    self.sigma = tf.nn.elu(log_sigma) + 1 + 1e-14
    .....

The first is about the parameters 'mu'. In your code the "mu" was calculated by adding a external layer in the neural network. Is that right? If it is true, would you mind providing me some reference about this process. I am just wandering why the 'mu' can be calculated like this.
The second is about the parameters 'sigma'. It was calculated just like the 'mu' as mentioned above. The 'sigma' should be a Covariance matrix with "self.L \times self.L" dimension, but in the code above, it was a vector with dimension "self.L"

I am beginner about the network representation, if there is some misunderstanding about your work, please figure is out . I would appreciate it. Thanks a lot!

HELP!!!!

I am training the node classification on my dataset.

The shape of A is [14000,14000]. X is [14000,240].

I set L=64, p_val=0.0, p_test=0.0.

BUT the loss keep nan during the training process.

Have tried changed the learning rate from 1 to 1e-8.

Plz give some suggestions or ideas.

orz

Bug with early stopping on loss

Thanks for open sourcing your code!
This line:

graph2gauss/g2g/model.py

Line 262 in 2dbad87

early_stopping_score_max = -1.0

is problematic for some datasets when training with p_val = 0 and p_test = 0 (as in the case of node classification), because the loss might start at values much higher than 1 and it might take more than 100 epochs to reach this value. This means that self.__save_vars() is never called, so then an exception is raised here:

graph2gauss/g2g/model.py

Line 299 in 2dbad87

self.__restore_vars(sess)

because the attribute self.saved_vars does not exist. I worked around this by changing the first line above to

early_stopping_score_max = -float('inf')

How did you preprocess datasets?

Here is the original PubMed dataset: https://linqs-data.soe.ucsc.edu/public/Pubmed-Diabetes.tgz
How did you preprocess this dataset into npz form in this repository? Does the feature file of each dataset in this repository normalized yet?

abojchevski / graph2gauss Goto Github PK

graph2gauss's People

Contributors

Stargazers

Watchers

Forkers

graph2gauss's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs