GithubHelp home page GithubHelp logo

Loss function in optimizer.py about gae HOT 24 OPEN

zzheyu avatar zzheyu commented on June 6, 2024
Loss function in optimizer.py

from gae.

Comments (24)

tkipf avatar tkipf commented on June 6, 2024 2

I think you are right, this makes the KL term indeed quite small in comparison. The model can still be seen as a beta-VAE with a very small beta parameter. I’ll have to check what scale these terms have in practice when runnning the model to see if there’s an issue.

Overall, a Gaussian prior is not a good choice in any case in combination with a dot-product decoder, as mentioned in our follow-up paper: https://nicola-decao.github.io/s-vae.html

I would recommend to run the GAE model in the non-probabilistic variant or use a hyperspherical posterior/prior.

from gae.

zzheyu avatar zzheyu commented on June 6, 2024 2

Hi @tkipf,

Thank you for your suggestion, I have tried on various thresholds. But it seems the reconstructed adjacency matrix is still very noisy even when the threshold is above 0.9.

However, I think the problem is not with the threshold. The performance on validation and test sets in the paper are very good due to these sets being very balanced (half edges and half non-edges). To justify this, I have increased the number of non-edges in the validation and the test sets (5 times, 30 times and 100 times of the number of edges), to simulate the sparsity of the graph. And the average precision scores dropped significantly as the number of non-edges increased. Also I have evaluated F1 score alongside on val/test sets, and they are usually quite low.

Below are the code for the reconstruction. They are identical to those when you have used to evaluate validation and test performance.

feed_dict.update({placeholders['dropout']: 0})
pred = sess.run(model.reconstructions, feed_dict=feed_dict)
A_recon_prob_vec = 1 / (1 + np.exp(-pred))
A_recon_prob = A_recon_prob_vec.reshape(num_nodes,-1)
A_recon_prob = A_recon_prob - np.diag(np.diag(A_recon_prob))

And here is the modification to sample more non-edges in val and test sets.

# Original
val_edges_false = []
while len(val_edges_false) < len(val_edges):
# Modified
val_edges_false = []
while len(val_edges_false) < non_edges * len(val_edges):
# non_edges is an integer indicating the multiple

Thanks

from gae.

tkipf avatar tkipf commented on June 6, 2024 1

from gae.

tkipf avatar tkipf commented on June 6, 2024

This is because the cross-entropy loss applies to N^2 terms, i.e. all potential edges, while the KL term only applies to N terms, i.e. all nodes. This normalization makes sure they have a comparable scale.

from gae.

zzheyu avatar zzheyu commented on June 6, 2024

Thank you.

But why would you divide the KL term by N^2 when it is the sum of N terms? Would it make more sense to divide it by N (so that we have the average of the sum, like in the cross-entropy term)?

from gae.

zzheyu avatar zzheyu commented on June 6, 2024

Thanks for the suggestions:)

from gae.

YH-learning avatar YH-learning commented on June 6, 2024

Hi @tkipf,
Thanks for your shares. When running the code with the big beta parameter.(i.e., kl_loss + ent_loss), the performance is not as good as that using kl_loss / num_nodes + ent_loss, I am very confused about this.

Could you explain this more or Have you solved this issue?

Thanks again for your sharing.

from gae.

zzheyu avatar zzheyu commented on June 6, 2024

Hi @tkipf,

Have you tried to output the reconstructed adjacency matrix based on the training data? I have found the reconstruction deviated a lot from the original matrix (for the cora graph and the citeseer graph), the number of edges in the reconstructed matrix is 300 times more than the original adj matrix (cora graph).

Of course this could be a mistake, please do correct me if this is not the case.

Thanks

from gae.

tkipf avatar tkipf commented on June 6, 2024

from gae.

dawnranger avatar dawnranger commented on June 6, 2024

Can you paste the code you used to reconstruct the adjacency matrix?

I'm also confused about this. The results of the reconstructed matrix look like this:

Epoch: 0010 TP=0013246 FN=0000018 FP=4508390 TN=2811610 Precision=0.0029 Recall=0.9986
Epoch: 0020 TP=0013238 FN=0000026 FP=3539226 TN=3780774 Precision=0.0037 Recall=0.9980
Epoch: 0030 TP=0013248 FN=0000016 FP=3282812 TN=4037188 Precision=0.0040 Recall=0.9988
Epoch: 0040 TP=0013254 FN=0000010 FP=3176094 TN=4143906 Precision=0.0042 Recall=0.9992
Epoch: 0050 TP=0013256 FN=0000008 FP=3168532 TN=4151468 Precision=0.0042 Recall=0.9994
Epoch: 0060 TP=0013258 FN=0000006 FP=3138802 TN=4181198 Precision=0.0042 Recall=0.9995
Epoch: 0070 TP=0013264 FN=0000000 FP=3110030 TN=4209970 Precision=0.0042 Recall=1.0000
Epoch: 0080 TP=0013264 FN=0000000 FP=3082102 TN=4237898 Precision=0.0043 Recall=1.0000
Epoch: 0090 TP=0013264 FN=0000000 FP=3063600 TN=4256400 Precision=0.0043 Recall=1.0000
Epoch: 0100 TP=0013264 FN=0000000 FP=3061570 TN=4258430 Precision=0.0043 Recall=1.0000
Epoch: 0110 TP=0013264 FN=0000000 FP=3065990 TN=4254010 Precision=0.0043 Recall=1.0000
Epoch: 0120 TP=0013264 FN=0000000 FP=3069514 TN=4250486 Precision=0.0043 Recall=1.0000
Epoch: 0130 TP=0013264 FN=0000000 FP=3075558 TN=4244442 Precision=0.0043 Recall=1.0000
Epoch: 0140 TP=0013264 FN=0000000 FP=3084226 TN=4235774 Precision=0.0043 Recall=1.0000
Epoch: 0150 TP=0013264 FN=0000000 FP=3092052 TN=4227948 Precision=0.0043 Recall=1.0000
Epoch: 0160 TP=0013264 FN=0000000 FP=3097308 TN=4222692 Precision=0.0043 Recall=1.0000
Epoch: 0170 TP=0013264 FN=0000000 FP=3100544 TN=4219456 Precision=0.0043 Recall=1.0000
Epoch: 0180 TP=0013264 FN=0000000 FP=3101718 TN=4218282 Precision=0.0043 Recall=1.0000
Epoch: 0190 TP=0013264 FN=0000000 FP=3103924 TN=4216076 Precision=0.0043 Recall=1.0000
Epoch: 0200 TP=0013264 FN=0000000 FP=3106388 TN=4213612 Precision=0.0043 Recall=1.0000

The codes are:

    preds = tf.cast(tf.greater_equal(tf.sigmoid(adj_preds), 0.5), tf.int32)
    labels = tf.cast(adj_out, tf.int32)
    self.accuracy = tf.reduce_mean(tf.cast(tf.equal(preds, labels),tf.float32))
    self.TP = tf.count_nonzero(preds * labels)
    self.FP = tf.count_nonzero(preds * (labels - 1))
    self.FN = tf.count_nonzero((preds - 1) * labels)
    self.TN = tf.count_nonzero((preds - 1) * (labels - 1))
    self.precision = self.TP / (self.TP + self.FP) 
    self.recall = self.TP / (self.TP + self.FN) 

It seems that the inner product module tend to reconstruct far more edges than expected. The optimization procedure is reducing FP and increasing TN, but nothing to do with TP.

from gae.

tkipf avatar tkipf commented on June 6, 2024

from gae.

dawnranger avatar dawnranger commented on June 6, 2024

In the case of adjacency matrix reconstruction, I think the training set is always unbalanced. For example, in cora dataset, the size of adjacency matrix is 2708*2708=7333264, but there are only 5429 edges exists. The positive-negative rate is 5429*2:(2708*2708-5429*2)=1:674.37, which is the pos_weight in your code, that's why you are using tf.weighted_cross_entropy_with_logits rather than tf.sigmoid_cross_entropy_with_logits.

Maybe your training set is unbalanced? It looks like this is not necessarily a problem with this code release...

from gae.

tkipf avatar tkipf commented on June 6, 2024

from gae.

tkipf avatar tkipf commented on June 6, 2024

from gae.

zzheyu avatar zzheyu commented on June 6, 2024

Thanks for clarifying and your further suggestions.

from gae.

jlevy44 avatar jlevy44 commented on June 6, 2024

I've been exploring a few real world social networks using GAE/VGAE and am running into this very problem. Is there a consensus on a solution? Number of positive edges are far less than the negative edges. I've just been adjusting my threshold, which has helped some, but ideally, I at least should be able to overfit on the adjacency itself and recover the original adjacency. This is not so.

I have a 33 node network and a 71 node network.

from gae.

tkipf avatar tkipf commented on June 6, 2024

from gae.

jlevy44 avatar jlevy44 commented on June 6, 2024

Is there literature to support the use of this?

from gae.

tkipf avatar tkipf commented on June 6, 2024

from gae.

XuanHeIIIS avatar XuanHeIIIS commented on June 6, 2024

Hi @tkipf
Thanks for your nice work!
I noticed that most of the GCNs were trained based on optimizing the structure (X) reconstruction error. So did you tried to train GCN by minimizing the feature (A) reconstruction error ? If you tried, can you share more details about it?
Thanks!

from gae.

tkipf avatar tkipf commented on June 6, 2024

from gae.

XuanHeIIIS avatar XuanHeIIIS commented on June 6, 2024

Can you share some examples which reconstruct the node features? I am a little confused about the reconstruction process. Thanks!

from gae.

tkipf avatar tkipf commented on June 6, 2024

from gae.

Yumlembam avatar Yumlembam commented on June 6, 2024

exp

what should I used for the activation z*z^T if I use exp(-d(x,y)) as the lost @tkipf .

from gae.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.