I saw that in your model.py code, for the one-hot initializer you set the input featur

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

about random and one-hot initialization about gnn-positional-structural-node-features HOT 5 CLOSED

xxmen commented on June 21, 2024

about random and one-hot initialization

from gnn-positional-structural-node-features.

Comments (5)

zjzijielu commented on June 21, 2024

Yes, this set up is on purpose

from gnn-positional-structural-node-features.

xxmen commented on June 21, 2024

Yes, this set up is on purpose

Could you explain the reason of using this set up? It seems for the other papers they tend to use fixed input for 1hot while make it learnable for the random init, but yours (the code attached below) is opposite.

if initializer == "1hot":
features = nn.Embedding(num_nodes, feature_dim)
else:
features = nn.Embedding(num_nodes, feature_dim).requires_grad_(False)

from gnn-positional-structural-node-features.

zjzijielu commented on June 21, 2024

For random initialization being fixed, if you look at all other initialization, they all have requires_grad = False set at https://github.com/zjzijielu/graphsage-simple/blob/00c18149d17b602c2e3b5f04219f1bb2b4a1f6b9/graphsage/model.py#L706. We fix these features instead of co-training them with the model so that we can compare them more objectively.

For 1hot initialization, from the paper,

This feature is essentially equivalent to the random
feature, when the parameters in the first linear layer of the GNN
are randomly initialized

In the code I skipped the step of looking up embedding after passing the one-hot encoding. We do this so that the 1hot feature dimension size can be kept the same with other features, but essentially it's trainable random features

from gnn-positional-structural-node-features.

cqhoneybear commented on June 21, 2024

Thanks for the explanation.

Random-fixed (implemented in your code):
H=σ(D^(-1) AW_(0_fixed) W)

Random-learnable:
H=σ(D^(-1) AW_(0_learnable) W)

One-hot (implemented in your code):
H=σ(D^(-1) AIW_(0_learnable) W)= σ(D^(-1) AW_(0_learnable) W)

I fully agree that “This feature is essentially equivalent to the random feature, when the parameters in the first linear layer of the GNN are randomly initialized.”, but it seems the “One-hot” implemented in your code should be equivalent to “Random-learnable” but not “Random-fixed” setting? So I’m not sure whether the explanation on the performance in your paper is exact and correct: “1. random and one-hot initialization achieve comparable results. This is because they are essentially the same: after passing through the first layer of neural network where the parameters are randomly initialized, one hot initialization is equivalent to random initialization except for possible differences in dimensions (e.g., on Pubmed).”. Please correct me if my understanding is wrong. Merry Christmas and Happy New Year!

@zjzijielu

from gnn-positional-structural-node-features.

HennyJie commented on June 21, 2024

Hi @cqhoneybear
The motivation that we made the one-hot feature learnable is to enhance the expression ability of each node's identity using a continuous vector, compared with the initial discrete vector (with only one element filled). After passing the first layer, the one-hot feature is similar to the random initialization, except one of them is learnable while the other does not. Since the random initialization also generate different representation for each node at the very start, both these two ways can keep the identity through iteration. In this way the results of these two methods are comparable.

from gnn-positional-structural-node-features.

about random and one-hot initialization about gnn-positional-structural-node-features HOT 5 CLOSED

Comments (5)

Related Issues (2)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs