GithubHelp home page GithubHelp logo

knowledgegraphembedding's Introduction

RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space

Introduction

This is the PyTorch implementation of the RotatE model for knowledge graph embedding (KGE). We provide a toolkit that gives state-of-the-art performance of several popular KGE models. The toolkit is quite efficient, which is able to train a large KGE model within a few hours on a single GPU.

A faster multi-GPU implementation of RotatE and other KGE models is available in GraphVite.

Implemented features

Models:

  • RotatE
  • pRotatE
  • TransE
  • ComplEx
  • DistMult

Evaluation Metrics:

  • MRR, MR, HITS@1, HITS@3, HITS@10 (filtered)
  • AUC-PR (for Countries data sets)

Loss Function:

  • Uniform Negative Sampling
  • Self-Adversarial Negative Sampling

Usage

Knowledge Graph Data:

  • entities.dict: a dictionary map entities to unique ids
  • relations.dict: a dictionary map relations to unique ids
  • train.txt: the KGE model is trained to fit this data set
  • valid.txt: create a blank file if no validation data is available
  • test.txt: the KGE model is evaluated on this data set

Train

For example, this command train a RotatE model on FB15k dataset with GPU 0.

CUDA_VISIBLE_DEVICES=0 python -u codes/run.py --do_train \
 --cuda \
 --do_valid \
 --do_test \
 --data_path data/FB15k \
 --model RotatE \
 -n 256 -b 1024 -d 1000 \
 -g 24.0 -a 1.0 -adv \
 -lr 0.0001 --max_steps 150000 \
 -save models/RotatE_FB15k_0 --test_batch_size 16 -de

Check argparse configuration at codes/run.py for more arguments and more details.

Test

CUDA_VISIBLE_DEVICES=$GPU_DEVICE python -u $CODE_PATH/run.py --do_test --cuda -init $SAVE

Reproducing the best results

To reprocude the results in the ICLR 2019 paper RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space, you can run the bash commands in best_config.sh to get the best performance of RotatE, TransE, and ComplEx on five widely used datasets (FB15k, FB15k-237, wn18, wn18rr, Countries).

The run.sh script provides an easy way to search hyper-parameters:

bash run.sh train RotatE FB15k 0 0 1024 256 1000 24.0 1.0 0.0001 200000 16 -de

Speed

The KGE models usually take about half an hour to run 10000 steps on a single GeForce GTX 1080 Ti GPU with default configuration. And these models need different max_steps to converge on different data sets:

Dataset FB15k FB15k-237 wn18 wn18rr Countries S*
MAX_STEPS 150000 100000 80000 80000 40000
TIME 9 h 6 h 4 h 4 h 2 h

Results of the RotatE model

Dataset FB15k FB15k-237 wn18 wn18rr
MRR .797 ± .001 .337 ± .001 .949 ± .000 .477 ± .001
MR 40 177 309 3340
HITS@1 .746 .241 .944 .428
HITS@3 .830 .375 .952 .492
HITS@10 .884 .533 .959 .571

Using the library

The python libarary is organized around 3 objects:

  • TrainDataset (dataloader.py): prepare data stream for training
  • TestDataSet (dataloader.py): prepare data stream for evluation
  • KGEModel (model.py): calculate triple score and provide train/test API

The run.py file contains the main function, which parses arguments, reads data, initilize the model and provides the training loop.

Add your own model to model.py like:

def TransE(self, head, relation, tail, mode):
    if mode == 'head-batch':
        score = head + (relation - tail)
    else:
        score = (head + relation) - tail

    score = self.gamma.item() - torch.norm(score, p=1, dim=2)
    return score

Citation

If you use the codes, please cite the following paper:

@inproceedings{
 sun2018rotate,
 title={RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space},
 author={Zhiqing Sun and Zhi-Hong Deng and Jian-Yun Nie and Jian Tang},
 booktitle={International Conference on Learning Representations},
 year={2019},
 url={https://openreview.net/forum?id=HkgEQnRqYQ},
}

knowledgegraphembedding's People

Contributors

clpl avatar edward-sun avatar kiddozhu avatar tangjianpku avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

knowledgegraphembedding's Issues

the early stop when setting the max step

Sorry again for troubling you, I set the max step is 100000 in RotatE FB15k-237. When I run your best congif.sh, I found my code early stop at step 4900, Can you help me sovle this issues? Thanks!!

Can we initialize entity embeddings using GLoVE embeddings?

Thanks for making the work public!

It has been reported in the literature that in case of common-sense Knowledge Graphs, initializing entity embeddings as averaged word embeddings leads to faster convergence and better results. Have you tried this and provide functionality to use this? I can implement it for my own use-case, however wanted to know if your work already handles this.

Memory consumption issue

I use the command:

bash run.sh train RotatE FB15k-237 0 0 1024 256 1000 9.0 1.0 0.00005 100000 16 -de

to train RotatE on a 11 GB GPU. I ensure it is completely free.
I still get the following error:

2022-03-31 19:32:37,370 INFO     negative_adversarial_sampling = False
2022-03-31 19:32:37,370 INFO     learning_rate = 0
2022-03-31 19:32:39,079 INFO     Training average positive_sample_loss at step 0: 5.635527
2022-03-31 19:32:39,079 INFO     Training average negative_sample_loss at step 0: 0.003591
2022-03-31 19:32:39,079 INFO     Training average loss at step 0: 2.819559
2022-03-31 19:32:39,079 INFO     Evaluating on Valid Dataset...
2022-03-31 19:32:39,552 INFO     Evaluating the model... (0/2192)
2022-03-31 19:33:38,650 INFO     Evaluating the model... (1000/2192)
2022-03-31 19:34:38,503 INFO     Evaluating the model... (2000/2192)
2022-03-31 19:34:49,981 INFO     Valid MRR at step 0: 0.005509
2022-03-31 19:34:49,982 INFO     Valid MR at step 0: 6894.798660
2022-03-31 19:34:49,982 INFO     Valid HITS@1 at step 0: 0.004733
2022-03-31 19:34:49,982 INFO     Valid HITS@3 at step 0: 0.005076
2022-03-31 19:34:49,982 INFO     Valid HITS@10 at step 0: 0.005646
Traceback (most recent call last):
  File "codes/run.py", line 371, in <module>
    main(parse_args())
  File "codes/run.py", line 315, in main
    log = kge_model.train_step(kge_model, optimizer, train_iterator, args)
  File "/home/prachi/related_work/KnowledgeGraphEmbedding/codes/model.py", line 315, in train_step
    loss.backward()
  File "/home/prachi/anaconda3/envs/py36/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/prachi/anaconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.95 GiB (GPU 0; 10.92 GiB total capacity; 7.41 GiB already allocated; 1.51 GiB free; 1.52 GiB cached)
run.sh: line 79: 
CUDA_VISIBLE_DEVICES=$GPU_DEVICE python -u $CODE_PATH/run.py --do_train \
    --cuda \
    --do_valid \
    --do_test \
    --data_path $FULL_DATA_PATH \
    --model $MODEL \
    -n $NEGATIVE_SAMPLE_SIZE -b $BATCH_SIZE -d $HIDDEN_DIM \
    -g $GAMMA -a $ALPHA -adv \
    -lr $LEARNING_RATE --max_steps $MAX_STEPS \
    -save $SAVE --test_batch_size $TEST_BATCH_SIZE \
    ${14} ${15} ${16} ${17} ${18} ${19} ${20}

: No such file or directory

I get similar errors on trying to train FB15k using the command in best_config.sh file.
I reduced the batchsize to 500 and it worked but the performance is much less than the numbers reported in the paper.

I am not sure what is the issue.

Issue of the evaluation results on WN18RR

Thank you for your excellent project, but I cannot obtain the evaluation results on WN18RR with the best configuration. I get about 4500 on MR and 0.4 on MRR with the source code. Can you please tell me why I get this result? Thank you.

The dimension of entity vector and relation vector

re_head, im_head = torch.chunk(head, 2, dim=2)
re_tail, im_tail = torch.chunk(tail, 2, dim=2)
#Make phases of relations uniformly distributed in [-pi, pi]
phase_relation = relation/(self.embedding_range.item()/pi)
re_relation = torch.cos(phase_relation)
im_relation = torch.sin(phase_relation)

Here the re_head, im_head are hidden_size vectors but the L208-L211 wouldn't change the dimension of relation vector, so the re_relation and im_relation are both hidden_size*2 vector. How can you operate them with different dimension.

Pretrained models

Hello all,

Thank you for great work. I was wondering whether you plan to make pretrained RotatE models publicly available. I reckon such pretrained models would be very helpful for everyone including the mother nature :) as training RotatE on suitable machine would require at least 19 hours.

Cheers

Loss function of TransE and RotatE in the code

Thank you for your excellent research and codes. However, I am confused about why you use the same loss function for TransE and RotatE? I think the loss functions of TransE and RotatE are different according to their definitions in the orginal paper. I hope you can explain it. Thank you.

Why do you separate negative head samples and negative tail samples?

Thanks first for such a good job.

I observe in code that you implement two data iterators named train_dataloader_head and train_dataloader_tail, which respectively generate negative head samples and negative tail samples. And when training, these two iterators are alternatively fed into the model. If what I understand above is right, the model will train one positive sample twice, respectively for neg head and neg tail samples. I want to know why you do negative sampling this way, instead train the neg head and neg tail samples together and back propagate one positive sample once, which I think is a more intuitive way?

Thanks a lot for your reply.

How to split the train/valid/test

Hi, did you randomly split the knowledge graph into train/valid/test? How do you make sure the training set contains all entities?

Scoring functions and Adversarial Loss Parameters

I had 3 queries related to loss function and scoring functions:

a. Why is the margin a part of the scoring function for transE and rotatE. Doesn't it actually change the scores when you do prediction(Eg: if margin is 1 then (1-score) is different from score. Each approach would result in totally different ranks during prediction).

b. Margin is not directly part of the adversarial loss. It is a part of the scoring function as described above. However, this is not the case for complex and distmult in this implementation. Is it equivalent to saying that for these two models you are setting the hyper parameter margin=0 in the loss function? What does this mean, since you are using a margin based loss for optimisation?

c. Have you tried Rotate with other margin based losses? How does it perform compared to Complex/HolE?

RuntimeError: CUDA out of memory.

dear bro, very lucky to be able to read such a good paper, and open source, when I run the program, some errors occurred, the same server, when I use the data set FB15K, he is working, when I changed to wn18 , Then RuntimeError: CUDA out of memory

my code:
CUDA_VISIBLE_DEVICES=1 python -u codes/run.py --do_train --cuda --do_valid --do_test --data_path data/wn18 --model RotatE -n 256 -b 256 -d 1000 -g 24.0 -a 1.0 -adv -lr 0.0001 --max_steps 80000 -save models/RotatE_wn18_0 --test_batch_size 16 -de
-b{64,128,256,512}I have tried using these values,I also asked for help

Script for finding the best hyperparameters

Hi Zhiqing, thanks for making your code available for reproducibility. I am just wondering whether you could also share the script that you use for tuning the hyperparameters. This would make your approach even more reproducible. Thanks.

Regarding the direction of the rotation

In a paper, you mention that the direction must be counter clock wise. However image showing direction from head to tail in clockwise. Is there any typo or I made some mistake. It would be great, if you can put some insight on it. Thank you

confused about implementation of 1-1, 1-n, n-1, n-n

hi, thanks a lot for your work in knowledge graph completion, but I still am confused about the implementation of the table9 in your paper.

  1. as for the relation category, Following Wang et al. (2014), for each relation r, we compute the average number of tails per head (tphr) and the average number of head per tail (hptr). If tphr < 1.5 and hptr < 1.5, r is treated as one-to-one; if tphr ≥ 1.5 and hptr ≥ 1.5, r is treated as a many-to-many; if tphr < 1.5 and hptr ≥ 1.5, r is treated as one-to-many. So should we take the valid dataset and test dataset into consideration in this process? Or should we only classify them in the training dataset?
  2. take tail prediction in the 1-n relation category as an example, should we choose all 1-n relations prediction scores and take all the results into mean?
    I'm confused to re-implement this part of the experiment. It would be best if you could take a script as an example. Thanks a lot in advance!

How to use embedding to compute triplet score

I don't quite understand your training script about head-batch. Anyway, I have pretrained ent and rel embeddings without understanding it.

Now I want to figure out that given a triplet: h [x1, ..., x200], r=[r1, ..., r100] and t=[y1, ..., y200], how to compute its score?

A detailed description will be much appreciated.

L3 regularization

When Using L3 regularization for ComplEx and DistMult, you apply norm function two times for relation embedding, just as follwing (model.relation_embedding.norm(p=3).norm(p=3) ** 3). Can you explain why you apply norm function two times? Thank you.

invalid syntax error

Start Training......
File "codes/run.py", line 101
**save_variable_list,
^
SyntaxError: invalid syntax


Don't know what causing this error. Do you have any idea.

A question about the data range of negative sampling

Hi, thanks for such a good job first!

I observe when training, you generate negative samples based on train set, so for triples only appearing in valid or test set, the model will treat them negative and these "false negative" samples will influence the model performance when evaluating. From my opinion, maybe the valid set should be introduced for negative sampling in training?

Thanks for your interpretation.

Did you just use the first batch to train the model? Can you help me solve my problem?

I have question in (ROTATE) model.py. ROTATE uses next function to generate the data, shouldn't the next function be inside the loop? If use this function, I found in every step, ROTATE just chooses the first batch to train, because if next function is not in the loop, it will generates the first data in the list/dict.... who can help me answer my question?

class BidirectionalOneShotIterator(object):

def __init__(self, dataloader_head, dataloader_tail):
    self.iterator_head = self.one_shot_iterator(dataloader_head)
    # print("bb",next(self.iterator_head))  #一个batch的
    self.iterator_tail = self.one_shot_iterator(dataloader_tail)
    self.step = 0


def __next__(self):
    self.step += 1

    if self.step % 2 == 0:
        data = next(self.iterator_head)
    else:
        data = next(self.iterator_tail)
    print("self.step", self.step)
    return data


@staticmethod
def one_shot_iterator(dataloader):
    '''
    Transform a PyTorch Dataloader into python iterator
    '''
    while True:
        for data in dataloader:
            yield data

def train_step(model, optimizer, train_iterator, args):
'''
A single train step. Apply back-propation and return the loss
'''
model.train()
optimizer.zero_grad()
positive_sample, negative_sample, subsampling_weight, mode = next(train_iterator)

Interpreting how does RotatE handle 1-N, N-N type relations...

Hi. Great work and repo.
I was curious to know what is the geometrical interpretation when RotatE handles 1-N and N-N type relations.
For example, in TransH we know that the embeddings are projected onto a relation specific hyperplane to bypass this issue.

How is this handled in the complex plane of RotatE?

what is L3 regularization

I am curious about the L3 regularization you use for complEx and DistMult? Can you give any references for it?

Some question about the test process.

When I tested the model by the command: CUDA_VISIBLE_DEVICES=0 python -u codes/run.py --do_test --cuda -init models/RotatE_countries_S3_0, it showed that "UnboundLocalError: local variable 'current_learning_rate' referenced before assignment". I'm a novice. Could you tell me how to solve it? I would appreciate it if you can help me. Looking forward to your reply.
@7P(5DJE}O% QE)(L KFOBA

hi,a priblem about multi GPU training

how can i change the code to do multi gpu training
怎么样才能用多个gpu进行训练呢,我用的我自己的数据集,报的是显存溢出,我的显存40G,数据量太大啦,请问可以怎么修改呢?谢谢

About the seed setting.

Thanks for sharing the wonderful code :)
When I reproduce the RototE,I find the result is different each time(but just a little),such as loss.mr,mrr,etc.So I find that your code didn't contain ant code about seed setting code,like
seed = 42 np.random.seed(seed) torch.manual_seed(seed) if is_cuda: torch.cuda.manual_seed_all(seed)

It is different from the tutorial ever I did.
Could you give some explanation about your thinking about that?

Looking forward to your repeat,sorry for asking so shallow issues QAQ.

train error

I used the training instructions you provided, but there are some problems, I don’t know how to solve them

File "codes/run.py", line 361, in
main(parse_args())
File "codes/run.py", line 305, in main
log = kge_model.train_step(kge_model, optimizer, train_iterator, args)
File "codes\model.py", line 267, in train_step
negative_score = model((positive_sample, negative_sample), mode=mode)
File "C:\Users\Kano_Hayashi.conda\envs\rota\lib\site-packages\torch\nn\modules\module.py", line 550, in _call
_
result = self.forward(*input, **kwargs)
File "codes\model.py", line 144, in forward
index=tail_part.view(-1)
RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #3 'index' in call to _th_index_select

Can't find log files

Hi!

I trained the model using the following command:

CUDA_VISIBLE_DEVICES=0 python -u codes/run.py --do_train --cuda --do_valid --do_test --data_path data/FB15k-237 --model RotatE -n 256 -b 1024 -d 1000 -g 24.0 -a 1.0 -adv -lr 0.0001 --max_steps 150000 **-save models/RotatE_FB15k-237** --test_batch_size 16 -de --cuda

and tested it using:

time CUDA_VISIBLE_DEVICES=1 python -u codes/run.py --do_test -init models/RotatE_FB15k-237/ --cuda  

I am not able to find the log files in models/RotatE_FB15k-237/ folder. I am unsure what went wrong. Please help.

GPU Out Of Memory

my gpu is 1080ti
Is it impossible to run with this gpu? (rotate)
I wonder how much memory I need.
Or is there another way?

RuntimeError: The size of tensor a (2000) must match the size of tensor b (1000) at non-singleton dimension 2

I am using the following parameters to train Transe to report an error. The error message is in the title。
--do_train
--cuda
--do_valid
--do_test
--data_path
D:\KnowledgeGraphEmbedding\data\wn18rr
--model
TransE
-n
256
-b
1
-d
1000
-g
24.0
-a
1.0
-adv
-lr
0.001
--max_steps
1500
-save
models/TransE_wn18rr_0
--test_batch_size
1
-de

The detailed error information is
Traceback (most recent call last):
File "D:/KnowledgeGraphEmbedding/codes/run.py", line 364, in
main(parse_args())
File "D:/KnowledgeGraphEmbedding/codes/run.py", line 308, in main
log = kge_model.train_step(kge_model, optimizer, train_iterator, args)
File "D:\KnowledgeGraphEmbedding\codes\model.py", line 267, in train_step
negative_score = model((positive_sample, negative_sample), mode=mode)
File "E:\anaconda\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\KnowledgeGraphEmbedding\codes\model.py", line 159, in forward
score = model_func[self.model_name](head, relation, tail, mode)
File "D:\KnowledgeGraphEmbedding\codes\model.py", line 169, in TransE
score = (head + relation) - tail
RuntimeError: The size of tensor a (2000) must match the size of tensor b (1000) at non-singleton dimension 2

Compute Head and Tail Prediction on FB15K dataset

I want to check the Prediction Head and Prediction Tail (Hits@10) on different types of relation like 1-to-1, 1-to-N, N-to-1 and N-to-N of TransD model like the results mentioned in the paper. How can I perform the Head and Tail Prediction on 1-to-1, 1-to-N, N-to-1 and N-to-N of relations of FB15k dataset?

TransD

Question about filter_bias = -1 for positive triplets

In dataloader.py, the filter_bias is set -1 for positive triplets,
and score += filter_bias in model.py is find to filter the rank.

It seems the score of a triplet may be in a large range (at least the range is much larger than 1).
I noticed the small filter_bias really works (Both TransE and Rotate), it gets the same result when set filter_bias = -100.

But, why the small filter_bias "filter_bias = -1" can work?

Question about the embedding range (weight initialization and phase normalization)

Hello, first of all many thanks for providing the source code alongside the paper.

I was comparing the implementation of RotatE against the paper and I found something which seems quite important, the embedding range, which is defined as (gamma+2)/hidden_dim.

This raises two questions that could be related to each other:

  1. The paper says "Both the real and imaginary parts of the entity embeddings are uniformly initialized, and the phases of the relation embeddings are uniformly initialized between 0 and 2π.", while in the code both the entities and relations are initialized with Uniform(-embedding_range, embedding_range).
  2. The phase relation is divided by the embedding range in the metric implementation, while that does not seem explicitly mentioned in the paper from my reading.

Could you help me understand these two points, or maybe point out in the paper the explanation behind them?

Thank you very much.

CUDA out of memory (resolved) and method to make RotatE run faster

Thank you for developing great work, RotatE. I'm really interested in your research.

  1. I ran your program as the following, but I found that there is a bug "RuntimeError: CUDA out of memory". How did you debug?
    I changed the batch size from 1024 to 256 and the program could run successfully. But, I don't really want to change the batch size.

dl-box@DL-Box:~/Downloads/RotatE$ CUDA_VISIBLE_DEVICES=0 python -u codes/run.py --do_train \

--cuda
--do_valid
--do_test
--data_path data/FB15k
--model RotatE
-n 256 -b 1024 -d 1000
-g 24.0 -a 1.0 -adv
-lr 0.0001 --max_steps 150000
-save models/RotatE_FB15k_0 --test_batch_size 16 -de
2021-11-07 17:21:05,436 INFO Model: RotatE
2021-11-07 17:21:05,437 INFO Data Path: data/FB15k
2021-11-07 17:21:05,437 INFO #entity: 14951
2021-11-07 17:21:05,437 INFO #relation: 1345
2021-11-07 17:21:05,892 INFO #train: 483142
2021-11-07 17:21:05,941 INFO #valid: 50000
2021-11-07 17:21:06,000 INFO #test: 59071
2021-11-07 17:21:06,202 INFO Model Parameter Configuration:
2021-11-07 17:21:06,202 INFO Parameter gamma: torch.Size([1]), require_grad = False
2021-11-07 17:21:06,202 INFO Parameter embedding_range: torch.Size([1]), require_grad = False
2021-11-07 17:21:06,202 INFO Parameter entity_embedding: torch.Size([14951, 2000]), require_grad = True
2021-11-07 17:21:06,202 INFO Parameter relation_embedding: torch.Size([1345, 1000]), require_grad = True
2021-11-07 17:21:12,102 INFO Ramdomly Initializing RotatE Model...
2021-11-07 17:21:12,102 INFO Start Training...
2021-11-07 17:21:12,102 INFO init_step = 0
2021-11-07 17:21:12,102 INFO batch_size = 1024
2021-11-07 17:21:12,102 INFO negative_adversarial_sampling = 1
2021-11-07 17:21:12,102 INFO hidden_dim = 1000
2021-11-07 17:21:12,102 INFO gamma = 24.000000
2021-11-07 17:21:12,102 INFO negative_adversarial_sampling = True
2021-11-07 17:21:12,102 INFO adversarial_temperature = 1.000000
2021-11-07 17:21:12,102 INFO learning_rate = 0
Traceback (most recent call last):
File "codes/run.py", line 361, in
main(parse_args())
File "codes/run.py", line 305, in main
log = kge_model.train_step(kge_model, optimizer, train_iterator, args)
File "/home/dl-box/Downloads/RotatE/codes/model.py", line 300, in train_step
loss.backward()
File "/home/dl-box/.local/lib/python3.6/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/dl-box/.local/lib/python3.6/site-packages/torch/autograd/init.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.95 GiB (GPU 0; 10.92 GiB total capacity; 6.11 GiB already allocated; 866.06 MiB free; 7.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF


  1. I ran the command line "bash run.sh train ComplEx FB15k 0 0 1024 256 1000 500.0 1.0 0.001 150000 16 -de -dr -r 0.000002" as the following (with dataset FB15k), and your program could run successfully on my Ubuntu server.

dl-box@DL-Box:~/Downloads/RotatE$ bash run.sh train ComplEx FB15k 0 0 1024 256 1000 500.0 1.0 0.001 150000 16 -de -dr -r 0.000002
1.10.0+cu102
Start Training......
2021-11-08 04:46:49,552 INFO Model: ComplEx
2021-11-08 04:46:49,552 INFO Data Path: data/FB15k
2021-11-08 04:46:49,552 INFO #entity: 14951
2021-11-08 04:46:49,552 INFO #relation: 1345
2021-11-08 04:46:50,009 INFO #train: 483142
2021-11-08 04:46:50,058 INFO #valid: 50000
2021-11-08 04:46:50,120 INFO #test: 59071
2021-11-08 04:46:50,336 INFO Model Parameter Configuration:
2021-11-08 04:46:50,336 INFO Parameter gamma: torch.Size([1]), require_grad = False
2021-11-08 04:46:50,336 INFO Parameter embedding_range: torch.Size([1]), require_grad = False
2021-11-08 04:46:50,336 INFO Parameter entity_embedding: torch.Size([14951, 2000]), require_grad = True
2021-11-08 04:46:50,336 INFO Parameter relation_embedding: torch.Size([1345, 2000]), require_grad = True
2021-11-08 04:46:56,318 INFO Ramdomly Initializing ComplEx Model...
2021-11-08 04:46:56,318 INFO Start Training...
2021-11-08 04:46:56,318 INFO init_step = 0
2021-11-08 04:46:56,318 INFO batch_size = 1024
2021-11-08 04:46:56,318 INFO negative_adversarial_sampling = 1
2021-11-08 04:46:56,318 INFO hidden_dim = 1000
2021-11-08 04:46:56,318 INFO gamma = 500.000000
2021-11-08 04:46:56,318 INFO negative_adversarial_sampling = True
2021-11-08 04:46:56,318 INFO adversarial_temperature = 1.000000
2021-11-08 04:46:56,318 INFO learning_rate = 0
2021-11-08 04:46:57,568 INFO Training average regularization at step 0: 2.061783
2021-11-08 04:46:57,568 INFO Training average positive_sample_loss at step 0: 0.959978
2021-11-08 04:46:57,568 INFO Training average negative_sample_loss at step 0: 2.498887
2021-11-08 04:46:57,569 INFO Training average loss at step 0: 3.791215
2021-11-08 04:46:57,569 INFO Evaluating on Valid Dataset...
2021-11-08 04:46:58,255 INFO Evaluating the model... (0/6250)
2021-11-08 04:47:40,912 INFO Evaluating the model... (1000/6250)
2021-11-08 04:48:24,477 INFO Evaluating the model... (2000/6250)
2021-11-08 04:49:08,110 INFO Evaluating the model... (3000/6250)
2021-11-08 04:49:51,826 INFO Evaluating the model... (4000/6250)
2021-11-08 04:50:35,372 INFO Evaluating the model... (5000/6250)
2021-11-08 04:51:18,479 INFO Evaluating the model... (6000/6250)
2021-11-08 04:51:29,527 INFO Valid MRR at step 0: 0.000718
2021-11-08 04:51:29,527 INFO Valid MR at step 0: 7412.979920
2021-11-08 04:51:29,527 INFO Valid HITS@1 at step 0: 0.000050
2021-11-08 04:51:29,527 INFO Valid HITS@3 at step 0: 0.000190
2021-11-08 04:51:29,527 INFO Valid HITS@10 at step 0: 0.000820
2021-11-08 04:51:44,653 INFO Training average regularization at step 100: 1.869630
2021-11-08 04:51:44,653 INFO Training average positive_sample_loss at step 100: 0.878554
2021-11-08 04:51:44,654 INFO Training average negative_sample_loss at step 100: 2.214018
2021-11-08 04:51:44,654 INFO Training average loss at step 100: 3.415917
2021-11-08 04:51:59,475 INFO Training average regularization at step 200: 1.649423
2021-11-08 04:51:59,475 INFO Training average positive_sample_loss at step 200: 0.795739
2021-11-08 04:51:59,475 INFO Training average negative_sample_loss at step 200: 1.878687
2021-11-08 04:51:59,475 INFO Training average loss at step 200: 2.986636
2021-11-08 04:52:14,330 INFO Training average regularization at step 300: 1.493370
2021-11-08 04:52:14,330 INFO Training average positive_sample_loss at step 300: 0.723991
2021-11-08 04:52:14,330 INFO Training average negative_sample_loss at step 300: 1.647611
2021-11-08 04:52:14,330 INFO Training average loss at step 300: 2.679172
2021-11-08 04:52:29,411 INFO Training average regularization at step 400: 1.364369
2021-11-08 04:52:29,411 INFO Training average positive_sample_loss at step 400: 0.668379
2021-11-08 04:52:29,411 INFO Training average negative_sample_loss at step 400: 1.480148
2021-11-08 04:52:29,411 INFO Training average loss at step 400: 2.438632
2021-11-08 04:52:44,290 INFO Training average regularization at step 500: 1.252640
2021-11-08 04:52:44,290 INFO Training average positive_sample_loss at step 500: 0.615634
2021-11-08 04:52:44,290 INFO Training average negative_sample_loss at step 500: 1.347466
2021-11-08 04:52:44,290 INFO Training average loss at step 500: 2.234190
2021-11-08 04:52:59,189 INFO Training average regularization at step 600: 1.153765
2021-11-08 04:52:59,189 INFO Training average positive_sample_loss at step 600: 0.570805
2021-11-08 04:52:59,189 INFO Training average negative_sample_loss at step 600: 1.245437
2021-11-08 04:52:59,189 INFO Training average loss at step 600: 2.061886
2021-11-08 04:53:14,166 INFO Training average regularization at step 700: 1.065076
2021-11-08 04:53:14,166 INFO Training average positive_sample_loss at step 700: 0.524925
2021-11-08 04:53:14,166 INFO Training average negative_sample_loss at step 700: 1.163066
2021-11-08 04:53:14,166 INFO Training average loss at step 700: 1.909072
2021-11-08 04:53:29,006 INFO Training average regularization at step 800: 0.984837
2021-11-08 04:53:29,006 INFO Training average positive_sample_loss at step 800: 0.489442
2021-11-08 04:53:29,006 INFO Training average negative_sample_loss at step 800: 1.097700
2021-11-08 04:53:29,006 INFO Training average loss at step 800: 1.778408
2021-11-08 04:53:43,852 INFO Training average regularization at step 900: 0.911781
2021-11-08 04:53:43,852 INFO Training average positive_sample_loss at step 900: 0.451165
2021-11-08 04:53:43,852 INFO Training average negative_sample_loss at step 900: 1.044625
2021-11-08 04:53:43,852 INFO Training average loss at step 900: 1.659676
2021-11-08 04:53:59,565 INFO Training average regularization at step 1000: 0.845027
2021-11-08 04:53:59,565 INFO Training average positive_sample_loss at step 1000: 0.363237
2021-11-08 04:53:59,565 INFO Training average negative_sample_loss at step 1000: 1.000880
2021-11-08 04:53:59,565 INFO Training average loss at step 1000: 1.527086
2021-11-08 04:54:14,571 INFO Training average regularization at step 1100: 0.783731
2021-11-08 04:54:14,571 INFO Training average positive_sample_loss at step 1100: 0.312674
2021-11-08 04:54:14,571 INFO Training average negative_sample_loss at step 1100: 0.966706
2021-11-08 04:54:14,571 INFO Training average loss at step 1100: 1.423422
2021-11-08 04:54:29,543 INFO Training average regularization at step 1200: 0.726847
2021-11-08 04:54:29,543 INFO Training average positive_sample_loss at step 1200: 0.310942
...................................................................................................

  1. However, before that time, I ran the following command line "bash run.sh train RotatE wn18 0 0 512 1024 500 12.0 0.5 0.0001 80000 8 -de 1.10.0+cu102" (with dataset wn18), and I also found that your program still has a bug "RuntimeError: CUDA out of memory". Would you please explain to me why sometimes your program has a bug "RuntimeError: CUDA out of memory", but why sometimes your program could run successfully by changing the dataset? How did you debug with this problem?

dl-box@DL-Box:~/Downloads/RotatE$ bash run.sh train RotatE wn18 0 0 512 1024 500 12.0 0.5 0.0001 80000 8 -de
1.10.0+cu102
Start Training......
2021-11-08 04:46:15,756 INFO Model: RotatE
2021-11-08 04:46:15,756 INFO Data Path: data/wn18
2021-11-08 04:46:15,757 INFO #entity: 40943
2021-11-08 04:46:15,757 INFO #relation: 18
2021-11-08 04:46:15,886 INFO #train: 141442
2021-11-08 04:46:15,890 INFO #valid: 5000
2021-11-08 04:46:15,894 INFO #test: 5000
2021-11-08 04:46:16,147 INFO Model Parameter Configuration:
2021-11-08 04:46:16,147 INFO Parameter gamma: torch.Size([1]), require_grad = False
2021-11-08 04:46:16,147 INFO Parameter embedding_range: torch.Size([1]), require_grad = False
2021-11-08 04:46:16,147 INFO Parameter entity_embedding: torch.Size([40943, 1000]), require_grad = True
2021-11-08 04:46:16,147 INFO Parameter relation_embedding: torch.Size([18, 500]), require_grad = True
2021-11-08 04:46:19,692 INFO Ramdomly Initializing RotatE Model...
2021-11-08 04:46:19,692 INFO Start Training...
2021-11-08 04:46:19,692 INFO init_step = 0
2021-11-08 04:46:19,692 INFO batch_size = 512
2021-11-08 04:46:19,692 INFO negative_adversarial_sampling = 1
2021-11-08 04:46:19,692 INFO hidden_dim = 500
2021-11-08 04:46:19,692 INFO gamma = 12.000000
2021-11-08 04:46:19,692 INFO negative_adversarial_sampling = True
2021-11-08 04:46:19,692 INFO adversarial_temperature = 0.500000
2021-11-08 04:46:19,692 INFO learning_rate = 0
Traceback (most recent call last):
File "codes/run.py", line 361, in
main(parse_args())
File "codes/run.py", line 305, in main
log = kge_model.train_step(kge_model, optimizer, train_iterator, args)
File "/home/dl-box/Downloads/RotatE/codes/model.py", line 267, in train_step
negative_score = model((positive_sample, negative_sample), mode=mode)
File "/home/dl-box/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/dl-box/Downloads/RotatE/codes/model.py", line 159, in forward
score = model_func[self.model_name](head, relation, tail, mode)
File "/home/dl-box/Downloads/RotatE/codes/model.py", line 225, in RotatE
score = score.norm(dim = 0)
File "/home/dl-box/.local/lib/python3.6/site-packages/torch/_tensor.py", line 442, in norm
return torch.norm(self, p, dim, keepdim, dtype=dtype)
File "/home/dl-box/.local/lib/python3.6/site-packages/torch/functional.py", line 1442, in norm
return _VF.frobenius_norm(input, _dim, keepdim=keepdim)
RuntimeError: CUDA out of memory. Tried to allocate 1000.00 MiB (GPU 0; 10.92 GiB total capacity; 7.00 GiB already allocated; 22.62 MiB free; 7.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Question about relaxing RotatE

Hello,

I was wondering have you tried removing the modulus constraint (in RotatE, the modulus of each r_i is 1) and see if that helps?

Thank you.

N-1 and 1-N relations

  Thanks for your work,I'm a beginner of KGE.
  While reading this paper,I have a problem: for N-1 relation r, it should be x1· r = y and x2· r = y,r corresponds to a counterclockwise rotation, so x1 is close to x2 , which has the same issue with TransE.
  I'm not sure whether my understanding is true, it seems that RotatE can't model N-1 relation and 1-N relation properly.
  Looking forword to your reply, thank you.

confused about implementation of DistMult

According to the model in distmult paper, $$score = h_s * M_r * h_o^T$$. However, in most papers about knowledge graph completion including the model in CompGCN, they are realized as a form of $$score = h_s * r * h_o^T$$. May I ask you about the insight behind your realization form?

typo MRR in README.md

Dataset | FB15k | FB15k-237 | wn18 | wn18rr
MRR | .797 ± .001 | .949 ± .000 | .337 ± .001 | .477 ± .001

The MRR of datasets FB15k-237 and wn18 should be swapped with each other.

Init embedding random OR by bert encode from entity description text

Hi, thanks a lot for your work in KGE, but I still am confused about init embedding, I tried to init embedding from bert through entity text information, but when train model, the neg triplets loss seem still upgrade, I have change LR or other hyperparameters, but not work,
Will different init embedding have different results to model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.