GithubHelp home page GithubHelp logo

ink-usc / re-net Goto Github PK

View Code? Open in Web Editor NEW
427.0 10.0 94.0 26.46 MB

Recurrent Event Network: Autoregressive Structure Inference over Temporal Knowledge Graphs (EMNLP 2020)

Home Page: http://inklab.usc.edu/renet/

Python 100.00%
knowledge-graph graph-neural-networks reasoning temporal-networks dynamic-networks

re-net's Introduction

PyTorch implementation of Recurrent Event Network (RE-Net)

Paper: Recurrent Event Network: Autoregressive Structure Inference over Temporal Knowledge Graphs

TL;DR: We propose an autoregressive model to infer graph structures at unobserved times on temporal knowledge graphs (extrapolation problem).

This repository contains the implementation of the RE-Net architectures described in the paper.

Knowledge graph reasoning is a critical task in natural language processing. The task becomes more challenging on temporal knowledge graphs, where each fact is associated with a timestamp. Most existing methods focus on reasoning at past timestamps, which are not able to predict facts happening in the future. This paper proposes Recurrent Event Network (RE-Net), a novel autoregressive architecture for predicting future interactions. The occurrence of a fact (event) is modeled as a probability distribution conditioned on temporal sequences of past knowledge graphs. Specifically, our RE-Net employs a recurrent event encoder to encode past facts, and uses a neighborhood aggregator to model the connection of facts at the same timestamp. Future facts can then be inferred in a sequential manner based on the two modules. We evaluate our proposed method via link prediction at future times on five public datasets. Through extensive experiments we demonstrate the strength of RE-Net, especially on multi-step inference over future time stamps, and achieve state-of-the-art performance on all five datasets.

If you make use of this code or the RE-Net algorithm in your work, please cite the following paper:

@inproceedings{jin2020Renet,
	title={Recurrent Event Network: Autoregressive Structure Inference over Temporal Knowledge Graphs},
	author={Jin, Woojeong and Qu, Meng and Jin, Xisen and Ren, Xiang},
	booktitle={EMNLP},
	year={2020}
}

Quick Links

Installation

Run the following commands to create a conda environment (assume CUDA10.1):

conda create -n renet python=3.6 numpy
conda activate renet
pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
conda install -c dglteam "dgl-cuda10.1<0.5"

Train and Test

In this code, RE-Net with RGCN aggregator is included. Before running, the user should preprocess datasets.

cd data/DATA_NAME
python3 get_history_graph.py

We first pretrain the global model.

python3 pretrain.py -d DATA_NAME --gpu 0 --dropout 0.5 --n-hidden 200 --lr 1e-3 --max-epochs 20 --batch-size 1024

Then, train the model.

python3 train.py -d DATA_NAME --gpu 0 --dropout 0.5 --n-hidden 200 --lr 1e-3 --max-epochs 20 --batch-size 1024

We are ready to test!

python3 test.py -d DATA_NAME --gpu 0 --n-hidden 200

The default hyperparameters give the best performances.

Related Work

Our work is on an extrapolation problem. There are only a few work on the problem. Many studies on temporal knowledge graphs are focused on an intrapolation problem. We organized the list of related work such as Temporal Knowledge Graph Reasoning, Dynamic Graph Embedding, Knowledge Graph Embedding, and Static Graph Embedding.

Datasets

There are four datasets: ICEWS18, ICEWS14 (from Know-Evolve), GDELT, WIKI, and YAGO. These datasets are for the extrapolation problem. Times of test set should be larger than times of train and valid sets. (Times of valid set also should be larger than times of train set.) Each data folder has 'stat.txt', 'train.txt', 'valid.txt', 'test.txt',and 'get_history_graph.py'.

  • 'get_history_graph.py': This is for getting history and graph for the model.
  • 'stat.txt': First value is the number of entities, and second value is the number of relations.
  • 'train.txt', 'valid.txt', 'test.txt': First column is subject entities, second column is relations, and third column is object entities. The fourth column is time. The fifth column is for know-evolve's data format. It is ignored in RE-Net.

For relation names in GDELT, please refer to the GDLET codebook.

Baselines

We use the following public codes for baselines and hyperparameters. We validated embedding sizes among presented values.

Baselines Code Embedding size Batch size
TransE (Bordes et al., 2013) Link 100, 200 1024
DistMult (Yang et al., 2015) Link 100, 200 1024
ComplEx (Trouillon et al., 2016) Link 50, 100, 200 100
RGCN (Schlichtkrull et al., 2018) Link 200 Default
ConvE (Dettmers et al., 2018) Link 200 128
Know-Evolve (Trivedi et al., 2017) Link Default Default
HyTE (Dasgupta et al., 2018) Link 128 Default

We implemented TA-TransE, TA-DistMult, and TTransE. The user can run the baselines by the following command.

cd ./baselines
CUDA_VISIBLE_DEVICES=0 python3 TA-TransE.py -f 1 -d ICEWS18 -L 1 -bs 1024 -n 1000

The user can find implementations in the 'baselines' folder.

re-net's People

Contributors

changlinzhang avatar danny911kr avatar davidshumway avatar hhdo avatar shanzhenren avatar woojeongjin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

re-net's Issues

Compilation environment problems

Hello, do you need the same environment to run? My code is PORch1.7, Torchversion0.8.1, PYTHon3.7, DGL-CUDa11.0 0.5.3, and numpy1.19.2. There are errors in the direct running code.First, there was a problem when pickling. Load was reading TXT data during pre-training. Then I changed it to PKL file.There is no keys method or number_of_nodes etc

problem of relation loss

It seems the loss you used in your codes is "loss_sub + 0.1loss_sub_r + loss_ob + 0.1loss_ob_r" but not "loss_sub + loss_ob". are the loss_sub_r and loss_ob_r fatal to your results? Could you explain the meaning of them?

Running time is too long

I tried to run RE-Net on a Tesla V100, but it takes over 10 hours for one epoch.

Besides, GPU utilization has been very low, I guess it is the problem of dataloader.

I wanna know if anyone has successfully and efficiently implemented this algorithm?

Can you provide your settings and changes in running Know-Evovle code?

Hi authors,

I am trying to reproduce the results of the Know-Evolve model in your paper.
Now I can successfully run two example shells script in Know-Evolve on the original ICEWS dataset . However, when I changed it to datasets you provided, the program was able to perform the training normally but suddenly quit during the test.

I also found that ICEWS14 dataset was came from Know-Evolve, and the first 2776 lines data in "train.txt" were deleted.
I ran Know-Evolve on all datasets including ICWES14 and encountered the same problem, the error information was described in rstriv/Know-Evolve#11 (comment)

Can you provide your modified version of Know-Evovle code, or tell me the major changes to run it?
Thanks a lot!

dataset must be sorted

Just a comment on documentation, may be of interest for people trying to run this model that input files: test, train and valid must be sorted on the temporal dimension for the code to work properly.

Probably a typo in the paper

Hello,

I was checking your paper. In table 4, hits3 of ICEWS14 on RE-Net seems not very reasonable (it might be higher?).
And could you please also provide the hits1 performance?

Best,
Zifeng

AttributeError: 'bytearray' object has no attribute 'contiguous'

When I run the pretrain.py on GDELT, the following errors occur:
File "/data3/zhangmengqi/RE-Net/pretrain.py", line 139, in
train(args)
File "/data3/zhangmengqi/RE-Net/pretrain.py", line 55, in train
graph_dict = pickle.load(f)
File "/home/zhangmengqi/anaconda3/envs/renet/lib/python3.6/site-packages/dgl/heterograph_index.py", line 1122, in setstate
num_nodes_per_type = F.zerocopy_to_dgl_ndarray(num_nodes_per_type)
File "/home/zhangmengqi/anaconda3/envs/renet/lib/python3.6/site-packages/dgl/backend/pytorch/tensor.py", line 283, in zerocopy_to_dgl_ndarray
return nd.from_dlpack(dlpack.to_dlpack(input.contiguous()))
AttributeError: 'bytearray' object has no attribute 'contiguous'

Do you know the reason? I guess the problem with the DGL version, could you tell me the specific version of your numpy and DGL? Thanks!

Metric calculation

Hi, I think the way the baseline metrics Hits@X and MR and MMR are calculated has a problem as follows:

  1. TripleDict contains ground truth about what triples are known, but this dict is agnostic of time, it is created in utils.py and totally ignore the time dimension. Eg. (s,o,p) = True regardles of the time the query is issued.

  2. When you evaluate the ranking is based solely on the existence of (s,p,o) in triplesDict. Therefore if the quadruple to predict is (s,p,o,t) and (s,p,o) in triplesDict then is classified as true when it can be true in t1 but not in t2.

I have made a few changes to address this issue, mainly add a time field to the triplesDict and a realTime (rt) field to the quads object, so at test time we can easily search for (s,p,o,t) in tripleDict instead of (s,p,o).

Do you have any thoughts about which one is the correct evaluation? Is there something i am missing from your implementation?

metric,know-evolve and WIKI datasets question

Hi, Gentleman

Thank you for your excellent research and open source codes about RE-Net, I have some questions about this paper.
1. But I can run codes in ICEWS14 and ICEWS18 in pretrain.py,But CUDA out of memory in WIKI even though the batch is 16, I don't how can I update codes. my pytorch is 1.7 python is 3.7.9 dgl is '0.4.3post2',CUDA Version: 11.0
2. ICEWS14 does not have vaild.txt, so how can i run it?
3. In Metric, I think only fifilter from the list all the events that occur on [time] maybe is a good idea?

image

 4. If you implement Know-Evolve in pytorch, can you provide codes.

Thanks a lot!

entity2id, relation2id

Where can I get the mapping from ids to entities, or ids to relations for the datasets? Thank you.

A problem for Training with multiple GPUs-"CUDA error: device-side assert triggered"

When I tried to use multiple GPUs for training model, the program reported an error, but I didn't know how to solve it

Traceback (most recent call last): File "train2.py", line 265, in <module> train(args) File "train2.py", line 157, in train loss_s = model(batch_data, (s_hist, s_hist_t), (o_hist, o_hist_t), graph_dict, subject=True) File "/home/ws/anaconda3/envs/renet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/ws/anaconda3/envs/renet/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/ws/anaconda3/envs/renet/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/ws/anaconda3/envs/renet/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply output.reraise() File "/home/ws/anaconda3/envs/renet/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/ws/anaconda3/envs/renet/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, **kwargs) File "/home/ws/anaconda3/envs/renet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/ws/proj_ws/RE-Net/model.py", line 84, in forward reverse=reverse) File "/home/ws/anaconda3/envs/renet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/ws/proj_ws/RE-Net/Aggregator.py", line 131, in forward s_len_non_zero, s_tem, r_tem, g, node_ids_graph, global_emb_list = get_sorted_s_r_embed_rgcn(s_hist, s, r, ent_embeds, graph_dict, global_emb) File "/home/ws/proj_ws/RE-Net/utils.py", line 222, in get_sorted_s_r_embed_rgcn s_hist_sorted.append(s_hist[idx]) RuntimeError: CUDA error: device-side assert triggered

The weights in attention layer never change throughout training!!!

I was just going through the code of AttnAggregator class and I realised that the self.attn_s layer never receives the gradients. This is because, it is not a part of the forward pass.

If you check the forward method of that class, you will notice that you are always passing zeros as output of the attention aggregator due to the following:

class AttnAggregator(nn.Module):
   ...
   def forward(self, s_hist, s, r, ent_embeds, rel_embeds):
      ...
      # Creates a tensor of zeros
      s_embed_seq_tensor = torch.zeros(len(s_len_non_zero), self.seq_len, 3 * self.h_dim).cuda()
      
      # Passes zeros through dropout        
      s_embed_seq_tensor = self.dropout(s_embed_seq_tensor)
      
      # pack the s_embed_seq_tensor
      s_packed_input = torch.nn.utils.rnn.pack_padded_sequence(s_embed_seq_tensor,
                                                                 s_len_non_zero,
                                                                 batch_first=True)
      return s_packed_input

Due to this, the Linear layer(attention layer) is never applied to the inputs. Hence the weights wont receive gradients. In other words, the weight of attention layer won't train. You can verify this by either visualising weights in tensorboard or just by printing them.

IndexError: tensors used as indices must be long, byte or bool tensors

Hi, Thanks for your nice work!

Traceback (most recent call last):
File "train.py", line 241, in
train(args)
File "train.py", line 136, in train
loss_s = model(batch_data, (s_hist, s_hist_t), (o_hist, o_hist_t), graph_dict, subject=True)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "F:\RE-Net-master\model.py", line 90, in forward
self.dropout(torch.cat((self.ent_embeds[s[s_idx]], s_h, rel_embeds[r[s_idx]]), dim=1)))
IndexError: tensors used as indices must be long, byte or bool tensors

My environment is Python 3.6.3. Windows 10. Pytorch 1.4.0.
Besides, I changed to another environment called Pytorch 0.4.0
Both of them all got the same error.

I am very pleased to hear from you on the question's comments.

Issue of running pretrain.py

Hello, I got an error when I run the source code pretrain.py. I hope you can help me to solve this problem. The error information is shown as following:

Traceback (most recent call last):
  File "pretrain.py", line 139, in <module>
    train(args)
  File "pretrain.py", line 92, in train
    model.global_emb = model.get_global_emb(train_times_origin, graph_dict)
  File "/gs/home/beihangngl/RE-Net-master/global_model.py", line 67, in get_global_emb
    emb, _, _ = self.predict(t, graph_dict)
  File "/gs/home/beihangngl/RE-Net-master/global_model.py", line 88, in predict
    rnn_inp = self.aggregator.predict(t, self.ent_embeds, graph_dict, reverse=reverse)
  File "/gs/home/beihangngl/RE-Net-master/Aggregator.py", line 96, in predict
    batched_graph = dgl.batch(g_list)
  File "/gs/home/beihangngl/anaconda/envs/pytorch-gnn/lib/python3.8/site-packages/dgl/graph.py", line 4187, in batch
    cols = {key: F.cat([gr._node_frame[key] for gr in graph_list
  File "/gs/home/beihangngl/anaconda/envs/pytorch-gnn/lib/python3.8/site-packages/dgl/graph.py", line 4187, in <dictcomp>
    cols = {key: F.cat([gr._node_frame[key] for gr in graph_list
  File "/gs/home/beihangngl/anaconda/envs/pytorch-gnn/lib/python3.8/site-packages/dgl/backend/pytorch/tensor.py", line 141, in cat
    return th.cat(seq, dim=dim)
RuntimeError: Expected object of backend CUDA but got backend CPU for sequence element 1 in sequence argument at position #1 'tensors'

Hope for you help. Thank you.

To Do to enhance the repo

  • a centralized script/command to call BOTH baselines and our model variants

  • four datasets with unified format (and link to the dataset directory on README)

  • enrich the README with motivation/intro of the work, plus the illustration figure

Something strange in your inference code!!

In the line 224 of model.py , you use
global_emb_prev_t, sub, prob_sub = global_model.predict(t, self.graph_dict, subject=True) to get H_{t-1} and s_t (according to you notes in the line 81 of global_model.py # Predict s at time t, so <t-1 graphs are used.)
then you sample G_t and add to s_his_cache and o_his_cache, and in the line 301-302 of model.py,
self.data = get_data(self.s_his_cache, self.o_his_cache)
self.graph_dict[self.latest_time.item()] = get_big_graph(self.data, self.num_rels)
you add G_t to graph_dict.

I think the line 302 of model.py should be:
self.graph_dict[t] = get_big_graph(self.data, self.num_rels)

And why you add the sampled G_t to history when predicting (s_t, r_t, ??)

Are these codes the version of your EMNLP2020 paper?? Is there some misunderstanding of your codes?

Thanks very much for your reply!

some questions regarding entity id

Hello, I have some questions regarding, how to assign ids to the entities.

I am assuming that the entity ids that you have added are for all the subject and object entities together.
I have few question wrt to WIKI and YAGO data:

  1. Is there any fix rule/method how you assign the ids, especially for WIKI data?
  2. Does enumerating all the unique entities and assigning corresponding number make sense?
  3. Will there be any ambiguity if for the subject and object entities, ids are being independently assigned? For example, certain entity which appears as subject as well as object might have different ids.
  4. Why do we need a time period for the entities in YAGO data ?

Thank you very much!

A little error in Aggregator.py

Appreciate for the efforts. A little mistakes occur in Aggregator.py line 38:

time_list.append(torch.LongTensor(times[length - self.seq_len:length]))

Maybe (length - self.seq_len): length ?

KeyError in make_subgraph in utils.py

I'm trying to run the code on my own data that I generated from a Neo4j graph. During training, when model.evaluate_filter is called in train.py, I get this error

2020-09-29T13:06:22.156137293Z Epoch 0010 | Loss 16.3989 | time 24.4101
2020-09-29T13:06:22.160739045Z Traceback (most recent call last):
2020-09-29T13:06:22.160764766Z   File "train.py", line 257, in <module>
2020-09-29T13:06:22.160772024Z     train(args)
2020-09-29T13:06:22.16077712Z   File "train.py", line 185, in train
2020-09-29T13:06:22.160782517Z     ranks, loss = model.evaluate_filter(batch_data, (s_hist, s_hist_t), (o_hist, o_hist_t), global_model, total_data)
2020-09-29T13:06:22.160788149Z   File "/output/re-net/model.py", line 387, in evaluate_filter
2020-09-29T13:06:22.160793458Z     loss, sub_pred, ob_pred = self.predict(triplet, s_hist, o_hist, global_model)
2020-09-29T13:06:22.16079868Z   File "/output/re-net/model.py", line 337, in predict
2020-09-29T13:06:22.160803723Z     inp, _ = self.aggregator.predict((s_history, s_history_t), s, r, self.ent_embeds, self.rel_embeds[:self.num_rels], self.graph_dict, self.global_emb, reverse=False)
2020-09-29T13:06:22.160809008Z   File "/output/re-net/Aggregator.py", line 223, in predict
2020-09-29T13:06:22.160814046Z     graph_dict, global_emb)
2020-09-29T13:06:22.160818952Z   File "/output/re-net/utils.py", line 277, in get_s_r_embed_rgcn
2020-09-29T13:06:22.160824419Z     g_list, g_id_dict = get_g_list_id(neighs_t, graph_dict)
2020-09-29T13:06:22.16082952Z   File "/output/re-net/utils.py", line 169, in get_g_list_id
2020-09-29T13:06:22.160834744Z     g_list.append(make_subgraph(graph_dict[tim], neighs_t[tim]))
2020-09-29T13:06:22.160839784Z   File "/output/re-net/utils.py", line 121, in make_subgraph
2020-09-29T13:06:22.16084499Z     relabeled_nodes.append(g.ids[node])
2020-09-29T13:06:22.16084988Z KeyError: 3685
2020-09-29T13:06:26.799168428Z 

What could be the cause of this? Is there any special assumption on how the input data should be formed?

I've put in this gist the train, test, valid and stat files I used.

cuda error triggered when —seq-len is set smaller than 10 using model 3

Command (taking —seq-len as 2 for example):
py train.py -d YAGO —gpu 0 —model 3 —dropout 0.5 —n-hidden 200 —lr 1e-3 —seq-len 2

Traceback:
File “train.py” in
train(args)
File “train.py” in train
loss = model.get_loss(batch_data, (s_hist, s_hist_t), (o_hist, o_hist_t), graph_dict)
File “model.py” in get_loss
loss, , ,, = self.forward(triplets, s_hist, o_hist, graph_dict)
File “model.py” in forward
s_packed_input = self.aggregators_s(s_hist, s, r, self.ent_embeds, self.rel_embeds[:self.num_rels], graph_dict, reverse=False)
File “module.py” in call
result = self.forward(*input, **kwargs)
File “Aggregator.py” in forward
(embeds, ent_embeds[s_tem[i]].repeat(len(embeds), 1)
RuntimeError: CUDA error: device-side assert triggered

Settings:
Python 3.6.7
Pytorch 1.1.0
Cuda 9.0
DGL 0.3

the time for training

hello!
I am interested in your excellent work!
But when I run your code, I notice that the training process is time consuming.
So I want to know how much time per epoch need in each data set!
Thanks!

Running on Google Colab: "dgl._ffi.base.DGLError: Cannot assign node feature..."

Running on Google Colab, the following issue occurs in pretrrain.py. Perhaps it is an installation issue?

Traceback (most recent call last):
  File "pretrain.py", line 141, in <module>
    train(args)
  File "pretrain.py", line 85, in train
    loss = model(batch_data, true_s, true_o, graph_dict)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/gdrive/MyDrive/test/RE-Net/global_model.py", line 47, in forward
    packed_input = self.aggregator(sorted_t, self.ent_embeds, graph_dict, reverse=reverse)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/gdrive/MyDrive/test/RE-Net/Aggregator.py", line 54, in forward
    batched_graph.ndata['h'] = ent_embeds[batched_graph.ndata['id']].view(-1, ent_embeds.shape[1])
  File "/usr/local/lib/python3.7/dist-packages/dgl/view.py", line 81, in __setitem__
    self._graph._set_n_repr(self._ntid, self._nodes, {key : val})
  File "/usr/local/lib/python3.7/dist-packages/dgl/heterograph.py", line 3997, in _set_n_repr
    ' same device.'.format(key, F.context(val), self.device))
dgl._ffi.base.DGLError: Cannot assign node feature "h" on device cuda:0 to a graph on device cpu. Call DGLGraph.to() to copy the graph to the same device.

Some questions in the inference

In the predict function of model.py, the global variable s and local variable s are used together. Will the ob_pred prediction result be a problem?

RuntimeError: tensors used as indices must be long or byte tensors

Traceback (most recent call last):
File "train.py", line 223, in
train(args)
File "train.py", line 126, in train
loss = model.get_loss(batch_data, s_hist, o_hist, None)
File "X:\GitRepo\RE-Net\model.py", line 131, in get_loss
loss, _, _, _, _ = self.forward(triplets, s_hist, o_hist, graph_dict)
File "X:\GitRepo\RE-Net\model.py", line 101, in forward
self.dropout(torch.cat((self.ent_embeds[s[s_idx]], s_h, self.rel_embeds[:self.num_rels][r[s_idx]]), dim=1)))
RuntimeError: tensors used as indices must be long or byte tensors

Any ideas?

Clarification for HyTE baseline

Hi,
Since all test/valid triples in your datasets are for timestamps not seen yet, which hyperplane do you use for HyTE predictions? The most recent one?

To Do to enhance the repo

  • a centralized script/command to call BOTH baselines and our model variants

  • four datasets with unified format (and link to the dataset directory on README)

  • enrich the README with motivation/intro of the work, plus the illustration figure

How to generate ICEWS18 dataset from raw ICEWS data?

Dear authors, I want to generate id to string files of ICEWS18 dataset from raw ICEWS data but i noticed something weird. As said in your paper, "ICEWS is collected from 1/1/2018 to 10/31/2018". However there are 635305 triples in this period in raw ICEWS data, After i deleted duplicate triples from raw data, there are 468559 triples. It is different from the size of the ICEWS18 dataset your provides(373018+45995+69514=488527). Could you tell me is there some thing wrong in my preprocess of the raw ICEWS dataset? Or could you provide the id to string files of your datasets, Thanks very much!!!!

The results of three Aggregators in your paper?

In your paper, the next-to-last row of your table1 is the multi step inference result with Multi-Relational Aggregator? It seems the result 42.93 do not improve the result of re-net with one-hop subgraph with mean pooling aggreagator in your previous version which represented in ICLR19 workshop. The result is 42.38

Multi-step inference over time in the valid data

In the section 2.4 in your paper, "the encoder state is updated based on current predictions, and will be used for making next predictions, That is, for each time step we rank the candidate entities and select top-m entities as current predictions. We maintain the history as a sliding window of length k, so the oldest interaction set will be detached and new predicted entity set will be added to the history."
In get_history.py, the generated train_history_ob1.txt, dev_history_ob1.txt, test_history_ob1.txt, the s_hist and o_hist always saved the ground truth history.
In my opinion, the history you used to predict results in valid data consist of two parts, 1. the ground truth history in the training data, 2.the prediction for every sample happened before current valid sample in the valid data. while in your code(line 216~224 in model.py)
if len(self.s_hist_test[s][r]) == 0:
self.s_hist_test[s][r] = s_hist.copy()
s_history = self.s_hist_test[s][r]
if self.s_hist_test is the generated history, the codes mean: if there is no generated history, s_hist_test equals to the ground truth history, once generated history exists, you only use the generated history to predict the result.
I am confused about the codes, if the s_hist is the ground truth history, following your paper, the codes should replace the ground truth history in the valid data with the outputs of your model??
Look forward to your soonest reply!

训练时间过长且GPU利用率较低

数据集每一轮训练时间都很长,并且GPU的利用率较低,请问是否有人和我遇到同样的问题?是不是需要在dataload部分进行修改呢?

Relation names for GDELT

I'm interested in your work, and appreciate your effort of making the datasets public so much!

Could you please provide the relation names of GDELT (the rel2id file only contains numbers), and entity2id & rel2id files for WIKI and YAGO? Thanks a lot!

Which version of dgl-cu100 do you use?

pip install dgl-cu100 will install version 0.5.2

There will be some runtime errors on this version, like:

AttributeError: 'DGLHeteroGraph' object has no attribute 'parent_nid'

TypeError in pretrain.py

torch.version
1.3.1
dgl.version
0.4.2

Calling in ./data/ICEWS14:
python get_history_graph.py
seems to work

However:
python3 pretrain.py -d ICEWS14

Namespace(batch_size=1024, dataset='ICEWS14', dropout=0.5, gpu=0, grad_norm=1.0, lr=0.01, max_epochs=100, maxpool=1, model=0, n_hidden=200, num_k=10, rnn_layers=1, seq_len=10)
start training...
Traceback (most recent call last):
File "pretrain.py", line 139, in
train(args)
File "pretrain.py", line 83, in train
loss = model(batch_data, true_s, true_o, graph_dict)
File "/home/martin/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/RE-Net-master/global_model.py", line 47, in forward
packed_input = self.aggregator(sorted_t, self.ent_embeds, graph_dict, reverse=reverse)
File "/home/martin/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/RE-Net-master/Aggregator.py", line 38, in forward
time_list.append(torch.LongTensor(times[length - self.seq_len:length]))
TypeError: only integer tensors of a single element can be converted to an index

Seems to be the same as: #22

Problems with RuntimeError:

Hello, I'm having trouble running the training on nvidia GPU. I always get the same error regarding the tensors not all being on the same device (cpu and gpu). I saw there were already similar issues, so I pulled in the changes two hours ago and retried, still same problem when running train.py (on YAGO dataset, but same results with WIKI).

Python is 3.6; with pytorch 1.4.0 (CUDA 10.1) I get

Traceback (most recent call last):
  File "train.py", line 256, in <module>
    train(args)
  File "train.py", line 184, in train
    ranks, loss = model.evaluate_filter(batch_data, (s_hist, s_hist_t), (o_hist, o_hist_t), global_model, total_data)
  File "/output/re-net/model.py", line 387, in evaluate_filter
    loss, sub_pred, ob_pred = self.predict(triplet, s_hist, o_hist, global_model)
  File "/output/re-net/model.py", line 239, in predict
    probs = prob_s * self.pred_r_rank2(ss, rr, subject=True)
  File "/output/re-net/model.py", line 193, in pred_r_rank2
    reverse=reverse)
  File "/output/re-net/Aggregator.py", line 178, in predict_batch
    s_len_non_zero, s_tem, r_tem, g, node_ids_graph, global_emb_list = get_s_r_embed_rgcn(s_hist, s, r, ent_embeds, graph_dict, global_emb)
  File "/output/re-net/utils.py", line 273, in get_s_r_embed_rgcn
    batched_graph = dgl.batch(g_list)
  File "/usr/local/lib/python3.6/dist-packages/dgl/graph.py", line 4189, in batch
    for key in node_attrs}
  File "/usr/local/lib/python3.6/dist-packages/dgl/graph.py", line 4189, in <dictcomp>
    for key in node_attrs}
  File "/usr/local/lib/python3.6/dist-packages/dgl/backend/pytorch/tensor.py", line 141, in cat
    return th.cat(seq, dim=dim)
RuntimeError: Expected object of backend CUDA but got backend CPU for sequence element 0 in sequence argument at position #1 'tensors'

with pytorch 1.5.1 (CUDA 10.1) at the same exact line in utils.py

 RuntimeError: All input tensors must be on the same device. Received cpu and cuda:0  

AttributeError in pretrain.py

torch.version
1.4.0
dgl-cu100.version
0.4.2

when run

python3 pretrain.py -d DATA_NAME --gpu 0 --dropout 0.5 --n-hidden 200 --lr 1e-3 --max-epochs 20 --batch-size 1024

as in README:

Traceback (most recent call last):
File "pretrain.py", line 139, in
train(args)
File "pretrain.py", line 104, in train
's_hist': model.s_hist_test, 's_cache': model.s_his_cache,
File "/DIRECTORY/TO/RE-Net/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 576, in __getattr__
type(self).name, name))
AttributeError: 'RENet_global' object has no attribute 's_hist_test'

RE-Net/pretrain.py

Lines 103 to 107 in 9923608

torch.save({'state_dict': model.state_dict(), 'epoch': epoch,
's_hist': model.s_hist_test, 's_cache': model.s_his_cache,
'o_hist': model.o_hist_test, 'o_cache': model.o_his_cache,
'latest_time': model.latest_time},
model_state_file)

Errors with ICEWS18 scripts from top of repo

Not sure if I'm missing something, but in RE-Net/data/ICEWS18/get_history_graph.py.py there are two load_quadruples functions that seem to do the same thing.

When I run python data/ICEWS18/get_history_graph.py from the top of the repo directory, I get the following error and traceback:

Traceback (most recent call last):
  File "data/ICEWS18/get_history_graph.py", line 112, in <module>
    train_data, train_times = load_quadruples('', 'train.txt')
  File "data/ICEWS18/get_history_graph.py", line 47, in load_quadruples
    with open(os.path.join(inPath, fileName), 'r') as fr:
FileNotFoundError: [Errno 2] No such file or directory: 'train.txt'

I get the same error with python data/ICEWS18/get_history.py:

Traceback (most recent call last):
  File "data/ICEWS18/get_history.py", line 86, in <module>
    train_data, train_times = load_quadruples('','train.txt')
  File "data/ICEWS18/get_history.py", line 42, in load_quadruples
    with open(os.path.join(inPath, fileName), 'r') as fr:
FileNotFoundError: [Errno 2] No such file or directory: 'train.txt'

Whereas running these scripts from the top of the directory with python data/ICEWS18/get_history.py, etc., does not work, running python get_history.py, etc. from within the data/ICEWS18/ directory is successful. You may want to update your preprocessing notes in https://github.com/INK-USC/re-net#train-and-test to tell people to navigate to the respective data folder first and then run the python script.

Thanks for making this code public!

Out of memory when pretraining

Hello Woojeong,

When I run pretrain.py as described in instruction:

python pretrain.py -d ICEWS18 --gpu 0 --dropout 0.5 --n-hidden 200 --lr 1e-3 --max-epochs 20 --batch-size 1024

It gets OOM error in the first batch:

Traceback (most recent call last):
File "pretrain.py", line 139, in
train(args)
File "pretrain.py", line 83, in train
loss = model(batch_data, true_s, true_o, graph_dict)
File "/home/liux/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/data1/liux/RE-Net/global_model.py", line 47, in forward
packed_input = self.aggregator(sorted_t, self.ent_embeds, graph_dict, reverse=reverse)
File "/home/liux/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/data1/liux/RE-Net/Aggregator.py", line 57, in forward
self.rgcn1(batched_graph, reverse)
File "/home/liux/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/data1/liux/RE-Net/RGCN.py", line 39, in forward
self.propagate(g, reverse)
File "/data1/liux/RE-Net/RGCN.py", line 91, in propagate
g.update_all(lambda x: self.msg_func(x, reverse), fn.sum(msg='msg', out='h'), self.apply_func)
File "/home/liux/.local/lib/python3.6/site-packages/dgl/heterograph.py", line 4501, in update_all
ndata = core.message_passing(g, message_func, reduce_func, apply_node_func)
File "/home/liux/.local/lib/python3.6/site-packages/dgl/core.py", line 291, in message_passing
msgdata = invoke_edge_udf(g, ALL, g.canonical_etypes[0], mfunc, orig_eid=orig_eid)
File "/home/liux/.local/lib/python3.6/site-packages/dgl/core.py", line 82, in invoke_edge_udf
return func(ebatch)
File "/data1/liux/RE-Net/RGCN.py", line 91, in
g.update_all(lambda x: self.msg_func(x, reverse), fn.sum(msg='msg', out='h'), self.apply_func)
File "/data1/liux/RE-Net/RGCN.py", line 84, in msg_func
weight = self.weight.index_select(0, edges.data['type_s']).view(
RuntimeError: CUDA out of memory. Tried to allocate 11.06 GiB (GPU 1; 10.92 GiB total capacity; 578.54 MiB already allocated; 8.13 GiB free; 756.00 MiB reserved in total by PyTorch)

I wonder why the update step needs so much memory.
Could you please help me? Thanks a lot!
By the way, my DGL version is dgl-cu102 (don't know whether this difference causes the error).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.