GithubHelp home page GithubHelp logo

cripac-dig / dgsr Goto Github PK

View Code? Open in Web Editor NEW
60.0 2.0 12.0 13.99 MB

[TKDE 2022] The source code of "Dynamic Graph Neural Networks for Sequential Recommendation"

Python 96.24% Shell 3.76%
recommender-system sequential-recommendation graph-neural-networks dynamic-graph-embedding

dgsr's People

Contributors

opilgrim avatar zm7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dgsr's Issues

Error occur when setting the long term to orgat and short term to last

Here is the command I used and the execution result:
%run new_main.py --data=Games
--gpu=0
--epoch=20
--hidden_size=50
--batchSize=50
--user_long=orgat
--user_short=last
--item_long=orgat
--item_short=last
--user_update=rnn
--item_update=rnn
--lr=0.001
--l2=0.0001
--layer_num=2
--item_max_length=50
--user_max_length=50
--attn_drop=0.3
--feat_drop=0.3
--record \


RuntimeError Traceback (most recent call last)
/content/drive/MyDrive/TCSS556/DGSR-master/new_main.py in ()
112 for user, batch_graph, label, last_item in train_data:
113 iter += 1
--> 114 score = model(batch_graph.to(device), user.to(device), last_item.to(device), is_training=True)
115 loss = loss_func(score, label.to(device))
116 optimizer.zero_grad()

8 frames
/content/drive/MyDrive/TCSS556/DGSR-master/DGSR.py in user_reduce_func(self, nodes)
272 return {'user_h': h[0]}
273 else:
--> 274 return {'user_h': self.agg_gate_u(torch.cat(h,-1))}
275
276 def graph_user(bg, user_index, user_embedding):

RuntimeError: Tensors must have same number of dimensions: got 2 and 1

dgl版本问题

老师您好,我搜的最新版本的dgl是0.6.1,没有找到dgl 0.7.2版本,请问您是在哪下载的

About a question in new_data.py

Hello, I note that in new_data.py, when it start to create dynamic sub-graph, it skip the first timestamp(i.e. if j == 0: continue). I dont't know why it shoud be skipped. I would appreciate that if you can answer my question!!!

构建图时内存溢出问题

老师您好,我在执行new_data.py时,出现了以下问题:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Using backend: pytorch
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
exception calling callback for <Future at 0x7f068bb4c610 state=finished raised TerminatedWorkerError>
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 347, in call
self.parallel.dispatch_next()
File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 780, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 847, in dispatch_one_batch
self._dispatch(tasks)
File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 765, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/opt/conda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 529, in apply_async
future = self._workers.submit(SafeFunction(func))
File "/opt/conda/lib/python3.7/site-packages/joblib/externals/loky/reusable_executor.py", line 178, in submit
fn, *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 1102, in submit
raise self._flags.broken
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

The exit codes of the workers are {SIGABRT(-6)}
Traceback (most recent call last):
File "new_data.py", line 161, in
all_num = generate_data(data, graph, opt.item_max_length, opt.user_max_length, train_path, test_path, val_path, job=opt.job, k_hop=opt.job)
File "new_data.py", line 129, in generate_data
a = Parallel(n_jobs=job)(delayed(lambda u: generate_user(u, data, graph, item_max_length, user_max_length, train_path, test_path, k_hop, val_path))(u) for u in user)
File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 1042, in call
self.retrieve()
File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 921, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/opt/conda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 540, in wrap_future_result
return future.result(timeout=timeout)
File "/opt/conda/lib/python3.7/concurrent/futures/_base.py", line 435, in result
return self.__get_result()
File "/opt/conda/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/opt/conda/lib/python3.7/site-packages/joblib/externals/loky/_base.py", line 625, in _invoke_callbacks
callback(self)
File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 347, in call
self.parallel.dispatch_next()
File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 780, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 847, in dispatch_one_batch
self._dispatch(tasks)
File "/opt/conda/lib/python3.7/site-packages/joblib/parallel.py", line 765, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/opt/conda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 529, in apply_async
future = self._workers.submit(SafeFunction(func))
File "/opt/conda/lib/python3.7/site-packages/joblib/externals/loky/reusable_executor.py", line 178, in submit
fn, *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 1102, in submit
raise self._flags.broken
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

The exit codes of the workers are {SIGABRT(-6)}
这似乎是一个内存溢出问题,我尝试创建单个用户的图可以成功,但这样效率太低了,想请问下老师有没有解决的建议?

Key Error while training and predicting

I received the following error during training and predicting based on the Games dataset.
'''
start training: 2022-04-25 23:07:12.871280
Iter 400, loss 9.3309 2022-04-25 23:17:20.633740
Iter 800, loss 9.1787 2022-04-25 23:27:15.086462
Epoch 0, loss 9.1725 =============================================
start predicting: 2022-04-25 23:27:29.583910

KeyError Traceback (most recent call last)
/content/drive/MyDrive/TCSS556/DGSR-master/new_main.py in ()
145 all_loss = []
146 with torch.no_grad():
--> 147 for user, batch_graph, label, last_item, neg_tar in test_data:
148 iter+=1
149 score, top = model(batch_graph.to(device), user.to(device), last_item.to(device), neg_tar=torch.cat([label.unsqueeze(1), neg_tar],-1).to(device), is_training=False)

3 frames
/usr/local/lib/python3.7/dist-packages/torch/_utils.py in reraise(self)
455 # instantiate since we don't know how to
456 raise RuntimeError(msg) from None
--> 457 raise exception
458
459

KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index_class_helper.pxi", line 105, in pandas._libs.index.Int64Engine._check_type
File "pandas/_libs/index_class_helper.pxi", line 105, in pandas._libs.index.Int64Engine._check_type
KeyError: tensor([0])

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/content/drive/MyDrive/TCSS556/DGSR-master/new_main.py", line 92, in
test_data = DataLoader(dataset=test_set, batch_size=opt.batchSize, collate_fn=lambda x: collate_test(x, data_neg), pin_memory=True, num_workers=8)
File "/content/drive/MyDrive/TCSS556/DGSR-master/DGSR.py", line 341, in collate_test
return torch.tensor(user).long(), dgl.batch(graph), torch.tensor(label).long(), torch.tensor(last_item).long(), torch.Tensor(neg_generate(user, user_neg)).long()
File "/content/drive/MyDrive/TCSS556/DGSR-master/DGSR.py", line 326, in neg_generate
neg[i] = np.random.choice(data_neg[u], neg_num, replace=False)
File "/usr/local/lib/python3.7/dist-packages/pandas/core/series.py", line 942, in getitem
return self._get_value(key)
File "/usr/local/lib/python3.7/dist-packages/pandas/core/series.py", line 1051, in _get_value
loc = self.index.get_loc(label)
File "/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: tensor([0])
'''
Here is all the steps I did to produce this error. I ran the code in google colab.
'''
%run new_data.py --data=Games --job=10 --item_max_length=50 --user_max_length=50 --k_hop=2
%run generate_neg.py
%run new_main.py --data=Games
--gpu=0
--epoch=20
--hidden_size=50
--batchSize=50
--user_long=orgat
--user_short=att
--item_long=orgat
--item_short=att
--user_update=rnn
--item_update=rnn
--lr=0.001
--l2=0.0001
--layer_num=2
--item_max_length=50
--user_max_length=50
--attn_drop=0.3
--feat_drop=0.3
--record \

Aborted (core dumped)

您好,python 3.8, torch1.7, dgl 0.7.2, 显卡24G,内存160G, 请问,我在跑这个代码中的new_data.py时,用Movie数据集,只取样50条样本的时候能通过,100条样本就会出现下面报错,是什么原因只能跑这么少的样本呢?大家有没有遇上类似的问题,急求解决,谢谢

terminate called after throwing an instance of 'dmlc::Error'
what(): [11:07:45] /opt/dgl/src/array/cpu/./rowwise_pick.h:89: Check failed: rid < mat.num_rows (2 vs. 1) :
Stack trace:
[bt] (0) /root/miniconda3/lib/python3.8/site-packages/dgl/libdgl.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7f9ab4cb332f]
[bt] (1) /root/miniconda3/lib/python3.8/site-packages/dgl/libdgl.so(+0x59b122) [0x7f9ab4d03122]
[bt] (2) /root/miniconda3/lib/python3.8/site-packages/torch/lib/libgomp-7c85b1e2.so.1(GOMP_parallel+0x3f) [0x7f9bba37b01f]
[bt] (3) /root/miniconda3/lib/python3.8/site-packages/dgl/libdgl.so(dgl::aten::COOMatrix dgl::aten::impl::CSRRowWisePick(dgl::aten::CSRMatrix, dgl::runtime::NDArray, long, bool, std::function<void (long, long, long, long const*, long const*, long*)>)+0x29a) [0x7f9ab4d0353a]
[bt] (4) /root/miniconda3/lib/python3.8/site-packages/dgl/libdgl.so(dgl::aten::COOMatrix dgl::aten::impl::CSRRowWiseTopk<(DLDeviceType)1, long, long>(dgl::aten::CSRMatrix, dgl::runtime::NDArray, long, dgl::runtime::NDArray, bool)+0x133) [0x7f9ab4d0c273]
[bt] (5) /root/miniconda3/lib/python3.8/site-packages/dgl/libdgl.so(dgl::aten::CSRRowWiseTopk(dgl::aten::CSRMatrix, dgl::runtime::NDArray, long, dgl::runtime::NDArray, bool)+0x426) [0x7f9ab4c93ef6]
[bt] (6) /root/miniconda3/lib/python3.8/site-packages/dgl/libdgl.so(dgl::sampling::SampleNeighborsTopk(std::shared_ptrdgl::BaseHeteroGraph, std::vector<dgl::runtime::NDArray, std::allocatordgl::runtime::NDArray > const&, std::vector<long, std::allocator > const&, dgl::EdgeDir, std::vector<dgl::runtime::NDArray, std::allocatordgl::runtime::NDArray > const&, bool)+0x1364) [0x7f9ab547fc04]
[bt] (7) /root/miniconda3/lib/python3.8/site-packages/dgl/libdgl.so(+0xd1bf3a) [0x7f9ab5483f3a]
[bt] (8) /root/miniconda3/lib/python3.8/site-packages/dgl/libdgl.so(+0xd1c694) [0x7f9ab5484694]

Aborted (core dumped)

是否存在数据泄露问题?

作者你好,在阅读代码过程中,发现在为某一个用户构建动态图的过程可能会把别的用户的target item也会包含进去,这是否会导致数据泄露?

Bug regarding Train, Validation, Test split

Hello,

Thank you for the very interesting paper, really cool concept.

I had a question about how the Train/Validation/Test split happens in the code:

https://github.com/ZM7/DGSR/blob/2293cbdf6e043e9afc02d313e15458124dbd31b7/new_data.py#L112-L123

It seems to me that using this logic:

  • items of time == 0 ... time == split_point - 2 are added to the train data
  • items of time == split_point - 3 is added to the validation data
  • items of time != split_point - 3 is added to the test data

This would mean that the items in the training data are also added to the test data.

In the following table this is illustrated further:

time train validation test
0 x x
1 x x
2 x x
3 x x
4 x x
5 x
6 x

This seems very odd. I was wondering if this was intended or a bug?

Question about Weight matrix definition in paper

Hello,

Thanks again for the interesting research. We were further investigating the paper and I had a question.

Is the weight matrix 'W1' in equation 5 the same as the weight matrix 'W1' in equation 7? And is this also the same as in equation 14?

I was under the the impression this was true, however in equation 5 and 7 the weights are multiplied with the item embedding and in equation 14 it is multiplied with user embedding.

image

Thank you for your research and kind regards,
Linkerbrain

Got the following error during the testing

Traceback (most recent call last):
File "new_main.py", line 120, in
for user, batch_graph, label, last_item, neg_tar in test_data:
File "/usr/scratch/yjiang400/miniconda3/envs/dgsr/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/usr/scratch/yjiang400/miniconda3/envs/dgsr/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/usr/scratch/yjiang400/miniconda3/envs/dgsr/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/usr/scratch/yjiang400/miniconda3/envs/dgsr/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise
raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/scratch/yjiang400/miniconda3/envs/dgsr/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/scratch/yjiang400/miniconda3/envs/dgsr/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "new_main.py", line 92, in
test_data = DataLoader(dataset=test_set, batch_size=opt.batchSize, collate_fn=lambda x: collate_test(x, data_neg), pin_memory=True, num_workers=8)
File "/export/hdd/scratch/yjiang400/DGSR/DGSR.py", line 341, in collate_test
return torch.tensor(user).long(), dgl.batch(graph), torch.tensor(label).long(), torch.tensor(last_item).long(), torch.Tensor(neg_generate(user, user_neg)).long()
File "/export/hdd/scratch/yjiang400/DGSR/DGSR.py", line 326, in neg_generate
neg[i] = np.random.choice(data_neg[u], neg_num, replace=False)
File "/usr/scratch/yjiang400/miniconda3/envs/dgsr/lib/python3.6/site-packages/pandas/core/series.py", line 871, in getitem
result = self.index.get_value(self, key)
File "/usr/scratch/yjiang400/miniconda3/envs/dgsr/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 4405, in get_value
return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
File "pandas/_libs/index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 135, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index_class_helper.pxi", line 109, in pandas._libs.index.Int64Engine._check_type
KeyError: tensor([0])

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.