initzhang / ducati_sigmod Goto Github PK

Accepted paper of SIGMOD 2023, DUCATI: A Dual-Cache Training System for Graph Neural Networks on Giant Graphs with the GPU

Python 100.00%

ducati_sigmod's People

Contributors

Stargazers

Watchers

Forkers

commediajw odeinjul

ducati_sigmod's Issues

Mismatch in Number of Nodes for Friendster Dataset

While running the run_allocate script for the Friendster dataset, the output indicates a discrepancy in the number of nodes. The script reports 124 million nodes, whereas the actual number of nodes should be 65 million, as documented on the corresponding dataset website

Here's the relevant output from the run_allocate script:

Graph(num_nodes=124836180, num_edges=1806067135, ndata_schemes={} edata_schemes={})

Initially I inspected the dataset to observe that they had listed nodes ranging from 1 to 124M, but the unique count of these nodes was only 65M. Therefore, I renumbered the nodes. Even after the pre-processing step, run_allocate still outputs 124M as num_nodes.

I would require some help in verifying the functional correctness of the code in this case. Any pointers would be appreciated.

When I run the preprocess/PA.py,I encounter the ValueError: NULL pointer access error

I have already install the dc_dgl successfully.But I encounter this error:

How to resolve 'TypeError: 'NoneType' object is not subscriptable'?

Hello, following the instructions in the README, I successfully configured the environment. However, when I tried to run PA according to the commands, it seems that some issues have occurred. I would like to know where the problem is or if there might be an issue with the parameters I entered?

$CUDA_VISIBLE_DEVICES=0 python run_allocate.py --dataset ogbn-papers100M --fanouts 10,25 --fake-dim 128
2023-09-23 14:24:59,560 Namespace(adj_budget=0, adj_slope=1, batches=1000, bs=8000, dataset='ogbn-papers100M', fake_dim=128, fanouts='10,25', nfeat_budget=0, nfeat_slope=1, pre_batches=100, pre_epochs=2, runs=4, total_budget=1)
2023-09-23 14:25:00,598 loading raw dataset of ogbn-papers100M
2023-09-23 14:25:38,354 finish loading raw dataset, time elapsed: 37.76s
2023-09-23 14:26:09,762 finish preprocessing, time elapsed: 31.41s
2023-09-23 14:28:15,314 finish generating random features with dim=128, time elapsed: 124.61s
2023-09-23 14:28:16,243 Graph(num_nodes=111059956, num_edges=1615685872,
      ndata_schemes={}
      edata_schemes={})
2023-09-23 14:28:16,454 get 1000 seeds, 0.06GB on cuda:0
2023-09-23 14:28:16,455 start profiling and calculating slope
2023-09-23 14:28:42,581 finish calculating slope: adj(2.55) nfeat(13.45), time elapsed: 26.13s
2023-09-23 14:28:42,581 total cache budget: 1GB
2023-09-23 14:28:42,581 total adj size: 12.865GB, total nfeat size: 53.785GB
2023-09-23 14:28:44,675 finish constructing density and size array
2023-09-23 14:28:57,285 find the separate point 4565543
2023-09-23 14:28:57,325 nfeat entries: 1770741, adj entries: 2794802
2023-09-23 14:28:57,325 nfeat size: 0.858 GB, adj size: 0.142 GB
2023-09-23 14:28:57,684 dual cache allocation done, time_elapsed: 15.10s
2023-09-23 14:28:58,128 current allocation plan: 0.142GB adj cache & 0.858GB nfeat cache

Then

$CUDA_VISIBLE_DEVICES=0 python run_ducati.py

2023-09-23 14:42:06,607 Namespace(adj_budget=0, batches=1024, bs=8000, dataset='ogbn-papers100M', dropout=0.5, fake_dim=128, fanouts='10,25', lr=0.003, nfeat_budget=0, num_hidden=256, pre_batches=100, pre_epochs=2, runs=10)
2023-09-23 14:42:07,654 loading raw dataset of ogbn-papers100M
2023-09-23 14:42:45,337 finish loading raw dataset, time elapsed: 37.68s
2023-09-23 14:43:16,913 finish preprocessing, time elapsed: 31.58s
2023-09-23 14:45:22,367 finish generating random features with dim=128, time elapsed: 124.49s
2023-09-23 14:45:23,257 Graph(num_nodes=111059956, num_edges=1615685872,
      ndata_schemes={}
      edata_schemes={})
2023-09-23 14:45:23,485 get 1024 seeds, 0.06GB on cuda:0
gpu_flag None
gpu_map None
all_cache [None, None]
2023-09-23 14:45:23,882 buffer size: 0.185 GB
Traceback (most recent call last):
  File "run_ducati.py", line 109, in <module>
    entry(args, graph, all_data, seeds_list, counts)
  File "run_ducati.py", line 63, in entry
    run_one_list(seeds_list)
  File "run_ducati.py", line 49, in run_one_list
    cur_nfeat = nfeat_loader.load(input_nodes, nfeat_buf) # fetch nfeat
  File "/home/bear/workspace/DUCATI_SIGMOD/NfeatLoader.py", line 9, in load
    gpu_mask = self.gpu_flag[idx]
TypeError: 'NoneType' object is not subscriptable

Testing the dc_dgl installation result prompts ''CUDA: an illegal memory access was encountered''

Index error while running run_allocate file for UK Union dataset

File: /data/DUCATI_SIGMOD/run_allocate.py

I'm encountering an IndexError when running the code with the Uk Union dataset for batch sizes of 8192 and 4096. The total budget specified as params is 15-20GB onwards. Despite having 49.14 GB of available GPU memory, the code fails to execute intermittently.

P.S. I ran into a similar issue while using Twitter dataset, but on the adjacency cache allocation.
`

`
We observed that the above issue only popped up when the allocated adj cache size was larger than total adj size. Therefore, by increasing the fake_dim parameter, we essentially reduced the adj budget ( so the total adj never fits within the adj cache). But the issue still exists for adj cache as well. Wrt UK union, the issue is with nfeat cache allocation where the total nfeat size is way bigger than the total budget or the allocated nfeat cache.

Twitter: total adj size: 11.251GB, total nfeat size: 59.274GB
UK union: total adj size: 42.031GB, total nfeat size: 64.717GB

I would really appreciate if someone could shed some light on this. Thanks.

Can you provide a runtime environment based on Docker images?

The CUDA code is not provided in the source code, can you provide it?CUDA code

Question for function "DUCATI.CacheConstructor.separate_features_idx"

I would like to ask, why do we need to use randomly generated feature vectors (i.e., fake input)? If I misunderstood, could you please tell me the meaning of the function DUCATI.CacheConstructor.separate_features_idx? The following is your raw code:

def separate_features_idx(args, graph):
    separate_tic = time.time()
    train_idx = torch.nonzero(graph.ndata.pop("train_mask")).reshape(-1)
    adj_counts = graph.ndata.pop('adj_counts')
    nfeat_counts = graph.ndata.pop('nfeat_counts')

    # cleanup
    graph.ndata.clear()
    graph.edata.clear()

    # we prepare fake input for all datasets
    fake_nfeat = dgl.contrib.UnifiedTensor(torch.rand((graph.num_nodes(), args.fake_dim), dtype=torch.float), device='cuda')
    fake_label = dgl.contrib.UnifiedTensor(torch.randint(args.n_classes, (graph.num_nodes(), ), dtype=torch.long), device='cuda')

    mlog(f'finish generating random features with dim={args.fake_dim}, time elapsed: {time.time()-separate_tic:.2f}s')
    return graph, [fake_nfeat, fake_label], train_idx, [adj_counts, nfeat_counts]

How to obtain test results for accuracy?

When running the PA dataset and preparing to test the model's correctness, I encountered some issues. I stored the model at the end of the entry function in the run_ducati.py file. Subsequently, I tested the trained model, but it seems that my testing has some problems, and the accuracy I obtained is not correct. I would like to know how to set the parameters to achieve results similar to the paper. If you could provide me with an update to the testing code, I would greatly appreciate it. The parameters I set are fanout [15, 15, 15], and epoch: 20.

initzhang / ducati_sigmod Goto Github PK

ducati_sigmod's People

Contributors

Stargazers

Watchers

Forkers

ducati_sigmod's Issues

Mismatch in Number of Nodes for Friendster Dataset

When I run the preprocess/PA.py,I encounter the ValueError: NULL pointer access error

How to resolve 'TypeError: 'NoneType' object is not subscriptable'?

Testing the dc_dgl installation result prompts ''CUDA: an illegal memory access was encountered''

Index error while running run_allocate file for UK Union dataset

Can you provide a runtime environment based on Docker images?

The CUDA code is not provided in the source code, can you provide it?CUDA code

Question for function "DUCATI.CacheConstructor.separate_features_idx"

How to obtain test results for accuracy?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs