my gpu is 1080ti Is it impossible to run with this gpu? (rotate) I wonder how

I use the command: <div class="snippet-clipboard-content notranslate position-rela

GPU Out Of Memory about knowledgegraphembedding HOT 2 CLOSED

deepgraphlearning commented on August 27, 2024

GPU Out Of Memory

from knowledgegraphembedding.

Comments (2)

Edward-Sun commented on August 27, 2024

1080 Ti should work well. The training uses no more than 10GB GPU memory.

from knowledgegraphembedding.

p6jain commented on August 27, 2024

I use the command:

bash run.sh train RotatE FB15k-237 0 0 1024 256 1000 9.0 1.0 0.00005 100000 16 -de

to train RotatE on a 11 GB GPU. I ensure it is completely free.
I still get the following error:

2022-03-31 19:32:37,370 INFO     negative_adversarial_sampling = False
2022-03-31 19:32:37,370 INFO     learning_rate = 0
2022-03-31 19:32:39,079 INFO     Training average positive_sample_loss at step 0: 5.635527
2022-03-31 19:32:39,079 INFO     Training average negative_sample_loss at step 0: 0.003591
2022-03-31 19:32:39,079 INFO     Training average loss at step 0: 2.819559
2022-03-31 19:32:39,079 INFO     Evaluating on Valid Dataset...
2022-03-31 19:32:39,552 INFO     Evaluating the model... (0/2192)
2022-03-31 19:33:38,650 INFO     Evaluating the model... (1000/2192)
2022-03-31 19:34:38,503 INFO     Evaluating the model... (2000/2192)
2022-03-31 19:34:49,981 INFO     Valid MRR at step 0: 0.005509
2022-03-31 19:34:49,982 INFO     Valid MR at step 0: 6894.798660
2022-03-31 19:34:49,982 INFO     Valid HITS@1 at step 0: 0.004733
2022-03-31 19:34:49,982 INFO     Valid HITS@3 at step 0: 0.005076
2022-03-31 19:34:49,982 INFO     Valid HITS@10 at step 0: 0.005646
Traceback (most recent call last):
  File "codes/run.py", line 371, in <module>
    main(parse_args())
  File "codes/run.py", line 315, in main
    log = kge_model.train_step(kge_model, optimizer, train_iterator, args)
  File "/home/prachi/related_work/KnowledgeGraphEmbedding/codes/model.py", line 315, in train_step
    loss.backward()
  File "/home/prachi/anaconda3/envs/py36/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/prachi/anaconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.95 GiB (GPU 0; 10.92 GiB total capacity; 7.41 GiB already allocated; 1.51 GiB free; 1.52 GiB cached)
run.sh: line 79: 
CUDA_VISIBLE_DEVICES=$GPU_DEVICE python -u $CODE_PATH/run.py --do_train \
    --cuda \
    --do_valid \
    --do_test \
    --data_path $FULL_DATA_PATH \
    --model $MODEL \
    -n $NEGATIVE_SAMPLE_SIZE -b $BATCH_SIZE -d $HIDDEN_DIM \
    -g $GAMMA -a $ALPHA -adv \
    -lr $LEARNING_RATE --max_steps $MAX_STEPS \
    -save $SAVE --test_batch_size $TEST_BATCH_SIZE \
    ${14} ${15} ${16} ${17} ${18} ${19} ${20}

: No such file or directory

I get similar errors on trying to train FB15k using the command in best_config.sh file.
I reduced the batchsize to 500 and it worked but the performance is much less than the numbers reported in the paper.

I am not sure what is the issue.

from knowledgegraphembedding.

GPU Out Of Memory about knowledgegraphembedding HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs