GithubHelp home page GithubHelp logo

Comments (2)

Edward-Sun avatar Edward-Sun commented on August 27, 2024

1080 Ti should work well. The training uses no more than 10GB GPU memory.

from knowledgegraphembedding.

p6jain avatar p6jain commented on August 27, 2024

I use the command:

bash run.sh train RotatE FB15k-237 0 0 1024 256 1000 9.0 1.0 0.00005 100000 16 -de

to train RotatE on a 11 GB GPU. I ensure it is completely free.
I still get the following error:

2022-03-31 19:32:37,370 INFO     negative_adversarial_sampling = False
2022-03-31 19:32:37,370 INFO     learning_rate = 0
2022-03-31 19:32:39,079 INFO     Training average positive_sample_loss at step 0: 5.635527
2022-03-31 19:32:39,079 INFO     Training average negative_sample_loss at step 0: 0.003591
2022-03-31 19:32:39,079 INFO     Training average loss at step 0: 2.819559
2022-03-31 19:32:39,079 INFO     Evaluating on Valid Dataset...
2022-03-31 19:32:39,552 INFO     Evaluating the model... (0/2192)
2022-03-31 19:33:38,650 INFO     Evaluating the model... (1000/2192)
2022-03-31 19:34:38,503 INFO     Evaluating the model... (2000/2192)
2022-03-31 19:34:49,981 INFO     Valid MRR at step 0: 0.005509
2022-03-31 19:34:49,982 INFO     Valid MR at step 0: 6894.798660
2022-03-31 19:34:49,982 INFO     Valid HITS@1 at step 0: 0.004733
2022-03-31 19:34:49,982 INFO     Valid HITS@3 at step 0: 0.005076
2022-03-31 19:34:49,982 INFO     Valid HITS@10 at step 0: 0.005646
Traceback (most recent call last):
  File "codes/run.py", line 371, in <module>
    main(parse_args())
  File "codes/run.py", line 315, in main
    log = kge_model.train_step(kge_model, optimizer, train_iterator, args)
  File "/home/prachi/related_work/KnowledgeGraphEmbedding/codes/model.py", line 315, in train_step
    loss.backward()
  File "/home/prachi/anaconda3/envs/py36/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/prachi/anaconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.95 GiB (GPU 0; 10.92 GiB total capacity; 7.41 GiB already allocated; 1.51 GiB free; 1.52 GiB cached)
run.sh: line 79: 
CUDA_VISIBLE_DEVICES=$GPU_DEVICE python -u $CODE_PATH/run.py --do_train \
    --cuda \
    --do_valid \
    --do_test \
    --data_path $FULL_DATA_PATH \
    --model $MODEL \
    -n $NEGATIVE_SAMPLE_SIZE -b $BATCH_SIZE -d $HIDDEN_DIM \
    -g $GAMMA -a $ALPHA -adv \
    -lr $LEARNING_RATE --max_steps $MAX_STEPS \
    -save $SAVE --test_batch_size $TEST_BATCH_SIZE \
    ${14} ${15} ${16} ${17} ${18} ${19} ${20}

: No such file or directory

I get similar errors on trying to train FB15k using the command in best_config.sh file.
I reduced the batchsize to 500 and it worked but the performance is much less than the numbers reported in the paper.

I am not sure what is the issue.

from knowledgegraphembedding.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.