GithubHelp home page GithubHelp logo

svip-lab / ppgnet Goto Github PK

View Code? Open in Web Editor NEW
173.0 173.0 37.0 97 KB

Source code for our CVPR 2019 paper - PPGNet: Learning Point-Pair Graph for Line Segment Detection

Home Page: https://www.aiyoggle.me/publication/ppgnet-cvpr19/

License: MIT License

Python 99.08% Shell 0.92%

ppgnet's People

Contributors

allankevinrichie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ppgnet's Issues

Algorithm function

Can this algorithm be used to detect specific linear targets on the way? Not for all straight lines

The line_pred visualized on nothing.

Hey, thanks for your nice work.
I wanna try your work to see whether it works in my pics. First, I need to train it, I started by train.sh.
But when I use tensorboard to visualize the junctions and lines, it seems that line_pred output nothing like below. Is that normal?
image
image

Thanks!

ValueError: num_samples should be a positive integer value, but got num_samples=0

Hello, in root directory i did "git lfs pull", and my files in path "ckpt/backbone/"
decoder_epoch_20.pth has 162.5Mb
encoder_epoch_20.pth has 95Mb

When I run ./train.sh I have:

Loading weights for net_encoder @ ckpt/backbone/encoder_epoch_20.pth
Loading weights for net_decoder @ ckpt/backbone/decoder_epoch_20.pth
Traceback (most recent call last):
File "main.py", line 521, in
fire.Fire(LSDTrainer)
File "/home/user/anaconda3/envs/ppgnet/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/user/anaconda3/envs/ppgnet/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/user/anaconda3/envs/ppgnet/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "main.py", line 121, in init
pin_memory=True
File "/home/user/anaconda3/envs/ppgnet/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 176, in init
sampler = RandomSampler(dataset)
File "/home/user/anaconda3/envs/ppgnet/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 66, in init
"value, but got num_samples={}".format(self.num_samples))

ValueError: num_samples should be a positive integer value, but got num_samples=0

can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Hello! I have gtx1050mobile with CUDA on my laptop. My ./train.sh contains:

python main.py
--exp-name line_weighted_wo_focal_junc --backbone resnet50
--backbone-kwargs '{"encoder_weights": "ckpt/backbone/encoder_epoch_20.pth", "decoder_weights": "ckpt/backbone/decoder_epoch_20.pth"}'
--dim-embedding 256 --junction-pooling-threshold 0.2
--junc-pooling-size 64 --attention-sigma 1.5 --block-inference-size 2
--data-root ./data/indoorDist/ --junc-sigma 3
--batch-size 2 --gpus 0,1,2,3 --num-workers 0 --resume-epoch latest
--is-train-junc True --is-train-adj True
--vis-junc-th 0.1 --vis-line-th 0.1
- train --end-epoch 9 --solver SGD --lr 0.2 --weight-decay 5e-4 --lambda-heatmap 1. --lambda-adj 5.
- train --end-epoch 15 --solver SGD --lr 0.02 --weight-decay 5e-4 --lambda-heatmap 1. --lambda-adj 10.
- train --end-epoch 30 --solver SGD --lr 0.002 --weight-decay 5e-4 --lambda-heatmap 1. --lambda-adj 10.
- end

After run I get:

Loading weights for net_encoder @ ckpt/backbone/encoder_epoch_20.pth
Loading weights for net_decoder @ ckpt/backbone/decoder_epoch_20.pth
find 481 jucntions.
Traceback (most recent call last):
File "main.py", line 548, in
fire.Fire(LSDTrainer)
File "/home/user/.virtualenvs/PPGNet/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/user/.virtualenvs/PPGNet/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/user/.virtualenvs/PPGNet/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "main.py", line 377, in train
self._train_epoch()
File "main.py", line 238, in _train_epoch
junctions_gt
File "/home/user/.virtualenvs/PPGNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/user/.virtualenvs/PPGNet/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/user/.virtualenvs/PPGNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/user/Загрузки/PPGNet/PPGNet/models/lsd.py", line 202, in forward
junc_hm, junc_coords = self.junc_infer(feat)
File "/home/user/.virtualenvs/PPGNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/user/Загрузки/PPGNet/PPGNet/models/graph.py", line 42, in forward
coord = coord[ind[:self.max_juncs]]
File "/home/user/.virtualenvs/PPGNet/lib/python3.6/site-packages/torch/tensor.py", line 458, in array
return self.numpy()
TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

效果很差

我只用该模型前半部分做图片上关键点的检测,但是效果很差,也看不到优化的可能,不知道怎么弄了

ModuleNotFoundError: No module named 'sklearn.neighbors._kd_tree'

Hallo,
I know this might get overlooked since the repository is relatively old, but maybe someone has an idea what might be the solution to my problem.
Normally I use the pytorch/pytorch:0.4.1-cuda9-cudnn7-devel Docker environment for my training of the PPGNet, which worked perfectly fine up until this week. Since then I get the following Error-message:

Traceback (most recent call last):
  File "main.py", line 523, in <module>
    fire.Fire(LSDTrainer)
  File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 471, in _Fire
    target=component.__name__)
  File "/opt/conda/lib/python3.6/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "main.py", line 360, in train
    self._train_epoch()
  File "main.py", line 211, in _train_epoch
    for i, batch in enumerate(data_loader):
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 314, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 314, in <listcomp>
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/home/simon/PPGNet/data/sist_line.py", line 24, in __getitem__
    lg = LineGraph().load(os.path.join(self.data_root, self.img[item][:-4] + ".lg"))
  File "/home/simon/PPGNet/data/line_graph.py", line 31, in load
    data = pickle.load(f)
ModuleNotFoundError: No module named 'sklearn.neighbors._kd_tree'

I use the recommended packages, as it can be seen in the following pip list:

Package          Version
---------------- -----------
asn1crypto       0.24.0
backcall         0.1.0
beautifulsoup4   4.6.0
certifi          2018.4.16
cffi             1.11.5
chardet          3.0.4
conda            4.5.8
conda-build      3.12.0
cryptography     2.2.2
Cython           0.29.32
decorator        4.3.0
filelock         3.0.4
fire             0.4.0
glob2            0.6
idna             2.6
ipython          6.4.0
ipython_genutils 0.2.0
jedi             0.12.1
Jinja2           2.10
llvmlite         0.36.0
lxml             4.9.1
MarkupSafe       1.0
mkl-fft          1.0.4
mkl-random       1.0.1
numba            0.53.1
numpy            1.19.5
olefile          0.45.1
opencv-python    4.6.0.66
parso            0.3.1
pexpect          4.6.0
pickleshare      0.7.4
Pillow           5.2.0
pip              21.3.1
pkginfo          1.4.2
prompt-toolkit   1.0.15
protobuf         3.19.6
psutil           5.4.6
ptyprocess       0.6.0
pycosat          0.6.3
pycparser        2.18
Pygments         2.2.0
pyOpenSSL        18.0.0
PySocks          1.6.8
PyYAML           3.13
requests         2.18.4
ruamel_yaml      0.15.37
scikit-learn     0.19.2
scipy            1.1.0
setuptools       39.2.0
Shapely          1.8.5.post1
simplegeneric    0.8.1
six              1.11.0
tensorboardX     2.5.1
termcolor        1.1.0
torch            0.4.1
torchvision      0.2.1
traitlets        4.3.2
triangle         20220202
urllib3          1.22
wcwidth          0.1.7
wheel            0.31.1

Currently I don't know a solution. I tried conda environments instead of docker, no success. Fresh installation on a different system, no success. Newer version of torch, torchvision, scikit-learn, no success.

But it seems like I am not the only one with the problem, since here (https://githubhelp.com/qingyonghu/sensaturban/issues/18) the same issue is also discussed without a solution.

RuntimeError: DataLoader worker (pid 11499) is killed by signal: Segmentation fault. when running train.sh

I have installed the exact package versions as listed in the README.md file. By the way, NumPy version 0.15.0 does not exist. I suspect this is a typo and the README should read NumPy 1.15.0. I have also downloaded the data from Google Drive, unzipped via 7z, then untarred the resulting file.

I have modified the train.sh file to point to the downloaded data. When running the script on an AWS EC2 instance (p2.xlarge), I receive the following error:

(pytorch_p36) [ec2-user@ip-xxx-xx-xx-xxx PPGNet]$ ./train.sh
Loading weights for net_encoder @ ckpt/backbone/encoder_epoch_20.pth
Loading weights for net_decoder @ ckpt/backbone/decoder_epoch_20.pth
start training epoch: 0
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
Traceback (most recent call last):
  File "main.py", line 521, in <module>
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
    fire.Fire(LSDTrainer)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "main.py", line 358, in train
    self._train_epoch()
  File "main.py", line 209, in _train_epoch
    for i, batch in enumerate(data_loader):
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 582, in __next__
    return self._process_next_batch(batch)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/ec2-user/ray/PPGNet/data/sist_line.py", line 24, in __getitem__
    lg = LineGraph().load(os.path.join(self.data_root, self.img[item][:-4] + ".lg"))
  File "/home/ec2-user/ray/PPGNet/data/line_graph.py", line 29, in load
    data = pickle.load(f)
  File "sklearn/neighbors/binary_tree.pxi", line 1166, in sklearn.neighbors.kd_tree.BinaryTree.__setstate__
  File "stringsource", line 653, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 348, in View.MemoryView.memoryview.__cinit__
TypeError: a bytes-like object is required, not 'code'

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll
    pid, sts = os.waitpid(self.pid, flag)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py", line 63, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 11499) is killed by signal: Segmentation fault.

Searching on forums, they suggest that you should set num_workers=0. When I tried doing that, I simply get a segmentation fault without premise:

(pytorch_p36) [ec2-user@ip-xxx-xx-xx-xxx PPGNet]$ ./train.sh
Loading weights for net_encoder @ ckpt/backbone/encoder_epoch_20.pth
Loading weights for net_decoder @ ckpt/backbone/decoder_epoch_20.pth
start training epoch: 0
./train.sh: line 13: 11528 Segmentation fault      python main.py --exp-name line_weighted_wo_focal_junc --backbone resnet50 --backbone-kwargs '{"encoder_weights": "ckpt/backbone/encoder_epoch_20.pth", "decoder_weights": "ckpt/backbone/decoder_epoch_20.pth"}' --dim-embedding 256 --junction-pooling-threshold 0.2 --junc-pooling-size 64 --attention-sigma 1.5 --block-inference-size 128 --data-root ./indoorDist --junc-sigma 3 --batch-size 16 --gpus 0,1,2,3 --num-workers 0 --resume-epoch latest --is-train-junc True --is-train-adj True --vis-junc-th 0.1 --vis-line-th 0.1 - train --end-epoch 9 --solver SGD --lr 0.2 --weight-decay 5e-4 --lambda-heatmap 1. --lambda-adj 5. - train --end-epoch 15 --solver SGD --lr 0.02 --weight-decay 5e-4 --lambda-heatmap 1. --lambda-adj 10. - train --end-epoch 30 --solver SGD --lr 0.002 --weight-decay 5e-4 --lambda-heatmap 1. --lambda-adj 10. - end

I also suspect this may be due to the batch size, even though a p2.xlarge has K80 GPUs so this should theoretically handle the training as per the README.md instructions (requiring a GPU with at least 24 GB of RAM). I've tried reducing the batch size to 1 and still get the same error.


The above errors when looking at the original training script as is suggests that there is something wrong with the line graph (.lg) files and I'm not familiar with how these are structured to proceed any further. Any help would be appreciated in sorting this out.

specify the dataset path in the train.sh script

Step 3. specify the dataset path in the train.sh script.

For example, I extract the archive to the root of the project (/indoorDist).

In train.sh:

python main.py
--exp-name line_weighted_wo_focal_junc --backbone resnet50
--backbone-kwargs '{"encoder_weights": "ckpt/backbone/encoder_epoch_20.pth", "decoder_weights": "ckpt/backbone/decoder_epoch_20.pth"}'
--dim-embedding 256 --junction-pooling-threshold 0.2
--junc-pooling-size 64 --attention-sigma 1.5 --block-inference-size 16
--data-root /data/path --junc-sigma 3
--batch-size 16 --gpus 0,1,2,3 --num-workers 10 --resume-epoch latest
--is-train-junc True --is-train-adj True
--vis-junc-th 0.1 --vis-line-th 0.1
- train --end-epoch 9 --solver SGD --lr 0.2 --weight-decay 5e-4 --lambda-heatmap 1. --lambda-adj 5.
- train --end-epoch 15 --solver SGD --lr 0.02 --weight-decay 5e-4 --lambda-heatmap 1. --lambda-adj 10.
- train --end-epoch 30 --solver SGD --lr 0.002 --weight-decay 5e-4 --lambda-heatmap 1. --lambda-adj 10.
- end

Tell me please where I have to set path in this script?

parameter setting

Thanks for your work. I met a problem during training stage.
In graph.py line 118, there is a error "Calculated padded input size per channel: (7). Kernel size: (8). Kernel size can't be greater than actual input size", could you please help me with that?

Trained Model

Thanks for your work.

Unfortunately, I don't have sufficient resources to train model myself.

Do you plan to release trained checkpoint model anytime soon? It would be good to provide some instructions on running the trained model on some test images.

数据集

请问论文里您说的自己做的那个数据集可以发布吗?

too many "find 0 jucntions" in the log message

find 0 jucntions. find 0 jucntions. epoch: [1][1382/2500], lr: 0.2, time_total: 1.87, time_data: 0.01, time_net: 1.70, time_vis: 0.16, loss: 0.4219, loss_heatmap: 0.1099, loss_adj_mtx: 0.0624 find 0 jucntions. find 0 jucntions. epoch: [1][1383/2500], lr: 0.2, time_total: 1.87, time_data: 0.01, time_net: 1.70, time_vis: 0.16, loss: 0.4001, loss_heatmap: 0.1532, loss_adj_mtx: 0.0494 find 0 jucntions. find 0 jucntions. epoch: [1][1384/2500], lr: 0.2, time_total: 1.87, time_data: 0.01, time_net: 1.70, time_vis: 0.16, loss: 0.4026, loss_heatmap: 0.1498, loss_adj_mtx: 0.0506 find 0 jucntions. find 0 jucntions. epoch: [1][1385/2500], lr: 0.2, time_total: 1.87, time_data: 0.01, time_net: 1.70, time_vis: 0.16, loss: 0.4419, loss_heatmap: 0.1236, loss_adj_mtx: 0.0637 find 0 jucntions. find 0 jucntions. epoch: [1][1386/2500], lr: 0.2, time_total: 1.87, time_data: 0.01, time_net: 1.70, time_vis: 0.16, loss: 0.5633, loss_heatmap: 0.1879, loss_adj_mtx: 0.0751 find 0 jucntions. find 0 jucntions. epoch: [1][1387/2500], lr: 0.2, time_total: 1.87, time_data: 0.01, time_net: 1.70, time_vis: 0.16, loss: 0.3670, loss_heatmap: 0.1545, loss_adj_mtx: 0.0425 find 0 jucntions. find 0 jucntions. epoch: [1][1388/2500], lr: 0.2, time_total: 1.87, time_data: 0.01, time_net: 1.70, time_vis: 0.16, loss: 0.4504, loss_heatmap: 0.1039, loss_adj_mtx: 0.0693 find 0 jucntions. find 0 jucntions. epoch: [1][1389/2500], lr: 0.2, time_total: 1.87, time_data: 0.01, time_net: 1.70, time_vis: 0.16, loss: 0.4568, loss_heatmap: 0.0956, loss_adj_mtx: 0.0722 find 0 jucntions. find 0 jucntions. epoch: [1][1390/2500], lr: 0.2, time_total: 1.87, time_data: 0.01, time_net: 1.70, time_vis: 0.16, loss: 0.3710, loss_heatmap: 0.1334, loss_adj_mtx: 0.0475

Is this training properly? or If not, what should I do?

结果可视化

测试结果的可视化结果很差,出现大面积的污染,主要污染表现为R、G、B三通道存在通道数值达到255,不确定是什么原因导致?

│RuntimeError: CUDA error: out of memory

I try to reproduce the result with 4 1080TI ,and i reduce the batch size=4,
The detailed configuration
python main.py \ --exp-name line_weighted_wo_focal_junc --backbone resnet50 \ --backbone-kwargs '{"encoder_weights": "ckpt/backbone/encoder_epoch_20.pth", "decoder_weights": "ckpt/backbone/decoder_epoch_20.pth"}' \ --dim-embedding 256 --junction-pooling-threshold 0.2 \ --junc-pooling-size 64 --attention-sigma 1.5 --block-inference-size 4 \ --data-root "data/indoorDist" --junc-sigma 3 \ --batch-size 4 --gpus 0,1,2,3 --num-workers 10 --resume-epoch latest \ --is-train-junc True --is-train-adj True \ --vis-junc-th 0.1 --vis-line-th 0.1 \ - train --end-epoch 9 --solver SGD --lr 0.2 --weight-decay 5e-4 --lambda-heatmap 1. --lambda-adj 5. \ - train --end-epoch 15 --solver SGD --lr 0.02 --weight-decay 5e-4 --lambda-heatmap 1. --lambda-adj 10. \ - train --end-epoch 30 --solver SGD --lr 0.002 --weight-decay 5e-4 --lambda-heatmap 1. --lambda-adj 10. \ - end

Traceback (most recent call last):
File "main.py", line 522, in
fire.Fire(LSDTrainer)
File ".local/lib/python3.6/site-packages/fire/core.py", line 138, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File ".local/lib/python3.6/site-packages/fire/core.py", line 471, in _Fire
target=component.name)
File ".local/lib/python3.6/site-packages/fire/core.py", line 675, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "main.py", line 359, in train
self._train_epoch()
File "main.py", line 227, in _train_epoch
img, heatmap_gt, adj_mtx_gt, self.lambda_heatmap, self.lambda_adj, junctions_gt
File "envs/r0.3.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "envs/r0.3.0/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "envs/r0.3.0/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "envs/r0.3.0/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
raise output
File "envs/r0.3.0/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
output = module(*input, **kwargs)
File "envs/r0.3.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "LSD/PPGNet/models/lsd.py", line 291, in forward
adj_matrix_pred, loss_adj = block_adj_infer(feat_adj)
File "envs/r0.3.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "LSD/PPGNet/models/common.py", line 281, in forward
output = submodule(output)
File "envs/r0.3.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "LSD/PPGNet/models/lsd.py", line 70, in forward
line_feat = self.line_pool(feat, junc_st, junc_ed)
File "envs/r0.3.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "LSD/PPGNet/models/graph.py", line 86, in forward
output = F.grid_sample(feat[int(bs)].view(1, ch, h, w).expand(num_st, ch, h, w), sample_grid)
File "envs/r0.3.0/lib/python3.6/site-packages/torch/nn/functional.py", line 2717, in grid_sample
return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum)
RuntimeError: CUDA out of memory. Tried to allocate 44.00 MiB (GPU 0; 10.91 GiB total capacity; 1.53 GiB already allocated; 15.38 MiB free; 94.03 MiB cached)

为什么我下载下来的代码运行会报错?求助作者和各位正常使用的大神们

作者您好,我按照步骤来运行train.sh,但是直接就给我报错了, File "main.py", line 98
if resume_epoch and os.path.isfile(os.path.join("ckpt", exp_name, f"train_states_{resume_epoch}.pth")):
^
SyntaxError: invalid syntax
这应该是这句话附近哪里符号或者空格之类的有误?下载之后我没有改动过代码,我也没检查出哪里有问题,不知道怎么解决?

WARNING:root:NaN or Inf found in input tensor.

您好,感谢您的工作。我在训练的时候出现了WARNING:root:NaN or Inf found in input tensor.这个问题,训练完之后效果非常差,因为显卡不够只按照您的建议做了如下修改:
batch-size 1 block-inference-size 32 请问效果很差是什么原因造成的呢,期待您的回答。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.