l1aoxingyu / pytorch-beginner Goto Github PK

View Code? Open in Web Editor NEW

3.0K 73.0 1.1K 838 KB

pytorch tutorial for beginners

Jupyter Notebook 20.38% Python 79.62%

pytorch-tutorial pytorch

pytorch-beginner's Introduction

pytorch-beginner

Toy project for pytorch beginner with simplest code.

Requirements

python 3.7 pytorch 1.0.0+

pytorch-beginner's People

Contributors

Stargazers

Watchers

Forkers

zhangruiskyline bayareamachinelearning baolitoumu tianxingyzxq leeon2vec leezqcst coxlee ahuman jianfly awesome-archive libennext yingchao-mai heroingr jun4ai kqmw123 frankatmech niucheney wanjinchang yydxlv allensmile lidapang bluemoonnight jealous1989 juruobudong ouya-bytes benjamesbabala wubizhi learningformyself dondon2475848 frozencookie ocsponge ao33tju wwxfromtju chenwgen mjchen611 keloli llcing huangzhechao1995 arrowway zhangleuestc mrliaocn leidongfeng slidelucask liulj13 hehuanshu96 mixcoder yanghaha11514 wantingallin lebronyxm mainak24 bruinxiong mengfansheng16 napolun279 kqdmqx xychen9459 hulalazz yonti jamesmarswang linpingchuan tntek yunji-unity mryeshuai peterbai624 paranoidw lidaguan manlinting farhan1234 matchman233 wlt992 chappyhome houfeng0205 grseb9s jskdr jhnlp muzi-8 firearasi cym1021 kommmy seanhsieh voldemortgin jdocin ddxu sandywangxiao qdsunstar goodluckwlx billho tiankong12 xiaominghuang shahariarrabby ilovecv wpfhtl luoming1994 kokozeng ghldun keithsw lijiazhen1994 autuanliu negative09 cathleenyu tangal0203

pytorch-beginner's Issues

acc 计算错误

文件：05-Recurrent Neural Network/recurrent_network.py
例如：line 86
if i % 300 == 0:
print('[{}/{}] Loss: {:.6f}, Acc: {:.6f}'.format(
epoch + 1, num_epoches, running_loss / (batch_size * i),
running_acc / (batch_size * i)))
错误值：running_loss / (batch_size*i) 等于零
原因： running_loss 为int类型
解决方法： running_loss.double()

CNN

when i run the code of number 4,it embraces a error:

AttributeError: 'Cnn' object has no attribute 'named_parameters'

07-Language Model 缺少文件

请问第７个项目没有例子吗，我发现./data/文件夹下的文件并不存在，无法运行

其次项目中有很多用来老版本pytorch所导致的问题(并不像README所说使用pytorch1.0)，希望作者有时间可以进行改进 : )

embeddings训练问题

您好，关于这份代码有两个问题想请教：

训练词向量为什么要对label也加上Variable呢？
这样训练完成后，如何保存训练好的shape为(vocab_size, word_dim)的词向量呢？

感谢您分享的代码和文章，希望能得到您的回复！

浮点数计算的小bug

Logistic_Regression.py和neural_network.py源代码里的都存在一个相同的浮点数计算的小bug：
eval_loss = 0
eval_acc = 0
需要做一下修改，方式有两种：
1）方式一：
eval_loss = 0.0
eval_acc = 0.0
2）方式二：
在源代码文件第一行，加上“from future import division”

Logistic Regression: the printing accept rate will always be 0 bcz of feature of torch 0.4.0

第二章，logistic regression 代码：
line 65 report warnning:

UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number

this will cause an error in line 74: the value of running_acc / (batch_size * i)) will always be 0 because it's a tensor devide a number, this operation is no longer supported in latest torch version.

One solution is to modify line 65, use tensor.item()
running_acc += num_correct.data[0].item()

AutoEncoder一节中的transform错误

img_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

这里有问题，我按照别人其他代码里面的transform改成下面这样可以跑

img_transform= transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])

rcnn

请问一下，rcnn训练的时候如果label时不定长的，应该如何处理？

issues about GAN

@L1aoXingyu 你好，GAN网络的训练判别器的时候不是要把生成器固定住吗？但是，代码中并没有哪一步把生成器的参数固定住。这样在训练判别器的时候，生成器的参数也会进行梯度更新的吧？

dz1/x1 , dz1/x2 貌似才是对的。。

https://github.com/SherlockLiao/pytorch-beginner/blob/66479b9cb69faabf5b38159072d49d3218a93015/11-backward/backward.py#L53

这里的注释有点不对？

CNN Autoencoder converges to white

it starts with a bit of randomness, and then afterl ike a 100 batches, it just spits out white images

RuntimeError: output with shape [1, 28, 28] doesn't match the broadcast shape [3, 28, 28]

RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.6/dist-packages/torchvision/datasets/mnist.py", line 95, in getitem
img = self.transform(img)
File "/usr/local/lib/python3.6/dist-packages/torchvision/transforms/transforms.py", line 70, in call
img = t(img)
File "/usr/local/lib/python3.6/dist-packages/torchvision/transforms/transforms.py", line 175, in call
return F.normalize(tensor, self.mean, self.std, self.inplace)
File "/usr/local/lib/python3.6/dist-packages/torchvision/transforms/functional.py", line 217, in normalize
tensor.sub(mean[:, None, None]).div(std[:, None, None])
RuntimeError: output with shape [1, 28, 28] doesn't match the broadcast shape [3, 28, 28]

should loss function for vae use binary_cross_entropy?

pytorch-beginner/08-AutoEncoder/Variational_autoencoder.py

Line 75 in 052ed0d

reconstruction_function = nn.MSELoss(size_average=False)

this line here it assigned to variable called BCE, but uses MSE instead

04-Convolutional-Neural-Network两个问题

1from logger import Logger有警告
2D:\Anaconda3\python.exe G:/pytorch/pytorch-beginner-master/04-Convolutional-Neural-Network/convolution_network.py
epoch 1

Traceback (most recent call last):
File "G:/pytorch/pytorch-beginner-master/04-Convolutional-Neural-Network/convolution_network.py", line 78, in
running_loss += loss.data[0] * label.size(0)
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

Process finished with exit code 1

shouldn't loss have add(-1) here?

pytorch-beginner/08-AutoEncoder/Variational_autoencoder.py

Line 87 in 61db1de

KLD_element = mu.pow(2).add_(logvar.exp()).mul_(-1).add_(1).add_(logvar)

Wrong code

There is something wrong in the normalization part of the dataset; Also, 'save_image' does not accept the ‘illegal’ input. Users should fix these bugs.

issue with pytorch-beginner/05-Recurrent Neural Network/recurrent_network.py

need to change .data[0] => .item()
add model.train() at beginning of the loop

Only need to modify the training loop code, below is the fixed code worked for me :)


for epoch in range(num_epoches):
    model.train()
    print('epoch {}'.format(epoch + 1))
    print('*' * 10)
    running_loss = 0.0
    running_acc = 0.0
    for i, data in enumerate(train_loader, 1):
        img, label = data
        b, c, h, w = img.size()
        assert c == 1, 'channel must be 1'
        img = img.squeeze(1)
        # img = img.view(b*h, w)
        # img = torch.transpose(img, 1, 0)
        # img = img.contiguous().view(w, b, -1)
        if use_gpu:
            img = Variable(img).cuda()
            label = Variable(label).cuda()
        else:
            img = Variable(img)
            label = Variable(label)
            
        
        # 向前传播
        out = model(img)
        loss = criterion(out, label)
        running_loss += loss.item() * label.size(0)
        _, pred = torch.max(out, 1)
        num_correct = (pred == label).sum()
        running_acc += num_correct.item()
        # 向后传播
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if i % 300 == 0:
            print('[{}/{}] Loss: {:.6f}, Acc: {:.6f}'.format(
                epoch + 1, num_epoches, running_loss / (batch_size * i),
                running_acc / (batch_size * i)))
    print('Finish {} epoch, Loss: {:.6f}, Acc: {:.6f}'.format(
        epoch + 1, running_loss / (len(train_dataset)), running_acc / (len(
            train_dataset))))
    model.eval()
    eval_loss = 0.
    eval_acc = 0.
    for data in test_loader:
        img, label = data
        b, c, h, w = img.size()
        assert c == 1, 'channel must be 1'
        img = img.squeeze(1)
        # img = img.view(b*h, w)
        # img = torch.transpose(img, 1, 0)
        # img = img.contiguous().view(w, b, h)
        if use_gpu:
            img = Variable(img, volatile=True).cuda()
            label = Variable(label, volatile=True).cuda()
        else:
            img = Variable(img, volatile=True)
            label = Variable(label, volatile=True)
        out = model(img)
        loss = criterion(out, label)
        eval_loss += loss.item() * label.size(0)
        _, pred = torch.max(out, 1)
        num_correct = (pred == label).sum()
        eval_acc += num_correct.item()
    print('Test Loss: {:.6f}, Acc: {:.6f}'.format(eval_loss / (len(
        test_dataset)), eval_acc / (len(test_dataset))))
    print()

Zero accuracy in 02 logistic regression

In 02 Logistic Regression, all accuracies are zero. Need to convert torch tensor to float.

line 68: running_acc += num_correct.data[0] --> running_acc += num_correct.data[0].float()
line 98: eval_acc += num_correct.data[0] --> eval_acc += num_correct.data[0].float()

Reccurent network can't work

Reccurent network can't work.Information is as follows:
D:\Program Files (x86)\Anaconda3\python.exe" F:/py/pytorch_LSTM.py
Traceback (most recent call last):
File "F:/py/pytorch_LSTM.py", line 47, in
model = model.cuda()
File "D:\Program Files (x86)\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 147, in cuda
return self._apply(lambda t: t.cuda(device_id))
File "D:\Program Files (x86)\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 118, in _apply
module._apply(fn)
File "D:\Program Files (x86)\Anaconda3\lib\site-packages\torch\nn\modules\rnn.py", line 116, in apply
self.flatten_parameters()
File "D:\Program Files (x86)\Anaconda3\lib\site-packages\torch\nn\modules\rnn.py", line 95, in flatten_parameters
fn.rnn_desc = rnn.init_rnn_descriptor(fn, handle)
File "D:\Program Files (x86)\Anaconda3\lib\site-packages\torch\backends\cudnn\rnn.py", line 54, in init_rnn_descriptor
fn.datatype
File "D:\Program Files (x86)\Anaconda3\lib\site-packages\torch\backends\cudnn_init.py", line 229, in init
if version() >= 6000:
TypeError: '>=' not supported between instances of 'NoneType' and 'int'

02 不是逻辑回归，应该还是线性回归。逻辑回归需要要激活函数，你好好看一下

RuntimeError: cudnn RNN backward can only be called in training mode

when I run this code called: recurrent_network.py which in pytorch-beginner-master\05-Recurrent Neural Network

a error come:
Traceback (most recent call last): File "recurrent_network.py", line 83, in <module> loss.backward() File "C:\Users\yuanz\Miniconda3\envs\py36\lib\site-packages\torch\tensor.py", line 102, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "C:\Users\yuanz\Miniconda3\envs\py36\lib\site-packages\torch\autograd\__init__.py", line 90, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: cudnn RNN backward can only be called in training mode

关于02-Logistic Regression/Logistic_Regression.py，这个模型应该只是普通的线性模型？

您好，02-Logistic Regression/Logistic_Regression.py中34行直接将线性的结果输出了，那是否意味着这仅仅是一个线性模型，而非逻辑回归，逻辑回归应该必须有一个激活函数，类似relu或者sigmoid？

模型建立问题

model = Cnn(1, 10) #
不应该是Cnn(3,10)吗 rgb三通道

error calculation

In the autoencoder example, the values printed i.e.
print('epoch [{}/{}], loss:{:.4f}' .format(epoch + 1, num_epochs, loss.data[0]))
this is the error associated with the final batch of the data right ?

autoencoder

请问08章的readme里面, encoder的图是如何得到的？以及图里的-1.5到1.5代表什么含义？谢谢！

关于pytorch的gpu加速问题

为什么我使用了.cuda()将model和Variable放到gpu但gpu并不进行加速运算

A puzzle about the model in chapter 2.

The model in chapter 2 should be multilayer perceptron instead of logistic regression, since which can only be used in two classfication problem. How about changing the model to softmax which is available for multi-classification problem? thanks.

why can‘t I set a batch size larger than 12？

if I set batch_size=32 it returns "Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)"
but when I set batch_size=10(<12) the model works normally..
Dose someone had the same problem with me ?

RuntimeError: CUDNN_STATUS_INTERNAL_ERROR_04-Convolutional Neural Network

pytorch环境(ubuntu16.04)

使用anocoda上安装的pytorch
cudatoolkit: 8.0-3
cudnn: 7.0.5-cuda8.0_0
pytorch: 0.3.0-py35cuda8.0cudnn7.0_0

运行04,碰到关于cudnn的问题, 错误信息如下:

epoch 1

Traceback (most recent call last):
File "convolution_network.py", line 76, in
out = model(img)
File "/home/yjx/.conda/envs/pytorch/lib/python3.5/site-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "convolution_network.py", line 48, in forward
out = self.conv(x)
File "/home/yjx/.conda/envs/pytorch/lib/python3.5/site-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/yjx/.conda/envs/pytorch/lib/python3.5/site-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/home/yjx/.conda/envs/pytorch/lib/python3.5/site-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/yjx/.conda/envs/pytorch/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 277, in forward
self.padding, self.dilation, self.groups)
File "/home/yjx/.conda/envs/pytorch/lib/python3.5/site-packages/torch/nn/functional.py", line 90, in conv2d
return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_INTERNAL_ERROR

代码测试了自己的cuda和cudnn

CUDA TEST
import torch
x = torch.Tensor([1.0])
xx = x.cuda()
print(xx)
CUDNN TEST
from torch.backends import cudnn
print(cudnn.is_acceptable(xx))
~
显示是正常可用,

Error for MNIST autoEncoder

Your file produces Error:
RuntimeError: output with shape [1, 28, 28] doesn't match the broadcast shape [3, 28, 28]

It probably due to the gray-scale image downloaded automatically.

Zero accuracy in 03 Logistic Regression

Accuracies are all zeros. Need to convert torch tensor to float

line 68: running_acc += num_correct.data[0] -->running_acc += num_correct.data[0].float()

line 98: eval_acc += num_correct.data[0] --> eval_acc += num_correct.data[0].float()

The way you compute the accuracy is wrong?

Loss and training

Hi, I think that there is a few mistakes in the simple and convolutional autoencoders :

The displayed loss is the loss on the last image of the epoch instead of the loss over the whole epoch
The autoencoder is not tested on the test dataset
The autoencoder is never in "train" or "test" mode
Is it normal ?