中文文本分类，TextCNN，TextRNN，FastText，TextRCNN，BiLSTM_Attention，DPCNN，Transformer，基于pytorch，开箱即用。

License: MIT License

Python 100.00%

chinese-text-classification-pytorch's People

Contributors

Stargazers

Watchers

Forkers

brucew91 wangchong111 ottsion jalencato shu-howting quixoteji yanhouzhen chenny0808 juary88 tabversion guoqunabc zjuxiaoleiliu arnoldzhou liujunwang caleb66666 hjqjet 0eij0 sheller2010 snowy44 qianrenjian delaiahz lzgh whitespur 1997shirley jiahaiwu lyj555 communicateconnectcreate tony1236 yxiao1994 1048693172 xiapeng23 bennykuya houchaoxu albertchen1991 liaoi douyuc jinghao2eebd weihan-zhang meta-chen lifeisgoodcdj nlp-coder shijie97 lzhan011 milkwhite buaachuanwang panandicoding zhaoxingying awesome-archive howardchenhd scauapc ymbkxc generalzh legendtianjin nonva tchigher yutiangong xialei liyingkun1237 shaunstanislauslau amydoulaohu jnxiongjun seeker1943 hmthanh ser1993ching hell-to-heaven blake2002 xrosliang songym2020 junlongzhao tfighting gaohaihui zemu121 moshilangzi calvinzhu hellomaxwell pjy12345611 qism shaoqibnu guozhengpku andrealee yangking0834131 wj-mcat fangego hlstudio rubby33 monroeyu30 zhangtian1993 649459021 xuliang102663 xw-jia psyxusheng p-p-x exuding caoyj1010 monkeygh huwenxianglyy tudingneau appleyc benkang-chen yuansky

chinese-text-classification-pytorch's Issues

Use utf-8 everywhere

Hi guys, I 'm really appreciated for the algorithms you have provided.

Could you please use utf-8 encoding everywhere?
E.g., FastText.py, 16-17 lines should be the following:

        self.class_list = [x.strip() for x in open(
            dataset + '/data/class.txt', **encoding='utf-8'**).readlines()]

Otherwise I get the expected error:

    dataset + '/data/class.txt').readlines()]                                # 类别名单
  File "C:\Python37\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 51: character maps to <undefined>

Could you please add it to each algorithm?

Transformer是否有问题？

在multi-head attention 那儿求出attention之后的view()似乎会让顺序错误。
我自己试了一试，假设h=3, batch =1, 句子长度4, 词向量5。

a = torch.randn(3,4,5)
a
tensor([[[-0.1241, 0.0364, 1.2337, -0.5907, 0.8305],
[-0.0610, -0.9682, 0.7830, 1.5998, -0.6637],
[ 0.1863, -1.2179, 0.0710, 0.6962, -0.0442],
[ 0.0584, -0.5964, 0.8453, -1.3244, -0.0499]],

    [[ 2.7228,  0.6973, -1.2440,  1.8854,  2.3017],
     [-0.1034, -1.7281, -1.1495, -0.2478, -0.8541],
     [-0.2823, -0.3416, -1.3749,  0.2995, -0.1860],
     [-1.1601,  0.9876,  0.2881, -1.8866, -1.3901]],

    [[-1.1265,  1.2683, -0.7065,  0.0946,  0.3501],
     [-0.1266,  1.2834, -1.2694,  1.1730, -0.3443],
     [ 1.4679,  2.1238,  0.2405, -0.4388,  0.8566],
     [ 1.8933,  0.4461,  2.2419,  0.6118, -1.5001]]])

a.view(1, -1, 15)
tensor([[[-0.1241, 0.0364, 1.2337, -0.5907, 0.8305, -0.0610, -0.9682,
0.7830, 1.5998, -0.6637, 0.1863, -1.2179, 0.0710, 0.6962,
-0.0442],
[ 0.0584, -0.5964, 0.8453, -1.3244, -0.0499, 2.7228, 0.6973,
-1.2440, 1.8854, 2.3017, -0.1034, -1.7281, -1.1495, -0.2478,
-0.8541],
[-0.2823, -0.3416, -1.3749, 0.2995, -0.1860, -1.1601, 0.9876,
0.2881, -1.8866, -1.3901, -1.1265, 1.2683, -0.7065, 0.0946,
0.3501],
[-0.1266, 1.2834, -1.2694, 1.1730, -0.3443, 1.4679, 2.1238,
0.2405, -0.4388, 0.8566, 1.8933, 0.4461, 2.2419, 0.6118,
-1.5001]]])
可以看到view只是按顺序拼接，并没有做到concat

数据量有关么

在util.py构造DatasetIterater时， if len(batches) % self.n_batches != 0 报错n.n_batches =0 了，这跟数据量有关么？我自己的任务训练数据比较少

还没解决，您这边解决了吗

Originally posted by @wangzhedaye in #8 (comment)

我这里也有相同的问题?请问找到原因了吗?

npy文件

你好请问，搜狗词向量npz文件是原始文件吗，还是需要处理，想知道文件格式。
我现在处理英文数据，采用斯坦福的glove词向量文件，是txt文件，想知道怎么调整呢

FileNotFoundError: [Errno 2] No such file or directory: './THUCNews/data/sgns.sogou.char'

请问在执行utils.py的时候出现了这个错误，sgns.sogou.char是什么文件呢

模型跑长文本数据时loss波动太大，训练不出来，这是为什么呢？

感谢大佬分享！
本人跑代码中的短文本时，结果和你发的一样，然后我就用模型跑长文本，还是THUCNews数据集，用的数据是：https://github.com/gaussic/text-classification-cnn-rnn中用到的数据，下载链接: https://pan.baidu.com/s/1hugrfRu 密码: qfud

pad_size设置成600或1000都不行，日志如下：
Loading data...
50000it [00:27, 1849.42it/s]
5000it [00:02, 1879.22it/s]
10000it [00:05, 1667.15it/s]
Time usage: 0:00:36
<bound method Module.parameters of Model(
(embedding): Embedding(6206, 300)
(convs): ModuleList(
(0): Conv2d(1, 256, kernel_size=(2, 300), stride=(1, 1))
(1): Conv2d(1, 256, kernel_size=(3, 300), stride=(1, 1))
(2): Conv2d(1, 256, kernel_size=(4, 300), stride=(1, 1))
)
(dropout): Dropout(p=0.5)
(fc): Linear(in_features=768, out_features=10, bias=True)
)>
Epoch [1/20]
Iter: 0, Train Loss: 2.2, Train Acc: 17.97%, Val Loss: 3.5, Val Acc: 10.00%, Time: 0:00:03 *
Iter: 100, Train Loss: 1.4e-06, Train Acc: 100.00%, Val Loss: 2.3e+01, Val Acc: 10.00%, Time: 0:00:41
Iter: 200, Train Loss: 1.2e+01, Train Acc: 0.00%, Val Loss: 2.1e+01, Val Acc: 10.00%, Time: 0:01:19
Iter: 300, Train Loss: 1.3e-06, Train Acc: 100.00%, Val Loss: 3.3e+01, Val Acc: 10.00%, Time: 0:01:58
Epoch [2/20]
Iter: 400, Train Loss: 1.8, Train Acc: 45.31%, Val Loss: 1.2e+01, Val Acc: 14.48%, Time: 0:02:35
Iter: 500, Train Loss: 2.4e-05, Train Acc: 100.00%, Val Loss: 2.2e+01, Val Acc: 10.00%, Time: 0:03:13
Iter: 600, Train Loss: 1.0, Train Acc: 68.75%, Val Loss: 3.2, Val Acc: 15.80%, Time: 0:03:51 *
Iter: 700, Train Loss: 0.06, Train Acc: 98.44%, Val Loss: 8.1, Val Acc: 10.00%, Time: 0:04:29
Epoch [3/20]
Iter: 800, Train Loss: 0.0094, Train Acc: 100.00%, Val Loss: 1.6e+01, Val Acc: 10.00%, Time: 0:05:06
Iter: 900, Train Loss: 2.1e+01, Train Acc: 0.00%, Val Loss: 2.3e+01, Val Acc: 10.00%, Time: 0:05:45
Iter: 1000, Train Loss: 0.25, Train Acc: 99.22%, Val Loss: 3.5, Val Acc: 11.32%, Time: 0:06:23
Iter: 1100, Train Loss: 3.9, Train Acc: 0.00%, Val Loss: 3.9, Val Acc: 13.44%, Time: 0:07:01
Epoch [4/20]
Iter: 1200, Train Loss: 0.0073, Train Acc: 100.00%, Val Loss: 1.1e+01, Val Acc: 11.70%, Time: 0:07:38
Iter: 1300, Train Loss: 0.47, Train Acc: 85.94%, Val Loss: 7.9, Val Acc: 18.34%, Time: 0:08:17
Iter: 1400, Train Loss: 0.063, Train Acc: 100.00%, Val Loss: 4.9, Val Acc: 12.82%, Time: 0:08:55
Iter: 1500, Train Loss: 0.47, Train Acc: 89.84%, Val Loss: 3.1, Val Acc: 23.12%, Time: 0:09:33 *
Epoch [5/20]
Iter: 1600, Train Loss: 0.0037, Train Acc: 100.00%, Val Loss: 7.4, Val Acc: 15.36%, Time: 0:10:10
Iter: 1700, Train Loss: 0.027, Train Acc: 100.00%, Val Loss: 1.3e+01, Val Acc: 10.34%, Time: 0:10:49
Iter: 1800, Train Loss: 3.7, Train Acc: 3.91%, Val Loss: 5.2, Val Acc: 11.98%, Time: 0:11:27
Iter: 1900, Train Loss: 0.1, Train Acc: 98.44%, Val Loss: 4.0, Val Acc: 24.80%, Time: 0:12:05
Epoch [6/20]
Iter: 2000, Train Loss: 0.72, Train Acc: 76.56%, Val Loss: 4.2, Val Acc: 28.34%, Time: 0:12:42
Iter: 2100, Train Loss: 0.034, Train Acc: 99.22%, Val Loss: 8.1, Val Acc: 13.46%, Time: 0:13:20
Iter: 2200, Train Loss: 0.4, Train Acc: 92.97%, Val Loss: 4.2, Val Acc: 33.14%, Time: 0:13:59
Iter: 2300, Train Loss: 0.2, Train Acc: 96.09%, Val Loss: 4.6, Val Acc: 22.86%, Time: 0:14:36
Epoch [7/20]
Iter: 2400, Train Loss: 0.16, Train Acc: 96.88%, Val Loss: 3.7, Val Acc: 35.28%, Time: 0:15:13
Iter: 2500, Train Loss: 0.014, Train Acc: 99.22%, Val Loss: 8.2, Val Acc: 11.10%, Time: 0:15:52
No optimization for a long time, auto-stopping...
/usr/local/lib/python3.5/dist-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.
'precision', 'predicted', average, warn_for)
Test Loss: 3.3, Test Acc: 23.61%
Precision, Recall and F1-Score...
precision recall f1-score support

      a     0.8361    0.0510    0.0961      1000
      b     1.0000    0.0140    0.0276      1000
      c     1.0000    0.0020    0.0040      1000
      d     0.0000    0.0000    0.0000      1000
      e     0.0000    0.0000    0.0000      1000
      f     0.1711    0.7450    0.2783      1000
      g     0.8728    0.5490    0.6740      1000
      h     0.0791    0.0510    0.0620      1000
      i     0.2523    0.9490    0.3986      1000
      j     0.0000    0.0000    0.0000      1000

avg / total 0.4211 0.2361 0.1541 10000

Confusion Matrix...
[[ 51 0 0 0 0 325 0 0 91 533]
[ 3 14 0 0 0 627 1 159 196 0]
[ 1 0 2 0 0 182 3 415 397 0]
[ 2 0 0 0 0 458 56 16 468 0]
[ 3 0 0 0 0 866 4 4 123 0]
[ 0 0 0 0 0 745 12 0 243 0]
[ 1 0 0 0 0 55 549 0 395 0]
[ 0 0 0 0 0 891 0 51 58 0]
[ 0 0 0 0 0 50 1 0 949 0]
[ 0 0 0 0 0 155 3 0 842 0]]
Time usage: 0:00:06
模型跑长文本数据时loss波动太大，训练不出来，这是为什么呢？应该如何修改呢？谢谢！

请问下怎么对外提供服务

我现在使用的是flask，但是不知道flask加载对外提供服务，望告知，谢谢了

字转为id有问题

if pad_size:
if len(token) < pad_size:
token.extend([vocab.get(PAD)] * (pad_size - len(token)))
else:
token = token[:pad_size]
seq_len = pad_size
# word to id
for word in token:
words_line.append(vocab.get(word, vocab.get(UNK)))
contents.append((words_line, int(label), seq_len))

这里，vocab.get(word, vocab.get(UNK))得到上面PAD补长的id，这个id不在字典中，最后都成了UNK的id。

typo in utils.py

use_word拼写成了ues_word

可能的一个bug

在utils.py里的load_dataset函数里，第56行
token.extend([vocab.get(PAD)] * (pad_size - len(token)))
在token里padding，获得的是PAD的id
但下面的代码做了word to id,
for word in token:
words_line.append(vocab.get(word, vocab.get(UNK)))
这样的话，因为PAD的id不在词库里，所以PAD都变成了UNK的id了吧，所以个人认为第56行
应该是
token.extend([PAD] * (pad_size - len(token)))

出现：TypeError: Parameter to MergeFrom() must be instance of same class: expected Summary got Summary. for field Event.summary

TypeError: Parameter to MergeFrom() must be instance of same class: expected Summary got Summary. for field Event.summary

关于中文分词的方式

首先有一个细节，当我先用 --word False 也就是默认的 char 为单位运行代码，会生成相应的char的vocab，接下来如果我改成用 --word True 以词为单位运行代码时，并不会再次生成相应的词的vocab，因为这段代码里面做了判断，如果 vocab.pkl 存在就直接读取了，所以需要手动把 vocab.pkl 先删除掉。

    if os.path.exists(config.vocab_path):
        vocab = pkl.load(open(config.vocab_path, 'rb'))
    else:
        vocab = build_vocab(config.train_path, tokenizer=tokenizer, max_size=MAX_VOCAB_SIZE, min_freq=1)
        pkl.dump(vocab, open(config.vocab_path, 'wb'))

另外在运行 --word True 时，发现测试集里面的中文分词效果并不好，很多就是整段整段的了，这里可能我自己可以再用分词库去处理一下了。

{'': 0, 'ThinkPad': 1, 'LG': 2, '2011': 3, 'CJ': 4, '明日股市三大猜想及应对策略': 5, 'HTC': 6, '不派息': 7, '图文-火箭常规训练': 8, '2010': 9, '每日晚间实力机构点评热门个股精选': 10, 'E3': 11, 'IdeaPad': 12, '十大机构看后市': 13, '股海导航': 14, '盘面解读：八大机构预测今日市场走向': 15, 'iPhone': 16 ...

怎么解决

parser = argparse.ArgumentParser(description='Chinese Text Classification')
parser.add_argument('--model', type=str, required=True, help='TextRNN_Att')
parser.add_argument('--embedding', default='pre_trained', type=str, help='random ')
parser.add_argument('--word', default=False, type=bool, help='True for word, False for char')
args = parser.parse_args()
usage: run.py [-h] --model MODEL [--embedding EMBEDDING] [--word WORD]
run.py: error: the following arguments are required: --model

Some questions for fine-tuning pre-trained model of FastText

Hi,

Thank you for your great work.

I have trained the FastText with your code. Now I want to fine-tune it to accomplish a text binary classification task. Does the pre-trained model support for fine-tuning? And How can I do that?

Thanks!

textCNN在推理时，dropout没有去掉？

在测试的时候，不需要dropout吧?

建议增加一个TorchText 作为对比

大佬好，请问一下必须要使用预训练词向量么，想用自己的数据集

您好，想咨询一下fasttext输入特征的结构问题

您好，您在定义fasttext的网络结构的时候，对于word，bigram和trigram为什么是在向量的维度上进行拼接（将300维的词向量扩充到900维），而不是直接在句子的维度上拼接（将长度为32的句子扩充到96）呢？刚刚接触这方面的东西，所以没有很理解，还望您能解答一下哈

def forward(self, x):
out_word = self.embedding(x[0])
out_bigram = self.embedding_ngram2(x[2])
out_trigram = self.embedding_ngram3(x[3])
out = torch.cat((out_word, out_bigram, out_trigram), -1)
out = out.mean(dim=1)

请问DPCNN卷积层的设置是不是有点问题？

请问 repo 中已经有的预训练的词向量是自己训的还是readme链接中给的？

链接中的embedding 文件名和大小跟这里的都不一致啊。。

关于load_dataset(path, pad_size)

if pad_size:
  if len(token) < pad_size:
      token.extend([PAD] * (pad_size - len(token)))
  else:
     token = token[:pad_size]
      seq_len = pad_size

如果文本长度超过pad_szie，这里token=token[:pad_size]，那pan_size之后的文本是不是没用上，对这里不是很懂

the while stop condition in DPCNN seems wrong?

Chinese-Text-Classification-Pytorch/models/DPCNN.py

Line 69 in 6cb2681

while x.size()[2] > 2:

Should it be >= 2?

关于自己的数据集lable只有三四类需要怎么办

请问如果自己的文本不是分九类，Lable只有三四类该怎么办

您好，能问下以下FastText两个函数里面的的数值和公式是怎么来的嘛？

def biGramHash(sequence, t, buckets):
t1 = sequence[t - 1] if t - 1 >= 0 else 0
return (t1 * 14918087) % buckets

def triGramHash(sequence, t, buckets):
    t1 = sequence[t - 1] if t - 1 >= 0 else 0
    t2 = sequence[t - 2] if t - 2 >= 0 else 0
    return (t2 * 14918087 * 18408749 + t1 * 14918087) % buckets

您好，有几个问题：
1、buckets=self.n_gram_vocab = 250499 ，这个参数数值250499是怎么来的呢。
2、4918087、14918087、 18408749的数值是怎么来的呢。
3、这种return的计算方式不太明白呢，可以给个学习参考链接嘛。
非常谢谢！

AttributeError: 'Config' object has no attribute 'log_path'

Loading data...
Vocab size: 4762
180000it [02:11, 1369.46it/s]
10000it [00:07, 1290.27it/s]
10000it [00:11, 862.36it/s]
Time usage: 0:02:31
<bound method Module.parameters of Model(
(embedding): Embedding(4762, 300)
(lstm): LSTM(300, 128, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True)
(tanh1): Tanh()
(tanh2): Tanh()
(fc1): Linear(in_features=256, out_features=64, bias=True)
(fc): Linear(in_features=64, out_features=10, bias=True)
)>
Traceback (most recent call last):
File "run.py", line 53, in
train(config, model, train_iter, dev_iter, test_iter)
File "/home/zgy/wll/Chinese-Text-Classification-Pytorch-master/train_eval.py", line 40, in train
writer = SummaryWriter(log_dir=config.log_path + '/' + time.strftime('%m-%d_%H.%M', time.localtime()))
AttributeError: 'Config' object has no attribute 'log_path'

关于transformer

您好，您的代码给了我很大帮助，关于transformer我想请教一下，一个[batch_size, seq_len, embed_size]的tensor经过transformer的encoder后还是[batch_size, seq_len, embed_size]，然后可以在第二个维度上累加为[batch_size, embed_size]，进行分类任务吗？

TextRNN的一个注释小问题

def forward(self, x):
        x, _ = x
        out = self.embedding(x)  # [batch_size, seq_len, embeding]=[128, 32, 300]
        out, _ = self.lstm(out)
        out = self.fc(out[:, -1, :])  # 句子最后时刻的 hidden state
        return out

这里的out[:, -1, :]不应该是句子最后一个状态（单词）对应的输出么，这里注释的是hidden state

请问作者会考虑，新增RoBERTa、Albert的中文实现吗？

准确率没那么高？大家如何呢？

使用了新的数据集，字典5300，训练集750000，验证集8000，测试集83599. 结果不理想。

`
Epoch [1/20]
Iter: 0, Train Loss: 3.7, Train Acc: 0.00%, Val Loss: 3.1, Val Acc: 4.48%, Time: 0:01:14 *

Iter: 100, Train Loss: 2.7e-05, Train Acc: 100.00%, Val Loss: 2.3e+01, Val Acc: 4.46%, Time: 0:05:59

Iter: 200, Train Loss: 9.7e-08, Train Acc: 100.00%, Val Loss: 2.4e+01, Val Acc: 4.47%, Time: 0:10:55

Iter: 300, Train Loss: 4.6e-06, Train Acc: 100.00%, Val Loss: 3.6e+01, Val Acc: 0.96%, Time: 0:15:56

Iter: 400, Train Loss: 0.0001, Train Acc: 100.00%, Val Loss: 2.3e+01, Val Acc: 2.36%, Time: 0:20:57

Iter: 500, Train Loss: 0.0016, Train Acc: 100.00%, Val Loss: 9.8, Val Acc: 18.74%, Time: 0:26:16

Iter: 600, Train Loss: 0.0025, Train Acc: 100.00%, Val Loss: 1.1e+01, Val Acc: 18.76%, Time: 0:32:22

Iter: 700, Train Loss: 0.00082, Train Acc: 100.00%, Val Loss: 1.2e+01, Val Acc: 18.76%, Time: 0:38:37

Iter: 800, Train Loss: 0.00033, Train Acc: 100.00%, Val Loss: 1.4e+01, Val Acc: 18.75%, Time: 0:44:52

Iter: 900, Train Loss: 5e-06, Train Acc: 100.00%, Val Loss: 1.5e+01, Val Acc: 18.75%, Time: 0:51:07

Iter: 1000, Train Loss: 1e-06, Train Acc: 100.00%, Val Loss: 1.5e+01, Val Acc: 18.75%, Time: 0:57:40
`

关于Assertion `cur_target >= 0 && cur_target

你好~我想请教一下，我替换了自己的数据之后运行FastText时会出现以下报错：
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at c:\n\pytorch_1559129895673\work\aten\src\thnn\generic/ClassNLLCriterion.c:93
出现这个报错的原因应该是num.classes和label数量不一致，或者是label并非从0开始。我注意到了这个问题并确认数据label和class.txt无误，但还是持续报错，请问这样该怎么解决呢0 0

why do you set " padding_idx=config.n_vocab - 1" in TextCnn instead of "padding_idx=0"?

vocab.pkl 与word2vec 预训练文件对应问题

大神你好，请问一个问题：

在 utils.py 中，main()方法将训练集中出现的字词，
重新生成vocab.pkl 以及对应的embedded npz文件。这样训练集中没有出现的字词就是unk，不会出现在vocab.pkl 也没有对应的pretrained embed。

请问为什么一定要加这个步骤了？
不加这个步骤，利用从搜狗新闻训练的所有字词及其预训练embedded向量，也可以顺利模型训练和预测。

1.增加这个步骤，只考虑训练集中出现的字词，可以提高准确性，还是别的考虑，谢谢？
2.另外vocab.pkl 中额外增加了UNK 和PAD，但是对应的embedded npz文件似乎没有对应UNK 和PAD 的id 对应的embed vector？
谢谢您
@649453932

请问这个怎么对新文本进行预测啊。。我网上翻了好多教程，还是没懂

类似于下面这种的：
if name == 'main':
cnn_model = CnnModel()
test_demo = ['三星ST550以全新的拍摄方式超越了以往任何一款数码相机',
'热火vs骑士前瞻：皇帝回乡二番战东部次席唾手可得新浪体育讯北京时间3月30日7:00']
for i in test_demo:
print(cnn_model.predict(i))

ModuleNotFoundError: No module named 'models'如何解决

[zgy@localhost Chinese-Text-Classification-Pytorch-master]$ python run.py --model TextCNN
Traceback (most recent call last):
File "run.py", line 30, in
x = import_module('models.' + model_name)
File "/home/zgy/miniconda3/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 994, in _gcd_import
File "", line 971, in _find_and_load
File "", line 941, in _find_and_load_unlocked
File "", line 219, in _call_with_frames_removed
File "", line 994, in _gcd_import
File "", line 971, in _find_and_load
File "", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'models'

utils.py中的def next(self)好像有一点小问题

def next(self):
if self.residue and self.index == self.n_batches:
batches = self.batches[self.index * self.batch_size: len(self.batches)]
self.index += 1
batches = self._to_tensor(batches)
return batches

    elif self.index > self.n_batches:
        self.index = 0
        raise StopIteration
    else:
        batches = self.batches[self.index * self.batch_size: (self.index + 1) * self.batch_size]
        self.index += 1
        batches = self._to_tensor(batches)
        return batches

这里应该是elif self.index >= self.n_batches吧？？？？

请问是否可以写一个预测类

请问生成完训练的模型并且保存下来以后，怎样使用这个模型去预测新数据呢？

如题。现在按照作者的流程可以再命令行中生成并保存下来模型，但是后面如果想要对新的数据进行预测后续该怎么操作呢？

support是什么评价指标？

_pickle.UnpicklingError: invalid load key, '\x0a'. OSError: Failed to interpret file 'THUCNews/data/embedding_SougouNews.npz' as a pickle

Traceback (most recent call last):
File "/home/mli/.pyenv/versions/miniconda-latest/envs/li/lib/python3.7/site-packages/numpy/lib/npyio.py", line 460, in load
return pickle.load(fid, **pickle_kwargs)
_pickle.UnpicklingError: invalid load key, '\x0a'.

File "/home/mli/.pyenv/versions/miniconda-latest/envs/li/lib/python3.7/site-packages/numpy/lib/npyio.py", line 463, in load
"Failed to interpret file %s as a pickle" % repr(file))
OSError: Failed to interpret file 'THUCNews/data/embedding_SougouNews.npz' as a pickle
求大佬解答

649453932 / chinese-text-classification-pytorch Goto Github PK

chinese-text-classification-pytorch's People

Contributors

Stargazers

Watchers

Forkers

chinese-text-classification-pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs