lancopku / superae Goto Github PK

Code for "Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization"

Python 51.91% Perl 48.09%

superae's Introduction

Citation

If you use this code for your research, please cite the paper this code is based on: Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization:

@inproceedings{Ma2016superAE,
  title   = {Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization},
  author  = {Shuming Ma and Xu Sun and Junyang Lin and Houfeng Wang},
  booktitle = {{ACL} 2018},
  year      = {2018}
}

superae's People

Contributors

Stargazers

Watchers

superae's Issues

KeyError: 'unexpected key "decoder_s2s.embedding.weight" in state_dict'

Traceback (most recent call last):
File "predict.py", line 177, in
model.load_state_dict(checkpoints['model'])
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 522, in load_state_dict
.format(name))
KeyError: 'unexpected key "decoder_s2s.embedding.weight" in state_dict'
在运行predict加载模型时，出现这个错误，请问下是什么原因呢？

I can`t find "data" module in the source code.

I notice that "data" module has been imported in many places, but I can`t find it anywhere. It confuses me a lot, could you please help me to set this matter?

您好，能提供一下完整的代码吗？十分感谢

通过读您的论文找到的这个代码，想跑一次，但是发现代码不完整，您能提供一下完整的代码吗？或者发邮箱也可以[email protected] 十分感谢！

RuntimeError: The size of tensor a (128) must match the size of tensor b (64) at non-singleton dimension 1

/home/TengWei/SAE/models/attention.py:32: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
weights = self.softmax(weights) # batch * time
Traceback (most recent call last):
File "train.py", line 345, in
main()
File "train.py", line 334, in main
train(i)
File "train.py", line 194, in train
loss, num_total, num_correct = model.train_model(src, src_len, tgt, tgt_len, opt.loss, updates, optim, num_oovs=num_oovs)
File "/home/TengWei/SAE/models/seq2seq.py", line 85, in train_model
loss, num_total, num_correct = self.compute_loss(outputs, targets, loss_fn, updates)
File "/home/TengWei/SAE/models/seq2seq.py", line 61, in compute_loss
return models.cross_entropy_loss(hidden_outputs, self.decoder, targets, self.criterion, self.config)
File "/home/TengWei/SAE/models/loss.py", line 190, in cross_entropy_loss
num_correct = pred.data.eq(targets.data).masked_select(targets.ne(dict.PAD).data).sum()
RuntimeError: The size of tensor a (128) must match the size of tensor b (64) at non-singleton dimension 1

sae结果无法复现

seq2seq结果

sae结果

疑惑

虽然没有严格按照论文里给的条件做设置，但至少这两个对比实验的训练集测试集和配置文件都是一样的，按理说seq2seq的结果和论文里的差不多，但是为什么sae会差这么多？

模型文件

你好，可以提供一下，你们训练好的模型文件吗？

hello

我想知道数据最后的格式是什么

生成的摘要中只有<unk>的问题

您好我想问一下，通过您的代码生成的许多摘要中只有一个是什么原因呢？

你好，请教下ROUGE的使用，在中文要怎么处理呢？

What's the version of pytorch that you use?

I have trouble running this code. At first I cannot train, after modifying the code, I am able to train the network. But now I have trouble in eval_rouge, and i can't figure out. I think the version of pytorch make a difference. Could you clarify the requirements in README?
I'm using python3.6 with pytorch 0.4.0.

hello 我想知道数据最后的格式是什么

最后预处理后的数据格式

ValueError: max() arg is an empty sequence

Traceback (most recent call last):
File "train.py", line 343, in
main()
File "train.py", line 337, in main
logging("Best %s score: %.2f\n" % (metric, max(scores[metric])))
ValueError: max() arg is an empty sequence

当我用partⅡ做训练集，partⅢ做测试集时，出现该错误信息怎么解决呢？

[email protected]> 您好，请问这个数据集可以分享给我吗

给个邮箱呗

Originally posted by @3401mm in #4 (comment)

datainput

How to input data into code,I can't find zhe path

您好，关注词向量的vocab词库

请问用于把中文转化为数字的vocab词库，用的是哪个？用贵组的训练集训练完了，想做个自己的测试集，不知道用哪个对应的vocab文件啊

关于preprocess.py预处理结果

您好，请问当前版本的preprocess.py是针对LCSTS2.0数据集吗？
（LCSTS2.0的数据文件中有大量<>tag，但似乎没有见到去除这些tag的操作？）

想了解一下您从LCSTS2.0到lcsts.low.share.train.pt的操作，谢谢！
（是因为在预处理其他数据集时发现，处理后的结果运行时报错）

Illegal division by zero at data/script/ROUGE-1.5.5.pl line 2450

 您好，我使用的是lcsts2.0数据集，根据前面issue里面提到的将数据集里的文本和摘要分别放入src以及tgt文件中，但在运行rouge的时候报错，查阅其他资料好像是因为采用中文字符的原因？ 但本篇文章就是训练的中文摘要啊？ 
  此外，candidate文件夹里每个文件都是<unk>，不知道这是正常的，还是因为我哪个步骤做错了？

log如下：
/mnt/extend_sdb/workspace/superAE/superAE/models/attention.py:32: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
weights = self.softmax(weights) # batch * time
/mnt/extend_sdb/workspace/superAE/superAE/models/loss.py:190: UserWarning: self and other not broadcastable, but have the same number of elements. Falling back to deprecated pointwise behavior.
num_correct = pred.data.eq(targets.data).masked_select(targets.ne(dict.PAD).data).sum()
/mnt/extend_sdb/workspace/superAE/superAE/models/loss.py:190: UserWarning: self and mask not broadcastable, but have the same number of elements. Falling back to deprecated pointwise behavior.
num_correct = pred.data.eq(targets.data).masked_select(targets.ne(dict.PAD).data).sum()
[======================================== 10000/10000 ================================>] Step: 3s712ms | Tot: 6h2m
epoch: 1, ppl: 1.006, time: 21772.570, updates: 10000, accuracy: 95.07
evaluating after 10000 updates...
/mnt/extend_sdb/workspace/superAE/superAE/models/seq2seq.py:169: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
output = unbottle(self.log_softmax(output))
Illegal division by zero at data/script/ROUGE-1.5.5.pl line 2450........................] Step: 410ms | Tot: 0ms
F_measure: [0.0, 0.0, 0.0] Recall: [0.0, 0.0, 0.0] Precision: [0.0, 0.0, 0.0]

运行环境：
python 3.5 pytorch0.3.1 cuda9.0

原始数据处理

你好，我想问一下，原始数据lcsts中有很多<>之类的标签，怎么把它处理成train.src/valid.src这样的数据文件呢？

lancopku / superae Goto Github PK

superae's Introduction

Citation

superae's People

Contributors

Stargazers

Watchers

Forkers

superae's Issues

相关环境

seq2seq结果

sae结果

疑惑

Recommend Projects

Recommend Topics

Recommend Org

Jobs