GithubHelp home page GithubHelp logo

pochih / rl-chatbot Goto Github PK

View Code? Open in Web Editor NEW
417.0 23.0 140.0 52.68 MB

🤖 Deep Reinforcement Learning Chatbot

License: MIT License

Shell 1.05% Python 98.95%
deep-learning reinforcement-learning seq2seq-model chatbot tensorflow nlp

rl-chatbot's People

Contributors

pochih avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rl-chatbot's Issues

chat bot was bad learning

I tried to learn seq 2 seq from the beginning using python3, but when I tested it by turning the first 10 epochs, all the responses were . Will this be cured if I study more?

=== Use model ./model/Seq2Seq/model-1 ===

question => Have you heard about 'machine learning and having it deep and structured'?
generated_sentence => .
question => Machine learning.
generated_sentence => .
question => I don't know. Maybe we should watch the tape to be sure.
generated_sentence => .
question => Listen man, I don't need this shit.
generated_sentence => .
question => Will you stand up for me?
generated_sentence => .
question => How do you trun this on?
generated_sentence => .
question => Thank God it's Friday!
generated_sentence => .
question => I'm sure a lot of people down in L.A. are worried sick about you.
generated_sentence => .
question => I forgot to get the Coca-Cola.
generated_sentence => .
question => How about you graduation thesis?
generated_sentence => .

Unable to load pretrained model

Use default model

Traceback (most recent call last):
File "python/test.py", line 138, in
test()
File "python/test.py", line 72, in test
saver.restore(sess, default_model_path)
File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1428, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [7634,1000] rhs shape= [6851,1000]
[[Node: save/Assign_4 = Assign[T=DT_FLOAT, _class=["loc:@wemb"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](Wemb,
save/RestoreV2_4)]]

wrong output

hey, thanks for the code. I used your pre-trained model. but getting absurd responses. Should I train it to more epochs?
A: Have you heard about 'machine learning and having it deep and structured'?
B: Woods path woods bee sir takes nowhere thompson.
A: Reminds 'a name's working rebels secrets guts procedure fairy missus pain warned procedure ignorant troops wrap famous pain dna wheel troops.
B: Tongue hooked ignorant tongue tongue manner ignorant positively pain warned break real travel sir.
A: Pain las tricks putting ears shack warm ignorant positively arrives brand woods knew jersey domino wrap.
B: Reminds whores pain warned river rio brand.
A: Mob longer. Procedure woody luther wasted tricks specialty assumed window pain scientist.
B: Wigand much shift mud traitor woody stab brand submit submit touches tanks department woody feed pain middle wrap ignorant unlikely.
A: House real policeman term.
B: Mob slaves knee. Meters bat assumed martin woods tongue pit.
A: Jesse electrical reports domino real.
B: Redi whores real. Redi whores real. Desmond real.

pip install -r requirements.txt is failing -Could not find a version that satisfies the requirement tensorflow==1.0.1

I am getting an error when try to run
pip install -r requirements.txt
Is saying
Could not find a version that satisfies the requirement tensorflow==1.0.1 (from -r requirements.txt (line 1)) (from versions: 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 1.15.0rc0, 2.0.0a0, 2.0.0b0, 2.0.0b1, 2.0.0rc0, 2.0.0rc1)
No matching distribution found for tensorflow==1.0.1 (from -r requirements.txt (line 1))
I have tensorflow cloned recent version- since i have windows10 also my python version
Python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Please let me know how to run pip install -r requirements.txt
Issue.txt

您好,我重新训练SEQ2SEQ模型后用 text.py 但是得到以下结果:

您好,我重新训练SEQ2SEQ模型后用 text.py 但是得到以下结果:
Concentration concentration concentration concentration planning priest.
Concentration concentration concentration concentration planning concentration planning priest.
Concentration concentration concentration concentration planning priest.
Concentration concentration concentration concentration planning concentration planning priest.
Concentration concentration concentration concentration planning priest.
Concentration concentration concentration concentration planning concentration planning priest.
您知道是什么原因么?

Where is RL environment?

Hi,

Thanks for this code repo.
I have one question , which environment you are using for RL ?

Thanks
Mahesh

procedure

can you upload file which can explain step by step procedure to run this

A question about ease of answering

Thanks for your sharing.

In RL for ease of answering, the reward is calculated by RL model itself, not another model?

Why not input the action into another pretrained model to obtain the response, and measure its likelihood with a dull response?

How to use saved_model api to store the model?

Hi, i wander to know if you have tried save the chatbot with saved_model api?Do you have any ideas?
你好,我想知道你有尝试过用saved_model来保存模型吗?我想在tensorflow serving上加载chatbot模型,但是只能加载saved_model保存的模型文件。

请问你有遇过reward爆炸的情况吗?

我只用了 Ease of answering 作为reward,但是随着训练这一项从-2.x开始一直减小到负无穷。 我没有用sigmoid,但是也很奇怪,因为原作者也没有加sigmoid。

How to train reversed model for RL model

You mentioned "
When training with policy gradient (pg)

you may need a reversed model

the reversed model is also trained by cornell movie-dialogs dataset, but with source and target reversed.
"
Except downloading pre-trained reversed model, could you please tell how to rain it?

Thank you a lot.

Why the sigmoid in count_rewards()

Hi,

In python\RL\train.py, after adding the ease of answering reward and semantic coherence, the sigmoid of the reward is scaled by 1.1

total_loss = sigmoid(total_loss) * 1.1

What was the purpose of the sigmoid and the scaling (1.1) in this line 261?

Also, I noticed you didn't weight each reward by lambda like in the "Deep Reinforcement Learning for Dialogue Generation" paper. Was this on purpose?

Thanks!

RL training

Hi,
Thanks for having shared your implementation of the RL chatbot.
I might ask stupid questions since I am not an expert in RL neither in NLP so sorry in advance!
1- In python/RL/train.py
l307, saver.restore(sess, os.path.join(model_path, model_name)) seems to intialize the weight of the model with some pretrained params, correct? Is it the ones given by the Seq2seq trained as usual in a supervised way? I dont find anywhere the 'model-55' you are using for this... Am I missing something?

2- In python/RL/rl_mpdel.py
Why do we have build_model and build_generator, it seems to have the same setup but not the same output. Is it RL specific?

3- In the paper
Also, in the paper they specified that for the reward they use a seq2seq2 model and not the RL model. Is this taken into consideration in your code?

Thanks a lot for your answers!

Qusetion about ease of answering

Thanks for your sharing.

In RL for ease of answering, the reward is calculated by RL model itself, not another model?

Why not input the action into another pretrained model to obtain the response, and measure its likelihood with a dull response?

Did you share the parameters of LSTM in encoder and decoder

In the python/model.py script, the decoding stage:

`with tf.variable_scope("LSTM1"):
output1, state1 = self.lstm1(padding, state1)

with tf.variable_scope("LSTM2"):
output2, state2 = self.lstm2(tf.concat([current_embed, output1], 1), state2)`
which is the same in encoding stage. Did you share the same parameters between encoder and decoder?

Tensorflow2.x

I would like to ask whether you have tried to modify this into the version of tensorflow2.x?

Dialogue History

How is the dialogue history encoded here? In the paper they say "The previous two dialogue turns are transformed to a vector representation by feeding the concatenation of them into an LSTM encoder model".

I'm not sure how to interpret this and I'm interested in how it's realized here.

Thanks

unable to run ./scripts/train_RL.sh

Traceback (most recent call last):
File "python/RL/train.py", line 470, in
train()
File "python/RL/train.py", line 297, in train
train_op, loss, input_tensors, inter_value = model.build_model()
File "/home/ubuntu/AI_studio/Lakshmi/RL-Chatbot/python/RL/rl_model.py", line 92, in build_model
train_op = tf.train.AdamOptimizer(self.lr).minimize(pg_loss)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 413, in minimize
name=name)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 597, in apply_gradients
self._create_slots(var_list)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/adam.py", line 131, in _create_slots
self._zeros_slot(v, "m", self._name)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 1155, in _zeros_slot
new_slot_variable = slot_creator.create_zeros_slot(var, op_name)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 190, in create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 164, in create_slot_with_initializer
dtype)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 74, in _create_slot_var
validate_shape=validate_shape)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1496, in get_variable
aggregation=aggregation)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1239, in get_variable
aggregation=aggregation)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 562, in get_variable
aggregation=aggregation)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 514, in _true_getter
aggregation=aggregation)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 882, in _get_single_variable
"reuse=tf.AUTO_REUSE in VarScope?" % name)
ValueError: Variable Wemb/Adam/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=tf.AUTO_REUSE in VarScope?

Please provide info on how to solve this error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.