GithubHelp home page GithubHelp logo

huggingface / transfer-learning-conv-ai Goto Github PK

View Code? Open in Web Editor NEW
1.7K 86.0 426.0 57 KB

🦄 State-of-the-Art Conversational AI with Transfer Learning

License: MIT License

Python 97.68% Dockerfile 2.32%
nlp neural-networks chatbots deep-learning pytorch transfer-learning gpt gpt-2 dialog

transfer-learning-conv-ai's Introduction

🦄 Building a State-of-the-Art Conversational AI with Transfer Learning

The present repo contains the code accompanying the blog post 🦄 How to build a State-of-the-Art Conversational AI with Transfer Learning.

This code is a clean and commented code base with training and testing scripts that can be used to train a dialog agent leveraging transfer Learning from an OpenAI GPT and GPT-2 Transformer language model.

This codebase can be used to reproduce the results of HuggingFace's participation to NeurIPS 2018 dialog competition ConvAI2 which was state-of-the-art on the automatic metrics. The 3k+ lines of competition code was distilled in about 250 lines of training code with distributed & FP16 options to form the present repository.

This model can be trained in about one hour on a 8 V100 cloud instance (currently costs about $25) and a pre-trained model is also made available.

Installation

To install and use the training and inference scripts please clone the repo and install the requirements:

git clone https://github.com/huggingface/transfer-learning-conv-ai
cd transfer-learning-conv-ai
pip install -r requirements.txt
python -m spacy download en

Installation with Docker

To install using docker please build the self-contained image:

docker build -t convai .

Note: Make sure your Docker setup allocates enough memory to building the container. Building with the default of 1.75GB will fail due to large Pytorch wheel.

You can then enter the image

ip-192-168-22-157:transfer-learning-conv-ai loretoparisi$ docker run --rm -it convai bash
root@91e241bb823e:/# ls
Dockerfile  README.md  boot                  dev  home         lib    media  models  proc              root  sbin  sys  train.py  utils.py
LICENCE     bin        convai_evaluation.py  etc  interact.py  lib64  mnt    opt     requirements.txt  run   srv   tmp  usr       var

You can then run the interact.py script on the pretrained model:

python3 interact.py --model models/

Pretrained model

We make a pretrained and fine-tuned model available on our S3 here. The easiest way to download and use this model is just to run the interact.py script to talk with the model. Without any argument, this script will automatically download and cache our model.

Using the training script

The training script can be used in single GPU or multi GPU settings:

python ./train.py  # Single GPU training
python -m torch.distributed.launch --nproc_per_node=8 ./train.py  # Training on 8 GPUs

The training script accept several arguments to tweak the training:

Argument Type Default value Description
dataset_path str "" Path or url of the dataset. If empty download from S3.
dataset_cache str './dataset_cache.bin' Path or url of the dataset cache
model str "openai-gpt" Path, url or short name of the model
num_candidates int 2 Number of candidates for training
max_history int 2 Number of previous exchanges to keep in history
train_batch_size int 4 Batch size for training
valid_batch_size int 4 Batch size for validation
gradient_accumulation_steps int 8 Accumulate gradients on several steps
lr float 6.25e-5 Learning rate
lm_coef float 1.0 LM loss coefficient
mc_coef float 1.0 Multiple-choice loss coefficient
max_norm float 1.0 Clipping gradient norm
n_epochs int 3 Number of training epochs
personality_permutations int 1 Number of permutations of personality sentences
device str "cuda" if torch.cuda.is_available() else "cpu" Device (cuda or cpu)
fp16 str "" Set to O0, O1, O2 or O3 for fp16 training (see apex documentation)
local_rank int -1 Local rank for distributed training (-1: not distributed)

Here is how to reproduce our results on a server with 8 V100 GPUs (adapt number of nodes and batch sizes to your configuration):

python -m torch.distributed.launch --nproc_per_node=8 ./train.py --gradient_accumulation_steps=4 --lm_coef=2.0 --max_history=2 --n_epochs=1 --num_candidates=4 --personality_permutations=2 --train_batch_size=2 --valid_batch_size=2

This model should give a Hits@1 over 79, perplexity of 20.5 and F1 of 16.5 using the convai2 evaluation script (see below).

These numbers are slightly lower than the number we obtained in the ConvAI2 competition. Here is what you can tweak to reach the same results:

  • in the ConvAI2 competition we also used tweaked position emebddings so that the history of the dialog always start at with the same embeddings. This is easy to add with pytorch-transformers and should improve the hits@1 metric.
  • in the ConvAI2 competition we used a beam search decoder. While the results are better in term of f1 metric, our feeling is that the human experience is less compelling with beam search versus the nucleus sampling detector which is provided in the present repository.

Using the interaction script

The training script saves all the experiments and checkpoints in a sub-folder named with the timestamp of the experiment in the ./runs folder of the repository base folder.

You can then use the interactive script to interact with the model simply by pointing to this folder.

Here is an example command line to run the interactive script:

python ./interact.py --model_checkpoint ./data/Apr17_13-31-38_thunder/  # run the interactive script with a training checkpoint
python ./interact.py  # run the interactive script with the finetuned model on our S3

The fine-tuned model will gives FINAL Hits@1: 0.715

The interactive script accept a few arguments to tweak the decoding algorithm:

Argument Type Default value Description
dataset_path str "" Path or url of the dataset. If empty download from S3.
dataset_cache str './dataset_cache.bin' Path or url of the dataset cache
model str "openai-gpt" Path, url or short name of the model
max_history int 2 Number of previous utterances to keep in history
device str cuda if torch.cuda.is_available() else cpu Device (cuda or cpu)
no_sample action store_true Set to use greedy decoding instead of sampling
max_length int 20 Maximum length of the output utterances
min_length int 1 Minimum length of the output utterances
seed int 42 Seed
temperature int 0.7 Sampling softmax temperature
top_k int 0 Filter top-k tokens before sampling (<=0: no filtering)
top_p float 0.9 Nucleus filtering (top-p) before sampling (<=0.0: no filtering)

Running ConvAI2 evaluation scripts

To run the evaluation scripts of the ConvAI2 challenge, you first need to install ParlAI in the repo base folder like this:

git clone https://github.com/facebookresearch/ParlAI.git
cd ParlAI
python setup.py develop

You can then run the evaluation script from ParlAI base folder:

cd ParlAI
python ../convai_evaluation.py --eval_type hits@1  # to download and evaluate our fine-tuned model on hits@1 metric
python ../convai_evaluation.py --eval_type hits@1  --model_checkpoint ./data/Apr17_13-31-38_thunder/  # to evaluate a training checkpoint on hits@1 metric

The evaluation script accept a few arguments to select the evaluation metric and tweak the decoding algorithm:

Argument Type Default value Description
eval_type str "hits@1" Evaluate the model on hits@1, ppl or f1 metric on the ConvAI2 validation dataset
model str "openai-gpt" Path, url or short name of the model
max_history int 2 Number of previous utterances to keep in history
device str cuda if torch.cuda.is_available() else cpu Device (cuda or cpu)
no_sample action store_true Set to use greedy decoding instead of sampling
max_length int 20 Maximum length of the output utterances
min_length int 1 Minimum length of the output utterances
seed int 42 Seed
temperature int 0.7 Sampling softmax temperature
top_k int 0 Filter top-k tokens before sampling (<=0: no filtering)
top_p float 0.9 Nucleus filtering (top-p) before sampling (<=0.0: no filtering)

Data Format

see example_entry.py, and the comment at the top.

Citation

If you use this code in your research, you can cite our NeurIPS CAI workshop paper:

@article{DBLP:journals/corr/abs-1901-08149,
  author    = {Thomas Wolf and
               Victor Sanh and
               Julien Chaumond and
               Clement Delangue},
  title     = {TransferTransfo: {A} Transfer Learning Approach for Neural Network
               Based Conversational Agents},
  journal   = {CoRR},
  volume    = {abs/1901.08149},
  year      = {2019},
  url       = {http://arxiv.org/abs/1901.08149},
  archivePrefix = {arXiv},
  eprint    = {1901.08149},
  timestamp = {Sat, 02 Feb 2019 16:56:00 +0100},
  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1901-08149},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

transfer-learning-conv-ai's People

Contributors

andr-ec avatar clmnt avatar dreamingjudith avatar fbosler avatar ganeshkrishnan1 avatar julien-c avatar kasparpeterson avatar loretoparisi avatar nbertagnolli avatar nihil0 avatar pomonam avatar sshleifer avatar thomwolf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transfer-learning-conv-ai's Issues

Error trying to compute unigram precision/recall/F1

"accuracy": Accuracy(output_transform=lambda x: (x[0][1], x[1][1]))}

So I tried extending the above metrics being computed during training to additionally compute validation set unigram precision, recall and F1 since these are supported in pytorch-ignite. I did something like the following:

    eval_metrics = {"nll": Loss(torch.nn.CrossEntropyLoss(ignore_index=-1), output_transform=lambda x: (x[0][0], x[1][0])),
                    "accuracy": Accuracy(output_transform=lambda x: (x[0][1], x[1][1])),
                    "precision": Precision(output_transform=lambda x: (x[0][0], x[1][0])),
                    "recall": Recall(output_transform=lambda x: (x[0][0], x[1][0]))}
    eval_metrics["ppl"] = MetricsLambda(math.exp, eval_metrics["nll"])

    eval_metrics["f1"] = eval_metrics["precision"] * eval_metrics["recall"] * 2 / (eval_metrics["precision"] +
                                                                                   eval_metrics["recall"] + 1e-20)
    eval_metrics["f1"] = MetricsLambda(lambda t: torch.mean(t).item(), eval_metrics["f1"])

    eval_metrics.update({"average_nll": MetricsLambda(average_distributed_scalar, eval_metrics["nll"], args),
                         "average_accuracy": MetricsLambda(average_distributed_scalar, eval_metrics["accuracy"], args),
                         "average_f1": MetricsLambda(average_distributed_scalar, eval_metrics["f1"], args)})
    eval_metrics["average_ppl"] = MetricsLambda(math.exp, eval_metrics["average_nll"])
    for name, metric in eval_metrics.items():
        metric.attach(evaluator, name)

Does this seem right? @thomwolf

I'm getting a device-side assert failure with the following stack trace when I launch the distributed train job with CUDA_LAUNCH_BLOCKING=1:

Traceback (most recent call last):
  File "train.py", line 402, in <module>
    train()
  File "train.py", line 391, in train
    trainer.run(train_loader, max_epochs=args.n_epochs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 446, in run
    self._handle_exception(e)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 410, in _handle_exception
    raise e
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 429, in run
    self._fire_event(Events.STARTED)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 345, in _fire_event
    func(self, *(event_args + args), **kwargs)
  File "train.py", line 318, in <lambda>
    trainer.add_event_handler(Events.STARTED, lambda _: evaluator.run(val_loader))
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 446, in run
    self._handle_exception(e)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 410, in _handle_exception
    raise e
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 433, in run
    hours, mins, secs = self._run_once_on_dataset()
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 399, in _run_once_on_dataset
    self._handle_exception(e)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 410, in _handle_exception
    raise e
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 392, in _run_once_on_dataset
    self._fire_event(Events.ITERATION_COMPLETED)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 345, in _fire_event
    func(self, *(event_args + args), **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 43, in decorate_no_grad
    return func(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/metrics/metric.py", line 65, in iteration_completed
    self.update(output)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/metrics/precision.py", line 101, in update
    y = to_onehot(y.view(-1), num_classes=num_classes)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/utils.py", line 52, in to_onehot
    return onehot.scatter_(1, indices.unsqueeze(1), 1)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THC/generic/THCTensorScatterGather.cu:380
terminate called without an active exception

Any ideas on why this is happening?

Sample personality too slow and repeated question

  1. when you call
    get_dataset_personalities(tokenizer, args.dataset_path, args.dataset_cache)
    it parser from personachat_self_original.json which contains the whole training set and take long time. Think it's better to sample from a smaller file.

  2. Although the interact input contains chat history, but the model did't do well. Here are some chat example:

Selected personality: my mom is my best friend. i have four sisters. i believe that mermaids are real. i love iced tea.
me: it's so good to have sisters
it is. my mom is my best friend.
me: how about your sisters
they are my sisters.
me: how old are you
i'm 34 and you?
me: i'm also 34 what a coincidence
you should get married
me: yes i'm married, and have a daughter
i believe mermaids are real
me: are you married?
no, but i do believe mermaids are real
me: i don't believe that
how old are you?
me: you already know that

It repeated the same question

Is this because the model doesn't have the long range attention mechanism?

using question generation in isolation README.md clarification

I was curious to try using the question generation component. Based on the overall README, I might expect to be able to run interact.py without any arguments, but this doesn't work:

ERROR:pytorch_pretrained_bert.modeling_openai:Model name '' was not found in model name list (openai-gpt). We assumed '' was a path or url but couldn't find files pytorch_model.bin and config.json at this path or url.
Traceback (most recent call last):
  File "question-generation/interact.py", line 238, in <module>
    run()
  File "question-generation/interact.py", line 149, in run
    model.to(args.device)
AttributeError: 'NoneType' object has no attribute 'to'

That is ok, I downloaded a pretrained model myself from Google drive and tried interact.py again (python3 question-generation/interact.py --model_checkpoint ~/Downloads/gpt2_corefs_question_generation/). When I did so, I first hit the below issue:

  File "question-generation/interact.py", line 238, in <module>
    run()
  File "question-generation/interact.py", line 152, in run
    data = get_positional_dataset_from_file(tokenizer, args.filename)
  File "/Users/amooren/Code/squash-generation/question-generation/dataloader.py", line 81, in get_positional_dataset_from_file
    with open(file, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/instances_dev.pkl'

Reading the overall README, I inferred that I needed to download instances_dev.pkl, and I found something closely named instances_dev.pickle and instances_coref_dev.pickle here. I assumed I'd need the coref labeled one, as the pretrained folder is named "gpt2_corefs_question_generation". But when I tried to use this, I hit the next issue:

Traceback (most recent call last):
  File "question-generation/interact.py", line 238, in <module>
    run()
  File "question-generation/interact.py", line 168, in run
    para_index = inst["para_index"]
KeyError: 'para_index'

Would you mind clarifying the steps to test out just this component? It would greatly speed up my efforts.

How to achieve the 18 ppl?

Hello!
I tried the different setting of your model, for example, changed the token level loss to sentence level loss.
And used the beam search as you mentioned.

But the ppl was around 23.
Could you tell me how to lower the ppl?

Running out of memory (125gb) when building my dataset

First of all thank you for sharing this code!

I try running the train script on my own dataset, and it successfully generates a tokenized cached version which is about 6gb on disk.

What I don't understand is that it runs out of 125gb ram during the 'Building inputs and labels' phase.

I have not modified the code, I don't use personas, my candidate number is 2 and my history size 4.

Let me know if you have any ideas what could be the problem, or maybe this is perfectly normal?

RuntimeError: cublas runtime error : resource allocation failed at THCGeneral.cpp:250

Any ideas on resolving this issue would be greatly appreciated!

GPU details: Tesla K80 (8 GPUs), NVIDIA-SMI 410.79, Driver Version: 410.79, CUDA Version: 10.0

I was trying to run it on a single GPU alone first (local_rank = -1), and faced the below error.

ERROR:ignite.engine.engine.Engine:Current run is terminating due to exception: cublas runtime error : resource allocation failed at /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCGeneral.cpp:250.
ERROR:ignite.engine.engine.Engine:Engine run is terminating due to exception: cublas runtime error : resource allocation failed at /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCGeneral.cpp:250.
Traceback (most recent call last):
  File "train.py", line 358, in <module>
    train()
  File "train.py", line 349, in train
    trainer.run(train_loader, max_epochs=args.n_epochs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 388, in run
    self._handle_exception(e)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 352, in _handle_exception
    raise e
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 375, in run
    hours, mins, secs = self._run_once_on_dataset()
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 341, in _run_once_on_dataset
    self._handle_exception(e)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 352, in _handle_exception
    raise e
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/ignite/engine/engine.py", line 333, in _run_once_on_dataset
    self.state.output = self._process_function(self, batch)
  File "train.py", line 275, in update
    lm_loss, mc_loss = model(*batch)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling_openai.py", line 808, in forward
    hidden_states = self.transformer(input_ids, position_ids, token_type_ids)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling_openai.py", line 643, in forward
    hidden_states = block(hidden_states)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling_openai.py", line 334, in forward
    a = self.attn(x)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling_openai.py", line 297, in forward
    x = self.c_attn(x)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ec2-user/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling_openai.py", line 248, in forward
    x = torch.addmm(self.bias, x.view(-1, x.size(-1)), self.weight)
RuntimeError: cublas runtime error : resource allocation failed at /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCGeneral.cpp:250

I also tried multi-GPU by doing python -m torch.distributed.launch --nproc_per_node=8 train.py <my cmdline options> and it threw the same error.

Loading same personality every time

I am running the interact.py and it's loading same personality again and again

"INFO:interact.py:Selected personality: i prefer vinyl records to any other music recording format. i fix airplanes for a living. i drive junk cars that no one else wants. i think if i work hard enough i can fix the world. i am never still."

How to select a random personality every time or between the conversation?

Is the pre-trained model trained on only on one personality?

invalid device ordinal

Hello,

I followed the steps of your article and I have install pytorch with Cuda like this

   pip3 install torch torchvision

I have python 3.7, torch 1.1.0 , ubuntu 18.04. When I am trying to run this command

  python -m torch.distributed.launch --nproc_per_node=8 ./train.py

I get this error

  WARNING:./train.py:Running process 2
  THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1573049306803/work/torch/csrc/cuda/Module.cpp line=37 error=101 : invalid device ordinal
 Traceback (most recent call last):
 File "./train.py", line 267, in <module>
 train()
 File "./train.py", line 147, in train
  torch.cuda.set_device(args.local_rank)
 File "/home/hatzimin/.conda/envs/maria_env/lib/python3.7/site-packages/torch/cuda/__init__.py", 
  line 300, in set_device
torch._C._cuda_setDevice(device). 

I searched the error but I haven't managed to find a solution.
If I try to run python ./train.py I get no error.

Thank you

Training on my own data/dialogues: Understanding the dataset format used by the code here

Hello,

This is with respect to the dataset file being used by the code here at https://s3.amazonaws.com/datasets.huggingface.co/personachat/personachat_self_original.json.

Can anyone tell what the "candidate" utterances are? I could not find a description...are they simply negative samples, random utterances from other dialogues? Also, the last utterance in the "candidates" list is always the response after the chat "history" utterances - is this correct?

Thanks!

tensorboardX issue

tb_logger = TensorboardLogger(log_dir=None) in train.py

This line is causing this problem:

Traceback (most recent call last):
File "", line 1, in
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/tnguyen/Desktop/recourse-nlp/transfer-learning-conv-ai/train.py", line 273, in
train()
File "/Users/tnguyen/Desktop/recourse-nlp/transfer-learning-conv-ai/train.py", line 243, in train
tb_logger = TensorboardLogger(log_dir=None)
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ignite/contrib/handlers/tensorboard_logger.py", line 408, in init
self.writer = SummaryWriter(logdir=log_dir)
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorboardX/writer.py", line 279, in init
self.file_writer = FileWriter(logdir=log_dir, **kwargs)
TypeError: type object got multiple values for keyword argument 'logdir'

I initially had some tensorboadX issue, and had to revert it back to 1.6 instead of the latest 1.7 version. I also had to uninstall and install pytorch ignite again, I'm not sure what this means and I'm not sure if it's an ignite/tensorboardX problem or if I should be addressing this issue here. Please help :D

Weight not initialized from the pretrained model when using interact.py

I'm trying to run interact.py with the pretrained model, and after making it finally run by fixing the changes from pytorch-pretrained-bert to pytorch-transformers, I am getting this message while loading the weights:

INFO:pytorch_transformers.modeling_utils:Weights of OpenAIGPTLMHeadModel not initialized from pretrained model: ['lm_head.weight']

There are other weights that are also not loaded, such as these:

INFO:pytorch_transformers.modeling_utils:Weights from pretrained model not used in OpenAIGPTLMHeadModel: ['multiple_choice_head.linear.weight', 'multiple_choice_head.linear.bias', 'lm_head.decoder.weight']

I'm guessing this is the reason why when I run interact.py and get to the point to insert a prompt after being given a personality, all my responses are as such:

>>> hi
<unk>? <unk><unk><unk><unk>? <unk><unk><unk>
>>> say something that makes sense
<unk><unk><unk>? <unk><unk>

In order to use a fully functional version of this script, I believe the weights for the lm_head is needed to generate meaningful responses. Am I missing something here or is this an issue that needs to be resolved? I can imagine training a model from scratch using the training script would work since these weights will be trained, but I would like to be able to use the pretrained model for cost/time reasons.

interact.py killed on second run

I did:
git clone https://github.com/huggingface/transfer-learning-conv-ai
cd transfer-learning-conv-ai
pip install -r requirements.txt
python -m spacy download en

Then It run once as follows:
python3 interact.py

It downloaded the model and I could chat with the bot. I stop the execution with Ctrl+C and then
I did again:
python3 interact.py

to get:
INFO:pytorch_transformers.modeling_utils:loading weights file /tmp/tmpzwe_u3oz/pytorch_model.bin
INFO:pytorch_transformers.modeling_utils:Weights from pretrained model not used in OpenAIGPTLMHeadModel: ['multiple_choice_head.summary.weight', 'multiple_choice_head.summary.bias']
INFO:pytorch_transformers.tokenization_utils:Assigning to the bos_token key of the tokenizer
INFO:pytorch_transformers.tokenization_utils:Assigning to the eos_token key of the tokenizer
INFO:pytorch_transformers.tokenization_utils:Assigning to the pad_token key of the tokenizer
INFO:pytorch_transformers.tokenization_utils:Assigning ('', '') to the additional_special_tokens key of the tokenizer
INFO:interact.py:Sample a personality
INFO:/home/jovyan/work/alela/dan_chatbots/transfer-learning-conv-ai/utils.py:Load tokenized dataset from cache at ./dataset_cache_OpenAIGPTTokenizer
Killed

And every time I run
pyhton3 interact.py, the same happens. Always Killed.

Somebody can explain?

On the other hand I have no way to exit the program while into the dialogue but to do Ctrl+C

Thank you,
Alejandro

Recommended hardware for gpt2-medium

Hello!

By adapting the code in this repo, I've been able to fine-tune GPT and GPT-2 small using Topical-Chat with an EC2 instance with 8 Tesla V100 GPUs (32 GB memory each). However, I am unable to fine-tune GPT-2 medium on the same instance with the exact same hyper-parameters - I'm getting out of memory issues, presumably because GPT-2 medium is much larger than GPT-2 small. I haven't tried fine-tuning GPT-2 medium using Persona-Chat yet though.

Have you tried fine-tuning GPT-2 medium (from the attention branch in pytorch-pretrained-BERT) on large dialog datasets with long turns and if so, could you share the details of the underlying hardware used?

Thanks!

Request update to pytorch_transformers

I'm running into multiple errors when trying to use this with torch==1.2.0 and pytorch_transformers.

The most difficult to solve is a RuntimeError which seems to pertain to the difference in implementation of the nn.Embedding layer

RuntimeError: index out of range: Tried to access index 50257 out of table with 50256 rows. at C:\w\1\s\windows\pytorch\aten\src\TH/generic/THTensorEvenMoreMath.cpp:237

Implementing Negative Sampling

I am trying to add random sampling of negatives for the multi-choice candidates, instead of using the same negatives for each example in every epoch.
One way I started implementing it was to generate a new train_loader every epoch. This idea does not play nicely with trainer.run(train_loader, max_epochs=args.num_epochs), so I was wondering if others had an idea of a nicer way.

Thanks!

Training with gpt2-large and got ValueError: max() arg is an empty sequence

Hi, i have prepared my dataset with 2 personalities my.json (the same with the original 200mb dataset) and tried to start training with parameter --model="gpt2-large", here is output:

joo@joo-tf:~/Документы/LocalRepository/transfer-learning-conv-ai$ python  ./train.py --gradient_accumulation_steps=4 --lm_coef=2.0 --max_history=2 --n_epochs=1 --num_candidates=4 --personality_permutations=2 --train_batch_size=1 --valid_batch_size=1 --dataset_path="my.json" --model="gpt2-large"
WARNING:./train.py:Running process -1
INFO:./train.py:Arguments: Namespace(dataset_cache='./dataset_cache', dataset_path='ze_personality_dataset_forgpt2.json', device='cuda', eval_before_start=False, fp16='', gradient_accumulation_steps=4, lm_coef=2.0, local_rank=-1, lr=6.25e-05, max_history=2, max_norm=1.0, mc_coef=1.0, model_checkpoint='gpt2-large', n_epochs=1, num_candidates=4, personality_permutations=2, train_batch_size=1, valid_batch_size=1)
INFO:./train.py:Prepare tokenizer, pretrained model and optimizer.
INFO:pytorch_transformers.tokenization_utils:loading file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-large-vocab.json from cache at /home/joo/.cache/torch/pytorch_transformers/69f8d734111f39eaa51a85907bfdc81a7ef42242d638ffab6f77df305402b2b2.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71
INFO:pytorch_transformers.tokenization_utils:loading file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-large-merges.txt from cache at /home/joo/.cache/torch/pytorch_transformers/38d28acc17953e356348dca948e152c653c0ccf5058a552eea30168e27f02046.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
INFO:pytorch_transformers.modeling_utils:loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-large-config.json from cache at /home/joo/.cache/torch/pytorch_transformers/c8f887cdfff4327916f4b7ed06a379c0add42bd9c66e1fe3b4a5a8525a4b2678.bc44facd742477605da5434f20a32607ead98e78fff95c5ca9523e47b453e1ad
INFO:pytorch_transformers.modeling_utils:Model config {
  "attn_pdrop": 0.1,
  "embd_pdrop": 0.1,
  "finetuning_task": null,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "n_ctx": 1024,
  "n_embd": 1280,
  "n_head": 20,
  "n_layer": 36,
  "n_positions": 1024,
  "num_labels": 1,
  "output_attentions": false,
  "output_hidden_states": false,
  "pruned_heads": {},
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "torchscript": false,
  "vocab_size": 50257
}

INFO:pytorch_transformers.modeling_utils:loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-large-pytorch_model.bin from cache at /home/joo/.cache/torch/pytorch_transformers/bcc61dff8b1b03d0fd33a1eb1dc4db00875cae33296848155c6882d4bab03db4.999a50942f8e31ea6fa89ec2580cb38fa40e3db5aa46102d0406bcfa77d9142d
INFO:pytorch_transformers.tokenization_utils:Adding <bos> to the vocabulary
INFO:pytorch_transformers.tokenization_utils:Assigning <bos> to the bos_token key of the tokenizer
INFO:pytorch_transformers.tokenization_utils:Adding <eos> to the vocabulary
INFO:pytorch_transformers.tokenization_utils:Assigning <eos> to the eos_token key of the tokenizer
INFO:pytorch_transformers.tokenization_utils:Adding <pad> to the vocabulary
INFO:pytorch_transformers.tokenization_utils:Assigning <pad> to the pad_token key of the tokenizer
INFO:pytorch_transformers.tokenization_utils:Adding <speaker1> to the vocabulary
INFO:pytorch_transformers.tokenization_utils:Adding <speaker2> to the vocabulary
INFO:pytorch_transformers.tokenization_utils:Assigning ('<speaker1>', '<speaker2>') to the additional_special_tokens key of the tokenizer
INFO:./train.py:Prepare datasets
INFO:/home/joo/Документы/LocalRepository/transfer-learning-conv-ai/utils.py:Load tokenized dataset from cache at ./dataset_cache_GPT2Tokenizer
INFO:./train.py:Build inputs and labels
INFO:./train.py:Pad inputs and convert to Tensor
Traceback (most recent call last):
  File "./train.py", line 271, in <module>
    train()
  File "./train.py", line 175, in train
    train_loader, val_loader, train_sampler, valid_sampler = get_data_loaders(args, tokenizer)
  File "./train.py", line 102, in get_data_loaders
    dataset = pad_dataset(dataset, padding=tokenizer.convert_tokens_to_ids(SPECIAL_TOKENS[-1]))
  File "./train.py", line 43, in pad_dataset
    max_l = max(len(x) for x in dataset["input_ids"])
ValueError: max() arg is an empty sequence

I checked other issues, but no similar

num of candidates and labels

What is number of candidates ? is it the number of person in the chat?
And what is the use of setting -1 for all the labels except the last reply?
And in the cost function the ignore index = -1, therefore the loss will not be calculated for this labels.

 instance["lm_labels"] = [-1] * len(instance["input_ids"])

And where is the distractor in this code?

Running interact.py using GPT2 model has several problems

I tried to run interact.py using GPT2 model as it is and I faced several problems. My pytorch-pretrained-bert version is already the latest, which is 0.6.2. The problem that occurred:

  1. There was unicode error when loading vocab.json from GPT2Tokenizer. I added utf-8 encoding to the open json file function and the problem is gone... hopefully
  2. There was dimension mismatch when loading convai pretrained model's weight. I looked at the source code at the installed pytorch-pretrained-bert and compared it with the github repo and realized that in the installed version, modeling_gpt2.py doesn't have set_num_special_tokens function to add persona chat special tokens. I copied the modeling_gpt2.py from github to my installed folder and the problem is gone... hopefully as well
  3. Decoded token has weird output:
INFO:interact.py:Selected personality: i<unk>prefer<unk>vinyl<unk>records<unk>to<unk>any<unk>other<unk>music<unk>recording<unk>format<unk>.i<unk>fix<unk>airplanes<unk>for<unk>a<unk>living<unk>.i<unk>drive<unk>junk<unk>cars<unk>that<unk>no<unk>one<unk>else<unk>wants<unk>.i<unk>think<unk>if<unk>i<unk>work<unk>hard<unk>enough<unk>i<unk>can<unk>fix<unk>the<unk>world<unk>.i<unk>am<unk>never<unk>still<unk>.
>>> k
Traceback (most recent call last):
  File "interact.py", line 146, in <module>
    run()
  File "interact.py", line 141, in run
    out_text = tokenizer.decode(out_ids, skip_special_tokens=True)
TypeError: decode() got an unexpected keyword argument 'skip_special_tokens'

Then I copied tokenization_gpt2 from pytorch-pretrained-bert master branch to my installed folder, then the result looks like this:

INFO:interact.py:Selected personality: iprefervinylrecordstoanyothermusicrecordingformat.ifixairplanesforaliving.idrivejunkcarsthatnooneelsewants.ithinkifiworkhardenoughicanfixtheworld.iamneverstill.
>>> test
amaranth</w>chowhungered</w>libibickerstaff</w>kingsley</w>syd</w>implacsoapy</w>arabella</w>dina</w>keaton</w>unstoppimplackingsley</w>unladylike</w>mirroring</w>implacimplacwondr

Did I set it up wrong for using GPT2 model?
If I use GPT model, the code works well.

README vs Defaults: Which training parameters lead to Hits@1 over 79

Hi team thank you very much for the great work and the clean code! I got some problem while running the code and was wondering if you could give me some help :-)

  1. I just got an error while I'm trying to evaluate the F1 score with convai_evaluation.py:

Traceback (most recent call last):
File "../convai_evaluation.py", line 239, in
eval_fct(opt)
File "/home/wang/PycharmProjects/transfer-learning-conv-ai/ParlAI/projects/convai2/eval_f1.py", line 27, in eval_f1
report = eval_model(opt, print_parser)
File "/home/wang/PycharmProjects/transfer-learning-conv-ai/ParlAI/parlai/scripts/eval_model.py", line 84, in eval_model
world.parley()
File "/home/wang/PycharmProjects/transfer-learning-conv-ai/ParlAI/parlai/core/worlds.py", line 275, in parley
acts[1] = agents[1].act()
File "/home/wang/PycharmProjects/transfer-learning-conv-ai/convai_evaluation.py", line 156, in act
out_ids, _ = sample_sequence(self.persona, self.history, self.tokenizer, self.model_checkpoint, self.args)
File "/home/wang/PycharmProjects/transfer-learning-conv-ai/interact.py", line 74, in sample_sequence
logits = top_filtering(logits, top_k=args.top_k, top_p=args.top_p)
AttributeError: 'AttrDict' object has no attribute 'top_p'

Then I tried to add an top_p argument but then got:

Traceback (most recent call last):
File "../convai_evaluation.py", line 239, in
eval_fct(opt)
File "/home/wang/PycharmProjects/transfer-learning-conv-ai/ParlAI/projects/convai2/eval_f1.py", line 27, in eval_f1
report = eval_model(opt, print_parser)
File "/home/wang/PycharmProjects/transfer-learning-conv-ai/ParlAI/parlai/scripts/eval_model.py", line 84, in eval_model
world.parley()
File "/home/wang/PycharmProjects/transfer-learning-conv-ai/ParlAI/parlai/core/worlds.py", line 275, in parley
acts[1] = agents[1].act()
File "/home/wang/PycharmProjects/transfer-learning-conv-ai/convai_evaluation.py", line 156, in act
out_ids, _ = sample_sequence(self.persona, self.history, self.tokenizer, self.model_checkpoint, self.args)
ValueError: too many values to unpack (expected 2)

I changed this line (line 151 in the original script convai_evaluation.py) from
out_ids, _ = sample_sequence(self.persona, self.history, self.tokenizer, self.model_checkpoint, self.args)
into
out_ids = sample_sequence(self.persona, self.history, self.tokenizer, self.model_checkpoint, self.args)
and it works.

Could you please tell me if it's a bug here or if I ran and changed the code in a wrong way? In the table of Running ConvAI2 evaluation scripts in readme, the default parameters are the same as the one in Using the interaction script part, which look different from the defined parameters in evaluation script.

  1. In the Using the training script part it was said "This model should give a Hits@1 over 79, perplexity of 20.5 and F1 of 16.5 using the convai2 evaluation script (see below)." I was wondering if this result is based on the model which is trained by the default parameters or the parameters which are given in your example (see below)?

python -m torch.distributed.launch --nproc_per_node=8 ./train.py --gradient_accumulation_steps=4 --lm_coef=2.0 --max_history=2 --n_epochs=1 --num_candidates=4 --personality_permutations=2 --train_batch_size=2 --valid_batch_size=2

I trained two models, the first one based on the default parameters and the second one based on the parameters in this example. But the first one gave better result which is correspond to "Hits@1 over 79, perplexity of 20.5 and F1 of 16.5". So I'm curious about that.

Thank you very much in advance!

Train own persona: mismatch with dataset paths

I have a mismatch while training with own data.
How to reproduce:

  1. I have create my own dataset with 1 persona: my.json with the following structure:

{
   "train":[
      {
    "personality": ["first sentence .", "second sentence .",
                    "third sentence .", "four  sentence ."],
    "utterances": [
        {"candidates": [
...................................
...................................
         }
    ]
}
]}
  1. Passed this file to train:

mypc@mypc-tf:~/docs/LocalRepository/transfer-learning-conv-ai$ python ./train.py --gradient_accumulation_steps=4 --lm_coef=2.0 --max_history=2 --n_epochs=1 --num_candidates=4 --personality_permutations=2 --train_batch_size=1 --valid_batch_size=1 --dataset_path="my.json"

  1. After training finished i got runs folder with following folders:
    Oct05_00-22-19_mypc-tf_openai-gpt
    Oct05_01-32-39_mypc-tf_openai-gpt
    that is strange because there 2 folders, not one. actual model is located in Oct05_01-32-39_mypc-tf_openai-gpt

  2. But, when i call interact it behaves the wrong way:
    python ./interact.py --model_checkpoint ./runs/Oct05_01-32-39_mypc-tf_openai-gpt/

result is

INFO:/home/mypc/docs/LocalRepository/transfer-learning-conv-ai/utils.py:Gathered 18878 personalities

So, no my personality from my.json, only all original personalities.

My guess i made a mistake during train script, it somehow got cached original dataset, how to fix?

Parameter of forward pass (missing?)

I'm studying the model and was wondering about a few things:

  1. the dataloader append the padding tokens and while the training there is no attention_mask set. is there any reason for that?
  2. In your blog post you describe that the model should care about the positions of the tokens. Is there any reason why the position_ids parameter is not set?
  3. lm_labels parameter: according to the documentation labels are ignored if they are set to -100. Is that a type in the documentation or did I miss something?

6 GB VRAM?

Hello,

is it possible to run the 117M GPT-2 model with 6 GB VRAM using FP16?

TypeError: can only concatenate list (not "tuple") to list

Hi
I am using python 3.6, and I run
python train.py --model_checkpoint pretrained_transformers/gpt --dataset_path datasets/personachat_self_original.json
thanks

INFO:/dev/ccn/generation/transfer-learning-conv-ai/utils.py:Tokenize and encode the dataset
Traceback (most recent call last):
File "train.py", line 271, in
train()
File "train.py", line 175, in train
train_loader, val_loader, train_sampler, valid_sampler = get_data_loaders(args, tokenizer)
File "train.py", line 77, in get_data_loaders
personachat = get_dataset(tokenizer, args.dataset_path, args.dataset_cache)
File "/dev/ccn/generation/transfer-learning-conv-ai/utils.py", line 52, in get_dataset
dataset = tokenize(dataset)
File "/dev/ccn/generation/transfer-learning-conv-ai/utils.py", line 50, in tokenize
return dict((n, tokenize(o)) for n, o in obj.items())
File "/dev/ccn/generation/transfer-learning-conv-ai/utils.py", line 50, in
return dict((n, tokenize(o)) for n, o in obj.items())
File "/dev/ccn/generation/transfer-learning-conv-ai/utils.py", line 51, in tokenize
return list(tokenize(o) for o in obj)
File "/dev/ccn/generation/transfer-learning-conv-ai/utils.py", line 51, in
return list(tokenize(o) for o in obj)
File "/dev/ccn/generation/transfer-learning-conv-ai/utils.py", line 50, in tokenize
return dict((n, tokenize(o)) for n, o in obj.items())
File "/dev/ccn/generation/transfer-learning-conv-ai/utils.py", line 50, in
return dict((n, tokenize(o)) for n, o in obj.items())
File "/dev/ccn/generation/transfer-learning-conv-ai/utils.py", line 51, in tokenize
return list(tokenize(o) for o in obj)
File "/dev/ccn/generation/transfer-learning-conv-ai/utils.py", line 51, in
return list(tokenize(o) for o in obj)
File "/dev/ccn/generation/transfer-learning-conv-ai/utils.py", line 48, in tokenize
return tokenizer.convert_tokens_to_ids(tokenizer.tokenize(obj))
File "/libs/anaconda3/envs/transformer36/lib/python3.6/site-packages/pytorch_transformers/tokenization_utils.py", line 490, in tokenize
added_tokens = list(self.added_tokens_encoder.keys()) + self.all_special_tokens
File "/libs/anaconda3/envs/transformer36/lib/python3.6/site-packages/pytorch_transformers/tokenization_utils.py", line 635, in all_special_tokens
all_toks = all_toks + (attr_value if isinstance(attr_value, (list, tuple)) else [attr_value])
TypeError: can only concatenate list (not "tuple") to list

Assign personal traits like name

Is it possible to assign personal traits like a name, age and interests?

I currently use a pretained model, will i need to retrain?

No APEX Issue

If I don't have CUDA support, this code wouldn't work right, since you guys are using NVIDIA's apex which requires CUDA? Just wondering if there's an alternative?

Error when setting num_candidates=1 and in dataset there's only one candidate during validation

When setting args.num_candidates to 1, and the actual length of the candidates list of each entry is 1, I get this error during validation:

ERROR:ignite.engine.engine.Engine:Current run is terminating due to exception: For binary cases, y_pred must be comprised of 0's and 1's..
ERROR:ignite.engine.engine.Engine:Engine run is terminating due to exception: For binary cases, y_pred must be comprised of 0's and 1's..
ERROR:ignite.engine.engine.Engine:Engine run is terminating due to exception: For binary cases, y_pred must be comprised of 0's and 1's..
Traceback (most recent call last):
  File "./train-regular.py", line 277, in <module>
    train()
  File "./train-regular.py", line 269, in train
    trainer.run(train_loader, max_epochs=args.n_epochs)
  File "/usr/local/lib/python3.5/dist-packages/ignite/engine/engine.py", line 446, in run
    self._handle_exception(e)
  File "/usr/local/lib/python3.5/dist-packages/ignite/engine/engine.py", line 410, in _handle_exception
    raise e
  File "/usr/local/lib/python3.5/dist-packages/ignite/engine/engine.py", line 437, in run
    self._fire_event(Events.EPOCH_COMPLETED)
  File "/usr/local/lib/python3.5/dist-packages/ignite/engine/engine.py", line 345, in _fire_event
    func(self, *(event_args + args), **kwargs)
  File "./train-regular.py", line 223, in <lambda>
    trainer.add_event_handler(Events.EPOCH_COMPLETED, lambda _: evaluator.run(val_loader))
  File "/usr/local/lib/python3.5/dist-packages/ignite/engine/engine.py", line 446, in run
    self._handle_exception(e)
  File "/usr/local/lib/python3.5/dist-packages/ignite/engine/engine.py", line 410, in _handle_exception
    raise e
  File "/usr/local/lib/python3.5/dist-packages/ignite/engine/engine.py", line 433, in run
    hours, mins, secs = self._run_once_on_dataset()
  File "/usr/local/lib/python3.5/dist-packages/ignite/engine/engine.py", line 399, in _run_once_on_dataset
    self._handle_exception(e)
  File "/usr/local/lib/python3.5/dist-packages/ignite/engine/engine.py", line 410, in _handle_exception
    raise e
  File "/usr/local/lib/python3.5/dist-packages/ignite/engine/engine.py", line 392, in _run_once_on_dataset
    self._fire_event(Events.ITERATION_COMPLETED)
  File "/usr/local/lib/python3.5/dist-packages/ignite/engine/engine.py", line 345, in _fire_event
    func(self, *(event_args + args), **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/ignite/metrics/metric.py", line 65, in iteration_completed
    self.update(output)
  File "/usr/local/lib/python3.5/dist-packages/ignite/metrics/accuracy.py", line 126, in update
    self._check_type((y_pred, y))
  File "/usr/local/lib/python3.5/dist-packages/ignite/metrics/accuracy.py", line 57, in _check_type
    self._check_binary_multilabel_cases((y_pred, y))
  File "/usr/local/lib/python3.5/dist-packages/ignite/metrics/accuracy.py", line 48, in _check_binary_multilabel_cases
    raise ValueError("For binary cases, y_pred must be comprised of 0's and 1's.")
ValueError: For binary cases, y_pred must be comprised of 0's and 1's.

How To Create Custom Data For This Model

If my data is structured this way, how can I create data that fits your model with no candidates and personas?

1 Hello, how are you?
2 I'm good, what's new?
3 nothing much, watching tv.
4 cool, talk to u later.
...

Also, it seems to me that your data structure is suited only when there are an equal number of turns, what if I have odd number of turns?

Confusion in pretrained model.

Hello.

I am trying to run your model and I have some confusion in your pre-trained model.

It seems that train.py trained the model with doublehead model, but in the interact.py, the code loads LMHeadmodel.

Why are they using different models?

So in the pre-trained model, actually, next sentence classification is not implemented?

Multilingual model

Hello,
sorry if this is a silly question. Can I somehow use the multilingual model with this code? A changed the tokenized and model to

tokenizer = AutoTokenizer.from_pretrained("bert-base-multilingual-cased")
model = AutoModelWithLMHead.from_pretrained("bert-base-multilingual-cased")

but then I don't understand how to change functions def update() and def inference() which use the loss function.

If someone could explain or give me some basic steps it will be very helpful.
Thank you

Can not train with gpt2?

Hi, I really like your project, but I tried to train with gpt2, it does not work. Will you release some instructions for gpt2? Thank you for your work

Training without personas

Thank you for open-sourcing and such a well-written code!

Because most of the conversational AI datasets are without personas, then how to train without personas?

Will setting the persona to an empty string do the job?

RuntimeError: shape '[-1, 2, 34]' is invalid for input of size 61710

I'm playing around with this wonderful code but I'm running into a curious issue when I try to train the model with my own data.

I replicated the personachat_self_original.json file structure and added my own data. I deleted dataset_cache_OpenAIGPTTokenizer file but when I try to train, I get this error:

INFO:train.py:Pad inputs and convert to Tensor
Traceback (most recent call last):
  File "train.py", line 252, in <module>
    train()
  File "train.py", line 164, in train
    train_loader, val_loader, train_sampler, valid_sampler = get_data_loaders(args, tokenizer)
  File "train.py", line 97, in get_data_loaders
    tensor = tensor.view((-1, datasets[dataset_name]["n_candidates"]) + tensor.shape[1:])
RuntimeError: shape '[-1, 2, 34]' is invalid for input of size 61710

I have triple checked that my dataset follows the same structure but I can't figure out why the training script doesn't like it.

Any ideas why this is happening?

train from checkpoint

can anyone help, how to continue training from the checkpoint. after training for 2 epochs, I tried to load the checkpoint and resume training, but was unable to do so.

Running interact.py using BertModel

Hi,
I would like to modify the interact.py in such a way, that it uses the BertModel and BertTokenizer. I have adapted the file. However, when I try to run the code, I get the following error:

>>> hello
Traceback (most recent call last):
  File "/Users/timospring/Desktop/empathic-chatbot/code/bertBot.py", line 169, in <module>
    run()
  File "/Users/timospring/Desktop/empathic-chatbot/code/bertBot.py", line 161, in run
    out_ids = sample_sequence(history, tokenizer, model, args)
  File "/Users/timospring/Desktop/empathic-chatbot/code/bertBot.py", line 70, in sample_sequence
    logits = model(input_ids, token_type_ids=token_type_ids)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/transformers/modeling_bert.py", line 624, in forward
    embedding_output = self.embeddings(input_ids, position_ids=position_ids, token_type_ids=token_type_ids)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/transformers/modeling_bert.py", line 169, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 118, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/functional.py", line 1454, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: index out of range at /Users/soumith/b101_2/2019_02_08/wheel_build_dirs/wheel_3.7/pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:191

I did some digging and this error is supposed to appear when there is a mismatch in the vocab_size. However, I tried to change the size in the code and then I get other errors complaining that the vocab_size does not match the original 30522. Any help in getting it to work with bert would be highly appreciated!

Train without Persona

Hi,
Has anyone managed to train without persona taking into account only the context information?

Changing from gpt to gpt2

I get an error when I change from gpt to gpt2 in the interactive.py file.
However, when I change from gpt to gpt2 in the train.py file, everything is fine.

The error I get:
File "/transfer-learning-conv-ai/interact.py", line 130, in run
logger.info("Selected personality: %s", tokenizer.decode(chain(*personality)))

File "/pytorch_transformers/tokenization_utils.py", line 767, in decode
sub_texts.append(self.convert_tokens_to_string(current_sub_text))

File "/pytorch_transformers/tokenization_gpt2.py", line 198, in convert_tokens_to_string
text = ''.join(tokens)

TypeError: sequence item 0: expected str instance, NoneType found

I also tried to switch from pytorch-transformer to the new transformers library to no avail.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.