The adem from noseworm

This repository has the code and parameters used for the ADEM model in:

Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses
Ryan Lowe, Michael Noseworthy, Iulian V. Serban, Nicolas Angelard-Gontier, Yoshua Bengio, and Joelle Pineau

Due to the ethics policy for this project, we cannot release the collected human data at this time. However, we do provide the weights/parameters for a trained model and the code to train ADEM with new data.

ADEM uses the VHRED model. A modified version of the code is included in this repo. The original repo and paper can be found at:
https://github.com/julianser/hed-dlg-truncated
https://arxiv.org/abs/1605.06069

You will need to download the weights for the pretrained VHRED model before running the code. Once downloaded from the following link, place all the files in the ./vhred folder.
https://drive.google.com/file/d/0B-nb1w_dNuMLY0Fad3N1YU9ZOU0/view?usp=sharing

An example of running ADEM can be found in interactive.py:
THEANO_FLAGS='device=gpu0,floatX=float32' python interactive.py

Questions about input format

Hi, I have tried to use this code to evaluate the dialog models. However, when I apply Adem on the instances in the paper "Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses", I find that the output results are different from the results shown in the paper. Here are some examples of my experiments:
context: [ <first_speaker> photo to see my television debut go to - some. some on- hehe! <second_speaker> it really was you? i thought ppl were recognizing someone who looked like you! were the oysters worth the wait? ]
true: [ <first_speaker> yeah it was me . haha i’d kinda forgotten about it it was filmed a while ago ]
model: [ <first_speaker> i’m not sure. i just don’t know what to do with it. ]
score in my experiment:3.26289914095
score in the paper:1.602

The code I use is as follows:

from models import *
from preprocess import Preprocessor
import sys
saved_model = './weights/adem_model.pkl'
if __name__ == '__main__':
        pp = Preprocessor()
        adem = ADEM(pp, None, saved_model)
        f=open(sys.argv[1],'r')
        fw=open(sys.argv[1]+'.eval','w')
        context=[]
        true=[]
        model=[]
        for line in f:
                lines=line.strip().split('\t')
                if(len(lines)!=4 or len(lines[2])<=5):
                        continue
                context.append(lines[0])
                true.append(lines[1])
                model.append(lines[2])
        print 'Model Loaded!'
        final_score= adem.get_scores(context, true, model)
        for i in range(len(final_score)):
                fw.write(context[i]+'\t'+model[i]+'\t'+str((final_score[i]))+'\n')

The input file is as follows:

Is there something wrong with my input format? Could you please help me figure out why there is such a big difference between scores in my experiment and scores in the paper? Thanks!

noseworm / adem Goto Github PK

adem's Introduction

adem's People

Contributors

Stargazers

Watchers

Forkers

adem's Issues

The original dataset to train and evaluate ADEM

Questions about input format

Weights of VHRED is not loaded at all?

Where can I find the dataset used in the paper or this project?

human judgements (scores)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs