GithubHelp home page GithubHelp logo

convai's People

Contributors

koustuvsinha avatar noseworm avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

afcarl

convai's Issues

[New Model] Question Generator

Let's take squad dataset and build the following:

given article, generate one or more questions
This could be done as a preprocessing step just once in the whole conversation if we store all the generated responses for the length of the conversation

[New Model] Wikipedia fact extractor

Extract facts from Wikipedia using Elasticsearch api

Implementation ideas:

  • Download the latest wikipedia dump
  • Run Elasticsearch server
  • Query by entity
  • Return first line of the top article

The wikipedia dump can be found here. Although, given the huge size of this dump, I don't know whether it would be feasible for us to wrap this within our docker container. Easier way to tackle this problem would be to use the python wikipedia api, but as per the convai rules we are not allowed to place external api calls.

  • Verify with the organizers.

Check Docker push

The docker file is now at 51GB, check if the correct version is getting pushed.

Resolving anaphora

I think we should probably have anaphora resolution as a preprocessing step? Idk how hard this would be but I wanted to see what people thought about this?

[New Model] VHRED Retrieval model

We already have Dual Encoder retrieval model, but the responses given by Rosemary in Ethics paper seems pretty good. can we quickly add the existing vhred model? (should be straightforward)

[New Feature] Topic model to extract latent topics for each article

Could be useful to implement a topic classification model using fasttext to extract the topics which are being talked in the article about.

Implementation plan:

  • Get a set of general topics from wibi taxonomy
  • Get wikipedia articles for each of the above topics, including their children (prune to children having at least 10 child nodes)
  • Train fasttext
  • Evaluate

[Epic] Features List to implement for the RankerNN

List of features to implement / implemented for the RankerNN. If you have worked / want to work on a particular feature claim it beside the point. When the bullet point is done mark it done with an x in between the brackets like this [x]. The list is extracted from this google doc.

  • Averaged word embeddings (dimension : 300)
    • candidate response (Nicolas)
    • last user turn (Nicolas)
    • last k turns (Nicolas)
    • last q user turns (Nicolas)
    • article (Nicolas)
  • Embedding similarity metrics w.r.t the candidate response
    • last user turn (Nicolas)
      • greedy score
      • average score
      • Extrema score
    • last k turns (Nicolas)
      • greedy score
      • average score
      • Extrema score
    • last k turns without stop words (Nicolas)
      • greedy score
      • average score
      • Extrema score
    • last q user turns (Nicolas)
      • greedy score
      • average score
      • Extrema score
    • last q user turns without stop words (Nicolas)
      • greedy score
      • average score
      • Extrema score
    • article (Nicolas)
      • greedy score
      • average score
      • Extrema score
    • article without stop words (Nicolas)
      • greedy score
      • average score
      • Extrema score
  • word/entity overlap:
    • non-stop word overlap (Nicolas)
    • bigram overlap (Koustuv)
    • trigram overlap (Koustuv)
    • entity overlap (Koustuv)
  • generic turn (Nicolas)
  • word presence:
    • โ€œWhโ€-word (Koustuv)
    • intensifier word (Koustuv)
    • confusion word (Koustuv)
    • Profanity list (Koustuv)
    • negation word (Nicolas)
  • lengths:
    • dialogue length (Koustuv)
    • last user turn length
    • response length
    • article length (Koustuv)
  • dialogue act
  • Sentiment score (https://github.com/cjhutto/vaderSentiment - https://pypi.python.org/pypi/vaderSentiment) (Koustuv)

[New Model] ALICE Bot

Incorporate ALICE BOT

Implementation plan:

  • Get Alicebot AIML files. Consult Julian about the files they used
  • Load Alicebot as a new model (simple python import should work)

List of AIML files we can use can be found in the MILABOT repo. Although would be great to find some open resource for this.

In order to implement this, first install python package aiml, then following this tutorial, load the aiml files. Save the file state in "brain" for faster processing.

While this is a good addition, I have some apprehensions in using this, as its mostly rule based = less innovation

[New Model] Better QA - Neural QA

We have a followup questions model but it only generates very basic one-liner questions ("what","why","huh"). Since we have documents and articles I propose we use something like this : Neural QA (accompanying paper) . The original code, again, is in Torch7. Would it be useful to use it as is or port it quickly to PyTorch?

[AMT] Create evaluation set for AMT

In order to maximize immediate response, we could create an evaluation dataset, where given a context dialogs and article, we provide all the candidate responses, and ask user to rate / rank them.

[NEW MODEL] question topic extraction

detect when user asks something related to the article.
example of training set: squad with (article, question, flag) triples where question is either related (flag=1) or not (flag=0) to the article

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.