knewton / edm2016 Goto Github PK

Code for replicating results in our EDM2016 paper

License: Apache License 2.0

Python 100.00%

edm2016's Introduction

IRT and DKT implementation

This library contains implementations of IRT models and Deep Knowledge Tracing (DKT) that reproduces the results reported in "Back to the Basics: Bayesian extensions of IRT outperform neural networks for proficiency estimation" (Wilson, Karklin, Han, Ekanadham EDM2016).

Implemented models

IRT

Bayesian versions of one and two parameter Item Response Theory models. The likelihood is given by the ogive item response function, and priors on student and item parameters are standard normal distributions.

Hierarchical IRT

Implementation of an IRT model that extends the model above with a Gaussian hyper-prior on item difficulties.

DKT

Recurrent neural network implemented using Theano.

Requirements (see `requirements.in`)

python
theano
numpy
scipy
ipython
pandas
igraph

Data

The ASSISTments data set may be found here. Note that the authors of the data set have since removed several duplicates from the original data set which we used. However, as we explain in the paper, our preprocessing steps involved removing these duplicates as well. Thus, while we used the original data set, both the original and the corrected versions should duplicate our results.

The KDD Cup data set may be found here. We used the Bridge to Algebra 2006-2007 data set, and specifically the training data set.

Usage

    Usage: rnn_prof [OPTIONS] COMMAND [ARGS]...

      Collection of scripts for evaluating RNN proficiency models

    Options:
      -h, --help  Show this message and exit.

    Commands:
      irt    Run IRT to get item parameters and compute...
      naive  Just report the percent correct across all...
      rnn    RNN based proficiency estimation :param str...

To reproduce results in the EDM2016 paper:

construct the 20/80 split data sets (20% for model parameter selection, e.g., prior parameters, RNN layer sizes; 80% for train/test) using data/split_data.py, python split_data.py bridge_to_algebra_2006_2007_train.txt "Anon Student Id" "\t", python split_data.py skill_builder_data.csv user_id ","
execute the following commands:

IRT

rnn_prof irt assistments skill_builder_data_big.txt --onepo \
--drop-duplicates --no-remove-skill-nans --num-folds 5 \
--item-id-col problem_id --concept-id-col single 

rnn_prof irt kddcup bridge_to_algebra_2006_2007_train_big.txt \
--onepo --drop-duplicates --no-remove-skill-nans --num-folds 5 \
--item-id-col 'Step Name' --concept-id-col single

HIRT

rnn_prof irt assistments skill_builder_data_big.txt --onepo \
--drop-duplicates --no-remove-skill-nans --num-folds 5 \
--item-precision 4.0 --template-precision 2.0 \
--template-id-col template_id --item-id-col problem_id \
--concept-id-col single

rnn_prof irt kddcup bridge_to_algebra_2006_2007_train_big.txt  --onepo \
--drop-duplicates --no-remove-skill-nans --num-folds 5 \
--item-precision 2.0 --template-precision 4.0 -m 5000 \
--template-id-col template_id --item-id-col problem_id \
--concept-id-col single

DKT

rnn_prof rnn assistments skill_builder_data_big.txt  \
--no-remove-skill-nans --drop-duplicates --num-folds 5 \
--item-id-col problem_id --num-iters 50 --dropout-prob 0.25 \
--first-learning-rate 5.0  --compress-dim 50 --hidden-dim 100 

rnn_prof rnn kddcup bridge_to_algebra_2006_2007_train_big.txt  \
--no-remove-skill-nans --drop-duplicates --num-folds 5 --item-id-col KC \
--num-iters 50 --dropout-prob 0.25 --first-learning-rate 5.0 \
--compress-dim 50 --hidden-dim 100

edm2016's People

Contributors

Stargazers

Watchers

edm2016's Issues

Running without Tox

Hi there,
I'm very interested in implementing IRT; this repo seems to be well-suited to what I am to achieve. However, i find it very difficult to play around with the code when I have to continuously run the command tox after any changes. What would be the best approach to achieve the same results WITHOUT using tox? What files should I individually run?

I also hope to implement this in a web application in the future, so knowing which files to run (without tox) would help as well.

Thanks.

Questions on training the model

What are your thoughts on training a model with pre-existing data? For instance, DKT/logistic models do not require any student/item specific parameters, hence, the model could be trained with data collected elsewhere, and then implemented in an application to make immediate, accurate predictions.
For a model that does not require item or student parameters, would this be appropriate? What are the benefits of using data from the same students/items to train general weights? A model trained on dataset A could then be tested on datasets B and C, just like a real flashcard scenario; what are your thoughts on this?
When trying to evaluate an IRT model through online prediction accuracy, after determining all item parameters, is the ability parameter updated through retraining the model with ALL the data collected thus far (all the students + training data), or just the students’ INDIVIDUAL data? In other words, what data is used to train the student-level parameters?
Thanks again, looking forward to your response.

Maximum interactions?

I must be reading the code wrong in someway, but I'm running into an issue where the max_inter variables drops all values in the data frame. When I look at the common options I see the default set at 0 which seems to be the origin of this issue.

common_options.add('--max-inter', '-m', type=int, default=0, help="Maximum interactions per user", extra_callback=lambda ctx, param, value: value or None)

Online Prediction Accuracy

Regarding the paper titled “Back to the basics: Bayesian extensions of IRT outperform neural networks for proficiency estimation”, I am interested in the online prediction accuracy metric of evaluation.

Couple questions (in relation to the 1PL IRT model):

In this metric, students are split into training and testing populations. In a real life scenario, the initial training population used to determine item-level parameters would not always be available, especially in a flashcard application, where predictions are required immediately without any prior item-level parameter estimation.

In such a situation, is an IRT model unsuitable? Must the IRT model have initial data to work with, before making predictions; or can the model be continuously trained from the start? If so, what would be the default parameters to start with?

When you say the students are split into training and testing populations, what is the ratio between the populations? 70/30? 60/40?

Thanks so much for your time, looking forward to your response.

problem_id key error

Hi!

I'm trying to reproduce the results of the EDM2016 paper to see if HIRT is viable (rapid enough) for real time computation, and it seems that I have some issues with the code (I didn't modify it). When I try to run HIRT with the Bridge to Algebra dataset, I have the following error:

...
File "/path/edm2016/.tox/py27/lib/python2.7/site-packages/rnn_prof/cli.py", line 163, in irt
    data, _, _, _, _ = load_data(data_file, source, data_opts)
  File "/path/edm2016/.tox/py27/lib/python2.7/site-packages/rnn_prof/data/wrapper.py", line 77, in load_data
    min_interactions_per_user=data_opts.min_interactions_per_user)
  File "/path/edm2016/.tox/py27/lib/python2.7/site-packages/rnn_prof/data/kddcup.py", line 99, in load_data
    data = data.sort(sort_keys)
...
KeyError: u'problem_id'

I don't know why this happens, any help would be appreciated!
Thanks

About reproduce the result in this paper

Hi, I read this paper a few weeks ago, I want to implement IRT, I read several paper but still don't know how to implement it. So I download this project to see how the code were organized. First I wanted to reproduce the result, but when I run
rnn_prof irt assistments skill_builder_data_big.txt --onepo \ --drop-duplicates --no-remove-skill-nans --num-folds 5 \ --item-id-col problem_id --concept-id-col single
The system warned that "rnn_prof is not an internal or external command". So I want to know what mistake I have made, and how can I reproduce this result.

Code for TIRT?

Hi I read your paper Back to the Basics: Bayesian extensions of IRT outperform neural networks for proficiency estimation, and really want to try TIRT on my dataset, but I can't find the code in this repo. Is that possible to add it to this repo?or is it confidential? Thanks!

Obtaining Prediction Result

We're working to build a learner model in our research project, and we'd love to be able to use your code base as a starting point. I am looking to obtain prediction result for each individual instead of an aggregate AUC. Can you point me towards where to look in the code?

Thanks in advance!

How to get the parameters of each item?

Hi I'm trying to get the parameters of each item in the HIRT model. I found these two functions in OnePOLearner. But they are actually the same and I'm wondering what's the meaning of offset coefficient.

    def get_difficulty(self, item_ids):
        """ Get the difficulties (in standard 1PO units) of an item or a set of items.
        :param item_ids: ids of the requested items
        :return: the difficulties (-offset_coeff)
        :rtype: np.ndarray
        """
        return -self.nodes[OFFSET_COEFFS_KEY].get_data_by_id(item_ids)

    def get_offset_coeff(self, item_ids):
        """ Get the offset coefficient of an item or a set of items.
        :param item_ids: ids of the requested items
        :return: the offset coefficient(s)
        :rtype: np.ndarray
        """
        return self.nodes[OFFSET_COEFFS_KEY].get_data_by_id(item_ids)

Key error: Hashing function?

I am just wondering what are the format requirement of Problem Id and Step Name column. I have tried to reuse the code for a research I am doing, but there would always be a key error when I tried to run the code on my data, unless I use the exact same problem id as the one in the KDD Cup dataset.

Any help will be appreciated.