arghosh / akt Goto Github PK

View Code? Open in Web Editor NEW

88.0 88.0 30.0 8.69 MB

License: MIT License

Python 100.00%

akt's People

Contributors

Stargazers

Watchers

akt's Issues

What is PID assist2009_pid

Could u help understand this parameter a bit thanks

"key_padding_mask" in attention mechanism not be implemented?

Hi, arghosh ! The idea of the paper is so amazing , and the code is so beautiful. Here I want to confirm some details about your code . I found that the sequence length of a student is 200 in your setting, and use 0 as padding number. In your implemention of attention mechanism I just found the upper triangular matrix as mask to ignore the influence of the time steps after current time step, but the padding values of the sequence may also should be ignored, that means these values should not be used to compute attention score. Well , thanks for you if you could solve my problem. : )

Padding Problem

Why scores[:, :, 1:, :]? It should be scores[:, :, 0:, :] here?

Masking or slicing allowing the model to use previous interactions

Hi @arghosh, First of all, thank you for sharing your great work.

Regarding #3, I'm wondering which part of the source code is masking or slicing the responses (targets).

That is, the part to allow the model to use (1:t-1) responses for predicting t^th response: p(r_t | q_{1 : t-1}, r_{1: t-1}).

Why don't all the models care repeated response sequences with different skill tagging?

In both assistment2009 and assistment2017, some problems contain more than one skills.
For example, there is a sequence in lines 854-856, assist2009_pid_test1.csv:

problemId	7374	7374	7362	7362	7421	7421	8287	8287	7372	7372	7425	7425
skillId	37	54	37	54	37	54	45	54	37	54	37	54
correct	0	0	1	1	0	0	1	1	1	1	1	1

Each of problems 7374, 7362, 7421, 8287, 7372, 7425 contains 2 skills. The student acts for only 6 times but 12 actions are recorded. We should not predict the performance of 2nd, 4th, 8th, 10th, 12th steps on the basis of information 1st, 3rd, 5th, 7th, 9th, 11th steps, respectively, because they are unavailable in reality. In addition, performance of 2nd, 4th, 8th, 10th, 12th steps is the same with that of 1st, 3rd, 5th, 7th, 9th, 11th steps, respectively, because they are actually from the same action.

In fact this problem is illustrated by Xiong et al. (Going Deeper with Deep Knowledge Tracing) in 2016. Why don't all the models care repeated response sequences with different skill tagging?

In assistment2017, some problems contain more than one skill as well, but in your processed data, each action only contains one skill. For example, in lines 14-16, assist2017_pid_test1.csv, problem 877 contains skill 6; but in lines 10-12, problem 877 contains skill 65. I don't know whether it disturbs Rasch model-based Embeddings in AKT.

Comparison to other models (SAKT, DKT, DKVMN)

Hi,

Do you have the implementations for the other models listed in your paper? (SAKT, DKT, DKVMN)

I noticed there are lines of code meant for it, but it appears it was left out.

If you have the code, it would be greatly appreciated. I am working on a school project that implements a different dataset using your paper and code.

Thank you,
Marshall

A bug of Rasch Model-Based Embeddings, Please check it!

In line 83 of akt.py, the shape of qa_data is [BS, seqlen, 2], but in line 42,
self.qa_embed_diff = nn.Embedding(2 * self.n_question + 1, embed_l), the embedding input dims is inconsistent, is this right?

Please check it! Thanks!

kq_same?

Hi @arghosh,

I am curious of the meaning of kq_same in your model. Could you explain about it for me? Thanks.

dropout in attention function

I saw scores = dropout(scores) in line 331 of akt.py
This is the first time I see dropout applied to attention weight.
Any reference or reason for this?

hyper-parameters of AKT for achieving the best mean test AUC

Thanks for sharing the codes! It is a great help to people working on the topic.
I am testing AKT on the provided data. On ASSISTment2015, I tried many combinations of hyper-parameters as suggested in the paper. The best mean test AUC I can achieve is around 0.731. Not sure which hyper-parameter I configured wrongly. Is it possible to share the hyper-parameters used for AKT to achieve a mean test AUC of 0.7828 on ASSISTment2015?

why in test() function. The qa_data pass into model?

why in test() function. The all qa_data pass into model? That is, the i'th response passed into model to predict for the i'th response?

Issue while running AKT with individual snp data

Hi,
I converted a vcf file containing the information of the individual sample HG00096 taken from the 1000 Genomes project and convert it to bcf format using bcftools. I used data/wgs.hg38.vcf.gz but have been receiving the following error-

akt: Eigen/src/Core/DenseCoeffsBase.h:425: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator()(Eigen::Index) [with Derived = Eigen::Matrix<float, -1, 1>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = float; Eigen::Index = long int]: Assertion `index >= 0 && index < size()' failed.

Can you suggest any solution for this?
Thank you.

target response issue in AKT model

Hello, I want to ask your opinion on the AKT model architecture.

the image above is the figure of AKT model represented in your paper

    if self.n_pid > 0:
        q_embed_diff_data = self.q_embed_diff(q_data)  # d_ct
        pid_embed_data = self.difficult_param(pid_data)  # uq
        q_embed_data = q_embed_data + pid_embed_data * \
            q_embed_diff_data  # uq *d_ct + c_ct
        qa_embed_diff_data = self.qa_embed_diff(
            qa_data)  # f_(ct,rt) or #h_rt
        if self.separate_qa:
            qa_embed_data = qa_embed_data + pid_embed_data * \
                qa_embed_diff_data  # uq* f_(ct,rt) + e_(ct,rt)
        else:
            qa_embed_data = qa_embed_data + pid_embed_data * \
                (qa_embed_diff_data+q_embed_diff_data)  # + uq *(h_rt+d_ct)
        c_reg_loss = (pid_embed_data ** 2.).sum() * self.l2

and the code above is what you implemented at akt.py.

The point is that I think AKT model has a chance to know the target answers with "f(c_t, r_t) variation vector" (at the paper), which is "qa_embed_diff_data" (at your code). In my opinion, this is related to already-known target issue.

To resolve the issue, I carefully suggest modifying Architecture forward function as the following code:

        else:  # dont peek current response
            pad_zero = torch.zeros(batch_size, 1, x.size(-1)).to(self.device)
            q = x
            k = torch.cat([pad_zero, x[:, :-1, :]], dim=1)
            v = torch.cat([pad_zero, y[:, :-1, :]], dim=1)
            x = block(mask=0, query=q, key=k, values=v, apply_pos=True) 
            flag_first = True

thank you for your attention :)

Loss of Model Performance after Removing 'with torch.no_grad()' at Line 304 in akt.py

Could you please explain the purpose of wtih torch.no_grad() at line 304 in akt.py? I removed this line and ran python main.py --dataset assist2009_pid --model akt_pid, but the model's performance degraded. The test AUC decreased from 0.826 to 0.824.

How to draw the Fig.2?

what is the tool you used to draw the fig.2?

statics data: some users occur in a same dataset more than once or occur in both training and validation/testing data

Examples:

users 976, 909 occur in statics_train1.csv twice
users 1024, 874 occur in both statics_valid1.csv and statics_train1.csv
users 864, 357 occur in both statics_valid1.csv and statics_test1.csv.

The paper mentioned that train/valid/test is split based on users. If so, then the above should not happen, right?

arghosh / akt Goto Github PK

akt's People

Contributors

Stargazers

Watchers

Forkers

akt's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs