GithubHelp home page GithubHelp logo

akt's People

Contributors

arghosh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

akt's Issues

"key_padding_mask" in attention mechanism not be implemented?

Hi, arghosh ! The idea of the paper is so amazing , and the code is so beautiful. Here I want to confirm some details about your code . I found that the sequence length of a student is 200 in your setting, and use 0 as padding number. In your implemention of attention mechanism I just found the upper triangular matrix as mask to ignore the influence of the time steps after current time step, but the padding values of the sequence may also should be ignored, that means these values should not be used to compute attention score. Well , thanks for you if you could solve my problem. : )

Padding Problem

image

Why scores[:, :, 1:, :]? It should be scores[:, :, 0:, :] here?

Why don't all the models care repeated response sequences with different skill tagging?

In both assistment2009 and assistment2017, some problems contain more than one skills.
For example, there is a sequence in lines 854-856, assist2009_pid_test1.csv:

problemId 7374 7374 7362 7362 7421 7421 8287 8287 7372 7372 7425 7425
skillId 37 54 37 54 37 54 45 54 37 54 37 54
correct 0 0 1 1 0 0 1 1 1 1 1 1

Each of problems 7374, 7362, 7421, 8287, 7372, 7425 contains 2 skills. The student acts for only 6 times but 12 actions are recorded. We should not predict the performance of 2nd, 4th, 8th, 10th, 12th steps on the basis of information 1st, 3rd, 5th, 7th, 9th, 11th steps, respectively, because they are unavailable in reality. In addition, performance of 2nd, 4th, 8th, 10th, 12th steps is the same with that of 1st, 3rd, 5th, 7th, 9th, 11th steps, respectively, because they are actually from the same action.

In fact this problem is illustrated by Xiong et al. (Going Deeper with Deep Knowledge Tracing) in 2016. Why don't all the models care repeated response sequences with different skill tagging?

In assistment2017, some problems contain more than one skill as well, but in your processed data, each action only contains one skill. For example, in lines 14-16, assist2017_pid_test1.csv, problem 877 contains skill 6; but in lines 10-12, problem 877 contains skill 65. I don't know whether it disturbs Rasch model-based Embeddings in AKT.

Comparison to other models (SAKT, DKT, DKVMN)

Hi,

Do you have the implementations for the other models listed in your paper? (SAKT, DKT, DKVMN)

I noticed there are lines of code meant for it, but it appears it was left out.

If you have the code, it would be greatly appreciated. I am working on a school project that implements a different dataset using your paper and code.

Thank you,
Marshall

A bug of Rasch Model-Based Embeddings, Please check it!

In line 83 of akt.py, the shape of qa_data is [BS, seqlen, 2], but in line 42,
self.qa_embed_diff = nn.Embedding(2 * self.n_question + 1, embed_l), the embedding input dims is inconsistent, is this right?

Please check it! Thanks!

kq_same?

Hi @arghosh,

I am curious of the meaning of kq_same in your model. Could you explain about it for me? Thanks.

dropout in attention function

I saw scores = dropout(scores) in line 331 of akt.py
This is the first time I see dropout applied to attention weight.
Any reference or reason for this?

hyper-parameters of AKT for achieving the best mean test AUC

Thanks for sharing the codes! It is a great help to people working on the topic.
I am testing AKT on the provided data. On ASSISTment2015, I tried many combinations of hyper-parameters as suggested in the paper. The best mean test AUC I can achieve is around 0.731. Not sure which hyper-parameter I configured wrongly. Is it possible to share the hyper-parameters used for AKT to achieve a mean test AUC of 0.7828 on ASSISTment2015?

Issue while running AKT with individual snp data

Hi,
I converted a vcf file containing the information of the individual sample HG00096 taken from the 1000 Genomes project and convert it to bcf format using bcftools. I used data/wgs.hg38.vcf.gz but have been receiving the following error-

akt: Eigen/src/Core/DenseCoeffsBase.h:425: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator()(Eigen::Index) [with Derived = Eigen::Matrix<float, -1, 1>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = float; Eigen::Index = long int]: Assertion `index >= 0 && index < size()' failed.

Can you suggest any solution for this?
Thank you.

target response issue in AKT model

Hello, I want to ask your opinion on the AKT model architecture.

image

the image above is the figure of AKT model represented in your paper

    if self.n_pid > 0:
        q_embed_diff_data = self.q_embed_diff(q_data)  # d_ct
        pid_embed_data = self.difficult_param(pid_data)  # uq
        q_embed_data = q_embed_data + pid_embed_data * \
            q_embed_diff_data  # uq *d_ct + c_ct
        qa_embed_diff_data = self.qa_embed_diff(
            qa_data)  # f_(ct,rt) or #h_rt
        if self.separate_qa:
            qa_embed_data = qa_embed_data + pid_embed_data * \
                qa_embed_diff_data  # uq* f_(ct,rt) + e_(ct,rt)
        else:
            qa_embed_data = qa_embed_data + pid_embed_data * \
                (qa_embed_diff_data+q_embed_diff_data)  # + uq *(h_rt+d_ct)
        c_reg_loss = (pid_embed_data ** 2.).sum() * self.l2

and the code above is what you implemented at akt.py.

The point is that I think AKT model has a chance to know the target answers with "f(c_t, r_t) variation vector" (at the paper), which is "qa_embed_diff_data" (at your code). In my opinion, this is related to already-known target issue.

To resolve the issue, I carefully suggest modifying Architecture forward function as the following code:

        else:  # dont peek current response
            pad_zero = torch.zeros(batch_size, 1, x.size(-1)).to(self.device)
            q = x
            k = torch.cat([pad_zero, x[:, :-1, :]], dim=1)
            v = torch.cat([pad_zero, y[:, :-1, :]], dim=1)
            x = block(mask=0, query=q, key=k, values=v, apply_pos=True) 
            flag_first = True

thank you for your attention :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.