arghosh / akt Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hi,
I converted a vcf file containing the information of the individual sample HG00096 taken from the 1000 Genomes project and convert it to bcf format using bcftools. I used data/wgs.hg38.vcf.gz but have been receiving the following error-
akt: Eigen/src/Core/DenseCoeffsBase.h:425: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator()(Eigen::Index) [with Derived = Eigen::Matrix<float, -1, 1>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = float; Eigen::Index = long int]: Assertion `index >= 0 && index < size()' failed.
Can you suggest any solution for this?
Thank you.
Examples:
The paper mentioned that train/valid/test is split based on users. If so, then the above should not happen, right?
Could you please explain the purpose of wtih torch.no_grad()
at line 304 in akt.py? I removed this line and ran python main.py --dataset assist2009_pid --model akt_pid
, but the model's performance degraded. The test AUC decreased from 0.826 to 0.824.
Hi,
Do you have the implementations for the other models listed in your paper? (SAKT, DKT, DKVMN)
I noticed there are lines of code meant for it, but it appears it was left out.
If you have the code, it would be greatly appreciated. I am working on a school project that implements a different dataset using your paper and code.
Thank you,
Marshall
In both assistment2009 and assistment2017, some problems contain more than one skills.
For example, there is a sequence in lines 854-856, assist2009_pid_test1.csv:
problemId | 7374 | 7374 | 7362 | 7362 | 7421 | 7421 | 8287 | 8287 | 7372 | 7372 | 7425 | 7425 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
skillId | 37 | 54 | 37 | 54 | 37 | 54 | 45 | 54 | 37 | 54 | 37 | 54 |
correct | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
Each of problems 7374, 7362, 7421, 8287, 7372, 7425 contains 2 skills. The student acts for only 6 times but 12 actions are recorded. We should not predict the performance of 2nd, 4th, 8th, 10th, 12th steps on the basis of information 1st, 3rd, 5th, 7th, 9th, 11th steps, respectively, because they are unavailable in reality. In addition, performance of 2nd, 4th, 8th, 10th, 12th steps is the same with that of 1st, 3rd, 5th, 7th, 9th, 11th steps, respectively, because they are actually from the same action.
In fact this problem is illustrated by Xiong et al. (Going Deeper with Deep Knowledge Tracing) in 2016. Why don't all the models care repeated response sequences with different skill tagging?
In assistment2017, some problems contain more than one skill as well, but in your processed data, each action only contains one skill. For example, in lines 14-16, assist2017_pid_test1.csv, problem 877 contains skill 6; but in lines 10-12, problem 877 contains skill 65. I don't know whether it disturbs Rasch model-based Embeddings in AKT.
Thanks for sharing the codes! It is a great help to people working on the topic.
I am testing AKT on the provided data. On ASSISTment2015, I tried many combinations of hyper-parameters as suggested in the paper. The best mean test AUC I can achieve is around 0.731. Not sure which hyper-parameter I configured wrongly. Is it possible to share the hyper-parameters used for AKT to achieve a mean test AUC of 0.7828 on ASSISTment2015?
Hi @arghosh,
I am curious of the meaning of kq_same in your model. Could you explain about it for me? Thanks.
Could u help understand this parameter a bit thanks
Hi, arghosh ! The idea of the paper is so amazing , and the code is so beautiful. Here I want to confirm some details about your code . I found that the sequence length of a student is 200 in your setting, and use 0 as padding number. In your implemention of attention mechanism I just found the upper triangular matrix as mask to ignore the influence of the time steps after current time step, but the padding values of the sequence may also should be ignored, that means these values should not be used to compute attention score. Well , thanks for you if you could solve my problem. : )
I saw scores = dropout(scores)
in line 331 of akt.py
This is the first time I see dropout applied to attention weight.
Any reference or reason for this?
what is the tool you used to draw the fig.2?
In line 83 of akt.py, the shape of qa_data is [BS, seqlen, 2], but in line 42,
self.qa_embed_diff = nn.Embedding(2 * self.n_question + 1, embed_l), the embedding input dims is inconsistent, is this right?
Please check it! Thanks!
Hello, I want to ask your opinion on the AKT model architecture.
the image above is the figure of AKT model represented in your paper
if self.n_pid > 0:
q_embed_diff_data = self.q_embed_diff(q_data) # d_ct
pid_embed_data = self.difficult_param(pid_data) # uq
q_embed_data = q_embed_data + pid_embed_data * \
q_embed_diff_data # uq *d_ct + c_ct
qa_embed_diff_data = self.qa_embed_diff(
qa_data) # f_(ct,rt) or #h_rt
if self.separate_qa:
qa_embed_data = qa_embed_data + pid_embed_data * \
qa_embed_diff_data # uq* f_(ct,rt) + e_(ct,rt)
else:
qa_embed_data = qa_embed_data + pid_embed_data * \
(qa_embed_diff_data+q_embed_diff_data) # + uq *(h_rt+d_ct)
c_reg_loss = (pid_embed_data ** 2.).sum() * self.l2
and the code above is what you implemented at akt.py.
The point is that I think AKT model has a chance to know the target answers with "f(c_t, r_t) variation vector" (at the paper), which is "qa_embed_diff_data" (at your code). In my opinion, this is related to already-known target issue.
To resolve the issue, I carefully suggest modifying Architecture forward function as the following code:
else: # dont peek current response
pad_zero = torch.zeros(batch_size, 1, x.size(-1)).to(self.device)
q = x
k = torch.cat([pad_zero, x[:, :-1, :]], dim=1)
v = torch.cat([pad_zero, y[:, :-1, :]], dim=1)
x = block(mask=0, query=q, key=k, values=v, apply_pos=True)
flag_first = True
thank you for your attention :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.