Thanks for your wonderful work. Can be used for other stance detection scene such as the SemEval-2016 Task 6 Sub-task A instead of the US Political Election? Many thanks for your reponse.
Hi, I am facing an issue when trying to load the model in Colab. I used the git clone to download all the files.
Also, the download by model name is not working for your model from my testing. Here is a screenshot when trying using the model name instead of the file path.
Thank you for the NAACL '21 paper. It provides an interesting way to incorporate knowledge in fine-tuning BERT. The experimental result is also comprehensive and persuasive.
But I have a question about the fine-tuning process
In Eqn 5, it seems to me that both $y_i$ and $\hat{y}_i$ with dimension equal to the vocabulary size. This makes output of the model a vector rather than a scalar. Then how do you compute gradient in this setting?
If multiple words are masked in the same sentence (see below). Then how do you decide the output of the second masked token when the first masked token is already predicted (for example [MASK] -> hilarious) ? Do you use the new token (hilarious) or you still stick to the original token (happy)?
ORIGINAL: I'm so happy Biden beat Trump in the debate
MASKED: I'm so [MASK] Biden [MASK] Trump in the debate.