GithubHelp home page GithubHelp logo

Comments (10)

luofuli avatar luofuli commented on April 26, 2024

It seems that the real error occurred in run_classifier_multi_task.py, Line 1056. Could you please comment out the try...except block, just keep Line 1056 to see what the real error is?

from alicemind.

LemXiong avatar LemXiong commented on April 26, 2024

if I use init_checkpoint to store the path and just keep line 1056, the error will be:

RuntimeError: Error(s) in loading state_dict for BertModel:
Missing key(s) in state_dict: "embeddings.word_embeddings.weight", "embeddings.position_embeddings.weight", "embeddings.token_type_embeddings.weight", "embeddings.LayerNorm.gamma", "embeddings.LayerNorm.beta", "encoder.layer.0.attention.self.query.weight", "encoder.layer.0.attention.self.query.bias", "encoder.layer.0.attention.self.key.weight", "encoder.layer.0.attention.self.key.bias", "encoder.layer.0.attention.self.value.weight", "encoder.layer.0.attention.self.value.bias", "encoder.layer.0.attention.output.dense.weight", "encoder.layer.0.attention.output.dense.bias", "encoder.layer.0.attention.output.LayerNorm.gamma", "encoder.layer.0.attention.output.LayerNorm.beta", "encoder.layer.0.intermediate.dense.weight", "encoder.layer.0.intermediate.dense.bias", "encoder.layer.0.output.dense.weight", "encoder.layer.0.output.dense.bias", "encoder.layer.0.output.LayerNorm.gamma", "encoder.layer.0.output.LayerNorm.beta", "encoder.layer.1.attention.self.query.weight", "encoder.layer.1.attention.self.query.bias", "encoder.layer.1.attention.self.key.weight", "encoder.layer.1.attention.self.key.bias", "encoder.layer.1.attention.self.value.weight", "encoder.layer.1.attention.self.value.bias", "encoder.layer.1.attention.output.dense.weight", "encoder.layer.1.attention.output.dense.bias", "encoder.layer.1.attention.output.LayerNorm.gamma", "encoder.layer.1.attention.output.LayerNorm.beta", "encoder.layer.1.intermediate.dense.weight", "encoder.layer.1.intermediate.dense.bias", "encoder.layer.1.output.dense.weight", "encoder.layer.1.output.dense.bias", "encoder.layer.1.output.LayerNorm.gamma", "encoder.layer.1.output.LayerNorm.beta", "encoder.layer.2.attention.self.query.weight", "encoder.layer.2.attention.self.query.bias", "encoder.layer.2.attention.self.key.weight", "encoder.layer.2.attention.self.key.bias", "encoder.layer.2.attention.self.value.weight", "encoder.layer.2.attention.self.value.bias", "encoder.layer.2.attention.output.dense.weight", "encoder.layer.2.attention.output.dense.bias", "encoder.layer.2.attention.output.LayerNorm.gamma", "encoder.layer.2.attention.output.LayerNorm.beta", "encoder.layer.2.intermediate.dense.weight", "encoder.layer.2.intermediate.dense.bias", "encoder.layer.2.output.dense.weight", "encoder.layer.2.output.dense.bias", "encoder.layer.2.output.LayerNorm.gamma", "encoder.layer.2.output.LayerNorm.beta", "encoder.layer.3.attention.self.query.weight", "encoder.layer.3.attention.self.query.bias", "encoder.layer.3.attention.self.key.weight", "encoder.layer.3.attention.self.key.bias", "encoder.layer.3.attention.self.value.weight", "encoder.layer.3.attention.self.value.bias", "encoder.layer.3.attention.output.dense.weight", "encoder.layer.3.attention.output.dense.bias", "encoder.layer.3.attention.output.LayerNorm.gamma", "encoder.layer.3.attention.output.LayerNorm.beta", "encoder.layer.3.intermediate.dense.weight", "encoder.layer.3.intermediate.dense.bias", "encoder.layer.3.output.dense.weight", "encoder.layer.3.output.dense.bias", "encoder.layer.3.output.LayerNorm.gamma", "encoder.layer.3.output.LayerNorm.beta", "encoder.layer.4.attention.self.query.weight", "encoder.layer.4.attention.self.query.bias", "encoder.layer.4.attention.self.key.weight", "encoder.layer.4.attention.self.key.bias", "encoder.layer.4.attention.self.value.weight", "encoder.layer.4.attention.self.value.bias", "encoder.layer.4.attention.output.dense.weight", "encoder.layer.4.attention.output.dense.bias", "encoder.layer.4.attention.output.LayerNorm.gamma", "encoder.layer.4.attention.output.LayerNorm.beta", "encoder.layer.4.intermediate.dense.weight", "encoder.layer.4.intermediate.dense.bias", "encoder.layer.4.output.dense.weight", "encoder.layer.4.output.dense.bias", "encoder.layer.4.output.LayerNorm.gamma", "encoder.layer.4.output.LayerNorm.beta", "encoder.layer.5.attention.self.query.weight", "encoder.layer.5.attention.self.query.bias", "encoder.layer.5.attention.self.key.weight", "encoder.layer.5.attention.self.key.bias", "encoder.layer.5.attention.self.value.weight", "encoder.layer.5.attention.self.value.bias", "encoder.layer.5.attention.output.dense.weight", "encoder.layer.5.attention.output.dense.bias", "encoder.layer.5.attention.output.LayerNorm.gamma", "encoder.layer.5.attention.output.LayerNorm.beta", "encoder.layer.5.intermediate.dense.weight", "encoder.layer.5.intermediate.dense.bias", "encoder.layer.5.output.dense.weight", "encoder.layer.5.output.dense.bias", "encoder.layer.5.output.LayerNorm.gamma", "encoder.layer.5.output.LayerNorm.beta", "encoder.layer.6.attention.self.query.weight", "encoder.layer.6.attention.self.query.bias", "encoder.layer.6.attention.self.key.weight", "encoder.layer.6.attention.self.key.bias", "encoder.layer.6.attention.self.value.weight", "encoder.layer.6.attention.self.value.bias", "encoder.layer.6.attention.output.dense.weight", "encoder.layer.6.attention.output.dense.bias", "encoder.layer.6.attention.output.LayerNorm.gamma", "encoder.layer.6.attention.output.LayerNorm.beta", "encoder.layer.6.intermediate.dense.weight", "encoder.layer.6.intermediate.dense.bias", "encoder.layer.6.output.dense.weight", "encoder.layer.6.output.dense.bias", "encoder.layer.6.output.LayerNorm.gamma", "encoder.layer.6.output.LayerNorm.beta", "encoder.layer.7.attention.self.query.weight", "encoder.layer.7.attention.self.query.bias", "encoder.layer.7.attention.self.key.weight", "encoder.layer.7.attention.self.key.bias", "encoder.layer.7.attention.self.value.weight", "encoder.layer.7.attention.self.value.bias", "encoder.layer.7.attention.output.dense.weight", "encoder.layer.7.attention.output.dense.bias", "encoder.layer.7.attention.output.LayerNorm.gamma", "encoder.layer.7.attention.output.LayerNorm.beta", "encoder.layer.7.intermediate.dense.weight", "encoder.layer.7.intermediate.dense.bias", "encoder.layer.7.output.dense.weight", "encoder.layer.7.output.dense.bias", "encoder.layer.7.output.LayerNorm.gamma", "encoder.layer.7.output.LayerNorm.beta", "encoder.layer.8.attention.self.query.weight", "encoder.layer.8.attention.self.query.bias", "encoder.layer.8.attention.self.key.weight", "encoder.layer.8.attention.self.key.bias", "encoder.layer.8.attention.self.value.weight", "encoder.layer.8.attention.self.value.bias", "encoder.layer.8.attention.output.dense.weight", "encoder.layer.8.attention.output.dense.bias", "encoder.layer.8.attention.output.LayerNorm.gamma", "encoder.layer.8.attention.output.LayerNorm.beta", "encoder.layer.8.intermediate.dense.weight", "encoder.layer.8.intermediate.dense.bias", "encoder.layer.8.output.dense.weight", "encoder.layer.8.output.dense.bias", "encoder.layer.8.output.LayerNorm.gamma", "encoder.layer.8.output.LayerNorm.beta", "encoder.layer.9.attention.self.query.weight", "encoder.layer.9.attention.self.query.bias", "encoder.layer.9.attention.self.key.weight", "encoder.layer.9.attention.self.key.bias", "encoder.layer.9.attention.self.value.weight", "encoder.layer.9.attention.self.value.bias", "encoder.layer.9.attention.output.dense.weight", "encoder.layer.9.attention.output.dense.bias", "encoder.layer.9.attention.output.LayerNorm.gamma", "encoder.layer.9.attention.output.LayerNorm.beta", "encoder.layer.9.intermediate.dense.weight", "encoder.layer.9.intermediate.dense.bias", "encoder.layer.9.output.dense.weight", "encoder.layer.9.output.dense.bias", "encoder.layer.9.output.LayerNorm.gamma", "encoder.layer.9.output.LayerNorm.beta", "encoder.layer.10.attention.self.query.weight", "encoder.layer.10.attention.self.query.bias", "encoder.layer.10.attention.self.key.weight", "encoder.layer.10.attention.self.key.bias", "encoder.layer.10.attention.self.value.weight", "encoder.layer.10.attention.self.value.bias", "encoder.layer.10.attention.output.dense.weight", "encoder.layer.10.attention.output.dense.bias", "encoder.layer.10.attention.output.LayerNorm.gamma", "encoder.layer.10.attention.output.LayerNorm.beta", "encoder.layer.10.intermediate.dense.weight", "encoder.layer.10.intermediate.dense.bias", "encoder.layer.10.output.dense.weight", "encoder.layer.10.output.dense.bias", "encoder.layer.10.output.LayerNorm.gamma", "encoder.layer.10.output.LayerNorm.beta", "encoder.layer.11.attention.self.query.weight", "encoder.layer.11.attention.self.query.bias", "encoder.layer.11.attention.self.key.weight", "encoder.layer.11.attention.self.key.bias", "encoder.layer.11.attention.self.value.weight", "encoder.layer.11.attention.self.value.bias", "encoder.layer.11.attention.output.dense.weight", "encoder.layer.11.attention.output.dense.bias", "encoder.layer.11.attention.output.LayerNorm.gamma", "encoder.layer.11.attention.output.LayerNorm.beta", "encoder.layer.11.intermediate.dense.weight", "encoder.layer.11.intermediate.dense.bias", "encoder.layer.11.output.dense.weight", "encoder.layer.11.output.dense.bias", "encoder.layer.11.output.LayerNorm.gamma", "encoder.layer.11.output.LayerNorm.beta", "encoder.layer.12.attention.self.query.weight", "encoder.layer.12.attention.self.query.bias", "encoder.layer.12.attention.self.key.weight", "encoder.layer.12.attention.self.key.bias", "encoder.layer.12.attention.self.value.weight", "encoder.layer.12.attention.self.value.bias", "encoder.layer.12.attention.output.dense.weight", "encoder.layer.12.attention.output.dense.bias", "encoder.layer.12.attention.output.LayerNorm.gamma", "encoder.layer.12.attention.output.LayerNorm.beta", "encoder.layer.12.intermediate.dense.weight", "encoder.layer.12.intermediate.dense.bias", "encoder.layer.12.output.dense.weight", "encoder.layer.12.output.dense.bias", "encoder.layer.12.output.LayerNorm.gamma", "encoder.layer.12.output.LayerNorm.beta", "encoder.layer.13.attention.self.query.weight", "encoder.layer.13.attention.self.query.bias", "encoder.layer.13.attention.self.key.weight", "encoder.layer.13.attention.self.key.bias", "encoder.layer.13.attention.self.value.weight", "encoder.layer.13.attention.self.value.bias", "encoder.layer.13.attention.output.dense.weight", "encoder.layer.13.attention.output.dense.bias", "encoder.layer.13.attention.output.LayerNorm.gamma", "encoder.layer.13.attention.output.LayerNorm.beta", "encoder.layer.13.intermediate.dense.weight", "encoder.layer.13.intermediate.dense.bias", "encoder.layer.13.output.dense.weight", "encoder.layer.13.output.dense.bias", "encoder.layer.13.output.LayerNorm.gamma", "encoder.layer.13.output.LayerNorm.beta", "encoder.layer.14.attention.self.query.weight", "encoder.layer.14.attention.self.query.bias", "encoder.layer.14.attention.self.key.weight", "encoder.layer.14.attention.self.key.bias", "encoder.layer.14.attention.self.value.weight", "encoder.layer.14.attention.self.value.bias", "encoder.layer.14.attention.output.dense.weight", "encoder.layer.14.attention.output.dense.bias", "encoder.layer.14.attention.output.LayerNorm.gamma", "encoder.layer.14.attention.output.LayerNorm.beta", "encoder.layer.14.intermediate.dense.weight", "encoder.layer.14.intermediate.dense.bias", "encoder.layer.14.output.dense.weight", "encoder.layer.14.output.dense.bias", "encoder.layer.14.output.LayerNorm.gamma", "encoder.layer.14.output.LayerNorm.beta", "encoder.layer.15.attention.self.query.weight", "encoder.layer.15.attention.self.query.bias", "encoder.layer.15.attention.self.key.weight", "encoder.layer.15.attention.self.key.bias", "encoder.layer.15.attention.self.value.weight", "encoder.layer.15.attention.self.value.bias", "encoder.layer.15.attention.output.dense.weight", "encoder.layer.15.attention.output.dense.bias", "encoder.layer.15.attention.output.LayerNorm.gamma", "encoder.layer.15.attention.output.LayerNorm.beta", "encoder.layer.15.intermediate.dense.weight", "encoder.layer.15.intermediate.dense.bias", "encoder.layer.15.output.dense.weight", "encoder.layer.15.output.dense.bias", "encoder.layer.15.output.LayerNorm.gamma", "encoder.layer.15.output.LayerNorm.beta", "encoder.layer.16.attention.self.query.weight", "encoder.layer.16.attention.self.query.bias", "encoder.layer.16.attention.self.key.weight", "encoder.layer.16.attention.self.key.bias", "encoder.layer.16.attention.self.value.weight", "encoder.layer.16.attention.self.value.bias", "encoder.layer.16.attention.output.dense.weight", "encoder.layer.16.attention.output.dense.bias", "encoder.layer.16.attention.output.LayerNorm.gamma", "encoder.layer.16.attention.output.LayerNorm.beta", "encoder.layer.16.intermediate.dense.weight", "encoder.layer.16.intermediate.dense.bias", "encoder.layer.16.output.dense.weight", "encoder.layer.16.output.dense.bias", "encoder.layer.16.output.LayerNorm.gamma", "encoder.layer.16.output.LayerNorm.beta", "encoder.layer.17.attention.self.query.weight", "encoder.layer.17.attention.self.query.bias", "encoder.layer.17.attention.self.key.weight", "encoder.layer.17.attention.self.key.bias", "encoder.layer.17.attention.self.value.weight", "encoder.layer.17.attention.self.value.bias", "encoder.layer.17.attention.output.dense.weight", "encoder.layer.17.attention.output.dense.bias", "encoder.layer.17.attention.output.LayerNorm.gamma", "encoder.layer.17.attention.output.LayerNorm.beta", "encoder.layer.17.intermediate.dense.weight", "encoder.layer.17.intermediate.dense.bias", "encoder.layer.17.output.dense.weight", "encoder.layer.17.output.dense.bias", "encoder.layer.17.output.LayerNorm.gamma", "encoder.layer.17.output.LayerNorm.beta", "encoder.layer.18.attention.self.query.weight", "encoder.layer.18.attention.self.query.bias", "encoder.layer.18.attention.self.key.weight", "encoder.layer.18.attention.self.key.bias", "encoder.layer.18.attention.self.value.weight", "encoder.layer.18.attention.self.value.bias", "encoder.layer.18.attention.output.dense.weight", "encoder.layer.18.attention.output.dense.bias", "encoder.layer.18.attention.output.LayerNorm.gamma", "encoder.layer.18.attention.output.LayerNorm.beta", "encoder.layer.18.intermediate.dense.weight", "encoder.layer.18.intermediate.dense.bias", "encoder.layer.18.output.dense.weight", "encoder.layer.18.output.dense.bias", "encoder.layer.18.output.LayerNorm.gamma", "encoder.layer.18.output.LayerNorm.beta", "encoder.layer.19.attention.self.query.weight", "encoder.layer.19.attention.self.query.bias", "encoder.layer.19.attention.self.key.weight", "encoder.layer.19.attention.self.key.bias", "encoder.layer.19.attention.self.value.weight", "encoder.layer.19.attention.self.value.bias", "encoder.layer.19.attention.output.dense.weight", "encoder.layer.19.attention.output.dense.bias", "encoder.layer.19.attention.output.LayerNorm.gamma", "encoder.layer.19.attention.output.LayerNorm.beta", "encoder.layer.19.intermediate.dense.weight", "encoder.layer.19.intermediate.dense.bias", "encoder.layer.19.output.dense.weight", "encoder.layer.19.output.dense.bias", "encoder.layer.19.output.LayerNorm.gamma", "encoder.layer.19.output.LayerNorm.beta", "encoder.layer.20.attention.self.query.weight", "encoder.layer.20.attention.self.query.bias", "encoder.layer.20.attention.self.key.weight", "encoder.layer.20.attention.self.key.bias", "encoder.layer.20.attention.self.value.weight", "encoder.layer.20.attention.self.value.bias", "encoder.layer.20.attention.output.dense.weight", "encoder.layer.20.attention.output.dense.bias", "encoder.layer.20.attention.output.LayerNorm.gamma", "encoder.layer.20.attention.output.LayerNorm.beta", "encoder.layer.20.intermediate.dense.weight", "encoder.layer.20.intermediate.dense.bias", "encoder.layer.20.output.dense.weight", "encoder.layer.20.output.dense.bias", "encoder.layer.20.output.LayerNorm.gamma", "encoder.layer.20.output.LayerNorm.beta", "encoder.layer.21.attention.self.query.weight", "encoder.layer.21.attention.self.query.bias", "encoder.layer.21.attention.self.key.weight", "encoder.layer.21.attention.self.key.bias", "encoder.layer.21.attention.self.value.weight", "encoder.layer.21.attention.self.value.bias", "encoder.layer.21.attention.output.dense.weight", "encoder.layer.21.attention.output.dense.bias", "encoder.layer.21.attention.output.LayerNorm.gamma", "encoder.layer.21.attention.output.LayerNorm.beta", "encoder.layer.21.intermediate.dense.weight", "encoder.layer.21.intermediate.dense.bias", "encoder.layer.21.output.dense.weight", "encoder.layer.21.output.dense.bias", "encoder.layer.21.output.LayerNorm.gamma", "encoder.layer.21.output.LayerNorm.beta", "encoder.layer.22.attention.self.query.weight", "encoder.layer.22.attention.self.query.bias", "encoder.layer.22.attention.self.key.weight", "encoder.layer.22.attention.self.key.bias", "encoder.layer.22.attention.self.value.weight", "encoder.layer.22.attention.self.value.bias", "encoder.layer.22.attention.output.dense.weight", "encoder.layer.22.attention.output.dense.bias", "encoder.layer.22.attention.output.LayerNorm.gamma", "encoder.layer.22.attention.output.LayerNorm.beta", "encoder.layer.22.intermediate.dense.weight", "encoder.layer.22.intermediate.dense.bias", "encoder.layer.22.output.dense.weight", "encoder.layer.22.output.dense.bias", "encoder.layer.22.output.LayerNorm.gamma", "encoder.layer.22.output.LayerNorm.beta", "encoder.layer.23.attention.self.query.weight", "encoder.layer.23.attention.self.query.bias", "encoder.layer.23.attention.self.key.weight", "encoder.layer.23.attention.self.key.bias", "encoder.layer.23.attention.self.value.weight", "encoder.layer.23.attention.self.value.bias", "encoder.layer.23.attention.output.dense.weight", "encoder.layer.23.attention.output.dense.bias", "encoder.layer.23.attention.output.LayerNorm.gamma", "encoder.layer.23.attention.output.LayerNorm.beta", "encoder.layer.23.intermediate.dense.weight", "encoder.layer.23.intermediate.dense.bias", "encoder.layer.23.output.dense.weight", "encoder.layer.23.output.dense.bias", "encoder.layer.23.output.LayerNorm.gamma", "encoder.layer.23.output.LayerNorm.beta", "pooler.dense.weight", "pooler.dense.bias".
Unexpected key(s) in state_dict: "lm_bias", "bert.embeddings.word_embeddings.weight", "bert.embeddings.position_embeddings.weight", "bert.embeddings.token_type_embeddings.weight", "bert.embeddings.LayerNorm.gamma", "bert.embeddings.LayerNorm.beta", "bert.encoder.layer.0.attention.self.query.weight", "bert.encoder.layer.0.attention.self.query.bias", "bert.encoder.layer.0.attention.self.key.weight", "bert.encoder.layer.0.attention.self.key.bias", "bert.encoder.layer.0.attention.self.value.weight", "bert.encoder.layer.0.attention.self.value.bias", "bert.encoder.layer.0.attention.output.dense.weight", "bert.encoder.layer.0.attention.output.dense.bias", "bert.encoder.layer.0.attention.output.LayerNorm.gamma", "bert.encoder.layer.0.attention.output.LayerNorm.beta", "bert.encoder.layer.0.intermediate.dense.weight", "bert.encoder.layer.0.intermediate.dense.bias", "bert.encoder.layer.0.output.dense.weight", "bert.encoder.layer.0.output.dense.bias", "bert.encoder.layer.0.output.LayerNorm.gamma", "bert.encoder.layer.0.output.LayerNorm.beta", "bert.encoder.layer.1.attention.self.query.weight", "bert.encoder.layer.1.attention.self.query.bias", "bert.encoder.layer.1.attention.self.key.weight", "bert.encoder.layer.1.attention.self.key.bias", "bert.encoder.layer.1.attention.self.value.weight", "bert.encoder.layer.1.attention.self.value.bias", "bert.encoder.layer.1.attention.output.dense.weight", "bert.encoder.layer.1.attention.output.dense.bias", "bert.encoder.layer.1.attention.output.LayerNorm.gamma", "bert.encoder.layer.1.attention.output.LayerNorm.beta", "bert.encoder.layer.1.intermediate.dense.weight", "bert.encoder.layer.1.intermediate.dense.bias", "bert.encoder.layer.1.output.dense.weight", "bert.encoder.layer.1.output.dense.bias", "bert.encoder.layer.1.output.LayerNorm.gamma", "bert.encoder.layer.1.output.LayerNorm.beta", "bert.encoder.layer.2.attention.self.query.weight", "bert.encoder.layer.2.attention.self.query.bias", "bert.encoder.layer.2.attention.self.key.weight", "bert.encoder.layer.2.attention.self.key.bias", "bert.encoder.layer.2.attention.self.value.weight", "bert.encoder.layer.2.attention.self.value.bias", "bert.encoder.layer.2.attention.output.dense.weight", "bert.encoder.layer.2.attention.output.dense.bias", "bert.encoder.layer.2.attention.output.LayerNorm.gamma", "bert.encoder.layer.2.attention.output.LayerNorm.beta", "bert.encoder.layer.2.intermediate.dense.weight", "bert.encoder.layer.2.intermediate.dense.bias", "bert.encoder.layer.2.output.dense.weight", "bert.encoder.layer.2.output.dense.bias", "bert.encoder.layer.2.output.LayerNorm.gamma", "bert.encoder.layer.2.output.LayerNorm.beta", "bert.encoder.layer.3.attention.self.query.weight", "bert.encoder.layer.3.attention.self.query.bias", "bert.encoder.layer.3.attention.self.key.weight", "bert.encoder.layer.3.attention.self.key.bias", "bert.encoder.layer.3.attention.self.value.weight", "bert.encoder.layer.3.attention.self.value.bias", "bert.encoder.layer.3.attention.output.dense.weight", "bert.encoder.layer.3.attention.output.dense.bias", "bert.encoder.layer.3.attention.output.LayerNorm.gamma", "bert.encoder.layer.3.attention.output.LayerNorm.beta", "bert.encoder.layer.3.intermediate.dense.weight", "bert.encoder.layer.3.intermediate.dense.bias", "bert.encoder.layer.3.output.dense.weight", "bert.encoder.layer.3.output.dense.bias", "bert.encoder.layer.3.output.LayerNorm.gamma", "bert.encoder.layer.3.output.LayerNorm.beta", "bert.encoder.layer.4.attention.self.query.weight", "bert.encoder.layer.4.attention.self.query.bias", "bert.encoder.layer.4.attention.self.key.weight", "bert.encoder.layer.4.attention.self.key.bias", "bert.encoder.layer.4.attention.self.value.weight", "bert.encoder.layer.4.attention.self.value.bias", "bert.encoder.layer.4.attention.output.dense.weight", "bert.encoder.layer.4.attention.output.dense.bias", "bert.encoder.layer.4.attention.output.LayerNorm.gamma", "bert.encoder.layer.4.attention.output.LayerNorm.beta", "bert.encoder.layer.4.intermediate.dense.weight", "bert.encoder.layer.4.intermediate.dense.bias", "bert.encoder.layer.4.output.dense.weight", "bert.encoder.layer.4.output.dense.bias", "bert.encoder.layer.4.output.LayerNorm.gamma", "bert.encoder.layer.4.output.LayerNorm.beta", "bert.encoder.layer.5.attention.self.query.weight", "bert.encoder.layer.5.attention.self.query.bias", "bert.encoder.layer.5.attention.self.key.weight", "bert.encoder.layer.5.attention.self.key.bias", "bert.encoder.layer.5.attention.self.value.weight", "bert.encoder.layer.5.attention.self.value.bias", "bert.encoder.layer.5.attention.output.dense.weight", "bert.encoder.layer.5.attention.output.dense.bias", "bert.encoder.layer.5.attention.output.LayerNorm.gamma", "bert.encoder.layer.5.attention.output.LayerNorm.beta", "bert.encoder.layer.5.intermediate.dense.weight", "bert.encoder.layer.5.intermediate.dense.bias", "bert.encoder.layer.5.output.dense.weight", "bert.encoder.layer.5.output.dense.bias", "bert.encoder.layer.5.output.LayerNorm.gamma", "bert.encoder.layer.5.output.LayerNorm.beta", "bert.encoder.layer.6.attention.self.query.weight", "bert.encoder.layer.6.attention.self.query.bias", "bert.encoder.layer.6.attention.self.key.weight", "bert.encoder.layer.6.attention.self.key.bias", "bert.encoder.layer.6.attention.self.value.weight", "bert.encoder.layer.6.attention.self.value.bias", "bert.encoder.layer.6.attention.output.dense.weight", "bert.encoder.layer.6.attention.output.dense.bias", "bert.encoder.layer.6.attention.output.LayerNorm.gamma", "bert.encoder.layer.6.attention.output.LayerNorm.beta", "bert.encoder.layer.6.intermediate.dense.weight", "bert.encoder.layer.6.intermediate.dense.bias", "bert.encoder.layer.6.output.dense.weight", "bert.encoder.layer.6.output.dense.bias", "bert.encoder.layer.6.output.LayerNorm.gamma", "bert.encoder.layer.6.output.LayerNorm.beta", "bert.encoder.layer.7.attention.self.query.weight", "bert.encoder.layer.7.attention.self.query.bias", "bert.encoder.layer.7.attention.self.key.weight", "bert.encoder.layer.7.attention.self.key.bias", "bert.encoder.layer.7.attention.self.value.weight", "bert.encoder.layer.7.attention.self.value.bias", "bert.encoder.layer.7.attention.output.dense.weight", "bert.encoder.layer.7.attention.output.dense.bias", "bert.encoder.layer.7.attention.output.LayerNorm.gamma", "bert.encoder.layer.7.attention.output.LayerNorm.beta", "bert.encoder.layer.7.intermediate.dense.weight", "bert.encoder.layer.7.intermediate.dense.bias", "bert.encoder.layer.7.output.dense.weight", "bert.encoder.layer.7.output.dense.bias", "bert.encoder.layer.7.output.LayerNorm.gamma", "bert.encoder.layer.7.output.LayerNorm.beta", "bert.encoder.layer.8.attention.self.query.weight", "bert.encoder.layer.8.attention.self.query.bias", "bert.encoder.layer.8.attention.self.key.weight", "bert.encoder.layer.8.attention.self.key.bias", "bert.encoder.layer.8.attention.self.value.weight", "bert.encoder.layer.8.attention.self.value.bias", "bert.encoder.layer.8.attention.output.dense.weight", "bert.encoder.layer.8.attention.output.dense.bias", "bert.encoder.layer.8.attention.output.LayerNorm.gamma", "bert.encoder.layer.8.attention.output.LayerNorm.beta", "bert.encoder.layer.8.intermediate.dense.weight", "bert.encoder.layer.8.intermediate.dense.bias", "bert.encoder.layer.8.output.dense.weight", "bert.encoder.layer.8.output.dense.bias", "bert.encoder.layer.8.output.LayerNorm.gamma", "bert.encoder.layer.8.output.LayerNorm.beta", "bert.encoder.layer.9.attention.self.query.weight", "bert.encoder.layer.9.attention.self.query.bias", "bert.encoder.layer.9.attention.self.key.weight", "bert.encoder.layer.9.attention.self.key.bias", "bert.encoder.layer.9.attention.self.value.weight", "bert.encoder.layer.9.attention.self.value.bias", "bert.encoder.layer.9.attention.output.dense.weight", "bert.encoder.layer.9.attention.output.dense.bias", "bert.encoder.layer.9.attention.output.LayerNorm.gamma", "bert.encoder.layer.9.attention.output.LayerNorm.beta", "bert.encoder.layer.9.intermediate.dense.weight", "bert.encoder.layer.9.intermediate.dense.bias", "bert.encoder.layer.9.output.dense.weight", "bert.encoder.layer.9.output.dense.bias", "bert.encoder.layer.9.output.LayerNorm.gamma", "bert.encoder.layer.9.output.LayerNorm.beta", "bert.encoder.layer.10.attention.self.query.weight", "bert.encoder.layer.10.attention.self.query.bias", "bert.encoder.layer.10.attention.self.key.weight", "bert.encoder.layer.10.attention.self.key.bias", "bert.encoder.layer.10.attention.self.value.weight", "bert.encoder.layer.10.attention.self.value.bias", "bert.encoder.layer.10.attention.output.dense.weight", "bert.encoder.layer.10.attention.output.dense.bias", "bert.encoder.layer.10.attention.output.LayerNorm.gamma", "bert.encoder.layer.10.attention.output.LayerNorm.beta", "bert.encoder.layer.10.intermediate.dense.weight", "bert.encoder.layer.10.intermediate.dense.bias", "bert.encoder.layer.10.output.dense.weight", "bert.encoder.layer.10.output.dense.bias", "bert.encoder.layer.10.output.LayerNorm.gamma", "bert.encoder.layer.10.output.LayerNorm.beta", "bert.encoder.layer.11.attention.self.query.weight", "bert.encoder.layer.11.attention.self.query.bias", "bert.encoder.layer.11.attention.self.key.weight", "bert.encoder.layer.11.attention.self.key.bias", "bert.encoder.layer.11.attention.self.value.weight", "bert.encoder.layer.11.attention.self.value.bias", "bert.encoder.layer.11.attention.output.dense.weight", "bert.encoder.layer.11.attention.output.dense.bias", "bert.encoder.layer.11.attention.output.LayerNorm.gamma", "bert.encoder.layer.11.attention.output.LayerNorm.beta", "bert.encoder.layer.11.intermediate.dense.weight", "bert.encoder.layer.11.intermediate.dense.bias", "bert.encoder.layer.11.output.dense.weight", "bert.encoder.layer.11.output.dense.bias", "bert.encoder.layer.11.output.LayerNorm.gamma", "bert.encoder.layer.11.output.LayerNorm.beta", "bert.encoder.layer.12.attention.self.query.weight", "bert.encoder.layer.12.attention.self.query.bias", "bert.encoder.layer.12.attention.self.key.weight", "bert.encoder.layer.12.attention.self.key.bias", "bert.encoder.layer.12.attention.self.value.weight", "bert.encoder.layer.12.attention.self.value.bias", "bert.encoder.layer.12.attention.output.dense.weight", "bert.encoder.layer.12.attention.output.dense.bias", "bert.encoder.layer.12.attention.output.LayerNorm.gamma", "bert.encoder.layer.12.attention.output.LayerNorm.beta", "bert.encoder.layer.12.intermediate.dense.weight", "bert.encoder.layer.12.intermediate.dense.bias", "bert.encoder.layer.12.output.dense.weight", "bert.encoder.layer.12.output.dense.bias", "bert.encoder.layer.12.output.LayerNorm.gamma", "bert.encoder.layer.12.output.LayerNorm.beta", "bert.encoder.layer.13.attention.self.query.weight", "bert.encoder.layer.13.attention.self.query.bias", "bert.encoder.layer.13.attention.self.key.weight", "bert.encoder.layer.13.attention.self.key.bias", "bert.encoder.layer.13.attention.self.value.weight", "bert.encoder.layer.13.attention.self.value.bias", "bert.encoder.layer.13.attention.output.dense.weight", "bert.encoder.layer.13.attention.output.dense.bias", "bert.encoder.layer.13.attention.output.LayerNorm.gamma", "bert.encoder.layer.13.attention.output.LayerNorm.beta", "bert.encoder.layer.13.intermediate.dense.weight", "bert.encoder.layer.13.intermediate.dense.bias", "bert.encoder.layer.13.output.dense.weight", "bert.encoder.layer.13.output.dense.bias", "bert.encoder.layer.13.output.LayerNorm.gamma", "bert.encoder.layer.13.output.LayerNorm.beta", "bert.encoder.layer.14.attention.self.query.weight", "bert.encoder.layer.14.attention.self.query.bias", "bert.encoder.layer.14.attention.self.key.weight", "bert.encoder.layer.14.attention.self.key.bias", "bert.encoder.layer.14.attention.self.value.weight", "bert.encoder.layer.14.attention.self.value.bias", "bert.encoder.layer.14.attention.output.dense.weight", "bert.encoder.layer.14.attention.output.dense.bias", "bert.encoder.layer.14.attention.output.LayerNorm.gamma", "bert.encoder.layer.14.attention.output.LayerNorm.beta", "bert.encoder.layer.14.intermediate.dense.weight", "bert.encoder.layer.14.intermediate.dense.bias", "bert.encoder.layer.14.output.dense.weight", "bert.encoder.layer.14.output.dense.bias", "bert.encoder.layer.14.output.LayerNorm.gamma", "bert.encoder.layer.14.output.LayerNorm.beta", "bert.encoder.layer.15.attention.self.query.weight", "bert.encoder.layer.15.attention.self.query.bias", "bert.encoder.layer.15.attention.self.key.weight", "bert.encoder.layer.15.attention.self.key.bias", "bert.encoder.layer.15.attention.self.value.weight", "bert.encoder.layer.15.attention.self.value.bias", "bert.encoder.layer.15.attention.output.dense.weight", "bert.encoder.layer.15.attention.output.dense.bias", "bert.encoder.layer.15.attention.output.LayerNorm.gamma", "bert.encoder.layer.15.attention.output.LayerNorm.beta", "bert.encoder.layer.15.intermediate.dense.weight", "bert.encoder.layer.15.intermediate.dense.bias", "bert.encoder.layer.15.output.dense.weight", "bert.encoder.layer.15.output.dense.bias", "bert.encoder.layer.15.output.LayerNorm.gamma", "bert.encoder.layer.15.output.LayerNorm.beta", "bert.encoder.layer.16.attention.self.query.weight", "bert.encoder.layer.16.attention.self.query.bias", "bert.encoder.layer.16.attention.self.key.weight", "bert.encoder.layer.16.attention.self.key.bias", "bert.encoder.layer.16.attention.self.value.weight", "bert.encoder.layer.16.attention.self.value.bias", "bert.encoder.layer.16.attention.output.dense.weight", "bert.encoder.layer.16.attention.output.dense.bias", "bert.encoder.layer.16.attention.output.LayerNorm.gamma", "bert.encoder.layer.16.attention.output.LayerNorm.beta", "bert.encoder.layer.16.intermediate.dense.weight", "bert.encoder.layer.16.intermediate.dense.bias", "bert.encoder.layer.16.output.dense.weight", "bert.encoder.layer.16.output.dense.bias", "bert.encoder.layer.16.output.LayerNorm.gamma", "bert.encoder.layer.16.output.LayerNorm.beta", "bert.encoder.layer.17.attention.self.query.weight", "bert.encoder.layer.17.attention.self.query.bias", "bert.encoder.layer.17.attention.self.key.weight", "bert.encoder.layer.17.attention.self.key.bias", "bert.encoder.layer.17.attention.self.value.weight", "bert.encoder.layer.17.attention.self.value.bias", "bert.encoder.layer.17.attention.output.dense.weight", "bert.encoder.layer.17.attention.output.dense.bias", "bert.encoder.layer.17.attention.output.LayerNorm.gamma", "bert.encoder.layer.17.attention.output.LayerNorm.beta", "bert.encoder.layer.17.intermediate.dense.weight", "bert.encoder.layer.17.intermediate.dense.bias", "bert.encoder.layer.17.output.dense.weight", "bert.encoder.layer.17.output.dense.bias", "bert.encoder.layer.17.output.LayerNorm.gamma", "bert.encoder.layer.17.output.LayerNorm.beta", "bert.encoder.layer.18.attention.self.query.weight", "bert.encoder.layer.18.attention.self.query.bias", "bert.encoder.layer.18.attention.self.key.weight", "bert.encoder.layer.18.attention.self.key.bias", "bert.encoder.layer.18.attention.self.value.weight", "bert.encoder.layer.18.attention.self.value.bias", "bert.encoder.layer.18.attention.output.dense.weight", "bert.encoder.layer.18.attention.output.dense.bias", "bert.encoder.layer.18.attention.output.LayerNorm.gamma", "bert.encoder.layer.18.attention.output.LayerNorm.beta", "bert.encoder.layer.18.intermediate.dense.weight", "bert.encoder.layer.18.intermediate.dense.bias", "bert.encoder.layer.18.output.dense.weight", "bert.encoder.layer.18.output.dense.bias", "bert.encoder.layer.18.output.LayerNorm.gamma", "bert.encoder.layer.18.output.LayerNorm.beta", "bert.encoder.layer.19.attention.self.query.weight", "bert.encoder.layer.19.attention.self.query.bias", "bert.encoder.layer.19.attention.self.key.weight", "bert.encoder.layer.19.attention.self.key.bias", "bert.encoder.layer.19.attention.self.value.weight", "bert.encoder.layer.19.attention.self.value.bias", "bert.encoder.layer.19.attention.output.dense.weight", "bert.encoder.layer.19.attention.output.dense.bias", "bert.encoder.layer.19.attention.output.LayerNorm.gamma", "bert.encoder.layer.19.attention.output.LayerNorm.beta", "bert.encoder.layer.19.intermediate.dense.weight", "bert.encoder.layer.19.intermediate.dense.bias", "bert.encoder.layer.19.output.dense.weight", "bert.encoder.layer.19.output.dense.bias", "bert.encoder.layer.19.output.LayerNorm.gamma", "bert.encoder.layer.19.output.LayerNorm.beta", "bert.encoder.layer.20.attention.self.query.weight", "bert.encoder.layer.20.attention.self.query.bias", "bert.encoder.layer.20.attention.self.key.weight", "bert.encoder.layer.20.attention.self.key.bias", "bert.encoder.layer.20.attention.self.value.weight", "bert.encoder.layer.20.attention.self.value.bias", "bert.encoder.layer.20.attention.output.dense.weight", "bert.encoder.layer.20.attention.output.dense.bias", "bert.encoder.layer.20.attention.output.LayerNorm.gamma", "bert.encoder.layer.20.attention.output.LayerNorm.beta", "bert.encoder.layer.20.intermediate.dense.weight", "bert.encoder.layer.20.intermediate.dense.bias", "bert.encoder.layer.20.output.dense.weight", "bert.encoder.layer.20.output.dense.bias", "bert.encoder.layer.20.output.LayerNorm.gamma", "bert.encoder.layer.20.output.LayerNorm.beta", "bert.encoder.layer.21.attention.self.query.weight", "bert.encoder.layer.21.attention.self.query.bias", "bert.encoder.layer.21.attention.self.key.weight", "bert.encoder.layer.21.attention.self.key.bias", "bert.encoder.layer.21.attention.self.value.weight", "bert.encoder.layer.21.attention.self.value.bias", "bert.encoder.layer.21.attention.output.dense.weight", "bert.encoder.layer.21.attention.output.dense.bias", "bert.encoder.layer.21.attention.output.LayerNorm.gamma", "bert.encoder.layer.21.attention.output.LayerNorm.beta", "bert.encoder.layer.21.intermediate.dense.weight", "bert.encoder.layer.21.intermediate.dense.bias", "bert.encoder.layer.21.output.dense.weight", "bert.encoder.layer.21.output.dense.bias", "bert.encoder.layer.21.output.LayerNorm.gamma", "bert.encoder.layer.21.output.LayerNorm.beta", "bert.encoder.layer.22.attention.self.query.weight", "bert.encoder.layer.22.attention.self.query.bias", "bert.encoder.layer.22.attention.self.key.weight", "bert.encoder.layer.22.attention.self.key.bias", "bert.encoder.layer.22.attention.self.value.weight", "bert.encoder.layer.22.attention.self.value.bias", "bert.encoder.layer.22.attention.output.dense.weight", "bert.encoder.layer.22.attention.output.dense.bias", "bert.encoder.layer.22.attention.output.LayerNorm.gamma", "bert.encoder.layer.22.attention.output.LayerNorm.beta", "bert.encoder.layer.22.intermediate.dense.weight", "bert.encoder.layer.22.intermediate.dense.bias", "bert.encoder.layer.22.output.dense.weight", "bert.encoder.layer.22.output.dense.bias", "bert.encoder.layer.22.output.LayerNorm.gamma", "bert.encoder.layer.22.output.LayerNorm.beta", "bert.encoder.layer.23.attention.self.query.weight", "bert.encoder.layer.23.attention.self.query.bias", "bert.encoder.layer.23.attention.self.key.weight", "bert.encoder.layer.23.attention.self.key.bias", "bert.encoder.layer.23.attention.self.value.weight", "bert.encoder.layer.23.attention.self.value.bias", "bert.encoder.layer.23.attention.output.dense.weight", "bert.encoder.layer.23.attention.output.dense.bias", "bert.encoder.layer.23.attention.output.LayerNorm.gamma", "bert.encoder.layer.23.attention.output.LayerNorm.beta", "bert.encoder.layer.23.intermediate.dense.weight", "bert.encoder.layer.23.intermediate.dense.bias", "bert.encoder.layer.23.output.dense.weight", "bert.encoder.layer.23.output.dense.bias", "bert.encoder.layer.23.output.LayerNorm.gamma", "bert.encoder.layer.23.output.LayerNorm.beta", "bert.pooler.dense.weight", "bert.pooler.dense.bias", "classifier.weight", "classifier.bias", "linear.weight", "linear.bias", "LayerNorm.gamma", "LayerNorm.beta".

from alicemind.

luofuli avatar luofuli commented on April 26, 2024

I guess it may be you use the --pretrain_model to load the pre-trained model. Please use the --init_checkpoint augment instead (the same as the README.md).
Or you can show your running scripts.

from alicemind.

LemXiong avatar LemXiong commented on April 26, 2024

I use the following command

nohup python run_classifier_multi_task.py
--task_name CoLA
--do_train
--do_eval
--do_test
--amp_type O1
--lr_decay_factor 1
--dropout 0.1
--do_lower_case
--detach_index -1
--core_encoder bert
--data_dir GLUE
--vocab_file config/vocab.txt
--bert_config_file config/large_bert_config.json
--init_checkpoint pretrained_model/en_model
--max_seq_length 128
--train_batch_size 16
--learning_rate 2e-5
--num_train_epochs 3
--fast_train
--save_model
--gradient_accumulation_steps 1
--output_dir output > output.log 2>&1 &

I have tried both --pretain_model and --init_checkpoint, the errors were the first and the second respectively

from alicemind.

luofuli avatar luofuli commented on April 26, 2024

We have tested the case that uses --init_checkpoint and keeps all codes unchanged works. Please don't just keep line 1056.

from alicemind.

LemXiong avatar LemXiong commented on April 26, 2024

We have tested the case that uses --init_checkpoint and keeps all codes unchanged works. Please don't just keep line 1056.

Unchanged code can run successfully all the time, but the result is only the half of the paper. And if you print the log you will find it execute the except part not the try part.

from alicemind.

wangwei7175878 avatar wangwei7175878 commented on April 26, 2024

Could you please post the full log? The picture below is the result of our reproduction under default setting.
截屏2021-12-15 下午12 30 51

from alicemind.

LemXiong avatar LemXiong commented on April 26, 2024

nohup: ignoring input
12/14/2021 05:02:17 - INFO - main - COMMAND: run_classifier_multi_task.py --task_name CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI --do_train --do_eval --amp_type O1 --lr_decay_factor 1 --dropout 0.1 --do_lower_case --detach_index -1 --core_encoder bert --data_dir GLUE --vocab_file config/vocab.txt --bert_config_file config/large_bert_config.json --init_checkpoint pretrained_model/en_model --max_seq_length 128 --train_batch_size 16 --learning_rate 2e-5 --num_train_epochs 3 --fast_train --save_model --gradient_accumulation_steps 1 --output_dir output
12/14/2021 05:02:17 - INFO - main - device cuda n_gpu 1 distributed training False
12/14/2021 05:02:30 - INFO - main - LOOKING AT GLUE/MRPC/train.tsv
12/14/2021 05:02:47 - INFO - main - Install apex first if you want to use mix precition.
12/14/2021 05:02:47 - INFO - main - ***** Process training data *****
12/14/2021 05:02:51 - INFO - main - start tokenize
12/14/2021 05:02:58 - INFO - main - start tokenize
12/14/2021 05:03:22 - INFO - main - start tokenize
12/14/2021 05:03:29 - INFO - main - start tokenize
12/14/2021 05:03:48 - INFO - main - start tokenize
12/14/2021 05:04:14 - INFO - main - start tokenize
12/14/2021 05:04:24 - INFO - main - start tokenize
12/14/2021 05:04:36 - INFO - main - start tokenize
12/14/2021 05:04:46 - INFO - main - start tokenize
12/14/2021 05:04:49 - INFO - main - ***** Running training *****
12/14/2021 05:04:49 - INFO - main - Num examples = 949733
12/14/2021 05:04:49 - INFO - main - Num tasks = 9
12/14/2021 05:04:49 - INFO - main - Batch size = 16
12/14/2021 05:04:49 - INFO - main - Num steps = 178074
12/14/2021 05:04:49 - INFO - main - ***** Process dev data *****
12/14/2021 05:04:59 - INFO - main - start tokenize
12/14/2021 05:05:08 - INFO - main - start tokenize
12/14/2021 05:05:18 - INFO - main - start tokenize
12/14/2021 05:05:26 - INFO - main - start tokenize
12/14/2021 05:05:35 - INFO - main - start tokenize
12/14/2021 05:05:46 - INFO - main - start tokenize
12/14/2021 05:05:55 - INFO - main - start tokenize
12/14/2021 05:06:04 - INFO - main - start tokenize
12/14/2021 05:06:13 - INFO - main - start tokenize
12/14/2021 05:06:14 - INFO - main - ***** Running evaluation *****
12/14/2021 05:06:14 - INFO - main - Num examples = 69711
12/14/2021 05:06:14 - INFO - main - Num tasks = 9
12/14/2021 05:06:14 - INFO - main - Batch size = 8

Epoch: 0%| | 0/3 [00:00<?, ?it/s]

Iteration: 100%|██████████| 59359/59359 [53:25<00:00, 14.81it/s]
12/14/2021 06:06:04 - INFO - main - ***** Eval results *****
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_accuracy = [0.3087248322147651, 0.2982643660609762, 0.6078431372549019, 0.537250594911221, 0.5476378926539698, 0.44765342960288806, 0.48394495412844035, 0.0, 0.5211267605633803]
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_corrcoef = [0, 0, 0, 0, 0, 0, 0, -0.18605788966825199, 0]
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_loss = 0.9975722805879683
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_matthew = [-0.018148342420931135, 0, 0, 0, 0, 0, 0, 0, 0]
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: global_step = 47487
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: loss = 0.9634366636350805
12/14/2021 06:06:07 - INFO - main - Save model of Epoch 0

Epoch: 33%|███▎ | 1/3 [59:51<1:59:43, 3591.88s/it]

Iteration: 100%|██████████| 59359/59359 [53:50<00:00, 14.70it/s]
12/14/2021 07:06:37 - INFO - main - ***** Eval results *****
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_accuracy = [0.3087248322147651, 0.2982643660609762, 0.6078431372549019, 0.537250594911221, 0.5476378926539698, 0.44765342960288806, 0.48394495412844035, 0.0, 0.5211267605633803]
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_corrcoef = [0, 0, 0, 0, 0, 0, 0, -0.18605788966825199, 0]
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_loss = 0.9975722805879683
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_matthew = [-0.018148342420931135, 0, 0, 0, 0, 0, 0, 0, 0]
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: global_step = 94974
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: loss = 0.9633064780335283
12/14/2021 07:06:41 - INFO - main - Save model of Epoch 1

Epoch: 67%|██████▋ | 2/3 [2:00:25<1:00:04, 3604.43s/it]

Iteration: 100%|██████████| 59359/59359 [54:27<00:00, 14.53it/s]
12/14/2021 08:07:35 - INFO - main - ***** Eval results *****
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_accuracy = [0.3087248322147651, 0.2982643660609762, 0.6078431372549019, 0.537250594911221, 0.5476378926539698, 0.44765342960288806, 0.48394495412844035, 0.0, 0.5211267605633803]
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_corrcoef = [0, 0, 0, 0, 0, 0, 0, -0.18605788966825199, 0]
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_loss = 0.9975722805879683
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_matthew = [-0.018148342420931135, 0, 0, 0, 0, 0, 0, 0, 0]
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: global_step = 142461
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: loss = 0.9636250535845807
12/14/2021 08:07:38 - INFO - main - Save model of Epoch 2

Epoch: 100%|██████████| 3/3 [3:01:22<00:00, 3620.24s/it]

from alicemind.

wangwei7175878 avatar wangwei7175878 commented on April 26, 2024

The performance of multi-task mode is related to the similarity between tasks. For example, MNLI and STS-B are similar, results of STS-B when run MNLI and STS-B together is better than run STS-B only, but CoLA and other tasks are not similar. You can get normal accuracy on the premise of running CoLA task alone (--task_name CoLA).

from alicemind.

LemXiong avatar LemXiong commented on April 26, 2024

Thank you very much for your patient reply. Seems like it is necessary to install apex for reproduce the correct result.

without apex

12/16/2021 04:50:44 - INFO - __main__ -   pretrained_model/en_model:  MRPC:  eval_accuracy = [0.5735294117647058]
12/16/2021 04:50:44 - INFO - __main__ -   pretrained_model/en_model:  MRPC:  eval_corrcoef = [0]
12/16/2021 04:50:44 - INFO - __main__ -   pretrained_model/en_model:  MRPC:  eval_loss = 0.6800402063949436
12/16/2021 04:50:44 - INFO - __main__ -   pretrained_model/en_model:  MRPC:  eval_matthew = [0]

with apex

12/16/2021 05:16:49 - INFO - __main__ -   pretrained_model/en_model:  MRPC:  eval_accuracy = [0.875]
12/16/2021 05:16:49 - INFO - __main__ -   pretrained_model/en_model:  MRPC:  eval_corrcoef = [0]
12/16/2021 05:16:49 - INFO - __main__ -   pretrained_model/en_model:  MRPC:  eval_loss = 0.46738045994539323
12/16/2021 05:16:49 - INFO - __main__ -   pretrained_model/en_model:  MRPC:  eval_matthew = [0]

from alicemind.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.