Comments (10)
It seems that the real error occurred in run_classifier_multi_task.py, Line 1056. Could you please comment out the try...except
block, just keep Line 1056 to see what the real error is?
from alicemind.
if I use init_checkpoint to store the path and just keep line 1056, the error will be:
RuntimeError: Error(s) in loading state_dict for BertModel:
Missing key(s) in state_dict: "embeddings.word_embeddings.weight", "embeddings.position_embeddings.weight", "embeddings.token_type_embeddings.weight", "embeddings.LayerNorm.gamma", "embeddings.LayerNorm.beta", "encoder.layer.0.attention.self.query.weight", "encoder.layer.0.attention.self.query.bias", "encoder.layer.0.attention.self.key.weight", "encoder.layer.0.attention.self.key.bias", "encoder.layer.0.attention.self.value.weight", "encoder.layer.0.attention.self.value.bias", "encoder.layer.0.attention.output.dense.weight", "encoder.layer.0.attention.output.dense.bias", "encoder.layer.0.attention.output.LayerNorm.gamma", "encoder.layer.0.attention.output.LayerNorm.beta", "encoder.layer.0.intermediate.dense.weight", "encoder.layer.0.intermediate.dense.bias", "encoder.layer.0.output.dense.weight", "encoder.layer.0.output.dense.bias", "encoder.layer.0.output.LayerNorm.gamma", "encoder.layer.0.output.LayerNorm.beta", "encoder.layer.1.attention.self.query.weight", "encoder.layer.1.attention.self.query.bias", "encoder.layer.1.attention.self.key.weight", "encoder.layer.1.attention.self.key.bias", "encoder.layer.1.attention.self.value.weight", "encoder.layer.1.attention.self.value.bias", "encoder.layer.1.attention.output.dense.weight", "encoder.layer.1.attention.output.dense.bias", "encoder.layer.1.attention.output.LayerNorm.gamma", "encoder.layer.1.attention.output.LayerNorm.beta", "encoder.layer.1.intermediate.dense.weight", "encoder.layer.1.intermediate.dense.bias", "encoder.layer.1.output.dense.weight", "encoder.layer.1.output.dense.bias", "encoder.layer.1.output.LayerNorm.gamma", "encoder.layer.1.output.LayerNorm.beta", "encoder.layer.2.attention.self.query.weight", "encoder.layer.2.attention.self.query.bias", "encoder.layer.2.attention.self.key.weight", "encoder.layer.2.attention.self.key.bias", "encoder.layer.2.attention.self.value.weight", "encoder.layer.2.attention.self.value.bias", "encoder.layer.2.attention.output.dense.weight", "encoder.layer.2.attention.output.dense.bias", "encoder.layer.2.attention.output.LayerNorm.gamma", "encoder.layer.2.attention.output.LayerNorm.beta", "encoder.layer.2.intermediate.dense.weight", "encoder.layer.2.intermediate.dense.bias", "encoder.layer.2.output.dense.weight", "encoder.layer.2.output.dense.bias", "encoder.layer.2.output.LayerNorm.gamma", "encoder.layer.2.output.LayerNorm.beta", "encoder.layer.3.attention.self.query.weight", "encoder.layer.3.attention.self.query.bias", "encoder.layer.3.attention.self.key.weight", "encoder.layer.3.attention.self.key.bias", "encoder.layer.3.attention.self.value.weight", "encoder.layer.3.attention.self.value.bias", "encoder.layer.3.attention.output.dense.weight", "encoder.layer.3.attention.output.dense.bias", "encoder.layer.3.attention.output.LayerNorm.gamma", "encoder.layer.3.attention.output.LayerNorm.beta", "encoder.layer.3.intermediate.dense.weight", "encoder.layer.3.intermediate.dense.bias", "encoder.layer.3.output.dense.weight", "encoder.layer.3.output.dense.bias", "encoder.layer.3.output.LayerNorm.gamma", "encoder.layer.3.output.LayerNorm.beta", "encoder.layer.4.attention.self.query.weight", "encoder.layer.4.attention.self.query.bias", "encoder.layer.4.attention.self.key.weight", "encoder.layer.4.attention.self.key.bias", "encoder.layer.4.attention.self.value.weight", "encoder.layer.4.attention.self.value.bias", "encoder.layer.4.attention.output.dense.weight", "encoder.layer.4.attention.output.dense.bias", "encoder.layer.4.attention.output.LayerNorm.gamma", "encoder.layer.4.attention.output.LayerNorm.beta", "encoder.layer.4.intermediate.dense.weight", "encoder.layer.4.intermediate.dense.bias", "encoder.layer.4.output.dense.weight", "encoder.layer.4.output.dense.bias", "encoder.layer.4.output.LayerNorm.gamma", "encoder.layer.4.output.LayerNorm.beta", "encoder.layer.5.attention.self.query.weight", "encoder.layer.5.attention.self.query.bias", "encoder.layer.5.attention.self.key.weight", "encoder.layer.5.attention.self.key.bias", "encoder.layer.5.attention.self.value.weight", "encoder.layer.5.attention.self.value.bias", "encoder.layer.5.attention.output.dense.weight", "encoder.layer.5.attention.output.dense.bias", "encoder.layer.5.attention.output.LayerNorm.gamma", "encoder.layer.5.attention.output.LayerNorm.beta", "encoder.layer.5.intermediate.dense.weight", "encoder.layer.5.intermediate.dense.bias", "encoder.layer.5.output.dense.weight", "encoder.layer.5.output.dense.bias", "encoder.layer.5.output.LayerNorm.gamma", "encoder.layer.5.output.LayerNorm.beta", "encoder.layer.6.attention.self.query.weight", "encoder.layer.6.attention.self.query.bias", "encoder.layer.6.attention.self.key.weight", "encoder.layer.6.attention.self.key.bias", "encoder.layer.6.attention.self.value.weight", "encoder.layer.6.attention.self.value.bias", "encoder.layer.6.attention.output.dense.weight", "encoder.layer.6.attention.output.dense.bias", "encoder.layer.6.attention.output.LayerNorm.gamma", "encoder.layer.6.attention.output.LayerNorm.beta", "encoder.layer.6.intermediate.dense.weight", "encoder.layer.6.intermediate.dense.bias", "encoder.layer.6.output.dense.weight", "encoder.layer.6.output.dense.bias", "encoder.layer.6.output.LayerNorm.gamma", "encoder.layer.6.output.LayerNorm.beta", "encoder.layer.7.attention.self.query.weight", "encoder.layer.7.attention.self.query.bias", "encoder.layer.7.attention.self.key.weight", "encoder.layer.7.attention.self.key.bias", "encoder.layer.7.attention.self.value.weight", "encoder.layer.7.attention.self.value.bias", "encoder.layer.7.attention.output.dense.weight", "encoder.layer.7.attention.output.dense.bias", "encoder.layer.7.attention.output.LayerNorm.gamma", "encoder.layer.7.attention.output.LayerNorm.beta", "encoder.layer.7.intermediate.dense.weight", "encoder.layer.7.intermediate.dense.bias", "encoder.layer.7.output.dense.weight", "encoder.layer.7.output.dense.bias", "encoder.layer.7.output.LayerNorm.gamma", "encoder.layer.7.output.LayerNorm.beta", "encoder.layer.8.attention.self.query.weight", "encoder.layer.8.attention.self.query.bias", "encoder.layer.8.attention.self.key.weight", "encoder.layer.8.attention.self.key.bias", "encoder.layer.8.attention.self.value.weight", "encoder.layer.8.attention.self.value.bias", "encoder.layer.8.attention.output.dense.weight", "encoder.layer.8.attention.output.dense.bias", "encoder.layer.8.attention.output.LayerNorm.gamma", "encoder.layer.8.attention.output.LayerNorm.beta", "encoder.layer.8.intermediate.dense.weight", "encoder.layer.8.intermediate.dense.bias", "encoder.layer.8.output.dense.weight", "encoder.layer.8.output.dense.bias", "encoder.layer.8.output.LayerNorm.gamma", "encoder.layer.8.output.LayerNorm.beta", "encoder.layer.9.attention.self.query.weight", "encoder.layer.9.attention.self.query.bias", "encoder.layer.9.attention.self.key.weight", "encoder.layer.9.attention.self.key.bias", "encoder.layer.9.attention.self.value.weight", "encoder.layer.9.attention.self.value.bias", "encoder.layer.9.attention.output.dense.weight", "encoder.layer.9.attention.output.dense.bias", "encoder.layer.9.attention.output.LayerNorm.gamma", "encoder.layer.9.attention.output.LayerNorm.beta", "encoder.layer.9.intermediate.dense.weight", "encoder.layer.9.intermediate.dense.bias", "encoder.layer.9.output.dense.weight", "encoder.layer.9.output.dense.bias", "encoder.layer.9.output.LayerNorm.gamma", "encoder.layer.9.output.LayerNorm.beta", "encoder.layer.10.attention.self.query.weight", "encoder.layer.10.attention.self.query.bias", "encoder.layer.10.attention.self.key.weight", "encoder.layer.10.attention.self.key.bias", "encoder.layer.10.attention.self.value.weight", "encoder.layer.10.attention.self.value.bias", "encoder.layer.10.attention.output.dense.weight", "encoder.layer.10.attention.output.dense.bias", "encoder.layer.10.attention.output.LayerNorm.gamma", "encoder.layer.10.attention.output.LayerNorm.beta", "encoder.layer.10.intermediate.dense.weight", "encoder.layer.10.intermediate.dense.bias", "encoder.layer.10.output.dense.weight", "encoder.layer.10.output.dense.bias", "encoder.layer.10.output.LayerNorm.gamma", "encoder.layer.10.output.LayerNorm.beta", "encoder.layer.11.attention.self.query.weight", "encoder.layer.11.attention.self.query.bias", "encoder.layer.11.attention.self.key.weight", "encoder.layer.11.attention.self.key.bias", "encoder.layer.11.attention.self.value.weight", "encoder.layer.11.attention.self.value.bias", "encoder.layer.11.attention.output.dense.weight", "encoder.layer.11.attention.output.dense.bias", "encoder.layer.11.attention.output.LayerNorm.gamma", "encoder.layer.11.attention.output.LayerNorm.beta", "encoder.layer.11.intermediate.dense.weight", "encoder.layer.11.intermediate.dense.bias", "encoder.layer.11.output.dense.weight", "encoder.layer.11.output.dense.bias", "encoder.layer.11.output.LayerNorm.gamma", "encoder.layer.11.output.LayerNorm.beta", "encoder.layer.12.attention.self.query.weight", "encoder.layer.12.attention.self.query.bias", "encoder.layer.12.attention.self.key.weight", "encoder.layer.12.attention.self.key.bias", "encoder.layer.12.attention.self.value.weight", "encoder.layer.12.attention.self.value.bias", "encoder.layer.12.attention.output.dense.weight", "encoder.layer.12.attention.output.dense.bias", "encoder.layer.12.attention.output.LayerNorm.gamma", "encoder.layer.12.attention.output.LayerNorm.beta", "encoder.layer.12.intermediate.dense.weight", "encoder.layer.12.intermediate.dense.bias", "encoder.layer.12.output.dense.weight", "encoder.layer.12.output.dense.bias", "encoder.layer.12.output.LayerNorm.gamma", "encoder.layer.12.output.LayerNorm.beta", "encoder.layer.13.attention.self.query.weight", "encoder.layer.13.attention.self.query.bias", "encoder.layer.13.attention.self.key.weight", "encoder.layer.13.attention.self.key.bias", "encoder.layer.13.attention.self.value.weight", "encoder.layer.13.attention.self.value.bias", "encoder.layer.13.attention.output.dense.weight", "encoder.layer.13.attention.output.dense.bias", "encoder.layer.13.attention.output.LayerNorm.gamma", "encoder.layer.13.attention.output.LayerNorm.beta", "encoder.layer.13.intermediate.dense.weight", "encoder.layer.13.intermediate.dense.bias", "encoder.layer.13.output.dense.weight", "encoder.layer.13.output.dense.bias", "encoder.layer.13.output.LayerNorm.gamma", "encoder.layer.13.output.LayerNorm.beta", "encoder.layer.14.attention.self.query.weight", "encoder.layer.14.attention.self.query.bias", "encoder.layer.14.attention.self.key.weight", "encoder.layer.14.attention.self.key.bias", "encoder.layer.14.attention.self.value.weight", "encoder.layer.14.attention.self.value.bias", "encoder.layer.14.attention.output.dense.weight", "encoder.layer.14.attention.output.dense.bias", "encoder.layer.14.attention.output.LayerNorm.gamma", "encoder.layer.14.attention.output.LayerNorm.beta", "encoder.layer.14.intermediate.dense.weight", "encoder.layer.14.intermediate.dense.bias", "encoder.layer.14.output.dense.weight", "encoder.layer.14.output.dense.bias", "encoder.layer.14.output.LayerNorm.gamma", "encoder.layer.14.output.LayerNorm.beta", "encoder.layer.15.attention.self.query.weight", "encoder.layer.15.attention.self.query.bias", "encoder.layer.15.attention.self.key.weight", "encoder.layer.15.attention.self.key.bias", "encoder.layer.15.attention.self.value.weight", "encoder.layer.15.attention.self.value.bias", "encoder.layer.15.attention.output.dense.weight", "encoder.layer.15.attention.output.dense.bias", "encoder.layer.15.attention.output.LayerNorm.gamma", "encoder.layer.15.attention.output.LayerNorm.beta", "encoder.layer.15.intermediate.dense.weight", "encoder.layer.15.intermediate.dense.bias", "encoder.layer.15.output.dense.weight", "encoder.layer.15.output.dense.bias", "encoder.layer.15.output.LayerNorm.gamma", "encoder.layer.15.output.LayerNorm.beta", "encoder.layer.16.attention.self.query.weight", "encoder.layer.16.attention.self.query.bias", "encoder.layer.16.attention.self.key.weight", "encoder.layer.16.attention.self.key.bias", "encoder.layer.16.attention.self.value.weight", "encoder.layer.16.attention.self.value.bias", "encoder.layer.16.attention.output.dense.weight", "encoder.layer.16.attention.output.dense.bias", "encoder.layer.16.attention.output.LayerNorm.gamma", "encoder.layer.16.attention.output.LayerNorm.beta", "encoder.layer.16.intermediate.dense.weight", "encoder.layer.16.intermediate.dense.bias", "encoder.layer.16.output.dense.weight", "encoder.layer.16.output.dense.bias", "encoder.layer.16.output.LayerNorm.gamma", "encoder.layer.16.output.LayerNorm.beta", "encoder.layer.17.attention.self.query.weight", "encoder.layer.17.attention.self.query.bias", "encoder.layer.17.attention.self.key.weight", "encoder.layer.17.attention.self.key.bias", "encoder.layer.17.attention.self.value.weight", "encoder.layer.17.attention.self.value.bias", "encoder.layer.17.attention.output.dense.weight", "encoder.layer.17.attention.output.dense.bias", "encoder.layer.17.attention.output.LayerNorm.gamma", "encoder.layer.17.attention.output.LayerNorm.beta", "encoder.layer.17.intermediate.dense.weight", "encoder.layer.17.intermediate.dense.bias", "encoder.layer.17.output.dense.weight", "encoder.layer.17.output.dense.bias", "encoder.layer.17.output.LayerNorm.gamma", "encoder.layer.17.output.LayerNorm.beta", "encoder.layer.18.attention.self.query.weight", "encoder.layer.18.attention.self.query.bias", "encoder.layer.18.attention.self.key.weight", "encoder.layer.18.attention.self.key.bias", "encoder.layer.18.attention.self.value.weight", "encoder.layer.18.attention.self.value.bias", "encoder.layer.18.attention.output.dense.weight", "encoder.layer.18.attention.output.dense.bias", "encoder.layer.18.attention.output.LayerNorm.gamma", "encoder.layer.18.attention.output.LayerNorm.beta", "encoder.layer.18.intermediate.dense.weight", "encoder.layer.18.intermediate.dense.bias", "encoder.layer.18.output.dense.weight", "encoder.layer.18.output.dense.bias", "encoder.layer.18.output.LayerNorm.gamma", "encoder.layer.18.output.LayerNorm.beta", "encoder.layer.19.attention.self.query.weight", "encoder.layer.19.attention.self.query.bias", "encoder.layer.19.attention.self.key.weight", "encoder.layer.19.attention.self.key.bias", "encoder.layer.19.attention.self.value.weight", "encoder.layer.19.attention.self.value.bias", "encoder.layer.19.attention.output.dense.weight", "encoder.layer.19.attention.output.dense.bias", "encoder.layer.19.attention.output.LayerNorm.gamma", "encoder.layer.19.attention.output.LayerNorm.beta", "encoder.layer.19.intermediate.dense.weight", "encoder.layer.19.intermediate.dense.bias", "encoder.layer.19.output.dense.weight", "encoder.layer.19.output.dense.bias", "encoder.layer.19.output.LayerNorm.gamma", "encoder.layer.19.output.LayerNorm.beta", "encoder.layer.20.attention.self.query.weight", "encoder.layer.20.attention.self.query.bias", "encoder.layer.20.attention.self.key.weight", "encoder.layer.20.attention.self.key.bias", "encoder.layer.20.attention.self.value.weight", "encoder.layer.20.attention.self.value.bias", "encoder.layer.20.attention.output.dense.weight", "encoder.layer.20.attention.output.dense.bias", "encoder.layer.20.attention.output.LayerNorm.gamma", "encoder.layer.20.attention.output.LayerNorm.beta", "encoder.layer.20.intermediate.dense.weight", "encoder.layer.20.intermediate.dense.bias", "encoder.layer.20.output.dense.weight", "encoder.layer.20.output.dense.bias", "encoder.layer.20.output.LayerNorm.gamma", "encoder.layer.20.output.LayerNorm.beta", "encoder.layer.21.attention.self.query.weight", "encoder.layer.21.attention.self.query.bias", "encoder.layer.21.attention.self.key.weight", "encoder.layer.21.attention.self.key.bias", "encoder.layer.21.attention.self.value.weight", "encoder.layer.21.attention.self.value.bias", "encoder.layer.21.attention.output.dense.weight", "encoder.layer.21.attention.output.dense.bias", "encoder.layer.21.attention.output.LayerNorm.gamma", "encoder.layer.21.attention.output.LayerNorm.beta", "encoder.layer.21.intermediate.dense.weight", "encoder.layer.21.intermediate.dense.bias", "encoder.layer.21.output.dense.weight", "encoder.layer.21.output.dense.bias", "encoder.layer.21.output.LayerNorm.gamma", "encoder.layer.21.output.LayerNorm.beta", "encoder.layer.22.attention.self.query.weight", "encoder.layer.22.attention.self.query.bias", "encoder.layer.22.attention.self.key.weight", "encoder.layer.22.attention.self.key.bias", "encoder.layer.22.attention.self.value.weight", "encoder.layer.22.attention.self.value.bias", "encoder.layer.22.attention.output.dense.weight", "encoder.layer.22.attention.output.dense.bias", "encoder.layer.22.attention.output.LayerNorm.gamma", "encoder.layer.22.attention.output.LayerNorm.beta", "encoder.layer.22.intermediate.dense.weight", "encoder.layer.22.intermediate.dense.bias", "encoder.layer.22.output.dense.weight", "encoder.layer.22.output.dense.bias", "encoder.layer.22.output.LayerNorm.gamma", "encoder.layer.22.output.LayerNorm.beta", "encoder.layer.23.attention.self.query.weight", "encoder.layer.23.attention.self.query.bias", "encoder.layer.23.attention.self.key.weight", "encoder.layer.23.attention.self.key.bias", "encoder.layer.23.attention.self.value.weight", "encoder.layer.23.attention.self.value.bias", "encoder.layer.23.attention.output.dense.weight", "encoder.layer.23.attention.output.dense.bias", "encoder.layer.23.attention.output.LayerNorm.gamma", "encoder.layer.23.attention.output.LayerNorm.beta", "encoder.layer.23.intermediate.dense.weight", "encoder.layer.23.intermediate.dense.bias", "encoder.layer.23.output.dense.weight", "encoder.layer.23.output.dense.bias", "encoder.layer.23.output.LayerNorm.gamma", "encoder.layer.23.output.LayerNorm.beta", "pooler.dense.weight", "pooler.dense.bias".
Unexpected key(s) in state_dict: "lm_bias", "bert.embeddings.word_embeddings.weight", "bert.embeddings.position_embeddings.weight", "bert.embeddings.token_type_embeddings.weight", "bert.embeddings.LayerNorm.gamma", "bert.embeddings.LayerNorm.beta", "bert.encoder.layer.0.attention.self.query.weight", "bert.encoder.layer.0.attention.self.query.bias", "bert.encoder.layer.0.attention.self.key.weight", "bert.encoder.layer.0.attention.self.key.bias", "bert.encoder.layer.0.attention.self.value.weight", "bert.encoder.layer.0.attention.self.value.bias", "bert.encoder.layer.0.attention.output.dense.weight", "bert.encoder.layer.0.attention.output.dense.bias", "bert.encoder.layer.0.attention.output.LayerNorm.gamma", "bert.encoder.layer.0.attention.output.LayerNorm.beta", "bert.encoder.layer.0.intermediate.dense.weight", "bert.encoder.layer.0.intermediate.dense.bias", "bert.encoder.layer.0.output.dense.weight", "bert.encoder.layer.0.output.dense.bias", "bert.encoder.layer.0.output.LayerNorm.gamma", "bert.encoder.layer.0.output.LayerNorm.beta", "bert.encoder.layer.1.attention.self.query.weight", "bert.encoder.layer.1.attention.self.query.bias", "bert.encoder.layer.1.attention.self.key.weight", "bert.encoder.layer.1.attention.self.key.bias", "bert.encoder.layer.1.attention.self.value.weight", "bert.encoder.layer.1.attention.self.value.bias", "bert.encoder.layer.1.attention.output.dense.weight", "bert.encoder.layer.1.attention.output.dense.bias", "bert.encoder.layer.1.attention.output.LayerNorm.gamma", "bert.encoder.layer.1.attention.output.LayerNorm.beta", "bert.encoder.layer.1.intermediate.dense.weight", "bert.encoder.layer.1.intermediate.dense.bias", "bert.encoder.layer.1.output.dense.weight", "bert.encoder.layer.1.output.dense.bias", "bert.encoder.layer.1.output.LayerNorm.gamma", "bert.encoder.layer.1.output.LayerNorm.beta", "bert.encoder.layer.2.attention.self.query.weight", "bert.encoder.layer.2.attention.self.query.bias", "bert.encoder.layer.2.attention.self.key.weight", "bert.encoder.layer.2.attention.self.key.bias", "bert.encoder.layer.2.attention.self.value.weight", "bert.encoder.layer.2.attention.self.value.bias", "bert.encoder.layer.2.attention.output.dense.weight", "bert.encoder.layer.2.attention.output.dense.bias", "bert.encoder.layer.2.attention.output.LayerNorm.gamma", "bert.encoder.layer.2.attention.output.LayerNorm.beta", "bert.encoder.layer.2.intermediate.dense.weight", "bert.encoder.layer.2.intermediate.dense.bias", "bert.encoder.layer.2.output.dense.weight", "bert.encoder.layer.2.output.dense.bias", "bert.encoder.layer.2.output.LayerNorm.gamma", "bert.encoder.layer.2.output.LayerNorm.beta", "bert.encoder.layer.3.attention.self.query.weight", "bert.encoder.layer.3.attention.self.query.bias", "bert.encoder.layer.3.attention.self.key.weight", "bert.encoder.layer.3.attention.self.key.bias", "bert.encoder.layer.3.attention.self.value.weight", "bert.encoder.layer.3.attention.self.value.bias", "bert.encoder.layer.3.attention.output.dense.weight", "bert.encoder.layer.3.attention.output.dense.bias", "bert.encoder.layer.3.attention.output.LayerNorm.gamma", "bert.encoder.layer.3.attention.output.LayerNorm.beta", "bert.encoder.layer.3.intermediate.dense.weight", "bert.encoder.layer.3.intermediate.dense.bias", "bert.encoder.layer.3.output.dense.weight", "bert.encoder.layer.3.output.dense.bias", "bert.encoder.layer.3.output.LayerNorm.gamma", "bert.encoder.layer.3.output.LayerNorm.beta", "bert.encoder.layer.4.attention.self.query.weight", "bert.encoder.layer.4.attention.self.query.bias", "bert.encoder.layer.4.attention.self.key.weight", "bert.encoder.layer.4.attention.self.key.bias", "bert.encoder.layer.4.attention.self.value.weight", "bert.encoder.layer.4.attention.self.value.bias", "bert.encoder.layer.4.attention.output.dense.weight", "bert.encoder.layer.4.attention.output.dense.bias", "bert.encoder.layer.4.attention.output.LayerNorm.gamma", "bert.encoder.layer.4.attention.output.LayerNorm.beta", "bert.encoder.layer.4.intermediate.dense.weight", "bert.encoder.layer.4.intermediate.dense.bias", "bert.encoder.layer.4.output.dense.weight", "bert.encoder.layer.4.output.dense.bias", "bert.encoder.layer.4.output.LayerNorm.gamma", "bert.encoder.layer.4.output.LayerNorm.beta", "bert.encoder.layer.5.attention.self.query.weight", "bert.encoder.layer.5.attention.self.query.bias", "bert.encoder.layer.5.attention.self.key.weight", "bert.encoder.layer.5.attention.self.key.bias", "bert.encoder.layer.5.attention.self.value.weight", "bert.encoder.layer.5.attention.self.value.bias", "bert.encoder.layer.5.attention.output.dense.weight", "bert.encoder.layer.5.attention.output.dense.bias", "bert.encoder.layer.5.attention.output.LayerNorm.gamma", "bert.encoder.layer.5.attention.output.LayerNorm.beta", "bert.encoder.layer.5.intermediate.dense.weight", "bert.encoder.layer.5.intermediate.dense.bias", "bert.encoder.layer.5.output.dense.weight", "bert.encoder.layer.5.output.dense.bias", "bert.encoder.layer.5.output.LayerNorm.gamma", "bert.encoder.layer.5.output.LayerNorm.beta", "bert.encoder.layer.6.attention.self.query.weight", "bert.encoder.layer.6.attention.self.query.bias", "bert.encoder.layer.6.attention.self.key.weight", "bert.encoder.layer.6.attention.self.key.bias", "bert.encoder.layer.6.attention.self.value.weight", "bert.encoder.layer.6.attention.self.value.bias", "bert.encoder.layer.6.attention.output.dense.weight", "bert.encoder.layer.6.attention.output.dense.bias", "bert.encoder.layer.6.attention.output.LayerNorm.gamma", "bert.encoder.layer.6.attention.output.LayerNorm.beta", "bert.encoder.layer.6.intermediate.dense.weight", "bert.encoder.layer.6.intermediate.dense.bias", "bert.encoder.layer.6.output.dense.weight", "bert.encoder.layer.6.output.dense.bias", "bert.encoder.layer.6.output.LayerNorm.gamma", "bert.encoder.layer.6.output.LayerNorm.beta", "bert.encoder.layer.7.attention.self.query.weight", "bert.encoder.layer.7.attention.self.query.bias", "bert.encoder.layer.7.attention.self.key.weight", "bert.encoder.layer.7.attention.self.key.bias", "bert.encoder.layer.7.attention.self.value.weight", "bert.encoder.layer.7.attention.self.value.bias", "bert.encoder.layer.7.attention.output.dense.weight", "bert.encoder.layer.7.attention.output.dense.bias", "bert.encoder.layer.7.attention.output.LayerNorm.gamma", "bert.encoder.layer.7.attention.output.LayerNorm.beta", "bert.encoder.layer.7.intermediate.dense.weight", "bert.encoder.layer.7.intermediate.dense.bias", "bert.encoder.layer.7.output.dense.weight", "bert.encoder.layer.7.output.dense.bias", "bert.encoder.layer.7.output.LayerNorm.gamma", "bert.encoder.layer.7.output.LayerNorm.beta", "bert.encoder.layer.8.attention.self.query.weight", "bert.encoder.layer.8.attention.self.query.bias", "bert.encoder.layer.8.attention.self.key.weight", "bert.encoder.layer.8.attention.self.key.bias", "bert.encoder.layer.8.attention.self.value.weight", "bert.encoder.layer.8.attention.self.value.bias", "bert.encoder.layer.8.attention.output.dense.weight", "bert.encoder.layer.8.attention.output.dense.bias", "bert.encoder.layer.8.attention.output.LayerNorm.gamma", "bert.encoder.layer.8.attention.output.LayerNorm.beta", "bert.encoder.layer.8.intermediate.dense.weight", "bert.encoder.layer.8.intermediate.dense.bias", "bert.encoder.layer.8.output.dense.weight", "bert.encoder.layer.8.output.dense.bias", "bert.encoder.layer.8.output.LayerNorm.gamma", "bert.encoder.layer.8.output.LayerNorm.beta", "bert.encoder.layer.9.attention.self.query.weight", "bert.encoder.layer.9.attention.self.query.bias", "bert.encoder.layer.9.attention.self.key.weight", "bert.encoder.layer.9.attention.self.key.bias", "bert.encoder.layer.9.attention.self.value.weight", "bert.encoder.layer.9.attention.self.value.bias", "bert.encoder.layer.9.attention.output.dense.weight", "bert.encoder.layer.9.attention.output.dense.bias", "bert.encoder.layer.9.attention.output.LayerNorm.gamma", "bert.encoder.layer.9.attention.output.LayerNorm.beta", "bert.encoder.layer.9.intermediate.dense.weight", "bert.encoder.layer.9.intermediate.dense.bias", "bert.encoder.layer.9.output.dense.weight", "bert.encoder.layer.9.output.dense.bias", "bert.encoder.layer.9.output.LayerNorm.gamma", "bert.encoder.layer.9.output.LayerNorm.beta", "bert.encoder.layer.10.attention.self.query.weight", "bert.encoder.layer.10.attention.self.query.bias", "bert.encoder.layer.10.attention.self.key.weight", "bert.encoder.layer.10.attention.self.key.bias", "bert.encoder.layer.10.attention.self.value.weight", "bert.encoder.layer.10.attention.self.value.bias", "bert.encoder.layer.10.attention.output.dense.weight", "bert.encoder.layer.10.attention.output.dense.bias", "bert.encoder.layer.10.attention.output.LayerNorm.gamma", "bert.encoder.layer.10.attention.output.LayerNorm.beta", "bert.encoder.layer.10.intermediate.dense.weight", "bert.encoder.layer.10.intermediate.dense.bias", "bert.encoder.layer.10.output.dense.weight", "bert.encoder.layer.10.output.dense.bias", "bert.encoder.layer.10.output.LayerNorm.gamma", "bert.encoder.layer.10.output.LayerNorm.beta", "bert.encoder.layer.11.attention.self.query.weight", "bert.encoder.layer.11.attention.self.query.bias", "bert.encoder.layer.11.attention.self.key.weight", "bert.encoder.layer.11.attention.self.key.bias", "bert.encoder.layer.11.attention.self.value.weight", "bert.encoder.layer.11.attention.self.value.bias", "bert.encoder.layer.11.attention.output.dense.weight", "bert.encoder.layer.11.attention.output.dense.bias", "bert.encoder.layer.11.attention.output.LayerNorm.gamma", "bert.encoder.layer.11.attention.output.LayerNorm.beta", "bert.encoder.layer.11.intermediate.dense.weight", "bert.encoder.layer.11.intermediate.dense.bias", "bert.encoder.layer.11.output.dense.weight", "bert.encoder.layer.11.output.dense.bias", "bert.encoder.layer.11.output.LayerNorm.gamma", "bert.encoder.layer.11.output.LayerNorm.beta", "bert.encoder.layer.12.attention.self.query.weight", "bert.encoder.layer.12.attention.self.query.bias", "bert.encoder.layer.12.attention.self.key.weight", "bert.encoder.layer.12.attention.self.key.bias", "bert.encoder.layer.12.attention.self.value.weight", "bert.encoder.layer.12.attention.self.value.bias", "bert.encoder.layer.12.attention.output.dense.weight", "bert.encoder.layer.12.attention.output.dense.bias", "bert.encoder.layer.12.attention.output.LayerNorm.gamma", "bert.encoder.layer.12.attention.output.LayerNorm.beta", "bert.encoder.layer.12.intermediate.dense.weight", "bert.encoder.layer.12.intermediate.dense.bias", "bert.encoder.layer.12.output.dense.weight", "bert.encoder.layer.12.output.dense.bias", "bert.encoder.layer.12.output.LayerNorm.gamma", "bert.encoder.layer.12.output.LayerNorm.beta", "bert.encoder.layer.13.attention.self.query.weight", "bert.encoder.layer.13.attention.self.query.bias", "bert.encoder.layer.13.attention.self.key.weight", "bert.encoder.layer.13.attention.self.key.bias", "bert.encoder.layer.13.attention.self.value.weight", "bert.encoder.layer.13.attention.self.value.bias", "bert.encoder.layer.13.attention.output.dense.weight", "bert.encoder.layer.13.attention.output.dense.bias", "bert.encoder.layer.13.attention.output.LayerNorm.gamma", "bert.encoder.layer.13.attention.output.LayerNorm.beta", "bert.encoder.layer.13.intermediate.dense.weight", "bert.encoder.layer.13.intermediate.dense.bias", "bert.encoder.layer.13.output.dense.weight", "bert.encoder.layer.13.output.dense.bias", "bert.encoder.layer.13.output.LayerNorm.gamma", "bert.encoder.layer.13.output.LayerNorm.beta", "bert.encoder.layer.14.attention.self.query.weight", "bert.encoder.layer.14.attention.self.query.bias", "bert.encoder.layer.14.attention.self.key.weight", "bert.encoder.layer.14.attention.self.key.bias", "bert.encoder.layer.14.attention.self.value.weight", "bert.encoder.layer.14.attention.self.value.bias", "bert.encoder.layer.14.attention.output.dense.weight", "bert.encoder.layer.14.attention.output.dense.bias", "bert.encoder.layer.14.attention.output.LayerNorm.gamma", "bert.encoder.layer.14.attention.output.LayerNorm.beta", "bert.encoder.layer.14.intermediate.dense.weight", "bert.encoder.layer.14.intermediate.dense.bias", "bert.encoder.layer.14.output.dense.weight", "bert.encoder.layer.14.output.dense.bias", "bert.encoder.layer.14.output.LayerNorm.gamma", "bert.encoder.layer.14.output.LayerNorm.beta", "bert.encoder.layer.15.attention.self.query.weight", "bert.encoder.layer.15.attention.self.query.bias", "bert.encoder.layer.15.attention.self.key.weight", "bert.encoder.layer.15.attention.self.key.bias", "bert.encoder.layer.15.attention.self.value.weight", "bert.encoder.layer.15.attention.self.value.bias", "bert.encoder.layer.15.attention.output.dense.weight", "bert.encoder.layer.15.attention.output.dense.bias", "bert.encoder.layer.15.attention.output.LayerNorm.gamma", "bert.encoder.layer.15.attention.output.LayerNorm.beta", "bert.encoder.layer.15.intermediate.dense.weight", "bert.encoder.layer.15.intermediate.dense.bias", "bert.encoder.layer.15.output.dense.weight", "bert.encoder.layer.15.output.dense.bias", "bert.encoder.layer.15.output.LayerNorm.gamma", "bert.encoder.layer.15.output.LayerNorm.beta", "bert.encoder.layer.16.attention.self.query.weight", "bert.encoder.layer.16.attention.self.query.bias", "bert.encoder.layer.16.attention.self.key.weight", "bert.encoder.layer.16.attention.self.key.bias", "bert.encoder.layer.16.attention.self.value.weight", "bert.encoder.layer.16.attention.self.value.bias", "bert.encoder.layer.16.attention.output.dense.weight", "bert.encoder.layer.16.attention.output.dense.bias", "bert.encoder.layer.16.attention.output.LayerNorm.gamma", "bert.encoder.layer.16.attention.output.LayerNorm.beta", "bert.encoder.layer.16.intermediate.dense.weight", "bert.encoder.layer.16.intermediate.dense.bias", "bert.encoder.layer.16.output.dense.weight", "bert.encoder.layer.16.output.dense.bias", "bert.encoder.layer.16.output.LayerNorm.gamma", "bert.encoder.layer.16.output.LayerNorm.beta", "bert.encoder.layer.17.attention.self.query.weight", "bert.encoder.layer.17.attention.self.query.bias", "bert.encoder.layer.17.attention.self.key.weight", "bert.encoder.layer.17.attention.self.key.bias", "bert.encoder.layer.17.attention.self.value.weight", "bert.encoder.layer.17.attention.self.value.bias", "bert.encoder.layer.17.attention.output.dense.weight", "bert.encoder.layer.17.attention.output.dense.bias", "bert.encoder.layer.17.attention.output.LayerNorm.gamma", "bert.encoder.layer.17.attention.output.LayerNorm.beta", "bert.encoder.layer.17.intermediate.dense.weight", "bert.encoder.layer.17.intermediate.dense.bias", "bert.encoder.layer.17.output.dense.weight", "bert.encoder.layer.17.output.dense.bias", "bert.encoder.layer.17.output.LayerNorm.gamma", "bert.encoder.layer.17.output.LayerNorm.beta", "bert.encoder.layer.18.attention.self.query.weight", "bert.encoder.layer.18.attention.self.query.bias", "bert.encoder.layer.18.attention.self.key.weight", "bert.encoder.layer.18.attention.self.key.bias", "bert.encoder.layer.18.attention.self.value.weight", "bert.encoder.layer.18.attention.self.value.bias", "bert.encoder.layer.18.attention.output.dense.weight", "bert.encoder.layer.18.attention.output.dense.bias", "bert.encoder.layer.18.attention.output.LayerNorm.gamma", "bert.encoder.layer.18.attention.output.LayerNorm.beta", "bert.encoder.layer.18.intermediate.dense.weight", "bert.encoder.layer.18.intermediate.dense.bias", "bert.encoder.layer.18.output.dense.weight", "bert.encoder.layer.18.output.dense.bias", "bert.encoder.layer.18.output.LayerNorm.gamma", "bert.encoder.layer.18.output.LayerNorm.beta", "bert.encoder.layer.19.attention.self.query.weight", "bert.encoder.layer.19.attention.self.query.bias", "bert.encoder.layer.19.attention.self.key.weight", "bert.encoder.layer.19.attention.self.key.bias", "bert.encoder.layer.19.attention.self.value.weight", "bert.encoder.layer.19.attention.self.value.bias", "bert.encoder.layer.19.attention.output.dense.weight", "bert.encoder.layer.19.attention.output.dense.bias", "bert.encoder.layer.19.attention.output.LayerNorm.gamma", "bert.encoder.layer.19.attention.output.LayerNorm.beta", "bert.encoder.layer.19.intermediate.dense.weight", "bert.encoder.layer.19.intermediate.dense.bias", "bert.encoder.layer.19.output.dense.weight", "bert.encoder.layer.19.output.dense.bias", "bert.encoder.layer.19.output.LayerNorm.gamma", "bert.encoder.layer.19.output.LayerNorm.beta", "bert.encoder.layer.20.attention.self.query.weight", "bert.encoder.layer.20.attention.self.query.bias", "bert.encoder.layer.20.attention.self.key.weight", "bert.encoder.layer.20.attention.self.key.bias", "bert.encoder.layer.20.attention.self.value.weight", "bert.encoder.layer.20.attention.self.value.bias", "bert.encoder.layer.20.attention.output.dense.weight", "bert.encoder.layer.20.attention.output.dense.bias", "bert.encoder.layer.20.attention.output.LayerNorm.gamma", "bert.encoder.layer.20.attention.output.LayerNorm.beta", "bert.encoder.layer.20.intermediate.dense.weight", "bert.encoder.layer.20.intermediate.dense.bias", "bert.encoder.layer.20.output.dense.weight", "bert.encoder.layer.20.output.dense.bias", "bert.encoder.layer.20.output.LayerNorm.gamma", "bert.encoder.layer.20.output.LayerNorm.beta", "bert.encoder.layer.21.attention.self.query.weight", "bert.encoder.layer.21.attention.self.query.bias", "bert.encoder.layer.21.attention.self.key.weight", "bert.encoder.layer.21.attention.self.key.bias", "bert.encoder.layer.21.attention.self.value.weight", "bert.encoder.layer.21.attention.self.value.bias", "bert.encoder.layer.21.attention.output.dense.weight", "bert.encoder.layer.21.attention.output.dense.bias", "bert.encoder.layer.21.attention.output.LayerNorm.gamma", "bert.encoder.layer.21.attention.output.LayerNorm.beta", "bert.encoder.layer.21.intermediate.dense.weight", "bert.encoder.layer.21.intermediate.dense.bias", "bert.encoder.layer.21.output.dense.weight", "bert.encoder.layer.21.output.dense.bias", "bert.encoder.layer.21.output.LayerNorm.gamma", "bert.encoder.layer.21.output.LayerNorm.beta", "bert.encoder.layer.22.attention.self.query.weight", "bert.encoder.layer.22.attention.self.query.bias", "bert.encoder.layer.22.attention.self.key.weight", "bert.encoder.layer.22.attention.self.key.bias", "bert.encoder.layer.22.attention.self.value.weight", "bert.encoder.layer.22.attention.self.value.bias", "bert.encoder.layer.22.attention.output.dense.weight", "bert.encoder.layer.22.attention.output.dense.bias", "bert.encoder.layer.22.attention.output.LayerNorm.gamma", "bert.encoder.layer.22.attention.output.LayerNorm.beta", "bert.encoder.layer.22.intermediate.dense.weight", "bert.encoder.layer.22.intermediate.dense.bias", "bert.encoder.layer.22.output.dense.weight", "bert.encoder.layer.22.output.dense.bias", "bert.encoder.layer.22.output.LayerNorm.gamma", "bert.encoder.layer.22.output.LayerNorm.beta", "bert.encoder.layer.23.attention.self.query.weight", "bert.encoder.layer.23.attention.self.query.bias", "bert.encoder.layer.23.attention.self.key.weight", "bert.encoder.layer.23.attention.self.key.bias", "bert.encoder.layer.23.attention.self.value.weight", "bert.encoder.layer.23.attention.self.value.bias", "bert.encoder.layer.23.attention.output.dense.weight", "bert.encoder.layer.23.attention.output.dense.bias", "bert.encoder.layer.23.attention.output.LayerNorm.gamma", "bert.encoder.layer.23.attention.output.LayerNorm.beta", "bert.encoder.layer.23.intermediate.dense.weight", "bert.encoder.layer.23.intermediate.dense.bias", "bert.encoder.layer.23.output.dense.weight", "bert.encoder.layer.23.output.dense.bias", "bert.encoder.layer.23.output.LayerNorm.gamma", "bert.encoder.layer.23.output.LayerNorm.beta", "bert.pooler.dense.weight", "bert.pooler.dense.bias", "classifier.weight", "classifier.bias", "linear.weight", "linear.bias", "LayerNorm.gamma", "LayerNorm.beta".
from alicemind.
I guess it may be you use the --pretrain_model
to load the pre-trained model. Please use the --init_checkpoint
augment instead (the same as the README.md).
Or you can show your running scripts.
from alicemind.
I use the following command
nohup python run_classifier_multi_task.py
--task_name CoLA
--do_train
--do_eval
--do_test
--amp_type O1
--lr_decay_factor 1
--dropout 0.1
--do_lower_case
--detach_index -1
--core_encoder bert
--data_dir GLUE
--vocab_file config/vocab.txt
--bert_config_file config/large_bert_config.json
--init_checkpoint pretrained_model/en_model
--max_seq_length 128
--train_batch_size 16
--learning_rate 2e-5
--num_train_epochs 3
--fast_train
--save_model
--gradient_accumulation_steps 1
--output_dir output > output.log 2>&1 &
I have tried both --pretain_model
and --init_checkpoint
, the errors were the first and the second respectively
from alicemind.
We have tested the case that uses --init_checkpoint
and keeps all codes unchanged works. Please don't just keep line 1056.
from alicemind.
We have tested the case that uses
--init_checkpoint
and keeps all codes unchanged works. Please don't just keep line 1056.
Unchanged code can run successfully all the time, but the result is only the half of the paper. And if you print the log you will find it execute the except part not the try part.
from alicemind.
Could you please post the full log? The picture below is the result of our reproduction under default setting.
from alicemind.
nohup: ignoring input
12/14/2021 05:02:17 - INFO - main - COMMAND: run_classifier_multi_task.py --task_name CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI --do_train --do_eval --amp_type O1 --lr_decay_factor 1 --dropout 0.1 --do_lower_case --detach_index -1 --core_encoder bert --data_dir GLUE --vocab_file config/vocab.txt --bert_config_file config/large_bert_config.json --init_checkpoint pretrained_model/en_model --max_seq_length 128 --train_batch_size 16 --learning_rate 2e-5 --num_train_epochs 3 --fast_train --save_model --gradient_accumulation_steps 1 --output_dir output
12/14/2021 05:02:17 - INFO - main - device cuda n_gpu 1 distributed training False
12/14/2021 05:02:30 - INFO - main - LOOKING AT GLUE/MRPC/train.tsv
12/14/2021 05:02:47 - INFO - main - Install apex first if you want to use mix precition.
12/14/2021 05:02:47 - INFO - main - ***** Process training data *****
12/14/2021 05:02:51 - INFO - main - start tokenize
12/14/2021 05:02:58 - INFO - main - start tokenize
12/14/2021 05:03:22 - INFO - main - start tokenize
12/14/2021 05:03:29 - INFO - main - start tokenize
12/14/2021 05:03:48 - INFO - main - start tokenize
12/14/2021 05:04:14 - INFO - main - start tokenize
12/14/2021 05:04:24 - INFO - main - start tokenize
12/14/2021 05:04:36 - INFO - main - start tokenize
12/14/2021 05:04:46 - INFO - main - start tokenize
12/14/2021 05:04:49 - INFO - main - ***** Running training *****
12/14/2021 05:04:49 - INFO - main - Num examples = 949733
12/14/2021 05:04:49 - INFO - main - Num tasks = 9
12/14/2021 05:04:49 - INFO - main - Batch size = 16
12/14/2021 05:04:49 - INFO - main - Num steps = 178074
12/14/2021 05:04:49 - INFO - main - ***** Process dev data *****
12/14/2021 05:04:59 - INFO - main - start tokenize
12/14/2021 05:05:08 - INFO - main - start tokenize
12/14/2021 05:05:18 - INFO - main - start tokenize
12/14/2021 05:05:26 - INFO - main - start tokenize
12/14/2021 05:05:35 - INFO - main - start tokenize
12/14/2021 05:05:46 - INFO - main - start tokenize
12/14/2021 05:05:55 - INFO - main - start tokenize
12/14/2021 05:06:04 - INFO - main - start tokenize
12/14/2021 05:06:13 - INFO - main - start tokenize
12/14/2021 05:06:14 - INFO - main - ***** Running evaluation *****
12/14/2021 05:06:14 - INFO - main - Num examples = 69711
12/14/2021 05:06:14 - INFO - main - Num tasks = 9
12/14/2021 05:06:14 - INFO - main - Batch size = 8
Epoch: 0%| | 0/3 [00:00<?, ?it/s]
Iteration: 100%|██████████| 59359/59359 [53:25<00:00, 14.81it/s]
12/14/2021 06:06:04 - INFO - main - ***** Eval results *****
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_accuracy = [0.3087248322147651, 0.2982643660609762, 0.6078431372549019, 0.537250594911221, 0.5476378926539698, 0.44765342960288806, 0.48394495412844035, 0.0, 0.5211267605633803]
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_corrcoef = [0, 0, 0, 0, 0, 0, 0, -0.18605788966825199, 0]
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_loss = 0.9975722805879683
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_matthew = [-0.018148342420931135, 0, 0, 0, 0, 0, 0, 0, 0]
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: global_step = 47487
12/14/2021 06:06:04 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: loss = 0.9634366636350805
12/14/2021 06:06:07 - INFO - main - Save model of Epoch 0
Epoch: 33%|███▎ | 1/3 [59:51<1:59:43, 3591.88s/it]
Iteration: 100%|██████████| 59359/59359 [53:50<00:00, 14.70it/s]
12/14/2021 07:06:37 - INFO - main - ***** Eval results *****
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_accuracy = [0.3087248322147651, 0.2982643660609762, 0.6078431372549019, 0.537250594911221, 0.5476378926539698, 0.44765342960288806, 0.48394495412844035, 0.0, 0.5211267605633803]
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_corrcoef = [0, 0, 0, 0, 0, 0, 0, -0.18605788966825199, 0]
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_loss = 0.9975722805879683
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_matthew = [-0.018148342420931135, 0, 0, 0, 0, 0, 0, 0, 0]
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: global_step = 94974
12/14/2021 07:06:37 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: loss = 0.9633064780335283
12/14/2021 07:06:41 - INFO - main - Save model of Epoch 1
Epoch: 67%|██████▋ | 2/3 [2:00:25<1:00:04, 3604.43s/it]
Iteration: 100%|██████████| 59359/59359 [54:27<00:00, 14.53it/s]
12/14/2021 08:07:35 - INFO - main - ***** Eval results *****
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_accuracy = [0.3087248322147651, 0.2982643660609762, 0.6078431372549019, 0.537250594911221, 0.5476378926539698, 0.44765342960288806, 0.48394495412844035, 0.0, 0.5211267605633803]
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_corrcoef = [0, 0, 0, 0, 0, 0, 0, -0.18605788966825199, 0]
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_loss = 0.9975722805879683
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: eval_matthew = [-0.018148342420931135, 0, 0, 0, 0, 0, 0, 0, 0]
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: global_step = 142461
12/14/2021 08:07:35 - INFO - main - pretrained_model/en_model: CoLA,MNLI,MRPC,QNLI,QQP,RTE,SST-2,STS-B,WNLI: loss = 0.9636250535845807
12/14/2021 08:07:38 - INFO - main - Save model of Epoch 2
Epoch: 100%|██████████| 3/3 [3:01:22<00:00, 3620.24s/it]
from alicemind.
The performance of multi-task mode is related to the similarity between tasks. For example, MNLI and STS-B are similar, results of STS-B when run MNLI and STS-B together is better than run STS-B only, but CoLA and other tasks are not similar. You can get normal accuracy on the premise of running CoLA task alone (--task_name CoLA).
from alicemind.
Thank you very much for your patient reply. Seems like it is necessary to install apex for reproduce the correct result.
without apex
12/16/2021 04:50:44 - INFO - __main__ - pretrained_model/en_model: MRPC: eval_accuracy = [0.5735294117647058]
12/16/2021 04:50:44 - INFO - __main__ - pretrained_model/en_model: MRPC: eval_corrcoef = [0]
12/16/2021 04:50:44 - INFO - __main__ - pretrained_model/en_model: MRPC: eval_loss = 0.6800402063949436
12/16/2021 04:50:44 - INFO - __main__ - pretrained_model/en_model: MRPC: eval_matthew = [0]
with apex
12/16/2021 05:16:49 - INFO - __main__ - pretrained_model/en_model: MRPC: eval_accuracy = [0.875]
12/16/2021 05:16:49 - INFO - __main__ - pretrained_model/en_model: MRPC: eval_corrcoef = [0]
12/16/2021 05:16:49 - INFO - __main__ - pretrained_model/en_model: MRPC: eval_loss = 0.46738045994539323
12/16/2021 05:16:49 - INFO - __main__ - pretrained_model/en_model: MRPC: eval_matthew = [0]
from alicemind.
Related Issues (20)
- When the mPLUG-2 model can be released? HOT 2
- Fairness of SOTA comparison in mPLUG-2 HOT 2
- RuntimeError: gather(): Expected dtype int64 for index
- There might be sth wrong in this file mPLUG/videocap_mplug.py
- Fine-tuning video captions HOT 1
- Inference of image captioning on single image HOT 4
- how to get the pre-trained model "ViT-L-14.tar"
- how to get ued model A Unified Pretraining Framework For Passage Ranking And Expansion
- “mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video”代码是否会开源?
- Grounding checkpoint evaluation results
- “VECO 2.0: Cross-lingual Language Model Pre-training with Multi-granularity Contrastive Learning”代码是否会开源?
- Missing partial code and files of gqa for VQA in mPLUG
- Zero-Shot Video Captioning script issues HOT 1
- Could you upload StructuralLM to HuggingFace ?
- The logprob of image captioning result of mplug is very small
- SDCUP
- SDCUP训练问题
- 表格数据集
- mplug中两处代码错误问题
- can i pretrain mPLUG model with my own dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alicemind.