Thanks for sharing the codes.
I tried to train the model in default settings and I found that the training is unstable.
The model often collapses in the final generation, and the best model usually comes from the 2nd or 3rd generation.
An example is shown as follows.
Epoch: 0-4, recalls(1/5/10): 88.7%, 96.1%, 97.8%
Epoch: 1-0, recalls(1/5/10): 88.4%, 96.2%, 97.8%
Epoch: 1-1, recalls(1/5/10): 89.5%, 96.8%, 98.1%
Epoch: 1-2, recalls(1/5/10): 89.9%, 96.8%, 98.2%
Epoch: 1-3, recalls(1/5/10): 89.8%, 96.8%, 98.2%
Epoch: 1-4, recalls(1/5/10): 90.1%, 96.8%, 98.2%
Epoch: 2-0, recalls(1/5/10): 88.7%, 96.2%, 97.6%
Epoch: 2-1, recalls(1/5/10): 89.5%, 96.8%, 97.8%
Epoch: 2-2, recalls(1/5/10): 89.3%, 96.8%, 98.0%
Epoch: 2-3, recalls(1/5/10): 89.8%, 97.0%, 98.1% (* the best model)
Epoch: 2-4, recalls(1/5/10): 89.7%, 96.8%, 98.0%
Epoch: 3-0, recalls(1/5/10): 2.8%, 10.3%, 18.1%
Epoch: 3-1, recalls(1/5/10): 2.8%, 10.5%, 18.3%
Epoch: 3-2, recalls(1/5/10): 2.8%, 10.5%, 18.1%
Epoch: 3-3, recalls(1/5/10): 2.9%, 10.9%, 18.3%
Epoch: 3-4, recalls(1/5/10): 3.0%, 10.4%, 18.4%