Firstly, really nice code base for the paper. But for the current setup, does it converge with ViSA dataset? I even tried to change code to incorporate larger batch size but it won't converge to a good policy.
Please let me know if there are some modifications you had in mind or some insights from experiments done. Also, is having message length more than 1 essential for successful communication?