The code appear to be on multiple file but is really easy to understand and is well organised.
Creating multiple strategy and make an ES seem to be the most normal solution, but creating more strategy can maybe increase the computation time and the fitness of the result.
Some commentary are in Italian and other in English but that not really a problem to understand because the code is well written.
there is a real effort of transformation and appropriation of the exercice because only small part of the teacher code remain.
Hello there, Marco!
I liked your idea of using Prioritized Experience Replay to improve the learning process, the concept is quite new to me so I found your work very interesting. There are some things I'd like to point out:
Inside the Memory class you define the capacity attribute, but, for what I can see, it's never used;
The way you defined the number of states overestimates the number of possible board states by a big margin because it considers a significant number of impossible states (for example those where there are more Os than Xs), from what I see this isn't a big issue because your algorithm never visits these states but it's quite expensive in terms of memory usage;
Some comments refer to unused pieces of code and it hurts the overall readability, so I suggest getting rid of them.
I hope this review was useful to you and good luck on the final project. ๐