proprietary documents clustering algorithm with master thesis serving as example of how to follow reproducible research requirements
sources and compiled pdf with thesis is in dedicated directory
directory with data contains datasets on which solution was evaluated, although they should be also available under: http://lcl.uniroma1.it/moresque http://credo.fub.it/ambient/ http://credo.fub.it/odp239/odp239.tar.gz
unfortunately putting word2vec model on github is problematic due to size, so please visit: https://code.google.com/archive/p/word2vec/ download word2vec model (google news based) and place it in code/lib directory
if you want to play with code read dedicated README for code directory