The training material and tet material as samples are inside the filecase. The original purpose is to classificer texts through lgisticResgression. We use it to apply the trained model to detect if a text is/not well similar to certain type of texts. The assignment of the 'certain type' artificially done.
We thnk Linan Qiu for the basic framework of the code.