for apally data mining project
I strongly recommend reading this chapter before this week's practical to give you further background in addition to the lecture material: http://www.nltk.org/book/ch06.html
The instruction of the code is not design in a 'click and run' format, so to use the code, user should open it in python and change the location of dataset (line 132 's = 'C:\Users\Hasee\Desktop\final project of adm\gb-celebs\gb-celebs') After which you can run the file.
The tokenizer is list in the line 60~64 you should use the function 'define_tokenizer(tokenizer = tokenise)' to use the tokenizer you want to use (line _77,several tokenizer is placed.)
And to use corpus from text , you should uncomment line 258
The code is design using a 10 cross validation with a 8-2 split, this is also what can be managed by user.line 237
If you want to use other classiers check the sklearn website for more. also proivided tokenizer not show in the report like standford or regtokenizer. you can explore it if you wish!!