nodesman / training-data-maker Goto Github PK
View Code? Open in Web Editor NEWApplication to make machine learning training data out of website content.App takes a file conaining a list of URLs as a command line argument. Each of these URLs will be downloaded, the HTML stripped and placed in a file in a specified directory. Necessary for making training data for mahout.