Put all the data file or code file to the github to make us can share all these, which could makes us work more efficient. Github url:https://github.com/CnBDM-Su/bigdata Just click the button update , u could upload what you want.
(1) discovery:some thing about background and aim
(2) Data preparation:
- Data learning and mining(done)
- Data conditioning(clean data) (processing): word count ,word clean...
- Data visualization(not yet)
(3) Model planning and building
-
Association rules : using this model to find the word group but only word.
-
Clustering: using k-means model to find some relation like( x is one of the avg(times in each year) for a word, y is Standard Deviation for each year of one word....
-
Regression: get some prediction for the technology like it will rise or drop
(4) Analyse the result and so some conclusion
(5) Write report