Part 1
- plot more curves to visualize convergence for diff batch sizes
- complete 1.4
PART 2
- n grams, tfidf__-use_idf (maybe hfake news has more filler words etc), penalty (not so much), c tol(vs epsilon) (possibly)
- need to read up on and others, maybe other params we havent included?
- ie max_features (good to focus on high frequency vs low? ooD? etc), penalty (in solver: liblinear, etc)
- need to read up on and others, maybe other params we havent included?
- MORE GRAPHS -> maybe try RandomSearch with all of the params (or a lot)
- maybe more preprocessing (?) -> remove garbage weights -Scaling features (fromo sklearn docs)
- Pipeline (try different model with justification)
- Things to look more into: C parameter -> which > 0 pos int val for log. reg
%Last updated Mon. Oct 18. 2021