GithubHelp home page GithubHelp logo

freesoft / detox_bot2 Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 73.24 MB

University of Illinois@Urbana-Champaign MCS-DS CS498 Cloud Computing Applications project

License: GNU General Public License v3.0

Makefile 0.43% TeX 91.26% CSS 1.77% Dockerfile 0.48% Python 5.16% HTML 0.89%
uiuc cca toxic-comment-classification cloud mcs final-project kubernetes gcp docker

detox_bot2's Introduction

Hi there ๐Ÿ‘‹

  • ๐Ÿ”ญ I work for Blizzard Entertainment, B&OP(old Battle.net), as a Senior Software Engineer.
  • ๐ŸŒฑ Graduated from B.S in Computer Engineering at Dong-A University, South Korea
  • ๐ŸŒฑ Graduated from Master's in Computer Science - Data Science(MCS-DS) at University of Illinois - Urbana Champaign
  • ๐ŸŒฑ Current Georgia Tech's OMSCS grad program student as of August 2023, focusing on Interactive Intelligence.
  • ๐Ÿ’ฌ Ask me about volunteer chances for profit or non-profit projects, especially for eduTech.
  • ๐Ÿ“ซ How to reach me: https://www.linkedin.com/in/wonheejung/

detox_bot2's People

Contributors

freesoft avatar harley3 avatar kevinmackie avatar noya avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

detox_bot2's Issues

[High] Current detox_bot uses MultinomialNB that allows out-of-core training ( means you can keep training it without starting over ) through partial_fit(). However, entire feature is already decided at the first then when training. I see there is a few way, including using word2vec on Keras or etc, and wondering if we can use it.

OR

If we can significantly boost the training speed, it doesn't matter that much. scikit-learn with MultinomialNB takes too much memory while it's running and it ends up using all my MacBook pro's memory so I had to partial run several times. Maybe we can improve it by just cherrypick those words to training and reduce the # of features.

write the project proposal

Clean up. some of the extra TODO and stuff from current markdown file, run some fancy decoration, generate PDF from it, and check-in to the repo.

Ask teammates for review before submission.

Summarize what progress we made

Need opinion and things you've done so we. can add those items in progress report.
So far what I can remember is,

Implemented working prototype ( by copying existing detox_bot ... )
Investigated IaaS/PaaS for running the service on k8s. (need more detail from @harley3 if you have documentation for the investigation )
Made a progress and now prototype app runs on Google Cloud Platform's k8s cluster. (means it's dockerized as well )
Investigated Apache Spark MLlib and TensorFlow if it's good enough to replace current scikit-learn implementation and also scalable ( need more detail from @kevinmackie and @noya for this )

[Low] Adding more stop words. detox_bot uses stopwords which is pre-defined set of words plus a few custom words that I added. However, there are more lot of stopword that's recurring too much or pretty much useless when detecting toxic chat. I would like to decent amount of stop words in there ( not just a, the, this, that, etc ) so that we can expect better accuracy.

Depends on which ML library we are going to use(Spark+Mllib or Tensorflow?), those stopwords need to be regenerated and/or migrated as well as adding more stopwords in it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.