GithubHelp home page GithubHelp logo

huseyinorkun / word2vec Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ibmdatascience/word2vec

0.0 2.0 0.0 1.19 MB

hands on lab instructions to build Spark-based machine learning models for capturing word meanings

Jupyter Notebook 100.00%

word2vec's Introduction

Spark-based machine learning for capturing word meanings

Data source: tweets

Algorithms: Word2Vec, K-means, PCA

Tools: IBM Data Science Experience, Spark, Python

Same techniques can be applied to text documents, product reviews, etc...

This is a tutorial to build Spark-based machine learning models for capturing word meanings. You can learn how to build a word2vec model using Twitter data on IBM's Data Science Experience using Apache Spark.

Blog: http://www.ibmbigdatahub.com/blog/spark-based-machine-learning-capturing-word-meanings

Instructions:

##Step 1. If you already have an account on IBM's Data Science Experience, go to Step 2. If not, follow this tutorial to create an account.

##Step 2. Create a project on DSX. For details on how to create a project, click here.

##Step 3. Get the data into DSX

  1. Download (without uncompressing) some tweets from here to your lap top. The tweets.gz file contains a 10% sample (using Twitter decahose API) of a 15 minute batch of the public tweets from December 23rd. The size of this compressed file is 116MB (compression ratio is about 10 to 1).

  2. Go to your recently created project on DSX and click on the add data assets + icon

  1. Click on the Add file and select the tweets.gz file from your lap top and click on open
  1. Wait until the file is loaded
  1. Once the file is loaded, click on Apply to add this file to your project.

You should see your tweets under the data assets list of your project. Your tweets are now loaded in your object storage in the container associated to your project. If your project name is "Word2Vec for Text Data", the default container name is Word2VecforTextData (unless you change to a different name on Step 2, part 3).

#Step 4. Get the notebook, open it and follow the instructions inside the notebook

  1. Go back to your project and click on the create new notebook icon
  1. Click on From URL (3rd tab), choose a name for your notebook (ex: "Spark-based ML to capture word meaning"), copy and paste this url https://github.com/IBMDataScience/word2vec/blob/master/Spark-based%20machine%20learning%20for%20word%20meanings.ipynb into the Notebook URL rectangle and finally click on Create Notebook.

You are now in your new notebook and the rest of the instructions are in there.

NOTE: to execute cells in notebooks select the cell and use Shift+enter

word2vec's People

Contributors

castanan avatar

Watchers

James Cloos avatar Hüseyin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.