GithubHelp home page GithubHelp logo

spark-mooc / mooc-setup Goto Github PK

View Code? Open in Web Editor NEW
349.0 349.0 313.0 52.44 MB

Information for setting up for the BerkeleyX Spark Intro MOOC, and lab assignments for the course

Jupyter Notebook 14.88% Python 74.44% Java 10.67%

mooc-setup's People

Contributors

adjucb avatar avonmoll avatar bawcos avatar bmc avatar felixcheung avatar joncbates avatar nealmcb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mooc-setup's Issues

Need to update Spark 1.6.0

Hi

The Spark version is 1.3.1 in this VM:

/usr/local/bin/spark-1.3.1-bin-hadoop2.6/

I need to do an update to 1.6.0. How is Spark being installed inside the VM and is there instruction to update? Or do you plan to push an update soon?

ssh timeout error

I am trying to use your VM which is similar to the one created here, but I am getting an ssh timeout error. Are you familiar to why this is the case?

Labs incompatibilities in certain circumstances

I do realize that course VM is close environment not friendly to change, but searching Piazza some students had same obstacles, if I'm incorrect, please close issue.

: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/vagrant/Scalable-Machine-Learning/labs-progress/data/cs190/neuro.txt
  1. Relative file import path in labs produce error in case when IPython working directory changed to another from user home. For convenience using shared folder I made a change in notebook profile c.NotebookApp.notebook_dir = '/vagrant' So maybe notebook have to use explicit path of current user home directory? Something like:
from os.path import expanduser
home = expanduser("~")
  1. Incompatible with numpy 1.9.2 Is that worth to make it forward compatible?

cs120_lab2_linear_regression_df.py - randomSplit changed result in Databrick may cause inconsistent test cases

Some change in Databrick caused randomSplit to result differently since yesterday (02/08/2016).

The same test case was correct yesterday but when I ran again today I found these test cases became incorrect due to result changed of randomSplit in line 352

https://github.com/spark-mooc/mooc-setup/blob/master/cs120_lab2_linear_regression_df.py#L384
should be
Test.assertEquals(round(float(n_train) / float(n_train + n_val + n_test), 1), .8, 'unexpected value for nTrain')

https://github.com/spark-mooc/mooc-setup/blob/master/cs120_lab2_linear_regression_df.py#L385
should be
Test.assertEquals(round(float(n_val) / float(n_train + n_val + n_test), 1), .1, 'unexpected value for nVal')

https://github.com/spark-mooc/mooc-setup/blob/master/cs120_lab2_linear_regression_df.py#L386
should be
Test.assertEquals(round(float(n_test) / float(n_train + n_val + n_test), 1), .1, 'unexpected value for nTest')

Seed problem

Hello

Im trying to go though the 3rd week lab, however it seems to be a problem with the proportions by which the data is partitioned regarding train, validation and test. I'm using the supplied seed, along with the defined weights and i get a different number of examples within each set. Obviously, the following tests are sentenced to fail.

snippet:

weights = [.8, .1, .1]
seed = 42
raw_train_df, raw_validation_df, raw_test_df = raw_df.randomSplit(weights, seed)

n_train = raw_train_df.cache().count()
n_val = raw_validation_df.cache().count()
n_test = raw_test_df.cache().count()
print n_train, n_val, n_test, n_train + n_val + n_test
raw_df.show(1)

output:

80115 9955 9930 100000
+--------------------+
|                text|
+--------------------+
|0,1,1,5,0,1382,4,...|
+--------------------+
only showing top 1 row

the same thing happens in lab 2 linear regression

Module 4 lab CTR Data URL No longer exists!

Hi Felix!

I love your pyspark course thus far!

I am going through your "Scalable Machine Learning" and noticed the link to the dataset in the Module 4 Lab 'Click through Rate Prediction" is not working anymore. Do you have any advice on how to import the dataset relevant to the Module 4 lab so that I may finish the Module?

Thank you so much for your help,

Austen

Minor typos/grammar errors in ML_lab3_linear_reg_student.ipynb

Three changes for your consideration where underscores are insertions and dashes are strike throughs:

  • line 361: 'task involves split_ting_ it into training, validation and test sets'
  • line 490: 'Calculates the -the- squared error'
  • line 815: 'gradient` s_h_ould be a '

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.