spark-mooc / mooc-setup Goto Github PK
View Code? Open in Web Editor NEWInformation for setting up for the BerkeleyX Spark Intro MOOC, and lab assignments for the course
Information for setting up for the BerkeleyX Spark Intro MOOC, and lab assignments for the course
Concerns 8252845
Following the discussion at https://piazza.com/class/iqfbu516yuj5t3?cid=653
it seems that "expected_test_baseline = 0.530363901139" used in the assertion comes from hash_test_df instead of hash_train_df
I think in this line, parsed_points_df
should be parsed_data_df
instead, because we should plot with the data with shifted labels. Using parsed_points_df
we can't see the labels on the x-axis as it's out of range.
Thank you very much.
Hi Felix!
I love your pyspark course thus far!
I am going through your "Scalable Machine Learning" and noticed the link to the dataset in the Module 4 Lab 'Click through Rate Prediction" is not working anymore. Do you have any advice on how to import the dataset relevant to the Module 4 lab so that I may finish the Module?
Thank you so much for your help,
Austen
Three changes for your consideration where underscores are insertions and dashes are strike throughs:
Hi, just downloaded sparkvm via vagrant, but can't login, what is the user name and password?
Some change in Databrick caused randomSplit to result differently since yesterday (02/08/2016).
The same test case was correct yesterday but when I ran again today I found these test cases became incorrect due to result changed of randomSplit in line 352
https://github.com/spark-mooc/mooc-setup/blob/master/cs120_lab2_linear_regression_df.py#L384
should be
Test.assertEquals(round(float(n_train) / float(n_train + n_val + n_test), 1), .8, 'unexpected value for nTrain')
https://github.com/spark-mooc/mooc-setup/blob/master/cs120_lab2_linear_regression_df.py#L385
should be
Test.assertEquals(round(float(n_val) / float(n_train + n_val + n_test), 1), .1, 'unexpected value for nVal')
https://github.com/spark-mooc/mooc-setup/blob/master/cs120_lab2_linear_regression_df.py#L386
should be
Test.assertEquals(round(float(n_test) / float(n_train + n_val + n_test), 1), .1, 'unexpected value for nTest')
Hi
The Spark version is 1.3.1 in this VM:
/usr/local/bin/spark-1.3.1-bin-hadoop2.6/
I need to do an update to 1.6.0. How is Spark being installed inside the VM and is there instruction to update? Or do you plan to push an update soon?
I do realize that course VM is close environment not friendly to change, but searching Piazza some students had same obstacles, if I'm incorrect, please close issue.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/vagrant/Scalable-Machine-Learning/labs-progress/data/cs190/neuro.txt
c.NotebookApp.notebook_dir = '/vagrant'
So maybe notebook have to use explicit path of current user home directory? Something like:from os.path import expanduser
home = expanduser("~")
"which is recommended when they key doesn't change"
I am trying to use your VM which is similar to the one created here, but I am getting an ssh timeout error. Are you familiar to why this is the case?
Hello
Im trying to go though the 3rd week lab, however it seems to be a problem with the proportions by which the data is partitioned regarding train, validation and test. I'm using the supplied seed, along with the defined weights and i get a different number of examples within each set. Obviously, the following tests are sentenced to fail.
snippet:
weights = [.8, .1, .1]
seed = 42
raw_train_df, raw_validation_df, raw_test_df = raw_df.randomSplit(weights, seed)
n_train = raw_train_df.cache().count()
n_val = raw_validation_df.cache().count()
n_test = raw_test_df.cache().count()
print n_train, n_val, n_test, n_train + n_val + n_test
raw_df.show(1)
output:
80115 9955 9930 100000
+--------------------+
| text|
+--------------------+
|0,1,1,5,0,1382,4,...|
+--------------------+
only showing top 1 row
the same thing happens in lab 2 linear regression
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.