GithubHelp home page GithubHelp logo

intel-analytics / bigdl-tutorials Goto Github PK

View Code? Open in Web Editor NEW
208.0 208.0 123.0 12.89 MB

Step-by-step Deep Leaning Tutorials on Apache Spark using BigDL

Jupyter Notebook 98.78% Shell 1.02% Python 0.20%

bigdl-tutorials's People

Contributors

ashizhao avatar chengxuhawkwood avatar cmusjtuliuyuan avatar dding3 avatar glorysdj avatar jason-dai avatar jenniew avatar le-zheng avatar megaspoon avatar neozhangjianyu avatar qiuxin2012 avatar sfblackl-intel avatar wzhongyuan avatar yangw1234 avatar yiheng avatar zhichao-li avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bigdl-tutorials's Issues

Why there is no bias term?

There is no bias term in linear_regression.ipynb, deep_feed_forward_neural_network.ipynb, cnn.ipynob ...
I don't understand why?

For example, in the BigDL tutorials:
model.add(Linear(n_input, n_hidden_1).set_name('mlp_fc1'))
model.add(ReLU())
When these two lines implemened in Tensorflow or Pytorch, it will look like:
W = weight_variable([input_dim, output_dim])
b = bias_variable([output_dim])
logits=tf.nn.relu(tf.matmul(input_placeholder, W) + b)

BigDL version

BigDL 0.1.0 in the intel-analytics/BigDL repo is different from the BigDL provided in the intel-analytics/BigDL-Tutorials repo.

So when I use the BigDL in the intel-analytics/BigDL repo, there are some problems in start_notebook.sh line 17 && 18 && 19:
Because, there is no "bigdl-SPARK_2.1-0.1.0-jar-with-dependencies.jar" in the intel-analytics/BigDL repo.
It should be changed to "bigdl-0.1.0-jar-with-dependencies.jar"

Using Word Vectors via the Embedding Layer

I want to use LSTM network for a text data.

I have the data in the following form. A snapshot is:

label            sequence
1.0              0 0 this is an example

How do I prepare the data to be fed to the LSTM Sequential() model such that input data is of the shape batch_size x seq_length x EMBEDDING_DIM? Is there a way or example to do it as it is done in Keras using the Embedding() layer and embedding_weights (GloVe weights) inside it?
[as done here https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/]

One line bug in RNN tutorial

rnn
In the build_model function's 4th and 8th line, the 'n_hidden' should be 'hidden_size'
In the 5th line, the 'n_input' should be 'input_size'

Linear initial

initial
When initializing the Linear layer, we always initialize the bias term to 0. However, according to this output picture, it doesn't work well.

Refactor dasta preprocessing

Sample should be construct just before the optimization and should not be used as the container for preprocessing.

No module named utils when running LSTM eample

I've set up all environment and I've successfully started Jupyter Notebook, however, when I opened the lstm.ipynb file and try to execute the commands, I failed at the very first command and it tells me "No module named utils", I tried to install python-utils cause I thought that's what I lacked. It didn't work, though.
Did I miss something in the tutorials or it has something to do with the code?
image

optimizer.optimize() causes java.lang.NullPointerException

I can run successfully everything up to the line below in the https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/bigdl_features/visualization.ipynb notebook.

# Boot training process
trained_model = optimizer.optimize()

This returns the following error. What am I doing wrong?

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-11-6b4b11e68280> in <module>()
     15 
     16 # Boot training process
---> 17 trained_model = optimizer.optimize()
     18 print "Optimization Done."

/miniconda/envs/spark2/lib/python2.7/site-packages/bigdl/optim/optimizer.pyc in optimize(self)
    762         Do an optimization.
    763         """
--> 764         jmodel = callJavaFunc(self.value.optimize)
    765         from bigdl.nn.layer import Layer
    766         return Layer.of(jmodel)

/miniconda/envs/spark2/lib/python2.7/site-packages/bigdl/util/common.pyc in callJavaFunc(func, *args)
    632     gateway = _get_gateway()
    633     args = [_py2java(gateway, a) for a in args]
--> 634     result = func(*args)
    635     return _java2py(gateway, result)
    636 

/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p2832.251268/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1131         answer = self.gateway_client.send_command(command)
   1132         return_value = get_return_value(
-> 1133             answer, self.gateway_client, self.target_id, self.name)
   1134 
   1135         for temp_arg in temp_args:

/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p2832.251268/lib/spark2/python/pyspark/sql/utils.py in deco(*a, **kw)
     61     def deco(*a, **kw):
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:
     65             s = e.java_exception.toString()

/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p2832.251268/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    317                 raise Py4JJavaError(
    318                     "An error occurred while calling {0}{1}{2}.\n".
--> 319                     format(target_id, ".", name), value)
    320             else:
    321                 raise Py4JError(

Py4JJavaError: An error occurred while calling o452.optimize.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 9.0 failed 4 times, most recent failure: Lost task 1.3 in stage 9.0 (TID 148, ip-100.100.100.100, executor 1): java.lang.NullPointerException
	at com.intel.analytics.bigdl.models.utils.ModelBroadcastImp.value(ModelBroadcast.scala:150)
	at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$12$$anonfun$13.apply(DistriOptimizer.scala:586)
	at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$12$$anonfun$13.apply(DistriOptimizer.scala:585)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.immutable.Range.foreach(Range.scala:160)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
	at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$12.apply(DistriOptimizer.scala:585)
	at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$12.apply(DistriOptimizer.scala:568)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1038)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
	at org.apache.spark.rdd.RDD.count(RDD.scala:1158)
	at com.intel.analytics.bigdl.optim.DistriOptimizer$.com$intel$analytics$bigdl$optim$DistriOptimizer$$initThreadModels(DistriOptimizer.scala:625)
	at com.intel.analytics.bigdl.optim.DistriOptimizer.optimize(DistriOptimizer.scala:843)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
	at com.intel.analytics.bigdl.models.utils.ModelBroadcastImp.value(ModelBroadcast.scala:150)
	at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$12$$anonfun$13.apply(DistriOptimizer.scala:586)
	at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$12$$anonfun$13.apply(DistriOptimizer.scala:585)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.immutable.Range.foreach(Range.scala:160)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
	at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$12.apply(DistriOptimizer.scala:585)
	at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$12.apply(DistriOptimizer.scala:568)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1038)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	... 1 more

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.