intel-analytics / bigdl-tutorials Goto Github PK
View Code? Open in Web Editor NEWStep-by-step Deep Leaning Tutorials on Apache Spark using BigDL
Step-by-step Deep Leaning Tutorials on Apache Spark using BigDL
There is no bias term in linear_regression.ipynb, deep_feed_forward_neural_network.ipynb, cnn.ipynob ...
I don't understand why?
For example, in the BigDL tutorials:
model.add(Linear(n_input, n_hidden_1).set_name('mlp_fc1'))
model.add(ReLU())
When these two lines implemened in Tensorflow or Pytorch, it will look like:
W = weight_variable([input_dim, output_dim])
b = bias_variable([output_dim])
logits=tf.nn.relu(tf.matmul(input_placeholder, W) + b)
BigDL 0.1.0 in the intel-analytics/BigDL repo is different from the BigDL provided in the intel-analytics/BigDL-Tutorials repo.
So when I use the BigDL in the intel-analytics/BigDL repo, there are some problems in start_notebook.sh line 17 && 18 && 19:
Because, there is no "bigdl-SPARK_2.1-0.1.0-jar-with-dependencies.jar" in the intel-analytics/BigDL repo.
It should be changed to "bigdl-0.1.0-jar-with-dependencies.jar"
I want to use LSTM network for a text data.
I have the data in the following form. A snapshot is:
label sequence
1.0 0 0 this is an example
How do I prepare the data to be fed to the LSTM Sequential() model such that input data is of the shape batch_size x seq_length x EMBEDDING_DIM
? Is there a way or example to do it as it is done in Keras using the Embedding() layer and embedding_weights (GloVe weights) inside it?
[as done here https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/]
ImportError: No module named 'matplotlib'
We need to add all dependencies to the requirements.
https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/rnn.py#L39
Change
model.add(Select(2, 28))
to
model.add(Select(2, -1))
By this way, we can emphasize that we select the last output of RNN layer.
Sample
should be construct just before the optimization and should not be used as the container for preprocessing.
I've set up all environment and I've successfully started Jupyter Notebook, however, when I opened the lstm.ipynb file and try to execute the commands, I failed at the very first command and it tells me "No module named utils", I tried to install python-utils cause I thought that's what I lacked. It didn't work, though.
Did I miss something in the tutorials or it has something to do with the code?
I want to achieve Parallel SGD on a Distributed environment on (YARN cluster).
I can run successfully everything up to the line below in the https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/bigdl_features/visualization.ipynb notebook.
# Boot training process
trained_model = optimizer.optimize()
This returns the following error. What am I doing wrong?
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<ipython-input-11-6b4b11e68280> in <module>()
15
16 # Boot training process
---> 17 trained_model = optimizer.optimize()
18 print "Optimization Done."
/miniconda/envs/spark2/lib/python2.7/site-packages/bigdl/optim/optimizer.pyc in optimize(self)
762 Do an optimization.
763 """
--> 764 jmodel = callJavaFunc(self.value.optimize)
765 from bigdl.nn.layer import Layer
766 return Layer.of(jmodel)
/miniconda/envs/spark2/lib/python2.7/site-packages/bigdl/util/common.pyc in callJavaFunc(func, *args)
632 gateway = _get_gateway()
633 args = [_py2java(gateway, a) for a in args]
--> 634 result = func(*args)
635 return _java2py(gateway, result)
636
/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p2832.251268/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
1131 answer = self.gateway_client.send_command(command)
1132 return_value = get_return_value(
-> 1133 answer, self.gateway_client, self.target_id, self.name)
1134
1135 for temp_arg in temp_args:
/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p2832.251268/lib/spark2/python/pyspark/sql/utils.py in deco(*a, **kw)
61 def deco(*a, **kw):
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
65 s = e.java_exception.toString()
/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p2832.251268/lib/spark2/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
317 raise Py4JJavaError(
318 "An error occurred while calling {0}{1}{2}.\n".
--> 319 format(target_id, ".", name), value)
320 else:
321 raise Py4JError(
Py4JJavaError: An error occurred while calling o452.optimize.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 9.0 failed 4 times, most recent failure: Lost task 1.3 in stage 9.0 (TID 148, ip-100.100.100.100, executor 1): java.lang.NullPointerException
at com.intel.analytics.bigdl.models.utils.ModelBroadcastImp.value(ModelBroadcast.scala:150)
at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$12$$anonfun$13.apply(DistriOptimizer.scala:586)
at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$12$$anonfun$13.apply(DistriOptimizer.scala:585)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$12.apply(DistriOptimizer.scala:585)
at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$12.apply(DistriOptimizer.scala:568)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1038)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
at org.apache.spark.rdd.RDD.count(RDD.scala:1158)
at com.intel.analytics.bigdl.optim.DistriOptimizer$.com$intel$analytics$bigdl$optim$DistriOptimizer$$initThreadModels(DistriOptimizer.scala:625)
at com.intel.analytics.bigdl.optim.DistriOptimizer.optimize(DistriOptimizer.scala:843)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at com.intel.analytics.bigdl.models.utils.ModelBroadcastImp.value(ModelBroadcast.scala:150)
at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$12$$anonfun$13.apply(DistriOptimizer.scala:586)
at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$12$$anonfun$13.apply(DistriOptimizer.scala:585)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$12.apply(DistriOptimizer.scala:585)
at com.intel.analytics.bigdl.optim.DistriOptimizer$$anonfun$12.apply(DistriOptimizer.scala:568)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1038)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.