intel-analytics / zoo-tutorials Goto Github PK
View Code? Open in Web Editor NEWTutorials for Analytics Zoo
Tutorials for Analytics Zoo
Tried to use "sbt assembly" to build SimpleMlp, the jar is put under target/scala-2.11/ dir and the jar name is simplemlp-assembly-0.1.0-SNAPSHOT.jar
%env PYSPARK_PYTHON=/usr/bin/python3.5
%env PYSPARK_DRIVER_PYTHON=/usr/bin/python3.5
These environment variable should be removed, as user may not using python3.5 and even if they are using python3.5, the path may not be /usr/bin/python3.5
(e.g. in conda env).
If there are some specific cases where these should be, I think it is better to state it in the readme.
In orca pytorch notebook 6.2 #42 , we need to collect training accuracy/loss, validation accuracy/loss in each epoch. However, orca does not save the training/validation accuracy automatically. So I use a loop to train the model (1 epoch in each loop), to obtain and save the training accuracy. I did the same in orca keras, however, it does not seem to work here in pytorch.
In the notebook https://github.com/intel-analytics/zoo-tutorials/blob/8b705c9134337c78e93c72a82c5fd0a92ef3c879/orca/pytorch/6.2-problem.ipynb, I do
for i in range(1, num_epochs + 1):
est.set_tensorboard("./log/", "epoch_" + str(i))
print("\nfit ", i, "start\n")
est.fit(data=train_loader, epochs=1, validation_data=val_loader, batch_size=batch_size,
checkpoint_trigger=EveryEpoch())
print("\nfit ", i, "end\n")
print("Get training accuracy: ")
train_acc_tmp = est.evaluate(data=train_loader, batch_size=batch_size)
train_acc.append(train_acc_tmp["Top1Accuracy"])
train_loss_tmp = [_[1] for _ in est.get_train_summary("Loss")]
train_loss.append(sum(train_loss_tmp) / len(train_loss_tmp))
The result shows that after the first loop, the training does not continue anymore. The training process does not even started in loop 2:
fit 2 start
creating: createEveryEpoch
creating: createMaxEpoch
2021-03-08 16:44:48 INFO DistriOptimizer$:818 - caching training rdd ...
2021-03-08 16:45:13 INFO DistriOptimizer$:161 - Count dataset
Warn: jep.JepException: <class 'StopIteration'>
at jep.Jep.exec(Native Method)
at jep.Jep.exec(Jep.java:478)
at com.intel.analytics.zoo.common.PythonInterpreter$$anonfun$1.apply$mcV$sp(PythonInterpreter.scala:108)
at com.intel.analytics.zoo.common.PythonInterpreter$$anonfun$1.apply(PythonInterpreter.scala:107)
at com.intel.analytics.zoo.common.PythonInterpreter$$anonfun$1.apply(PythonInterpreter.scala:107)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2021-03-08 16:45:13 INFO DistriOptimizer$:165 - Count dataset complete. Time elapsed: 0.186302138s
Warn: jep.JepException: <class 'StopIteration'>
at jep.Jep.exec(Native Method)
at jep.Jep.exec(Jep.java:478)
at com.intel.analytics.zoo.common.PythonInterpreter$$anonfun$1.apply$mcV$sp(PythonInterpreter.scala:108)
at com.intel.analytics.zoo.common.PythonInterpreter$$anonfun$1.apply(PythonInterpreter.scala:107)
at com.intel.analytics.zoo.common.PythonInterpreter$$anonfun$1.apply(PythonInterpreter.scala:107)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2021-03-08 16:45:13 WARN DistriOptimizer$:167 - If the dataset is built directly from RDD[Minibatch], the data in each minibatch is fixed, and a single minibatch is randomly selected in each partition. If the dataset is transformed from RDD[Sample], each minibatch will be constructed on the fly from random samples, which is better for convergence.
2021-03-08 16:45:13 INFO DistriOptimizer$:173 - config {
computeThresholdbatchSize: 100
maxDropPercentage: 0.0
warmupIterationNum: 200
isLayerwiseScaled: false
dropPercentage: 0.0
}
2021-03-08 16:45:13 INFO DistriOptimizer$:177 - Shuffle data
2021-03-08 16:45:13 INFO DistriOptimizer$:180 - Shuffle data complete. Takes 1.56775E-4s
fit 2 end
I also tried to create both the data_loader
instances and the estimator inside the loop, which does make the training processes occur in each loop, but the result is like "restarting" the whole training process from time to time (the training accuracy is the same in each loop?)
As a reference, here is a notebook that does not train in a loop which works fine: https://github.com/intel-analytics/zoo-tutorials/blob/8b705c9134337c78e93c72a82c5fd0a92ef3c879/orca/pytorch/6.2-understanding-recurrent-neural-networks.ipynb.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.