We launched a cluster with your image and ran the "lenet_memory" example with success.

MemoryData & JAR error about caffeonspark HOT 6 CLOSED

yahoo commented on July 2, 2024

MemoryData & JAR error

from caffeonspark.

Comments (6)

anfeng commented on July 2, 2024

Yes. We only support MemoryData right now.

What's content of your /data/mnist_memory_autoencoder_solver.prototxt? More specifically, I like to know its value for source_class

from caffeonspark.

mauricio-onoda commented on July 2, 2024

name: "MNISTAutoencoder"
layer {
name: "data"
type: "MemoryData"
top: "data"
include {
phase: TRAIN
}
transform_param {
scale: 0.0039215684
}
source_class: "com.yahoo.ml.caffe.LMDB"
memory_data_param {
source: "mnist_train_lmdb/"
batch_size: 64
channels: 1
height: 28
width: 28
share_in_parallel: false
}
}

This source_class is the same for phases TEST/stage test-on-train and TEST/stage test-on-test.

from caffeonspark.

anfeng commented on July 2, 2024

I suspect that you missed some spaces in your CLI. Please add a space char before all \s.

from caffeonspark.

mauricio-onoda commented on July 2, 2024

I found the problem! At second line in my CLI there was a space between file's names:

--files mnist_memory_autoencoder.prototxt, mnist_memory_autoencoder_solver.prototxt\

With a space after the comma, the error occurs.
After I removed the comma, my CLI worked.

However, we got another error:

16/03/30 12:49:12 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-172-31-2-118.eu->west-1.compute.internal, partition 0,PROCESS_LOCAL, 2216 bytes)
16/03/30 12:49:12 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, ip-172-31-2-117.eu->west-1.compute.internal, partition 1,PROCESS_LOCAL, 2216 bytes)
16/03/30 12:49:12 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-31-2->118.eu-west-1.compute.internal:39514 (size: 2.0 KB, free: 8.9 GB)
16/03/30 12:49:12 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-31-2->117.eu-west-1.compute.internal:58422 (size: 2.0 KB, free: 8.9 GB)
16/03/30 12:49:12 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, ip-172-31-2-117.eu->west-1.compute.internal): java.io.FileNotFoundException: >/root/CaffeOnSpark/data/mnist_memory_autoencoder.prototxt (No such file or directory)

The question is: the prototxt files must be exists in all workers (nodes)? If yes, how may I copy these files to workers?

Thanks again!

from caffeonspark.

mauricio-onoda commented on July 2, 2024

My mistake! I had used a version of mnist_memory_autoencoder_solver.prototxt with path at "net" parameter. After I removed the path, it worked.

from caffeonspark.

githubier commented on July 2, 2024

I have met the same error (NullPointerException) when I train other network(Caffenet), more detail see the #issue 217. I
I change the spark submit :
${CAFFE_ON_SPARK}/data/, so the spark submit is:
spark-submit --master ${MASTER_URL}
--files ${CAFFE_ON_SPARK}/data/solver.prototxt,${CAFFE_ON_SPARK}/data/train_val.prototxt
--conf spark.cores.max=${TOTAL_CORES}
--conf spark.task.cpus=${CORES_PER_WORKER}
--conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}"
--conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}"
--class com.yahoo.ml.caffe.CaffeOnSpark
${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar
-train
-features accuracy,loss -label label
-conf solver.prototxt
-clusterSize ${SPARK_WORKER_INSTANCES}
-devices 1
-connection ethernet
-model file:${CAFFE_ON_SPARK}/myself_caffenet.model
-output file:${CAFFE_ON_SPARK}/myself_result

the solver.prototxt and train_val.prototxt at the path: ${CAFFE_ON_SPARK}/data/,
and the error is:
17/01/11 20:49:34 ERROR caffe.DataSource$: source_class must be defined for input data layer:Data
Exception in thread "main" java.lang.NullPointerException
at com.yahoo.ml.caffe.CaffeOnSpark.train(CaffeOnSpark.scala:103)
at com.yahoo.ml.caffe.CaffeOnSpark$.main(CaffeOnSpark.scala:40)
at com.yahoo.ml.caffe.CaffeOnSpark.main(CaffeOnSpark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/01/11 20:49:34 INFO spark.SparkContext: Invoking stop() from shutdown hook

where is my error?

from caffeonspark.

MemoryData & JAR error about caffeonspark HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs