Comments (2)
It's a yarn problem. Yarn scheduled all workers on the same physical node. You need to tell yarn stop doing that.
One trick we used was executor memory. Let's say each node has 16GB memory, you can set the following in your spark-submit.
--config spark.executor.memory = 9g
Now, no two workers can be on the same node, otherwise, you need 18GB memory. Of course, there will be only 7GB left on each node (maybe less due to overhead).
from caffeonspark.
It worked for me. Thanks a lot.
R.
from caffeonspark.
Related Issues (20)
- CaffeOnSpark CPU model failed but GPU model success(same scripts and same data) HOT 1
- Trouble in performing test with existing model, dataframe empty. HOT 2
- Could anyone help about build CaffeOnSpak while caffe-distri failed ? HOT 5
- Core dump failures HOT 3
- CaffeOnSpark use infiniband but Cannot find the address of another infiniband host. HOT 4
- Use infiniband HOT 2
- Infiniband not work, Help me
- Feature extraction mode running slow HOT 7
- DataLayer use data_param instead of memory_data_param HOT 2
- Parameter synchronization mode HOT 2
- Error: Exception in thread "AWT-EventQueue-0" java.lang.UnsatisfiedLinkError: /Applications/Alice 2.4.app/Contents/Required/lib/osx/libjogl_awt.jnilib: Library not loaded: /System/Library/Frameworks/JavaVM.framework/Libraries/libjawt.dylib
- err “java.lang.UnsupportedOperationException: empty.reduceLeft”
- Error running javah command: Error executing command line HOT 2
- hive java.io.filenotfoundexception system cannot find specified path
- Is CaffeOnSpark still being maintained and developed? HOT 2
- Attribute protoFile not valid
- Exception in thread "main" org.apache.spark.SparkException: Cannot load main class from JAR file:/data/lenet_memory_solver.prototxt,/data/lenet_memory_train_test.prototxt
- have anyone faced that cannot fine class google V3 when doing test HOT 2
- java.lang.RuntimeException: Error while encoding: java.lang.ArrayIndexOutOfBoundsException: 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from caffeonspark.