Comments (5)
Hi, I think I have the same problem:
I1102 13:22:46.576802 22488 net.cpp:106] Creating Layer w_traindata
I1102 13:22:46.576828 22488 net.cpp:411] w_traindata -> traindata
*** Aborted at 1446466976 (unix time) try "date -d @1446466976" if you are using GNU date ***
PC: @ 0x7f3a3f0fa414 __pthread_cond_wait
*** SIGTERM (@0x2ddc6e00005812) received by PID 22488 (TID 0x7f3a48104a40) from PID 22546; stack trace: ***
@ 0x7f3a46709d40 (unknown)
@ 0x7f3a3f0fa414 __pthread_cond_wait
@ 0x7f3a478e14b3 boost::condition_variable::wait()
@ 0x7f3a478e1770 caffe::BlockingQueue<>::peek()
@ 0x7f3a478118cd caffe::DataLayer<>::DataLayerSetUp()
@ 0x7f3a477f6bc3 caffe::BasePrefetchingDataLayer<>::LayerSetUp()
@ 0x7f3a478b0c55 caffe::Net<>::Init()
@ 0x7f3a478b1d05 caffe::Net<>::Net()
@ 0x7f3a478bff7a caffe::Solver<>::InitTestNets()
@ 0x7f3a478c0abd caffe::Solver<>::Init()
@ 0x7f3a478c0dc9 caffe::Solver<>::Solver()
@ 0x7f3a478da263 caffe::Creator_SGDSolver<>()
@ 0x40f13e caffe::SolverRegistry<>::CreateSolver()
@ 0x407860 train()
@ 0x4056e1 main
@ 0x7f3a466f4ec5 (unknown)
@ 0x405dcd (unknown)
@ 0x0 (unknown)
Terminated
I can't terminate the program on the command line with ^C, the above shows when killing the process.
from deepdetect.
Hey, in my experience this can only happen on the second call to /train, if the first one has failed. So usually you can still run fine once you've spotted the problem (most of the time something in prototxt file).
Though it looks the same error, if you are not falling in this exact case, it may well come from elsewhere (e.g. prototxt file). The thing to keep in mind is that Caffe does not return nicely on errors, it simply crashes, see BVLC/caffe#2976
DD's job is in part to catch and recover from these fatal errors but it may not always be possible. I still need a bit more time to clear this one out.
from deepdetect.
@meyerd you can try the above fix, but I cannot guarantee it does work for all Caffe error cases (some may not be recoverable).
from deepdetect.
@beniz Thanks, I found out, what was causing the hang: If you open the same lmdb file twice (say for TEST and TRAIN in the cases you don't specify the phase when to include what, then it gets openend twice implicitely), then the peek on the blockingqueue seems to hang. In my case I was just specifying a simple net without wanting to specify different datasets for train and test out of lazyness. If I just copy the same file and then specify two different LMDBs all works well.
I don't know if that is a bug (I suspect, because as far as I know LMDB should be parallel readable).
from deepdetect.
OK, thanks for the report. I've seen the same report somewhere in the Caffe issues. It definitely could be fixed though it is cumbersome to train & test from the same samples :)
from deepdetect.
Related Issues (20)
- Inconsistent predictons using refinedet model HOT 12
- Memory leak on constant /predict requests HOT 8
- Refinedet Tensorrt prediction fails HOT 7
- Memory leak on compressed predict requests with oatpp HOT 7
- Different prediction with tensorrt on refinedet model for the version v0.18.0 HOT 3
- getting error while training, .solverstate HOT 23
- Chain predictions swapped between images HOT 2
- Simsearch query segfault when using IVF indexes, but not default/flat index HOT 6
- On object detect training call, missing either test or train list causes a segfault
- dd_client not find in this path anyone help HOT 2
- How do I do a face recognition using this? HOT 2
- DeepDetect full rewrite in Pure Java
- 'OCR' object has no attribute 'histogram_equalization' HOT 13
- "best: -1" in predict behaves differently in torch models HOT 2
- Torch v1.12 requires libcupti* but nvidia/cuda:11.6.0-cudnn8-runtime-ubuntu20.04 doesn't include it
- Race condition / pthread error when predicting
- I have error build xgboost HOT 1
- Using `true` or `false` instead of `1` or `0` for query params for status or labels returns a internal server error HOT 1
- Question about hosting the docker image HOT 4
- Graphics problem with tsne algorithm HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepdetect.