GithubHelp home page GithubHelp logo

Comments (5)

meyerd avatar meyerd commented on September 26, 2024

Hi, I think I have the same problem:

I1102 13:22:46.576802 22488 net.cpp:106] Creating Layer w_traindata
I1102 13:22:46.576828 22488 net.cpp:411] w_traindata -> traindata
*** Aborted at 1446466976 (unix time) try "date -d @1446466976" if you are using GNU date ***
PC: @ 0x7f3a3f0fa414 __pthread_cond_wait
*** SIGTERM (@0x2ddc6e00005812) received by PID 22488 (TID 0x7f3a48104a40) from PID 22546; stack trace: ***
@ 0x7f3a46709d40 (unknown)
@ 0x7f3a3f0fa414 __pthread_cond_wait
@ 0x7f3a478e14b3 boost::condition_variable::wait()
@ 0x7f3a478e1770 caffe::BlockingQueue<>::peek()
@ 0x7f3a478118cd caffe::DataLayer<>::DataLayerSetUp()
@ 0x7f3a477f6bc3 caffe::BasePrefetchingDataLayer<>::LayerSetUp()
@ 0x7f3a478b0c55 caffe::Net<>::Init()
@ 0x7f3a478b1d05 caffe::Net<>::Net()
@ 0x7f3a478bff7a caffe::Solver<>::InitTestNets()
@ 0x7f3a478c0abd caffe::Solver<>::Init()
@ 0x7f3a478c0dc9 caffe::Solver<>::Solver()
@ 0x7f3a478da263 caffe::Creator_SGDSolver<>()
@ 0x40f13e caffe::SolverRegistry<>::CreateSolver()
@ 0x407860 train()
@ 0x4056e1 main
@ 0x7f3a466f4ec5 (unknown)
@ 0x405dcd (unknown)
@ 0x0 (unknown)
Terminated

I can't terminate the program on the command line with ^C, the above shows when killing the process.

from deepdetect.

beniz avatar beniz commented on September 26, 2024

Hey, in my experience this can only happen on the second call to /train, if the first one has failed. So usually you can still run fine once you've spotted the problem (most of the time something in prototxt file).

Though it looks the same error, if you are not falling in this exact case, it may well come from elsewhere (e.g. prototxt file). The thing to keep in mind is that Caffe does not return nicely on errors, it simply crashes, see BVLC/caffe#2976

DD's job is in part to catch and recover from these fatal errors but it may not always be possible. I still need a bit more time to clear this one out.

from deepdetect.

beniz avatar beniz commented on September 26, 2024

@meyerd you can try the above fix, but I cannot guarantee it does work for all Caffe error cases (some may not be recoverable).

from deepdetect.

meyerd avatar meyerd commented on September 26, 2024

@beniz Thanks, I found out, what was causing the hang: If you open the same lmdb file twice (say for TEST and TRAIN in the cases you don't specify the phase when to include what, then it gets openend twice implicitely), then the peek on the blockingqueue seems to hang. In my case I was just specifying a simple net without wanting to specify different datasets for train and test out of lazyness. If I just copy the same file and then specify two different LMDBs all works well.

I don't know if that is a bug (I suspect, because as far as I know LMDB should be parallel readable).

from deepdetect.

beniz avatar beniz commented on September 26, 2024

OK, thanks for the report. I've seen the same report somewhere in the Caffe issues. It definitely could be fixed though it is cumbersome to train & test from the same samples :)

from deepdetect.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.