Comments (5)
This issue refers the cell is at 2.2.3 (c) Apply triple-barrier method.
I found the fix and the reason that running this cell hangs on my windows set-up which is similar to the specs listed above.
- First, the simple fix: At line cpus = cpu_count() - 1, change to cpus=1.
- This will invoke single-thread execution for debugging [20.8] at def processJobs_(jobs)
- On my laptop this cell runs in a second or two in single-thread execution
- No hang. Problems solved.
- Reasons for the hang and broader fix are at a couple of levels (based on my setup)
a) cpus = cpu_count() - 1 may not return a value. It didn't when I was earlier running the Cell. (A ggl search will show that the behaviour of multiprocessing.cpu_count() is not reliable across a range of hardware & OS combos. )
b) getEvents calls up mpPandasObj for multi-processing and in mpPandasObj, the default value (if it gets called) for numThreads=24, which exceeds the specs above (CPU cores: 8).
- In this Notebook, at def mpPandasObj, numThreads could be changed to = 1.
- You will see in def mpPandasObj that "if numThreads==1:out=processJobs_(jobs)", it calls up processJobs_ in the following cell 20.8 for single-thread processing.
c) I have returned to another session after a computer restrart and found that cpus = cpu_count() - 1 does return the correct value of 7 cpus, However, now instead of hanging, a series of Errors are displayed in the Jupyter session console which each mention a module in the environment folder for the multiprocessing modules (Anaconda3\lib\multiprocessing\ in my current case.)
- This would indicate to me that the multiprocessing code from ch 20 that is included in this Notebook cannot be presumed to be matched to computer specs such as those in the original post above.
- This seems to be confirmed in at 20.5, page 309 of the book where it states "In this section, we will study one such engine, and once you understand the logic, you will be ready to develop your own, including all sorts of customized
properties." - In the meantime, for me, rather than stumble around with the code in ch 20, I will just set the code to run in single-thread mode.
- Debugging
a) I was able to detect this issue without too much trouble by debugging the getEvents()by running the def code one line at a time with the variable values in the state they have reached at 2.2.3(c).
b) I used the set-up that I have described in the Suggestions to my other issue post at "Bars notebook - possible corrections", particularly the QtConsole and the Variable Inspector, and saving data variables to CSV files for inspection as to what is going on.
I trust that this may assist BlackArbsCEO and others who what to run the code in the Labelling Notebook.
And to BlackArbsCEO, I again Thank you greatly for the sharing your implementations in the 2 notebooks. It has assisted greatly in understanding and contributes towards the possibility of applying the work from the book.
from adv_fin_ml_exercises.
sorry I don't use Windows because of issues like you're experiencing in addition to other headaches I have experienced in the past. I recommend learning how to use Ubuntu. It is much easier to diagnose and fix issues in my opinion.
from adv_fin_ml_exercises.
It definitely also seems like problems with the multi processing functions in chapter 20. I have actually seen errors like this on both Windows and Linux, using the library one can patch together from the code snippets in the book. Some tasks run pretty fine and gives a way better utilizaition of your machine than if only running single process. Others go into some kind of infinite loop, spawning more and more processes. Maybe check how many processes you have running, @Chetanbuye12.
I crashed a 72 core Linux server using the exact code, giving the mpPandasObj function numThreads=20, but just before crashing >500 python processes were running.
If some one has an idea what goes on in the and how the code in chapter 20 should be changed to fit different architectures, I would be very interested in hearing possible causes
from adv_fin_ml_exercises.
I looked into this further after continuing to get Errors with cpu = a value other other than 1
ERROR Traceback:
Process SpawnPoolWorker-3:
. . .
AttributeError: Can't get attribute 'expandCall' on <module 'main' (built-in)>
The posts below indicate that it arises because the Jupyter in Windows is in interactive mode and python multiprocessing does not work in windows in interactive mode.
One solution posed is to execute the multiprocessing by putting it into a script.py and calling the script up from the Notebook cell. I haven't implemented this "solution" / workaround. I have - as I wrote above - simply set "cpu = 1" which invokes single thread processing, which executes quite quickly enough with the dataset used in this exercise.
References:
ipython/ipython#10894
https://stackoverflow.com/questions/48593694/python-multiprocessing-returning-attributeerror-when-following-documentation-cod
https://stackoverflow.com/questions/45719956/python-multiprocessing-attributeerror-cant-get-attribute-abc
Testing environment for multiprocessing
There are scripts at the post below which can be run in your environment to test that multiprocessing is working.
I simply saved the script in the answer post as py file, opened it in VS Code, and F5 Run in Debug mode, though I changed the range(300) to 30 so shorten the exercise.
https://stackoverflow.com/questions/48660656/multiprocessing-python-3-6-on-windows-10-not-working
from adv_fin_ml_exercises.
@gahobbsau, the issues I have reported above, actually appears in scripts (.py files) with all code after imports put inside the 'main' block and the multi processing code put in a separate .py file. Putting the executing code in the 'main' block is needed to make the multi processing run in a script on a Windows box. I assume that is related to the workaround you mention above for notebooks.
Hence, for me, there is still something more subtle to understand when it comes to the multi processing library that can be put together by the snippets from chapter 20.
from adv_fin_ml_exercises.
Related Issues (20)
- Possible problem in main_mp/sequential bootstrap code
- clean_IVE_fut_prices.parquet is not provided. HOT 1
- Ch3 Notebook: Adjust the getBins function HOT 1
- Ch3 Notebook: Random Forest Model HOT 1
- Ch3 Notebook: Trend Follow Strategy HOT 2
- Speed improvements for sampling HOT 2
- Exercices HOT 1
- Meta-labelling HOT 24
- getWeights_FFD is incomplete! HOT 1
- Question: Chapter 16
- Problem 1.7 on pitfalls of Sharpe ratio HOT 3
- Tick, Volume HOT 1
- How useful are Featrues(but Time & Price itself) in LSTM deep Learning for Trading ? HOT 6
- Optimum way of Scaling for LSTM Regressor?
- NameError: name 'data_dir' is not defined HOT 1
- Strange Dataframe doesnt work with specific functions : HOT 1
- Features for meta-labeling
- Sequentializing data HOT 1
- Potential bug in PurkedKFold class HOT 2
- Code questions corrections and suggestions.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from adv_fin_ml_exercises.