Comments (5)
It took 7 hours on r6i.24xlarge. [96 vCPU and 768 GiB Memory].
Output: extracted.csv file size is 20 GB for 7895800 rows × 28 columns
Hope that info help someone.
Thanks.
from tsfresh.
Just tried without the n_jobs parameter. Which seems like utilising 50% of the available CPU by default. I'm using r6i.24xlarge at the moment. It comes with 96 vCPU and 768 GiB Memory
I can confirm, tsfresh not utilising the CPU well.
Most of the time, CPU utilisation stays below 12.5%.
More than 87.5% of the CPU stays idle always. Also as you can see below, I have sufficient memory.
top - 10:13:46 up 31 min, 2 users, load average: 11.62, 13.63, 16.19
Tasks: 813 total, 11 running, 381 sleeping, 0 stopped, 0 zombie
%Cpu(s): 11.4 us, 0.0 sy, 0.0 ni, 88.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 78017574+total, 52574860+free, 25019536+used, 4231816 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 52583718+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18604 root 20 0 8117352 5.1g 21312 R 100.0 0.7 14:01.11 python3
18606 root 20 0 7997288 4.9g 21312 R 100.0 0.7 13:18.86 python3
18489 root 20 0 94.6g 91.1g 99204 S 100.0 12.2 24:27.62 python3
18562 root 20 0 7333480 4.3g 21312 R 100.0 0.6 17:16.38 python3
18563 root 20 0 7694440 4.7g 21312 R 100.0 0.6 18:44.78 python3
18565 root 20 0 7058792 4.1g 21312 R 100.0 0.5 15:54.73 python3
18567 root 20 0 7213672 4.2g 21312 R 100.0 0.6 16:42.11 python3
18568 root 20 0 7526248 4.5g 21312 R 100.0 0.6 17:57.02 python3
18569 root 20 0 6890088 3.9g 21312 R 100.0 0.5 15:17.46 python3
18573 root 20 0 6727272 3.7g 21312 R 100.0 0.5 14:29.92 python3
18608 root 20 0 7791208 4.7g 21312 R 100.0 0.6 12:28.75 python3
14 root 20 0 0 0 0 I 0.4 0.0 0:00.32 rcu_sched
18837 ec2-user 20 0 171848 5064 3704 R 0.4 0.0 0:00.81 top
1 root 20 0 191096 5472 3900 S 0.0 0.0 0:01.72 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H-kb
7 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker/0:1-rcu
8 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker/u192:0-
10 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
11 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_tasks_rude_
12 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_tasks_trace
13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
15 root rt 0 0 0 0 S 0.0 0.0 0:00.01 migration/0
16 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/0
17 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/1
18 root rt 0 0 0 0 S 0.0 0.0 0:00.24 migration/1
from tsfresh.
This line seems like the issue.
return_df = data.pivot(result)
https://github.com/blue-yonder/tsfresh/blob/main/tsfresh/feature_extraction/extraction.py#L304
from tsfresh.
How many features were extracted? Facing the same problem, long time series data(only 3 ids) memory overflows in 16GB laptop.
from tsfresh.
Thanks @dsstex for the analysis and the posted numbers (and really sorry for the long delay).
How did you know that pivoting is the issue? Have you tried running without it?
from tsfresh.
Related Issues (20)
- Error when importing tsfresh HOT 3
- [BUG] Python version has no upper bound HOT 2
- Memory problem, how to fix it? HOT 8
- Could not import matrixprofile but it was required? HOT 6
- shuffle_transfer failed during shuffle c1992000bcde9deee65a2332ab4614a2 HOT 3
- Can Not Rolling With Binary Feature HOT 2
- tsfresh 0.20.0: TypeError exception with Pandas 2.0.0 HOT 3
- support of python 3.10 HOT 2
- Debugging tsfresh methods from Notebook freezes whole process HOT 4
- Example Provided in doc doesn't work HOT 1
- Quick start example fails due to TypeError in Setting MultiIndex HOT 1
- During the feature extraction process, how could I set the param from a subset of another columns. HOT 1
- TypeError: acf() got an unexpected keyword argument 'unbiased' HOT 11
- 05 Timeseries Forecasting (multiple ids).ipynb doesn't work as written HOT 2
- Problem with the function extract_features HOT 7
- Feature calculator return type documentation HOT 2
- extract_relevant_features hangs on specific data HOT 1
- Why make_forecasting_frame does not have min_timeshift argument? HOT 1
- Can't get Time-Based Custom Feature to Work HOT 3
- Custom features not considered when also `kind_to_fc_parameters` is supplied. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tsfresh.