GithubHelp home page GithubHelp logo

Comments (5)

dsstex avatar dsstex commented on July 22, 2024 2

It took 7 hours on r6i.24xlarge. [96 vCPU and 768 GiB Memory].

Output: extracted.csv file size is 20 GB for 7895800 rows × 28 columns

Hope that info help someone.

Thanks.

from tsfresh.

dsstex avatar dsstex commented on July 22, 2024

Just tried without the n_jobs parameter. Which seems like utilising 50% of the available CPU by default. I'm using r6i.24xlarge at the moment. It comes with 96 vCPU and 768 GiB Memory

I can confirm, tsfresh not utilising the CPU well.

Most of the time, CPU utilisation stays below 12.5%.

More than 87.5% of the CPU stays idle always. Also as you can see below, I have sufficient memory.

top - 10:13:46 up 31 min,  2 users,  load average: 11.62, 13.63, 16.19
Tasks: 813 total,  11 running, 381 sleeping,   0 stopped,   0 zombie
%Cpu(s): 11.4 us,  0.0 sy,  0.0 ni, 88.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 78017574+total, 52574860+free, 25019536+used,  4231816 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 52583718+avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                 
18604 root      20   0 8117352   5.1g  21312 R 100.0  0.7  14:01.11 python3                                                                 
18606 root      20   0 7997288   4.9g  21312 R 100.0  0.7  13:18.86 python3                                                                 
18489 root      20   0   94.6g  91.1g  99204 S 100.0 12.2  24:27.62 python3                                                                 
18562 root      20   0 7333480   4.3g  21312 R 100.0  0.6  17:16.38 python3                                                                 
18563 root      20   0 7694440   4.7g  21312 R 100.0  0.6  18:44.78 python3                                                                 
18565 root      20   0 7058792   4.1g  21312 R 100.0  0.5  15:54.73 python3                                                                 
18567 root      20   0 7213672   4.2g  21312 R 100.0  0.6  16:42.11 python3                                                                 
18568 root      20   0 7526248   4.5g  21312 R 100.0  0.6  17:57.02 python3                                                                 
18569 root      20   0 6890088   3.9g  21312 R 100.0  0.5  15:17.46 python3                                                                 
18573 root      20   0 6727272   3.7g  21312 R 100.0  0.5  14:29.92 python3                                                                 
18608 root      20   0 7791208   4.7g  21312 R 100.0  0.6  12:28.75 python3                                                                 
   14 root      20   0       0      0      0 I   0.4  0.0   0:00.32 rcu_sched                                                               
18837 ec2-user  20   0  171848   5064   3704 R   0.4  0.0   0:00.81 top                                                                     
    1 root      20   0  191096   5472   3900 S   0.0  0.0   0:01.72 systemd                                                                 
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.02 kthreadd                                                                
    3 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 rcu_gp                                                                  
    4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 rcu_par_gp                                                              
    6 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/0:0H-kb                                                         
    7 root      20   0       0      0      0 I   0.0  0.0   0:00.00 kworker/0:1-rcu                                                         
    8 root      20   0       0      0      0 I   0.0  0.0   0:00.00 kworker/u192:0-                                                         
   10 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 mm_percpu_wq                                                            
   11 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_tasks_rude_                                                         
   12 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_tasks_trace                                                         
   13 root      20   0       0      0      0 S   0.0  0.0   0:00.00 ksoftirqd/0                                                             
   15 root      rt   0       0      0      0 S   0.0  0.0   0:00.01 migration/0                                                             
   16 root      20   0       0      0      0 S   0.0  0.0   0:00.00 cpuhp/0                                                                 
   17 root      20   0       0      0      0 S   0.0  0.0   0:00.00 cpuhp/1                                                                 
   18 root      rt   0       0      0      0 S   0.0  0.0   0:00.24 migration/1   

from tsfresh.

dsstex avatar dsstex commented on July 22, 2024

This line seems like the issue.

return_df = data.pivot(result)

https://github.com/blue-yonder/tsfresh/blob/main/tsfresh/feature_extraction/extraction.py#L304

from tsfresh.

b-y-f avatar b-y-f commented on July 22, 2024

How many features were extracted? Facing the same problem, long time series data(only 3 ids) memory overflows in 16GB laptop.

from tsfresh.

nils-braun avatar nils-braun commented on July 22, 2024

Thanks @dsstex for the analysis and the posted numbers (and really sorry for the long delay).
How did you know that pivoting is the issue? Have you tried running without it?

from tsfresh.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.