Hi, firstly, apologize in advance for using bug report instead of Discussion and featu

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Please take a look on <a class="issue-link js-issue-link" data-error-text="Failed to l

Too slow. GPU support please about tsfresh HOT 24 OPEN

jarlva commented on August 24, 2024

Too slow. GPU support please

from tsfresh.

Comments (24)

dom-white commented on August 24, 2024 5

It may be possible to get a decent speed up without GPU support.

As tsfresh uses Parallelization by default, this can cause perfomance issues if using the underlying python modules like SciPy and Scikit-learn which also (by default) attempt to distribute load between all processor cores when they drop down into c libraries.

This can lead to severe over provisioning, where processors spend most of their time context switching rather than doing useful work.

I was recently looking into performance issues with our own python notebook that we implemented multiprocessing on, and noticed that by forcing the underlying libraries to remain single threaded I saw a massive speed increase when using the multiprocessing module https://docs.python.org/3/library/multiprocessing.html

I then enforced the same changes for my tsfresh notebook and it went from taking around 7 minutes to feature extract each of my time series data files (using efficient parameters) to just 16 seconds!
Admittedly this may be an extreme saving example as I was running this on a sever with ~100 cores, so you're mileage may vary.

To get the underlying libraries to stay single core you need to do the following exports BEFORE starting the python environment you are using, otherwise there will be no difference:

export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1

Here are a couple of links where I found some of this useful info:
https://thomasjpfan.github.io/parallelism-python-libraries-design/
https://docs.dask.org/en/stable/array-best-practices.html?highlight=OMP_NUM_THREADS#avoid-oversubscribing-threads

If this does help people, then it may be worth updating some part the documentaion to reflect this configuration adjustment.

from tsfresh.

adhoc-researcher commented on August 24, 2024 3

+1. Have a 3 million row dataframe, a 16 hour wait time, and 4 V100s sitting idle :)

from tsfresh.

SoCool1345 commented on August 24, 2024 2

os.environ["OMP_NUM_THREADS"] = "1"
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["OPENBLAS_NUM_THREADS"] = "1"
Add the above code and it can work in win10

from tsfresh.

kempa-liehr commented on August 24, 2024 1

Thanks for the suggestion of using cudf. I will have a look into this package.

from tsfresh.

nils-braun commented on August 24, 2024 1

@beckernick - from our experience, basically everything with the marker high_comp_cost in https://github.com/blue-yonder/tsfresh/blob/main/tsfresh/feature_extraction/feature_calculators.py has a bad performance behavior if the size of the timeseries starts to grow.
If users want to perform only those feature calculators with a faster runtime, we recommend using the EfficientFCParameters, which removes those.

from tsfresh.

dom-white commented on August 24, 2024 1

I think it shold be just the case of running those three lines in your shell, before invoking whatever python environment you are running tsfresh in.
For me I am using Jupyter lab inside a docker environment, so I have added the extra environment variables to the docker compose yml file that controls it.
I think you could use a package like python-dotenv to set environment variables for your python enviroment.

from tsfresh.

dom-white commented on August 24, 2024 1

Hi @nils-braun, I managed to get this going on my 2nd attempt directly within jupyterlabs, so I have had a go at updating the documentation and have created a pull request for it

from tsfresh.

dom-white commented on August 24, 2024 1

If we can make this a default feature it will be a major upgrade for the package!! Thanks man

Unfortunaltely I think the only way of enabling this is via a user adding the environment vairables as shown in the documentation pull request I added. If the envirornment variables were set within tsfresh, tsfresh itself would have to do this before any other module was imported in, so it would still require documentation to the user that they would need to import tsfresh in first before anything else. I think how it is documented may be the best solution.
It may be worth adding a link to the new section from one of the pages most likely to be read, like the FAQ page to make sure it gets seen. e.g. FAQ: Is there anything I can do to speed up processing?

from tsfresh.

arturdaraujo commented on August 24, 2024

@EQU1 gpu would be great but I recently tested tsfresh on Linux and it was 35x times faster. I can't say why this happened because most of my code run 1.2 to 1.3 times faster on Linux on average. I used Ubuntu on WSL.

from tsfresh.

arturdaraujo commented on August 24, 2024

Please take a look on #972

from tsfresh.

nils-braun commented on August 24, 2024

Thank you all for your input @jarlva, @rushatrai and @arturdaraujo !
Sorry for the delayed (or even no) responses to your requests in the last times.

I personally do not have the bandwidth to implement this feature myself, but we are welcoming any kind of contributions (this is how open source works, please note that we do not need to be the only one doing the implementations ;-)). If one of you has a bit of experience with cudf (or any other package in this context) and would like to contribute parts or a full implementation, we are very happy to hear about this and collaborate! A GPU implementation would definitely be very nice to have.

from tsfresh.

arturdaraujo commented on August 24, 2024

Ideally I think the first step would be to implement numba or cython for a speed up

from tsfresh.

beckernick commented on August 24, 2024

Would someone be able to share a reproducible example of the code they're running that they'd like to be able to run faster (with GPUs or otherwise)? My recollection is that a few of the operations took up most of the time when I've used tsfresh in the past, but I don't know if my experience was representative.

It would be great to document examples that illustrate the bottlenecks.

from tsfresh.

aurora5161 commented on August 24, 2024

@beckernick - from our experience, basically everything with the marker high_comp_cost in https://github.com/blue-yonder/tsfresh/blob/main/tsfresh/feature_extraction/feature_calculators.py has a bad performance behavior if the size of the timeseries starts to grow. If users want to perform only those feature calculators with a faster runtime, we recommend using the EfficientFCParameters, which removes those.

Hi, sorry to disturb, recently I used tsfresh in time series data. Because my data is large(more than 10 million rows),it can not run on my computer. I tried to modify code to fit on GPU, but it failed, such as error info "cudf.core.series can not be used in numpy of fft function". So I have a question, Can we use spark-rapids that leverage GPUs to accelerate tsfresh and do not need to modify the code, because tsfresh can run on spark?

from tsfresh.

beckernick commented on August 24, 2024

The challenge with using GPUs here is that much of the work is happening in the user-defined functions (UDFs) mentioned above that are applied on the DataFrame Groupby objects. And these specific UDFs happen to be ones that can't be translated to run on the GPU "as is". Using Spark RAPIDS or cuDF would allow you to accelerate the dataframe operations, but even if you could smoothly pass the GPU dataframes around inside tsfresh you'd still be bottlenecked on the UDFs running on your CPU(s).

It may be possible to write the computationally expensive UDFs to use the GPU and get a speedup, but it would likely require a rewrite of the functions from first principles.

from tsfresh.

arturdaraujo commented on August 24, 2024

Gpu would be complicated to implement gpu guys. the next step here would be to implement numba. 10x to 20x speed up is a significant change...

I already implemented my version of tafresh using numba for minimal functions. Numba loves loops so I imagine it can even be above 20x speed up

from tsfresh.

arturdaraujo commented on August 24, 2024

Can you show more code of how that works? Like a full script on applying this

from tsfresh.

dom-white commented on August 24, 2024

Just to clairfy that you don't need an uber server to take advantage of this. If I stop the over provisioning from happening even on my laptop which is running tsfresh inside a linux virtual machine with only access to half my cpu cores (I have a core i7 machine), I still saw a 6.5x improvement:

original:

Feature Extraction: 100%|¦¦¦¦¦¦¦¦¦¦| 40/40 [08:40<00:00, 13.02s/it]

with forcing libraries to single core:

Feature Extraction: 100%|¦¦¦¦¦¦¦¦¦¦| 40/40 [01:20<00:00,  2.02s/it]

from tsfresh.

nils-braun commented on August 24, 2024

Hi @dom-white - very good! What you describe makes a lot of sense 👍. It might be even worse in tsfresh compared to other use-cases because we call so many different C functions (because there are many feature extractors) and therefore have the context switching even more often (?).
Would you like to add this to the documentation? I definitely think this is worth mentioning. Or do you think it might even make sense to set this by default (only for multiprocessing, because I assume this makes it slower in single processing)?

from tsfresh.

dom-white commented on August 24, 2024

Hi @nils-braun, yes I think it would be helpful to add some information on this to the documentaion.
I think it is a bit tricky as people run tsfresh under different environments and os's, and I have only got this to work by setting these envirorment variables before launching the python envrionment. So before adding, I could look into the simplest most universal way of setting these enviroment variables easily

from tsfresh.

arturdaraujo commented on August 24, 2024

If we can make this a default feature it will be a major upgrade for the package!! Thanks man

from tsfresh.

YamaByte commented on August 24, 2024

I tried @dom-white's method and it did indeed sped tsfresh up quite drastically (21M rows, 3 features)!

However, do take note for those who wish to use this workaround: remember to revert the environment variables (those 3 that were listed) once you're done extracting if you intend to do some machine learning.

Immediately after my tsfresh feature extraction step (within the same kernel session), I was grid searching through XGBoost classifier hyper-parameters on my GPU with the extracted features. However, the training run time was significantly slower than I expected. Upon inspecting my GPU utilization, I noticed that it was oscillating between 0 to 100% utilization at ~30s intervals when it should be constantly at 100% utilization until the grid search algorithm ends.

Turns out, it was the CPU causing the bottleneck, seen from an extremely low CPU utilization (<10%). I do know that many GPU-enabled machine learning algorithms fall back on CPU for certain intermediary computations (e.g. loss calculation for Keras neural networks), and likely XGBoost is doing the same somewhere along its pipeline. As such, the thread limitations proposed significantly affected these operations.

After saving my tsfresh features locally to drive, rebooting my python environment without those limitations, and restarting the machine learning pipeline, my training times were as fast as before: from ~36s per model -> 4s (significant if you are brute-force building and grid searching through 1350 models).

Hope this helps anyone out there who wants to speed up both feature extraction and model training!

from tsfresh.

dom-white commented on August 24, 2024

I tried @dom-white's method and it did indeed sped tsfresh up quite drastically (21M rows, 3 features)!

However, do take note for those who wish to use this workaround: remember to revert the environment variables (those 3 that were listed) once you're done extracting if you intend to do some machine learning.

Immediately after my tsfresh feature extraction step (within the same kernel session), I was grid searching through XGBoost classifier hyper-parameters on my GPU with the extracted features. However, the training run time was significantly slower than I expected. Upon inspecting my GPU utilization, I noticed that it was oscillating between 0 to 100% utilization at ~30s intervals when it should be constantly at 100% utilization until the grid search algorithm ends.

Turns out, it was the CPU causing the bottleneck, seen from an extremely low CPU utilization (<10%). I do know that many GPU-enabled machine learning algorithms fall back on CPU for certain intermediary computations (e.g. loss calculation for Keras neural networks), and likely XGBoost is doing the same somewhere along its pipeline. As such, the thread limitations proposed significantly affected these operations.

After saving my tsfresh features locally to drive, rebooting my python environment without those limitations, and restarting the machine learning pipeline, my training times were as fast as before: from ~36s per model -> 4s (significant if you are brute-force building and grid searching through 1350 models).

Hope this helps anyone out there who wants to speed up both feature extraction and model training!

That's a good point. I had a notebook purely for tsfresh feature extraction so did not encounter this issue. Thanks for highlighting its possible negative effect

from tsfresh.

nils-braun commented on August 24, 2024

Thanks for sharing with the community @YamaByte! Would you think it makes sense to add one sentence about this issue into the respective docs that @dom-white added? Happy to review your PR :)

from tsfresh.

Too slow. GPU support please about tsfresh HOT 24 OPEN

Comments (24)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs