GithubHelp home page GithubHelp logo

error when training with multiple GPUs : AttributeError: Can't pickle local object 'ExtractiveSummarizer.prepare_data.<locals>.longformer_modifier' about transformersum HOT 7 CLOSED

hhousen avatar hhousen commented on May 26, 2024
error when training with multiple GPUs : AttributeError: Can't pickle local object 'ExtractiveSummarizer.prepare_data..longformer_modifier'

from transformersum.

Comments (7)

HHousen avatar HHousen commented on May 26, 2024 1

It's good to hear that you've started training! I've seen this type of error before during abstractive summarization and I should be able to fix it relatively quickly. I'll have a fix in the next few days.

from transformersum.

moyid avatar moyid commented on May 26, 2024 1

@HHousen - it's training!

from transformersum.

moyid avatar moyid commented on May 26, 2024 1

sounds good, I'll try that.

from transformersum.

HHousen avatar HHousen commented on May 26, 2024

@moyid I believe that commit 597bc9d should fix the issue. Let me know if it works now.

from transformersum.

moyid avatar moyid commented on May 26, 2024

@HHousen - sorry one more issue -- I think my performance is slow because of num_workers -- I tried setting it through dataloader_num_workers based on the documentation but got error main.py: error: unrecognized arguments: --dataloader_num_workers 1

from transformersum.

HHousen avatar HHousen commented on May 26, 2024

@moyid The --dataloader_num_workers argument works only for abstractive summarization. The reason you cannot change this option for extractive summarization is because the DataLoaders are created from torch.utils.data.IterableDatasets. IterableDatasets replicate the same dataset object on each worker process. Thus, the replicas must be configured differently to avoid duplicated data. See the PyTorch documentation description of iterable style datasets and the IterableDataset docstring for more information.

The docstring gives two examples of how to split an IterableDataset workload across all workers. However, I have not implemented this into the library. Ideally, I would simply use a normal Dataset but I'm not certain how to use this properly since the entire dataset cannot be loaded into memory at once. I potentially could use Apache Arrow.

I was looking at how the huggingface/transformers seq2seq example deals with this problem. They use the Dataset class instead of IterableDataset by using the built-in python module linecache, which I have not heard of before. Implementing this will be a significant refactoring of the library's extractive data loading code. I have opened a new issue for it (#27). The linecache module looks promising.

from transformersum.

HHousen avatar HHousen commented on May 26, 2024

Also @moyid, to determine if the number of workers is actually the problem you can train with the --profiler argument. This will output how long certain functions took to run once training is complete. To speed up training you can only train on a percentage of the dataset using the --overfit_pct argument (for example, --overfit_pct 0.001 for 0.1% of the data) You can find more info on the pytorch-lightning profiler documentation.

from transformersum.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.