GithubHelp home page GithubHelp logo

contiguous_pytorch_params's People

Contributors

philjd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

contiguous_pytorch_params's Issues

Made a pypi release of your repo

Hey there @PhilJd,

Terrific work with this neat trick ๐Ÿ‘Œ
I've been using it for a while and it's really helpful. As I have to make a release on another project that depends on your package, a non direct URL dependency is required. So I made a pypi release of your work over here: https://pypi.org/project/contiguous-params/1.0.0/

I only updated the classifiers and the requirements but the rest is identical to your current master branch!

I figured I'd let you know :)

how to use it when using apex

when i try using contiguous params, it happens loss=NAN if i use apex O2 mode.
here is my code

parameters = ContiguousParams(network.parameters())
optimizer = torch.optim.SGD(parameters.contiguous())
network, optimizer = amp.initialize(network, optimizer, opt_level='O2')
network = torch.nn.parallel.DistributedDataParallel(network, device_ids=device_ids)

reduce time not obviously

Hi, thank you for your meaningful job. I change the model in benchmark.py and set batch size 64.

device = "cuda"
# model = nn.Sequential(*[nn.Linear(128, 128) for i in range(100)]).to(device)
model = LResNet18E().to(device)
print("Number of parameters: ", sum(p.numel() for p in model.parameters()))
x = torch.randn(64, 3, 224, 224).to(device)
y = torch.ones(64).to(device)
y = y.long()
model_copies = [deepcopy(model) for _ in range(2)]
# Benchmark original.
parameters = list(model_copies[0].parameters())
optimizer = torch.optim.SGD(parameters, lr=1e-3)
benchmark_model(model_copies[0], optimizer, parameters, "original_params")
# Benchmark contiguous.
parameters = ContiguousParams(model_copies[1].parameters())
optimizer = torch.optim.SGD(parameters.contiguous(), lr=1e-3)
benchmark_model(model_copies[1], optimizer, parameters.contiguous(),
                "contiguous_params")
# Ensure the parameter buffers are still valid.
parameters.assert_buffer_is_valid()

the print result is dissatisfactory.
Number of parameters: 11055816
Mean step time: 2.763813018798828 seconds. (Autograd profiler enabled: False)
Mean step time: 2.8434643745422363 seconds. (Autograd profiler enabled: True)
Mean step time: 2.057171106338501 seconds. (Autograd profiler enabled: False)
Mean step time: 2.271756172180176 seconds. (Autograd profiler enabled: True)

when the batch size is 128:
Number of parameters: 11055816
Mean step time: 4.793098592758179 seconds. (Autograd profiler enabled: False)
Mean step time: 4.904996871948242 seconds. (Autograd profiler enabled: True)
Mean step time: 4.080202102661133 seconds. (Autograd profiler enabled: False)
Mean step time: 4.198964834213257 seconds. (Autograd profiler enabled: True)

What's wrong in my code? Thanks for your answer.

How to handle the parameter groups when defining the optimizer?

Sometimes we define the optimizer using the dicts for parameters group instead of directly calling the parameters(), I just wonder how to handle this condition? For example,
train_params = [{'params': self.net.get_train_params(), 'lr': cfg.lr}]
self.optimizer = torch.optim.Adam(train_params, lr=cfg.lr, betas=(0.5, 0.999))

Very much appreciate it.

train with DDP is slower

When I just add params.py to my code, it run slower inTitan _XP. I found that when i define optimizer after DDP and it will keep the time almost same, but when I followed the readme.py and define optimizer before DDP, it run slower. What's wrong in my code? Thanks for your answer.
pp_imagenet.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.