philjd / contiguous_pytorch_params Goto Github PK
View Code? Open in Web Editor NEWAccelerate training by storing parameters in one contiguous chunk of memory.
Accelerate training by storing parameters in one contiguous chunk of memory.
Hey there @PhilJd,
Terrific work with this neat trick ๐
I've been using it for a while and it's really helpful. As I have to make a release on another project that depends on your package, a non direct URL dependency is required. So I made a pypi release of your work over here: https://pypi.org/project/contiguous-params/1.0.0/
I only updated the classifiers and the requirements but the rest is identical to your current master branch!
I figured I'd let you know :)
when i try using contiguous params, it happens loss=NAN if i use apex O2 mode.
here is my code
parameters = ContiguousParams(network.parameters())
optimizer = torch.optim.SGD(parameters.contiguous())
network, optimizer = amp.initialize(network, optimizer, opt_level='O2')
network = torch.nn.parallel.DistributedDataParallel(network, device_ids=device_ids)
Hi, thank you for your meaningful job. I change the model in benchmark.py and set batch size 64.
device = "cuda"
# model = nn.Sequential(*[nn.Linear(128, 128) for i in range(100)]).to(device)
model = LResNet18E().to(device)
print("Number of parameters: ", sum(p.numel() for p in model.parameters()))
x = torch.randn(64, 3, 224, 224).to(device)
y = torch.ones(64).to(device)
y = y.long()
model_copies = [deepcopy(model) for _ in range(2)]
# Benchmark original.
parameters = list(model_copies[0].parameters())
optimizer = torch.optim.SGD(parameters, lr=1e-3)
benchmark_model(model_copies[0], optimizer, parameters, "original_params")
# Benchmark contiguous.
parameters = ContiguousParams(model_copies[1].parameters())
optimizer = torch.optim.SGD(parameters.contiguous(), lr=1e-3)
benchmark_model(model_copies[1], optimizer, parameters.contiguous(),
"contiguous_params")
# Ensure the parameter buffers are still valid.
parameters.assert_buffer_is_valid()
the print result is dissatisfactory.
Number of parameters: 11055816
Mean step time: 2.763813018798828 seconds. (Autograd profiler enabled: False)
Mean step time: 2.8434643745422363 seconds. (Autograd profiler enabled: True)
Mean step time: 2.057171106338501 seconds. (Autograd profiler enabled: False)
Mean step time: 2.271756172180176 seconds. (Autograd profiler enabled: True)
when the batch size is 128:
Number of parameters: 11055816
Mean step time: 4.793098592758179 seconds. (Autograd profiler enabled: False)
Mean step time: 4.904996871948242 seconds. (Autograd profiler enabled: True)
Mean step time: 4.080202102661133 seconds. (Autograd profiler enabled: False)
Mean step time: 4.198964834213257 seconds. (Autograd profiler enabled: True)
What's wrong in my code? Thanks for your answer.
TypeError: 'ContiguousParams' object is not iterable
Sometimes we define the optimizer using the dicts for parameters group instead of directly calling the parameters(), I just wonder how to handle this condition? For example,
train_params = [{'params': self.net.get_train_params(), 'lr': cfg.lr}]
self.optimizer = torch.optim.Adam(train_params, lr=cfg.lr, betas=(0.5, 0.999))
Very much appreciate it.
When I just add params.py to my code, it run slower inTitan _XP. I found that when i define optimizer after DDP and it will keep the time almost same, but when I followed the readme.py and define optimizer before DDP, it run slower. What's wrong in my code? Thanks for your answer.
pp_imagenet.txt
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.