mlfoundations / open-diffusion Goto Github PK
View Code? Open in Web Editor NEWSimple large-scale training of stable diffusion with multi-node support.
Simple large-scale training of stable diffusion with multi-node support.
Would be great to have (optional) model evaluation.
Possibilities:
It would be nice to be able to avoid the need of an internet connection as it is usually blocked in supercomputers.
Two things I observed that needs internet connection:
wandb.init
, so that could be an option in the yaml config file (by default would be False)export TRANSFORMERS_CACHE=<cache_folder
and export TRANSFORMERS_OFFLINE=1
, so this just needs to be documented. In my case, even after using those, I still got the following:[1] 2023-04-23 06:19:21 WARNING 'HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /CompVis/stable-diffusion-v1-4/resolve/main/scheduler/scheduler_config.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x14806d3bfbe0>, 'Connection to huggingface.co timed out. (connect timeout=10)'))' thrown while requesting HEAD https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/scheduler/scheduler_config.json
which happens at each validation/generation step, although it did not have any consequence, the job did not hang or anything, but I did not find a way to get rid of the message.
I might have missed it in the code, but I can't see whether we randomly drop the captions for classifier-free guidance (which is already used at inference).
Thanks @vramanuj for this really nice repo!
I started to experiment with it and had an issue with symlinks that are used to indicate the current pipeline folder.
In my case, the symlinks were dead because they are pointing to relative path, so resuming did not work.
Simply adding .absolute() to save_path in https://github.com/mlfoundations/open-diffusion/blob/main/train.py#L311
and https://github.com/mlfoundations/open-diffusion/blob/main/train.py#L509
makes it work fine.
Or did you do anything else that made it work anyway?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.