GithubHelp home page GithubHelp logo

asagi4 / comfyui-latent-modifiers Goto Github PK

View Code? Open in Web Editor NEW

This project forked from clybius/comfyui-latent-modifiers

0.0 0.0 0.0 119 KB

A repository of ComfyUI nodes which modify the latent during the diffusion process.

License: GNU General Public License v3.0

Python 100.00%

comfyui-latent-modifiers's Introduction

ComfyUI-Latent-Modifiers

A repository of ComfyUI nodes which modify the latent during the diffusion process.

Latent Diffusion Mega Modifier (sampler_mega_modifier.py)

Adds multiple parameters to control the diffusion process towards a quality the user expects.

  • Sharpness: utilizes code from Fooocus's sampling process to sharpen the noise in the middle of the diffusion process. This can lead to more perceptual detail, especially at higher strengths.

  • Tonemap: Clamps conditioning noise (CFG) using a user-chosen method, which can allow for the use of higher CFG values.

  • Rescale: Scales the CFG by comparing the standard deviation to the existing latent to dynamically lower the CFG.

  • Extra Noise: Adds extra noise in the middle of the diffusion process to conditioning, and do the inverse operation on unconditioning, if chosen.

  • Contrast: Adjusts the contrast of the conditioning, can lead to more pop-style results. Essentially functions as a secondary CFG slider for stylization, without changing subject pose and location much, if at all.

  • Combat CFG Drift: As we increase CFG, the mean will slightly drift away from 0. This subtracts the mean or median of the latent. Can lead to potentially sharper and higher frequency results, but may result in discoloration.

  • Divisive Norm: Normalizes the latent using avg_pool2d, and can reduce noisy artifacts, due in part to features such as sharpness.

  • Spectral Modulation: Converts the latent to frequencies, and clamps higher frequencies while boosting lower ones, then converts it back to an image latent. This effectively can be treated as a solution to oversaturation or burning as a result of higher CFG values, while not touching values around the median.

Tonemapping Methods Explanation:

  • Reinhard:

    Uses the reinhard method of tonemapping (from comfyanonymous' ComfyUI Experiments) to clamp the CFG if the difference is too strong.

    Lower tonemap_multiplier clamps more noise, and a lower tonemap_percentile will increase the calculated standard deviation from the original noise. Play with it!

  • Arctan:

    Clamps the values dynamically using a simple arctan curve. Link to interactive Desmos visualization.

    Recommended values for testing: tonemap_multiplier of 5, tonemap_percentile of 90.

  • Quantile:

    Clamps the values using torch.quantile for obtaining the highest magnitudes, and clamping based on the result.

    Closer to 100 percentile == stronger clamping. Recommended values for testing: tonemap_multiplier of 1, tonemap_percentile of 99.

  • Gated:

    Clamps the values using torch.quantile, only if above a specific floor value, which is set by tonemapping_multiplier. Clamps the noise prediction latent based on the percentile.

    Closer to 100 percentile == stronger clamping, lower tonemapping_multiplier == stronger clamping. Recommended values for testing: tonemap_multiplier of 0.8-1, tonemap_percentile of 99.995.

  • CFG-Mimic:

    Attempts to mimic a lower or higher CFG based on tonemapping_multiplier, and clamps it using tonemapping_percentile with torch.quantile.

    Closer to 100 percentile == stronger clamping, lower tonemapping_multiplier == stronger clamping. Recommended values for testing: tonemap_multiplier of 0.33-1.0, tonemap_percentile of 100.

Contrast Explanation:

Scales the pixel values by the standard deviation, achieving a more contrasty look. In practice, this can effectively act as a secondary CFG slider for stylization. It doesn't modify subject poses much, if at all, which can be great for those looking to get more oomf out of their low-cfg setups.

Using a negative value will not de-contrast, but instead will use a differing method to do the contrast operation. -33 ought to be near-equivalent to 33 in this case, for example. Feel free to play around and share which you prefer!

Spectral Modification Explanation:

We boost the low frequencies (low rate of change in the noise), and we lower the high frequencies (high rates of change in the noise).

Change the low/high frequency range using spectral_mod_percentile (default of 5.0, which is the upper and lower 5th percentiles.)

Increase/Decrease the strength of the adjustment by increasing spectral_mod_multiplier

Beware of percentile values higher than 15 and multiplier values higher than 5, especially for hard clamping. Here be dragons, as large values may cause it to "noise-out", or become full of non-sensical noise, especially earlier in the diffusion process.

Current Pipeline:

Add extra noise to conditioning -> Sharpen conditioning -> Convert to Noise Prediction -> Tonemap Noise Prediction -> Spectral Modification -> Modify contrast of noise prediction -> Rescale CFG -> Divisive Normalization -> Combat CFG Drift

Why use this over x node?

Since the set_model_sampler_cfg_function hijack in ComfyUI can only utilize a single function, we bundle many latent modification methods into one large function for processing. This is simpler than taking an existing hijack and modifying it, which may be possible, but my (Clybius') lack of Python/PyTorch knowledge leads to this being the optimal method for simplicity. If you know how to do this, feel free to reach out through any means!

Can you implement x function?

Depends. Is there existing code for such a function, with an open license for possible use in this repository? I could likely attempt adding it! Feel free to start an issue or to reach out for ideas you'd want implemented.

comfyui-latent-modifiers's People

Contributors

asagi4 avatar clybius avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.