{"id":1772,"date":"2025-02-11T07:03:35","date_gmt":"2025-02-11T07:03:35","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/02\/11\/six-ways-to-control-style-and-content-in-diffusion-models\/"},"modified":"2025-02-11T07:03:35","modified_gmt":"2025-02-11T07:03:35","slug":"six-ways-to-control-style-and-content-in-diffusion-models","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/02\/11\/six-ways-to-control-style-and-content-in-diffusion-models\/","title":{"rendered":"Six Ways to Control Style and Content in Diffusion Models"},"content":{"rendered":"<p>    Six Ways to Control Style and Content in Diffusion Models<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p class=\"wp-block-paragraph\" id=\"76e9\">Stable Diffusion 1.5\/2.0\/2.1\/XL 1.0, DALL-E, Imagen\u2026 In the past years, <a href=\"https:\/\/towardsdatascience.com\/tag\/diffusion-models\/\" title=\"Diffusion Models\">Diffusion Models<\/a> have showcased stunning quality in image generation. However, while producing great quality on generic concepts, these struggle to generate high quality for more specialised queries, for example generating images in a specific style, that was not frequently seen in the training dataset.<\/p>\n<p class=\"wp-block-paragraph\" id=\"3583\">We could retrain the whole model on vast number of images, explaining the concepts needed to address the issue from scratch. However, this doesn\u2019t sound practical. First, we need a large set of images for the idea, and second, it is simply too expensive and time-consuming.<\/p>\n<p class=\"wp-block-paragraph\" id=\"1e0d\">There are solutions, however, that, given a handful of images and an hour of fine-tuning at worst, would enable diffusion models to produce reasonable quality on the new concepts.<\/p>\n<p class=\"wp-block-paragraph\" id=\"65fd\">Below, I cover approaches like Dreambooth, <a href=\"https:\/\/towardsdatascience.com\/tag\/lora\/\" title=\"Lora\">Lora<\/a>, Hyper-networks, Textual Inversion, IP-Adapters and ControlNets widely used to customize and condition diffusion models. The idea behind all these methods is to memorise a new concept we are trying to learn, however, each technique approaches it differently.<\/p>\n<h2 class=\"wp-block-heading\" id=\"64b0\">Diffusion architecture<\/h2>\n<p class=\"wp-block-paragraph\" id=\"9b43\">Before diving into various methods that help to condition diffusion models, let\u2019s first recap what diffusion models are.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"bcbcbc\" data-has-transparency=\"false\" style=\"--dominant-color: #bcbcbc;\" fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"685\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_UYFDyJn9pjw1tcrJx0mWoA-1024x685.webp?resize=1024%2C685&#038;ssl=1\" alt=\"\" class=\"wp-image-597626 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_UYFDyJn9pjw1tcrJx0mWoA-1024x685.webp 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_UYFDyJn9pjw1tcrJx0mWoA-300x201.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_UYFDyJn9pjw1tcrJx0mWoA-768x514.webp 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_UYFDyJn9pjw1tcrJx0mWoA.webp 1400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Diffusion process visualisation. Image by the Author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"0a1e\">The original idea of diffusion models is to train a model to reconstruct a coherent image from noise. In the training stage, we gradually add small amounts of Gaussian noise (forward process) and then reconstruct the image iteratively by optimizing the model to predict the noise, subtracting which we would get closer to the target image (reverse process).<\/p>\n<p class=\"wp-block-paragraph\" id=\"dc77\">The original idea of image corruption has<a href=\"https:\/\/arxiv.org\/abs\/2112.10752\" rel=\"noreferrer noopener\" target=\"_blank\">\u00a0evolved into a more practical<\/a>\u00a0and lightweight architecture in which images are first compressed to a latent space, and all manipulation with added noise is performed in low dimensional space.<\/p>\n<p class=\"wp-block-paragraph\" id=\"d651\">To add textual information to the diffusion model, we first pass it through a text-encoder (typically\u00a0<a href=\"https:\/\/github.com\/openai\/CLIP\" rel=\"noreferrer noopener\" target=\"_blank\">CLIP<\/a>) to produce latent embedding, that is then injected into the model with cross-attention layers.<\/p>\n<h2 class=\"wp-block-heading\" id=\"2acd\">Dreambooth,\u00a0<a href=\"https:\/\/dreambooth.github.io\/\" rel=\"noreferrer noopener\" target=\"_blank\">paper<\/a>,\u00a0<a href=\"https:\/\/huggingface.co\/docs\/diffusers\/training\/dreambooth\" rel=\"noreferrer noopener\" target=\"_blank\">code<\/a><br \/>\n<\/h2>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" data-dominant-color=\"c8bcbc\" data-has-transparency=\"false\" style=\"--dominant-color: #c8bcbc;\" decoding=\"async\" width=\"1024\" height=\"514\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_FTnN5WIWZNqn-L0OFrzaCQ-1024x514.webp?resize=1024%2C514&#038;ssl=1\" alt=\"\" class=\"wp-image-597627 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_FTnN5WIWZNqn-L0OFrzaCQ-1024x514.webp 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_FTnN5WIWZNqn-L0OFrzaCQ-300x151.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_FTnN5WIWZNqn-L0OFrzaCQ-768x386.webp 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_FTnN5WIWZNqn-L0OFrzaCQ.webp 1400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Dreambooth visualisation. Trainable blocks are marked in red. Image by the Author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"c720\">The idea is to take a rare word; typically, an {SKS} word is used and then teach the model to map the word {SKS} to a feature we would like to learn. That might, for example, be a style that the model has never seen, like van Gogh. We would show a dozen of his paintings and fine-tune to the phrase \u201cA painting of boots in the {SKS} style\u201d. We could similarly personalise the generation, for example, learn how to generate images of a particular person, for example \u201c{SKS} in the mountains\u201d on a set of one\u2019s selfies.<\/p>\n<p class=\"wp-block-paragraph\" id=\"2286\">To maintain the information learned in the pre-training stage, Dreambooth encourages the model not to deviate too much from the original, pre-trained version by adding text-image pairs generated by the original model to the fine-tuning set.<\/p>\n<p class=\"wp-block-paragraph\" id=\"ffce\"><strong>When to use and when not<br \/><\/strong>Dreambooth produces the best quality across all methods; however, the technique could impact already learnt concepts since the whole model is updated. The training schedule also limits the number of concepts the model can understand. Training is time-consuming, taking 1\u20132 hours. If we decide to introduce several new concepts at a time, we would need to store two model checkpoints, which wastes a lot of space.<\/p>\n<h2 class=\"wp-block-heading\" id=\"4840\">Textual Inversion,\u00a0<a href=\"https:\/\/textual-inversion.github.io\/\" rel=\"noreferrer noopener\" target=\"_blank\">paper<\/a>,\u00a0<a href=\"https:\/\/github.com\/rinongal\/textual_inversion\" rel=\"noreferrer noopener\" target=\"_blank\">code<\/a><br \/>\n<\/h2>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" data-dominant-color=\"cacaca\" data-has-transparency=\"false\" style=\"--dominant-color: #cacaca;\" decoding=\"async\" width=\"1024\" height=\"518\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_3q5750zxee_jEKwG4MFQTw-1024x518.webp?resize=1024%2C518&#038;ssl=1\" alt=\"\" class=\"wp-image-597628 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_3q5750zxee_jEKwG4MFQTw-1024x518.webp 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_3q5750zxee_jEKwG4MFQTw-300x152.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_3q5750zxee_jEKwG4MFQTw-768x388.webp 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_3q5750zxee_jEKwG4MFQTw.webp 1400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Textual inversion visualisation. Trainable blocks are marked in red. Image by the Author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"4a7e\">The assumption behind the textual inversion is that the knowledge stored in the latent space of the diffusion models is vast. Hence, the style or the condition we want to reproduce with the Diffusion model is already known to it, but we just don\u2019t have the token to access it. Thus, instead of fine-tuning the model to reproduce the desired output when fed with rare words \u201cin the {SKS} style\u201d, we are optimizing for a textual embedding that would result in the desired output.<\/p>\n<p class=\"wp-block-paragraph\" id=\"4949\"><strong>When to use and when not<br \/><\/strong>It takes very little space, as only the token will be stored. It is also relatively quick to train, with an average training time of 20\u201330 minutes. However, it comes with its shortcomings \u2014 as we are fine-tuning a specific vector that guides the model to produce a particular style, it won\u2019t generalise beyond this style.<\/p>\n<h2 class=\"wp-block-heading\" id=\"f36c\">LoRA,\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2106.09685\" target=\"_blank\" rel=\"noreferrer noopener\">paper<\/a>,\u00a0<a href=\"https:\/\/github.com\/microsoft\/LoRA\" target=\"_blank\" rel=\"noreferrer noopener\">code<\/a><br \/>\n<\/h2>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"cecccc\" data-has-transparency=\"false\" style=\"--dominant-color: #cecccc;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"681\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_d6-FSe5pFXu41r7FI4hjKA-1024x681.webp?resize=1024%2C681&#038;ssl=1\" alt=\"\" class=\"wp-image-597629 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_d6-FSe5pFXu41r7FI4hjKA-1024x681.webp 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_d6-FSe5pFXu41r7FI4hjKA-300x200.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_d6-FSe5pFXu41r7FI4hjKA-768x511.webp 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_d6-FSe5pFXu41r7FI4hjKA.webp 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">LoRA visualisation. Trainable blocks are marked in red. Image by the Author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"e301\">Low-Rank Adaptions (LoRA) were proposed for Large Language Models and were\u00a0<a href=\"https:\/\/github.com\/cloneofsimo\/lora\" rel=\"noreferrer noopener\" target=\"_blank\">first adapted to the diffusion model by Simo Ryu<\/a>. The original idea of LoRAs is that instead of fine-tuning the whole model, which can be rather costly, we can blend a fraction of new weights that would be fine-tuned for the task with a similar rare token approach into the original model.<\/p>\n<p class=\"wp-block-paragraph\" id=\"954a\">In diffusion models, rank decomposition is applied to cross-attention layers and is responsible for merging prompt and image information. The weight matrices WO, WQ, WK, and WV in these layers have LoRA applied.<\/p>\n<p class=\"wp-block-paragraph\" id=\"3bf0\"><strong>When to use and when not<br \/><\/strong>LoRAs take very little time to train (5\u201315 minutes) \u2014 we are updating a handful of parameters compared to the whole model, and unlike Dreambooth, they take much less space. However, small-in-size models fine-tuned with LoRAs prove worse quality compared to DreamBooth.<\/p>\n<h2 class=\"wp-block-heading\" id=\"d72e\">Hyper-networks, paper, code<\/h2>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"cccbcb\" data-has-transparency=\"false\" style=\"--dominant-color: #cccbcb;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"679\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_iFbRUz_2uTLA1BVJ5JN6ng-1024x679.webp?resize=1024%2C679&#038;ssl=1\" alt=\"\" class=\"wp-image-597630 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_iFbRUz_2uTLA1BVJ5JN6ng-1024x679.webp 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_iFbRUz_2uTLA1BVJ5JN6ng-300x199.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_iFbRUz_2uTLA1BVJ5JN6ng-768x510.webp 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_iFbRUz_2uTLA1BVJ5JN6ng.webp 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Hyper-networks visualisation. Trainable blocks are marked in red. Image by the Author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"3c05\">Hyper-networks are, in some sense, extensions to LoRAs. Instead of learning the relatively small embeddings that would alter the model\u2019s output directly, we train a separate network capable of predicting the weights for these newly injected embeddings.<\/p>\n<p class=\"wp-block-paragraph\" id=\"a045\">Having the model predict the embeddings for a specific concept we can teach the hypernetwork several concepts \u2014 reusing the same model for multiple tasks.<\/p>\n<p class=\"wp-block-paragraph\" id=\"cb5f\"><strong>When to use and not<br \/><\/strong>Hypernetworks, not specialising in a single style, but instead capable to produce plethora generally do not result in as good quality as the other methods and can take significant time to train. On the pros side, they can store many more concepts than other single-concept fine-tuning methods.<\/p>\n<h2 class=\"wp-block-heading\" id=\"f077\">IP Adapters,\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/2308.06721\" rel=\"noreferrer noopener\" target=\"_blank\">paper<\/a>,\u00a0<a href=\"https:\/\/github.com\/tencent-ailab\/IP-Adapter\" rel=\"noreferrer noopener\" target=\"_blank\">code<\/a><br \/>\n<\/h2>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"c6c2c2\" data-has-transparency=\"false\" style=\"--dominant-color: #c6c2c2;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"679\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_4ucPM5un1pWFkdmy0u0Z7w-1024x679.webp?resize=1024%2C679&#038;ssl=1\" alt=\"\" class=\"wp-image-597631 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_4ucPM5un1pWFkdmy0u0Z7w-1024x679.webp 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_4ucPM5un1pWFkdmy0u0Z7w-300x199.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_4ucPM5un1pWFkdmy0u0Z7w-768x510.webp 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_4ucPM5un1pWFkdmy0u0Z7w.webp 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">IP-adapter visualisation. Trainable blocks are marked in red. Image by the Author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"fdf0\">Instead of controlling image generation with text prompts, IP adapters propose a method to control the generation with an image without any changes to the underlying model.<\/p>\n<p class=\"wp-block-paragraph\" id=\"3744\">The core idea behind the IP adapter is a decoupled cross-attention mechanism that allows the combination of source images with text and generated image features. This is achieved by adding a separate cross-attention layer, allowing the model to learn image-specific features.<\/p>\n<p class=\"wp-block-paragraph\" id=\"f586\"><strong>When to use and not<br \/><\/strong>IP adapters are lightweight, adaptable and fast. However, their performance is highly dependent on the quality and diversity of the training data. IP adapters generally tend to work better with supplying stylistic attributes (e.g. with an image of Mark Chagall\u2019s paintings) that we would like to see in the generated image and could struggle with providing control for exact details, such as pose.<\/p>\n<h2 class=\"wp-block-heading\" id=\"2231\">ControlNets,\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2302.05543\" rel=\"noreferrer noopener\" target=\"_blank\">paper<\/a>,\u00a0<a href=\"https:\/\/github.com\/lllyasviel\/ControlNet\" rel=\"noreferrer noopener\" target=\"_blank\">code<\/a><br \/>\n<\/h2>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" data-dominant-color=\"c5c4c4\" data-has-transparency=\"false\" style=\"--dominant-color: #c5c4c4;\" loading=\"lazy\" decoding=\"async\" width=\"966\" height=\"768\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_zHqP1p5m_a_71W156TyYdw.webp?resize=966%2C768&#038;ssl=1\" alt=\"\" class=\"wp-image-597632 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_zHqP1p5m_a_71W156TyYdw.webp 966w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_zHqP1p5m_a_71W156TyYdw-300x239.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_zHqP1p5m_a_71W156TyYdw-768x611.webp 768w\" sizes=\"auto, (max-width: 966px) 100vw, 966px\"><figcaption class=\"wp-element-caption\">ControlNet visualisation. Trainable blocks are marked in red. Image by the Author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"9ff5\">ControlNet paper proposes a way to extend the input of the text-to-image model to any modality, allowing for fine-grained control of the generated image.<\/p>\n<p class=\"wp-block-paragraph\" id=\"beb5\">In the original formulation, ControlNet is an encoder of the pre-trained diffusion model that takes, as an input, the prompt, noise and control data (e.g. depth-map, landmarks, etc.). To guide the generation, the intermediate levels of the ControlNet are then added to the activations of the frozen diffusion model.<\/p>\n<p class=\"wp-block-paragraph\" id=\"cf2d\">The injection is achieved through zero-convolutions, where the weights and biases of 1\u00d71 convolutions are initialized as zeros and gradually learn meaningful transformations during training. This is similar to how LoRAs are trained \u2014 intialised with 0\u2019s they begin learning from the identity function.<\/p>\n<p class=\"wp-block-paragraph\" id=\"0c35\"><strong>When to use and not<br \/><\/strong>ControlNets are preferable when we want to control the output structure, for example, through landmarks, depth maps, or edge maps. Due to the need to update the whole model weights, training could be time-consuming; however, these methods also allow for the best fine-grained control through rigid control signals.<\/p>\n<h2 class=\"wp-block-heading\" id=\"a1a6\">Summary<\/h2>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>DreamBooth:<\/strong>\u00a0Full fine-tuning of models for custom subjects of styles, high control level; however, it takes long time to train and are fit for one purpose only.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Textual Inversion:\u00a0<\/strong>Embedding-based learning for new concepts, low level of control, however, fast to train.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>LoRA:\u00a0<\/strong>Lightweight fine-tuning of models for new styles\/characters, medium level of control, while quick to train<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Hypernetworks:\u00a0<\/strong>Separate model to predict LoRA weights for a given control request. Lower control level for more styles. Takes time to train.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>IP-Adapter:\u00a0<\/strong>Soft style\/content guidance via reference images, medium level of stylistic control, lightweight and efficient.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>ControlNet:\u00a0<\/strong>Control via pose, depth, and edges is very precise; however, it takes longer time to train.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\" id=\"08eb\"><strong>Best practice:<\/strong>\u00a0For the best results, the combination of IP-adapter, with its softer stylistic guidance and ControlNet for pose and object arrangement, would produce the best results.<\/p>\n<p class=\"wp-block-paragraph\" id=\"079c\">If you want to go into more details on diffusion, check out\u00a0<a href=\"https:\/\/erdem.pl\/2023\/11\/step-by-step-visual-introduction-to-diffusion-models\" target=\"_blank\" rel=\"noreferrer noopener\">this article<\/a>, that I have found very well written accessible to any level of machine learning and math. If you want to have an intuitive explanation of the Math with cool commentary check out\u00a0<a href=\"https:\/\/www.youtube.com\/watch?app=desktop&amp;v=HoKDTa5jHvg&amp;t=1284s\" target=\"_blank\" rel=\"noreferrer noopener\">this video<\/a>\u00a0or\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=fbLgFrlTnGU\" target=\"_blank\" rel=\"noreferrer noopener\">this video<\/a>.<\/p>\n<p class=\"wp-block-paragraph\" id=\"cf18\">For looking up information on ControlNets, I found\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=fhIGt7QGg4w\" rel=\"noreferrer noopener\" target=\"_blank\">this explanation<\/a>\u00a0very helpful,\u00a0<a href=\"https:\/\/medium.com\/@isa.dario.isa\/conditioning-image-generation-%EF%B8%8F-implementation-with-stable-diffusion-controlnet-and-ipadapter-b502bfe9315d\">this article<\/a>\u00a0and\u00a0<a href=\"https:\/\/medium.com\/@steinsfu\/stable-diffusion-controlnet-clearly-explained-f86092b62c89\">this article<\/a>\u00a0could be a good intro as well.<\/p>\n<h2 class=\"wp-block-heading\" id=\"e2b2\">Liked the author? Stay connected!<\/h2>\n<p class=\"wp-block-paragraph\" id=\"687b\">Have I missed anything? Do not hesitate to leave a note, comment or message me directly on\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/aliakseimikhailiuk\/\" target=\"_blank\" rel=\"noreferrer noopener\">LinkedIn<\/a>\u00a0or\u00a0<a href=\"https:\/\/twitter.com\/mikhailiuka\" target=\"_blank\" rel=\"noreferrer noopener\">Twitter<\/a>!<\/p>\n<p class=\"wp-block-paragraph\" id=\"687b\"><a href=\"https:\/\/towardsdatascience.com\/three-challenges-in-deploying-generative-models-in-production-8e4c0fcf63c3?source=post_page-----805169566d8e--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\"><\/a><\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/three-challenges-in-deploying-generative-models-in-production-8e4c0fcf63c3?source=post_page-----805169566d8e--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\">Three challenges in deploying generative models in production<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/llm-routing-the-heart-of-any-practical-ai-chatbot-application-892e88d4a80d?source=post_page-----805169566d8e--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\">LLM Routing \u2014 the Heart of Any Practical AI Chatbot Application<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/pub.towardsai.net\/face-off-practical-face-swapping-with-machine-learning-a05b911ea0f?source=post_page-----805169566d8e--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\">Face Off: Practical Face-Swapping with Machine Learning<\/a><\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\" id=\"b17a\"><strong>The opinions in this blog are my own and not attributable to or on behalf of Snap.<\/strong><a href=\"https:\/\/medium.com\/tag\/ai?source=post_page-----805169566d8e--------------------------------\"><\/a><\/p>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/six-ways-to-control-style-and-content-in-diffusion-models\/\">Six Ways to Control Style and Content in Diffusion Models<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Aliaksei Mikhailiuk<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/six-ways-to-control-style-and-content-in-diffusion-models\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Six Ways to Control Style and Content in Diffusion Models Stable Diffusion 1.5\/2.0\/2.1\/XL 1.0, DALL-E, Imagen\u2026 In the past years, Diffusion Models have showcased stunning quality in image generation. However, while producing great quality on generic concepts, these struggle to generate high quality for more specialised queries, for example generating images in a specific style, [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1699,62,1663,250,781,70,158],"tags":[454,845,73],"class_list":["post-1772","post","type-post","status-publish","format-standard","hentry","category-ai-image-generation","category-aimldsaimlds","category-diffusion-models","category-generative-ai-tools","category-lora","category-machine-learning","category-tips-and-tricks","tag-diffusion","tag-image","tag-models"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1772"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=1772"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1772\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=1772"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=1772"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=1772"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}