Do It Yourself

Custom Diffusion

Finetune Stable Diffusion on multiple concepts with a small number of images

Adobe Research's Custom Diffusion emerges as a groundbreaking tool offers a unique approach to fine-tuning these models using a relatively small number of images (about 4-20) for a new concept. This fine-tuning process is remarkably fast, taking approximately six minutes on two A100 GPUs. The key to its efficiency lies in its method of fine-tuning only a subset of model parameters, specifically the key and value projection matrices in the cross-attention layers.

Key Features of Custom Diffusion

  1. Multi-Concept Training and Combination: Capable of training for multiple concepts simultaneously or merging multiple fine-tuned models into one through closed-form constrained optimization, offering versatility in concept generation.

  2. Seamless Concept Composition: Generates and combines variations of multiple new concepts with existing ones in novel settings, showcasing the model's flexibility.

  3. Outstanding Performance: Surpasses other models in qualitative and quantitative evaluations, marking a significant advancement in AI-driven image creation.

  4. Memory and Computation Efficiency: Maintains high performance while being resource-efficient, a critical factor in sustainable AI development.

  5. Limitations and Challenges: While Custom Diffusion is highly effective, it faces challenges in composing complex combinations, such as multiple pets or more than three concepts, highlighting areas for future improvement.

Similar to DreamBooth↗︎, Custom Diffusion finetunes the whole model, but with different approaches and focuses. Custom Diffusion emphasizes multi-concept image generation, while DreamBooth specializes in subject-driven generation with high personalization.

By focusing on a select group of parameters for fine-tuning, it avoids the need for extensive retraining, which is both time-consuming and resource-intensive. This approach not only makes the workflow faster but also more accessible to users who may not have the resources for full-scale model training.

The ability to train for multiple concepts and merge fine-tuned models opens up new avenues for creativity workflows. Imagine being able to combine different elements – a concept from one model, a style from another – seamlessly. This capability is not just a technical achievement; it's an artistic one, allowing creators to push the boundaries of their imagination.

However, the tool's current limitations also offer insight into the future direction of research in this field. The challenges in composing complex combinations point to the need for continued innovation in understanding and replicating the nuances of visual perception and concept integration in AI models.

In conclusion, Adobe Research's Custom Diffusion represents a significant leap forward in the realm of AI-driven image creation. Its blend of efficiency, flexibility, and performance makes it an invaluable tool for anyone looking to explore the frontiers of visual creativity.