Under Review

Muse

Create images via text prompt based on transformers

Muse enables efficient and high-quality image generation and editing, making it an invaluable tool for various applications. Muse leverages a pre-trained large language model (LLM) to achieve fine-grained language understanding, resulting in high-fidelity image generation. By training on a masked modeling task in discrete token space, Muse surpasses pixel-space diffusion models and autoregressive models in terms of efficiency. It requires fewer sampling iterations and employs parallel decoding, making the image generation process faster and more streamlined. The benefits of Muse extend beyond image generation.

It allows for mask-free editing, enabling designers to control multiple objects in an image using only a text prompt.
Additionally, Muse offers zero-shot inpainting and outpainting capabilities, effortlessly filling or extending visual content based on text instructions. These features empower designers to create, manipulate, and refine their artwork without the need for fine-tuning or model inversion.
With Muse's speed and tokenization approach, real-time interactive editing becomes a reality. Designers can make instant updates and modifications to images, with each change processed in mere milliseconds.