selected model image

Do It Yourself

StyleGAN-T

Create a new picture given on a text input

Text-to-image synthesis has recently seen significant progress thanks to large pre-trained language models, large-scale training data, and the introduction of scalable model families such as diffusion↗︎ and autoregressive models. The generative adversarial networks (GANs), like StyleGAN↗︎, have become more and more uncommon. Although they are much faster, but they remain far behind the state-of-the-art in large-scale text-to-image synthesis.

StyleGAN-T, addresses the specific requirements of large-scale text-to-image synthesis, such as large capacity, stable training on diverse datasets, strong text alignment, and controllable fidelity vs. text alignment tradeoff. StyleGAN-T significantly improves over previous GANs and outperforms distilled diffusion models - the previous state-of-the-art in fast text-to-image synthesis - in terms of sample quality and speed.