DALL-E is a large-scale neural network created by OpenAI↗︎. It is trained to generate images from text descriptions, using a method called "Deep Learning". It can generate a wide variety of images, ranging from photorealistic to highly stylized, and can even create images that don't correspond to any existing picture. For example, if you give DALL-E the description "an armchair in the shape of an avocado," it will generate a corresponding image.
Dall-E (2021)
Since 2022 In April 2022, OpenAI announced DALL-E 2↗︎, a successor designed to generate more realistic images at higher resolutions that can combine concepts, attributes, and styles.
Technology
DALL-E is based on the Transformer architecture and uses a technique called "Deep Learning" to generate images from text descriptions. It is trained on a large dataset of text-image pairs and uses a multimodal implementation of the GPT-3↗︎ model with 12 billion parameters. DALL-E was developed in conjunction with CLIP, a separate model based on zero-shot learning that was trained on 400 million pairs of images and text captions scraped from the internet. CLIP's role is to understand and rank DALL-E's output by predicting which caption from a list of 32,768 captions is most appropriate for an image. DALL-E 2 uses a diffusion model conditioned on CLIP image embeddings.
Capabilities
DALL-E 2 can generate a wide range of images, including photorealistic, stylized, and even fictional images. DALL-E 2 is able to manipulate and rearrange objects in its images and can correctly place design elements in novel compositions without explicit instruction. It has shown the ability to "fill in the blanks" and add appropriate details to images without specific prompts. There are concerns that DALL-E 2 could be used to propagate deepfakes and other forms of misinformation, and that its accuracy and popularity could lead to technological unemployment for artists, photographers, and graphic designers.
Ethical concerns
DALL-E 2's reliance on public datasets can lead to algorithmic bias in some cases. For example, it may generate higher numbers of men than women for requests that do not mention gender. Its training data was filtered to remove violent and sexual imagery, but this was found to increase bias in some cases, such as reducing the frequency of women being generated. There are concerns that DALL-E 2 and similar image generation models could be used to propagate deepfakes and other forms of misinformation. To mitigate this, the software rejects prompts involving public figures and uploads containing human faces, and blocks prompts containing potentially objectionable content. However, it is easy to bypass this filtering using alternative phrases. There are also concerns that DALL-E 2 and similar models could cause technological unemployment for artists, photographers, and graphic designers due to their accuracy and popularity.
Technical limitations
DALL-E 2 has limitations in its language understanding. It may be unable to distinguish between similar phrases and may generate incorrect images in some circumstances. These limitations include its inability to handle complex sentences, negation, numbers, and requests for more than three objects. It also has difficulty generating images related to scientific subjects like astronomy or medical imagery.
Open-source implementations
There have been several attempts to create open-source implementations of DALL-E. Craiyon↗︎, formerly known as DALL-E Mini, is an AI model based on the original DALL-E that was trained on unfiltered data from the internet. It gained attention in mid-2022 for its ability to generate humorous imagery.