DALLE2-pytorchFramework45/100
via “two-stage diffusion-based text-to-image generation with clip embeddings”
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
Unique: Implements the official DALL-E 2 two-stage architecture with explicit separation of semantic embedding prediction (DiffusionPrior) and image synthesis (Decoder), allowing independent training and swapping of components. Uses cascading Unets for progressive resolution refinement rather than single-stage generation, enabling 1024x1024+ output with manageable memory.
vs others: More modular and research-friendly than Stable Diffusion (which uses single-stage latent diffusion) and more faithful to OpenAI's published architecture than community reimplementations, enabling reproducible research and component-level customization.