AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

With the advance of text-to-image models (e.g., Stable Diffusion) and corresponding personalization techniques (e.g., LoRA and DreamBooth), it is possible for everyone to manifest their imagination into high-quality images with an affordable cost. Subsequently, there is a great demand for image animation techniques to further combine generated stationary images with motion dynamics. In this project, we propose an effective framework to animate most of existing personalized text-to-image models once for all, saving the efforts in model-specific tuning.

At the core of the proposed framework is to append a newly-initialized motion modeling module to the frozen based text-to-image model, and train it on video clips thereafter to distill a reasonable motion prior. Once trained, by simply injecting this motion modeling module, all personalized versions derived from the same base one readily become text-driven models that produce diverse and personalized animated images.


Here we demonstrate best-quality animations generated by models injected with the motion modeling module in our framework.
