Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

[논문리뷰] Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

2023.05.28

https://arxiv.org/pdf/2205.11487v1.pdf Abstract Imagen: a text to image diffusion 모델로 높은 수준의 language understanding과 photorealism을 가짐 large transformer language models를 통해 text를 이해하고 이를 기반으로 diffusion model이 고화질 이미지를 생성함 T5와 같은 generic pre-trained large language models로 text를 encoding 하는 것은 image synthesis에 효과적이라는 것을 이 논문에서 밝혀냄 -> LM의 크기를 늘리는 것이 Diffusion 모델의 크기를 늘리는 것보다 성능이 좋게 나옴 벤치마크(어떤 것의 성능을..

[논문리뷰] Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

티스토리툴바