-
Sleepy_chord authoredSleepy_chord authored
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
readme.md 6.51 KiB
Generate vivid Images for Any (Chinese) text
CogView is a pretrained (4B-param) transformer for text-to-image generation in general domain.
- Read our paper CogView: Mastering Text-to-Image Generation via Transformers on ArXiv for a formal introduction. The PB-relax and Sandwich-LN can also help you train large and deep transformers stably (e.g. eliminating NaN losses).
- Visit our demo at https://lab.aminer.cn/cogview/index.html! (Without post-selection or super-resolution, currently only supports simplified Chinese input, but one can translate text from other languages into Chinese for input)
- Download our pretrained models from Project Wudao-Wenhui(悟道-文汇).
- Cite our paper if you find our work is helpful~
@article{ding2021cogview,
title={CogView: Mastering Text-to-Image Generation via Transformers},
author={Ding, Ming and Yang, Zhuoyi and Hong, Wenyi and Zheng, Wendi and Zhou, Chang and Yin, Da and Lin, Junyang and Zou, Xu and Shao, Zhou and Yang, Hongxia and Tang, Jie},
journal={arXiv preprint arXiv:2105.13290},
year={2021}
Getting Started
Setup
-
Hardware: Linux servers with Nvidia V100s or A100s are recommended, but it is also okay to run the pretrained models with smaller
--max-inference-batch-size
or training smaller models on less powerful GPUs. -
Environment (Option 1): Please first install PyTorch (>=1.7.0) and apex, and then install other dependencies via
pip install -r requirements.txt
. -
Environment (Option 2): We prepare a docker image in case that you fail to handle the environments. Pull the image, create a (background) container and get into it via:
docker pull cogview/cuda111_torch181_deepspeed040 ./env/start_docker.sh && docker exec -it bg-cogview bash cd /root/cogview # in the container