前言

最近Text-To-Image是一个很火的话题，甚至更进一步的Text-To-Video话题度也在不断上升。最近看到一个开源项目FlagAI，是目前我觉着效果比较好的项目之一。安装操作简单，支持中英文，这就很nice。

项目开源地址：github地址

下面是我用项目demo跑出的效果，大家可以看一下。

输入的文本：

Anime portrait of natalie portman as an anime girl by stanley artgerm lau, wlop, rossdraws, james jean, andrei riabovitchev, marc simonetti, and sakimichan, trending on artstation

翻译一下，乱七八糟的人名，动漫肖像女孩，大概是这样。

项目结构

在ReadMe中，作者不但提供了快速上手的可以使用的中英文预训练模型。

还有分词器、预测器的操作说明。

我先打开作者给我们的样例代码。

使用的方式很简单，生成图片的方法为：predictor.predict_generate_images，代码看上去十分简单移动。作者给出的安装说明如下：

OK，我把样例代码改一下，改成可页面交互方式，方便使用。

页面交互调整

代码如下：

import torch
from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor
from PIL import Image
import gradio as gr
import os
import shutil

# Initialize
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

loader = AutoLoader(task_name="text2img",  # contrastive learning
                    model_name="AltDiffusion",
                    model_dir="./checkpoints")

model = loader.get_model()
model.eval()
model.to(device)
predictor = Predictor(model)


def handle(text: str):
    if not os.path.isdir("./AltDiffusionOutputs"):
        os.mkdir("./AltDiffusionOutputs")
    else:
        shutil.rmtree("./AltDiffusionOutputs")
    predictor.predict_generate_images(text)
    imgs = []
    for s in os.listdir("./AltDiffusionOutputs/samples"):
        imgs.append(Image.open(os.path.join("./AltDiffusionOutputs/samples", s)))
    return imgs


if __name__ == '__main__':
    demo = gr.Interface(fn=handle, inputs=gr.Text(),
                        outputs=[gr.Image(type="pil"), gr.Image(type="pil"),
                                 gr.Image(type="pil"), gr.Image(type="pil")])
    demo.launch(server_name="0.0.0.0", server_port=12003)

除了项目要求的安装内容，需要额外安装gradio。

安装命令如下：

pip install gradio -i https://pypi.douban.com/simple

第一次执行会在当前目录下创建checkpoints文件夹，下载预训练模型，耗时比较久。

第二次执行后结果如下

******************** text2img altdiffusion
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 64, 64) = 16384 dimensions.
making attention of type 'vanilla' with 512 in_channels
******************** txt_img_matching altclip-xlmr-l
model files:['config.json', 'pytorch_model.bin', 'tokenizer.json', 'tokenizer_config.json', 'preprocessor_config.json', 'special_tokens_map.json']
./checkpoints/AltCLIP-XLMR-L
Global Step: 143310
Running on local URL: http://0.0.0.0:12003

To create a public link, set `share=True` in `launch()`.

通过浏览器打开页面：http://localhost:12003

输入需要生成图片的文字，然后点击submit。