大家好,我是微学AI,今天给大家介绍一下人工智能算法工程师(高级)课程5-图像生成项目之对抗生成模型与代码详解。本文将介绍对抗生成模型(GAN)及其变体CGAN、DCGAN的数学原理,并通过PyTorch框架搭建完整可运行的代码,帮助读者掌握图像生成的原理和技术。
文章目录
- 一、GAN模型
- 1. GAN模型的数学原理
- 2. GAN模型的代码实现
- 二、CGAN模型
- 1. CGAN模型的数学原理
- 2. CGAN模型的代码实现
- 三、DCGAN模型
- 1. DCGAN模型的数学原理
- 2. DCGAN模型的代码实现
- 四、总结
一、GAN模型
1. GAN模型的数学原理
生成对抗网络(GAN)由Goodfellow等人在2014年提出,主要由生成器(Generator)和判别器(Discriminator)两部分组成。生成器的任务是生成尽可能接近真实数据的样本,而判别器的任务是将生成器生成的样本与真实样本区分开来。
GAN的目标函数如下:
V
(
D
,
G
)
=
E
x
∼
p
d
a
t
a
(
x
)
[
log
D
(
x
)
]
+
E
z
∼
p
z
(
z
)
[
log
(
1
−
D
(
G
(
z
)
)
)
]
V(D, G) = \mathbb{E}_{x \sim p_{data}(x)}[\log D(x)] + \mathbb{E}_{z \sim p_{z}(z)}[\log(1-D(G(z)))]
V(D,G)=Ex∼pdata(x)[logD(x)]+Ez∼pz(z)[log(1−D(G(z)))]
其中,
D
(
x
)
D(x)
D(x)表示判别器判断
x
x
x为真实样本的概率,
G
(
z
)
G(z)
G(z)表示生成器生成的样本,
z
z
z为随机噪声。
2. GAN模型的代码实现
首先,导入所需库:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
定义生成器:
class Generator(nn.Module):
def __init__(self, z_dim, img_dim):
super(Generator, self).__init__()
self.model = nn.Sequential(
nn.Linear(z_dim, 256),
nn.LeakyReLU(0.2),
nn.Linear(256, 512),
nn.LeakyReLU(0.2),
nn.Linear(512, 1024),
nn.LeakyReLU(0.2),
nn.Linear(1024, img_dim),
nn.Tanh()
)
def forward(self, z):
return self.model(z)
定义判别器:
class Discriminator(nn.Module):
def __init__(self, img_dim):
super(Discriminator, self).__init__()
self.model = nn.Sequential(
nn.Linear(img_dim, 1024),
nn.LeakyReLU(0.2),
nn.Dropout(0.3),
nn.Linear(1024, 512),
nn.LeakyReLU(0.2),
nn.Dropout(0.3),
nn.Linear(512, 256),
nn.LeakyReLU(0.2),
nn.Dropout(0.3),
nn.Linear(256, 1),
nn.Sigmoid()
)
def forward(self, x):
return self.model(x)
训练GAN模型:
# 设定超参数
z_dim = 100
img_dim = 28 * 28
batch_size = 64
lr = 0.0002
epochs = 50
# 加载数据集
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
train_data = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
dataloader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
# 初始化生成器和判别器
G = Generator(z_dim, img_dim)
D = Discriminator(img_dim)
# 定义优化器
optimizer_G = optim.Adam(G.parameters(), lr=lr)
optimizer_D = optim.Adam(D.parameters(), lr=lr)
# 训练过程
for epoch in range(epochs):
for i, (imgs, _) in enumerate(dataloader):
# 训练判别器
real_imgs = imgs.view(-1, img_dim)
z = torch.randn(batch_size, z_dim)
fake_imgs = G(z)
D_real = D(real_imgs)
D_fake = D(fake_imgs)
D_loss = -torch.mean(torch.log(D_real) + torch.log(1 - D_fake))
optimizer_D.zero_grad()
D_loss.backward()
optimizer_D.step()
# 训练生成器
z = torch.randn(batch_size, z_dim)
fake_imgs = G(z)
D_fake = D(fake_imgs)
G_loss = -torch.mean(torch.log(D_fake))
optimizer_G.zero_grad()
G_loss.backward()
optimizer_G.step()
if (i + 1) % 100 == 0:
print(f'Epoch [{epoch + 1}/{epochs}], Step [{i + 1}/{len(dataloader)}], D_loss: {D_loss.item():.4f}, G_loss: {G_loss.item():.4f}')
二、CGAN模型
1. CGAN模型的数学原理
条件生成对抗网络(CGAN)在GAN的基础上,为生成器和判别器引入了额外的条件信息 y y y。CGAN的目标函数如下:
V
(
D
,
G
)
=
E
x
∼
p
d
a
t
a
(
x
)
[
log
D
(
x
∣
y
)
]
+
E
z
∼
p
z
(
z
)
[
log
(
1
−
D
(
G
(
z
∣
y
)
)
)
]
V(D, G) = \mathbb{E}_{x \sim p_{data}(x)}[\log D(x|y)] + \mathbb{E}_{z \sim p_{z}(z)}[\log(1-D(G(z|y)))]
V(D,G)=Ex∼pdata(x)[logD(x∣y)]+Ez∼pz(z)[log(1−D(G(z∣y)))]
其中,
D
(
x
∣
y
)
D(x|y)
D(x∣y)表示在给定条件
y
y
y的情况下,判别器判断
x
x
x为真实样本的概率,
G
(
z
∣
y
)
G(z|y)
G(z∣y)表示在给定条件
y
y
y的情况下,生成器生成的样本。
2. CGAN模型的代码实现
CGAN的生成器和判别器需要在原有的基础上加入条件信息 y y y。以下是CGAN的代码实现:
class CGenerator(nn.Module):
def __init__(self, z_dim, condition_dim, img_dim):
super(CGenerator, self).__init__()
self.model = nn.Sequential(
nn.Linear(z_dim + condition_dim, 256),
nn.LeakyReLU(0.2),
nn.Linear(256, 512),
nn.LeakyReLU(0.2),
nn.Linear(512, 1024),
nn.LeakyReLU(0.2),
nn.Linear(1024, img_dim),
nn.Tanh()
)
def forward(self, z, y):
z_y = torch.cat([z, y], 1)
return self.model(z_y)
class CDiscriminator(nn.Module):
def __init__(self, img_dim, condition_dim):
super(CDiscriminator, self).__init__()
self.model = nn.Sequential(
nn.Linear(img_dim + condition_dim, 1024),
nn.LeakyReLU(0.2),
nn.Dropout(0.3),
nn.Linear(1024, 512),
nn.LeakyReLU(0.2),
nn.Dropout(0.3),
nn.Linear(512, 256),
nn.LeakyReLU(0.2),
nn.Dropout(0.3),
nn.Linear(256, 1),
nn.Sigmoid()
)
def forward(self, x, y):
x_y = torch.cat([x, y], 1)
return self.model(x_y)
训练CGAN模型时,需要将条件信息 y y y传递给生成器和判别器:
# 假设条件信息y的维度为10
condition_dim = 10
# 初始化条件生成器和条件判别器
CG = CGenerator(z_dim, condition_dim, img_dim)
CD = CDiscriminator(img_dim, condition_dim)
# 定义优化器
optimizer.CG = optim.Adam(CG.parameters(), lr=lr)
optimizer.CD = optim.Adam(CD.parameters(), lr=lr)
# 训练过程
for epoch in range(epochs):
for i, (imgs, labels) in enumerate(dataloader):
# 将标签转换为one-hot编码
y = torch.nn.functional.one_hot(labels, num_classes=condition_dim).float()
real_imgs = imgs.view(-1, img_dim)
z = torch.randn(batch_size, z_dim)
fake_imgs = CG(z, y)
CD_real = CD(real_imgs, y)
CD_fake = CD(fake_imgs, y)
CD_loss = -torch.mean(torch.log(CD_real) + torch.log(1 - CD_fake))
optimizer.CD.zero_grad()
CD_loss.backward()
optimizer.CD.step()
z = torch.randn(batch_size, z_dim)
fake_imgs = CG(z, y)
CD_fake = CD(fake_imgs, y)
CG_loss = -torch.mean(torch.log(CD_fake))
optimizer.CG.zero_grad()
CG_loss.backward()
optimizer.CG.step()
if (i + 1) % 100 == 0:
print(f'Epoch [{epoch + 1}/{epochs}], Step [{i + 1}/{len(dataloader)}], CD_loss: {CD_loss.item():.4f}, CG_loss: {CG_loss.item():.4f}')
三、DCGAN模型
1. DCGAN模型的数学原理
深度卷积生成对抗网络(DCGAN)是GAN的一种变体,它将卷积神经网络(CNN)应用于生成器和判别器。DCGAN的目标函数与GAN相同,但其网络结构有所不同,使得生成器和判别器能够更好地处理图像数据。
2. DCGAN模型的代码实现
以下是DCGAN的生成器和判别器的代码实现:
class DCGenerator(nn.Module):
def __init__(self, z_dim, img_channels):
super(DCGenerator, self).__init__()
self.model = nn.Sequential(
nn.ConvTranspose2d(z_dim, 256, 4, 1, 0, bias=False),
nn.BatchNorm2d(256),
nn.ReLU(True),
nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False),
nn.BatchNorm2d(128),
nn.ReLU(True),
nn.ConvTranspose2d(128, 64, 4, 2, 1, bias=False),
nn.BatchNorm2d(64),
nn.ReLU(True),
nn.ConvTranspose2d(64, img_channels, 4, 2, 1, bias=False),
nn.Tanh()
)
def forward(self, z):
z = z.view(z.size(0), z.size(1), 1, 1)
return self.model(z)
class DCDiscriminator(nn.Module):
def __init__(self, img_channels):
super(DCDiscriminator, self).__init__()
self.model = nn.Sequential(
nn.Conv2d(img_channels, 64, 4, 2, 1, bias=False),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(64, 128, 4, 2, 1, bias=False),
nn.BatchNorm2d(128),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(128, 256, 4, 2, 1, bias=False),
nn.BatchNorm2d(256),
nn.LeakyReLU(0.2, inplace=True),
nn.Conv2d(256, 1, 4, 1, 0, bias=False),
nn.Sigmoid()
)
def forward(self, img):
return self.model(img).view(img.size(0), -1)
# 设定超参数
z_dim = 100
img_channels = 1
img_size = 64
batch_size = 64
lr = 0.0002
epochs = 50
# 加载数据集
transform = transforms.Compose([
transforms.Resize(img_size),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
train_data = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
dataloader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
# 初始化生成器和判别器
DG = DCGenerator(z_dim, img_channels)
DD = DCDiscriminator(img_channels)
# 定义优化器
optimizer_DG = optim.Adam(DG.parameters(), lr=lr)
optimizer_DD = optim.Adam(DD.parameters(), lr=lr)
# 训练过程
for epoch in range(epochs):
for i, (imgs, _) in enumerate(dataloader):
# 训练判别器
real_imgs = imgs.view(-1, img_channels, img_size, img_size)
z = torch.randn(batch_size, z_dim, 1, 1)
fake_imgs = DG(z)
DD_real = DD(real_imgs)
DD_fake = DD(fake_imgs)
DD_loss = -torch.mean(torch.log(DD_real) + torch.log(1 - DD_fake))
optimizer_DD.zero_grad()
DD_loss.backward()
optimizer_DD.step()
# 训练生成器
z = torch.randn(batch_size, z_dim, 1, 1)
fake_imgs = DG(z)
DD_fake = DD(fake_imgs)
DG_loss = -torch.mean(torch.log(DD_fake))
optimizer_DG.zero_grad()
DG_loss.backward()
optimizer_DG.step()
if (i + 1) % 100 == 0:
print(f'Epoch [{epoch + 1}/{epochs}], Step [{i + 1}/{len(dataloader)}], DD_loss: {DD_loss.item():.4f}, DG_loss: {DG_loss.item():.4f}')
在上述代码中,DCGenerator使用了多个ConvTranspose2d
层来逐步增加图像的尺寸,而DCDiscriminator则使用了多个Conv2d
层来逐步减小图像的尺寸。每个卷积层后面都跟有批量归一化(BatchNorm)和激活函数(ReLU或LeakyReLU),这些是DCGAN的关键组成部分,有助于稳定训练过程。
四、总结
本文介绍了GAN、CGAN和DCGAN的数学原理,并通过PyTorch框架提供了完整的代码实现。通过这些代码,读者可以深入了解对抗生成模型的训练过程,并掌握图像生成的原理和技术。在实际应用中,这些模型可以用于图像合成、风格迁移、数据增强等多个领域。需要注意的是,训练GAN模型可能会遇到稳定性问题,因此在实际操作中可能需要调整超参数和模型结构以获得最佳效果。