在当今的数字化世界中,验证码(CAPTCHA)是保护网站免受自动化攻击的重要工具。然而,对于用户来说,验证码有时可能会成为一种烦恼。为了解决这个问题,我们可以利用深度学习技术来自动识别验证码,从而提高用户体验。本文将介绍如何使用ResNet18模型来识别ImageCaptcha生成的验证码。
1. 环境设置与数据准备
首先,我们需要检查CUDA是否可用,以便利用GPU加速训练过程。
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f'Using device: {device}')
接下来,我们定义一个数据生成器CaptchaDataset
,它使用imagecaptcha
库生成验证码图像。
class CaptchaDataset(Dataset):
def __init__(self, length=1000, charset=None, captcha_length=5, transform=None):
self.length = length
self.transform = transform
self.charset = charset if charset is not None else string.ascii_letters + string.digits
self.captcha_length = captcha_length
self.num_classes = len(self.charset)
self.image_generator = ImageCaptcha(width=160, height=60)
def __len__(self):
return self.length
def __getitem__(self, idx):
text = ''.join(random.choices(self.charset, k=self.captcha_length))
image = self.image_generator.generate_image(text)
if self.transform:
image = self.transform(image)
label = [self.charset.index(c) for c in text]
return image, torch.tensor(label, dtype=torch.long)
2. 数据增强与预处理
为了提高模型的泛化能力,我们使用了一系列的数据增强和预处理步骤。
transform = transforms.Compose([
transforms.Grayscale(), # 将图像转换为灰度
transforms.Resize((40, 100)),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
3. 数据集划分与加载
我们将数据集划分为训练集和验证集,并使用DataLoader
进行批量加载。
dataset = CaptchaDataset(length=2000, charset=charset, captcha_length=captcha_length, transform=transform)
train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)
4. 模型定义与迁移学习
我们使用预训练的ResNet18模型,并对其进行微调以适应验证码识别任务。
class CaptchaModel(nn.Module):
def __init__(self, num_classes, captcha_length):
super(CaptchaModel, self).__init__()
self.captcha_length = captcha_length
self.resnet = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
self.resnet.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
num_ftrs = self.resnet.fc.in_features
self.resnet.fc = nn.Linear(num_ftrs, num_classes * self.captcha_length)
def forward(self, x):
x = self.resnet(x)
return x.view(-1, self.captcha_length, num_classes)
5. 训练与评估
我们定义了训练函数train_model
,并在每个epoch结束时保存模型检查点。
def train_model(epochs, resume=False):
start_epoch = 0
if resume and os.path.isfile("captcha_model_checkpoint.pth.tar"):
checkpoint = load_checkpoint()
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
start_epoch = checkpoint['epoch']
scaler = torch.cuda.amp.GradScaler()
for epoch in range(start_epoch, epochs):
model.train()
running_loss = 0.0
for images, labels in train_loader:
images, labels = images.to(device), labels.to(device)
optimizer.zero_grad()
with torch.cuda.amp.autocast():
outputs = model(images)
loss = sum(criterion(outputs[:, i, :], labels[:, i]) for i in range(captcha_length))
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
running_loss += loss.item()
val_accuracy = evaluate_accuracy(val_loader)
print(f'Epoch [{epoch+1}/{epochs}], Loss: {running_loss / len(train_loader):.4f}, Val Accuracy: {val_accuracy:.4f}')
save_checkpoint({
'epoch': epoch + 1,
'state_dict': model.state_dict(),
'optimizer': optimizer.state_dict(),
})
6. 可视化预测结果
最后,我们定义了一个函数visualize_predictions
来可视化模型的预测结果。
def visualize_predictions(num_samples=16):
model.eval()
samples, labels = next(iter(DataLoader(val_dataset, batch_size=num_samples, shuffle=True)))
samples, labels = samples.to(device), labels.to(device)
with torch.no_grad():
outputs = model(samples)
predicted = torch.argmax(outputs, dim=2)
samples = samples.cpu()
predicted = predicted.cpu()
labels = labels.cpu()
fig, axes = plt.subplots(4, 4, figsize=(10, 10))
for i in range(16):
ax = axes[i // 4, i % 4]
ax.imshow(samples[i].squeeze(), cmap='gray')
true_text = ''.join([dataset.charset[l] for l in labels[i]])
pred_text = ''.join([dataset.charset[p] for p in predicted[i]])
ax.set_title(f'True: {true_text}\nPred: {pred_text}')
ax.axis('off')
plt.show()
7. 训练与可视化
最后,我们调用train_model
函数进行模型训练,并使用visualize_predictions
函数来可视化模型的预测结果。
train_model(epochs=180, resume=True)
visualize_predictions()
通过上述步骤,我们成功地使用ResNet18模型来识别ImageCaptcha生成的验证码。这种方法不仅提高了验证码识别的准确性,还提升了用户体验。希望本文能为您在验证码识别领域的研究和应用提供有价值的参考。