【数据集处理】FFHQ如何进行人脸对齐，Aligned and cropped images at 1024×1024

news2025/4/20 9:35:45

什么是人脸对齐？

人脸对齐是一种图像处理技术，旨在将图像中的人脸部分对齐到一个标准位置或形状。在许多情况下，这通常涉及将眼睛、鼻子和嘴巴等关键点对齐到特定的位置。通过这种方式，所有的人脸图像可以有一个一致的方向和尺寸，从而方便后续的处理和分析。

人脸对齐用来做什么？

标准化: 通过对齐，可以使所有的人脸图像具有相同的方向、尺寸和比例，这有助于后续的分析任务，如人脸识别、表情识别等。
增强特征: 对齐可以使得图像中的人脸特征更加清晰和稳定，从而提高诸如特征提取、匹配和分类等任务的准确性。
减少噪声和变形: 对于来自不同来源或角度的人脸图像，通过对齐可以减少由于视角、光照和遮挡等因素引入的变形和噪声。
增强人脸识别的准确性: 在人脸识别任务中，对齐的人脸图像可以提供更加一致和可靠的特征，从而提高识别的准确性和鲁棒性。

一定需要人脸对齐吗？

不是所有的应用场景都需要人脸对齐。是否需要进行人脸对齐取决于具体的应用和需求：

任务需求: 在某些任务，如人脸识别、表情分析和人脸年龄识别等，对齐可以显著提高性能和准确性。
应用场景: 在某些应用场景，例如社交媒体应用或无需进行详细分析的应用，可能不需要进行人脸对齐。
性能要求: 如果在特定的应用中，准确性和一致性是关键指标，那么人脸对齐可能是必要的。但如果只是进行一些简单的图像展示或可视化，那么可能可以不进行对齐。

总之，是否需要进行人脸对齐取决于具体的应用和目标。在某些情况下，对齐可以提供显著的优势，但在其他情况下，它可能并不是必需的。

FFHQ数据集有7w张，其中黄种人约有1.3w张，为了进一步提升GAN效果，可能会自己新增数据，那么如何将原始数据进行aligned and cropped达到可以使用呢？

官网介绍写的：
The dataset consists of 70,000 high-quality PNG images at 1024×1024 resolution and contains considerable variation in terms of age, ethnicity and image background. It also has good coverage of accessories such as eyeglasses, sunglasses, hats, etc. The images were crawled from Flickr, thus inheriting all the biases of that website, and automatically aligned and cropped using dlib. Only images under permissive licenses were collected. Various automatic filters were used to prune the set, and finally Amazon Mechanical Turk was used to remove the occasional statues, paintings, or photos of photos.

所以直接用dlib库检测到5个关键点，然后Aligned and cropped images at 1024×1024。

代码

这里有个仓库干了这件事：https://github.com/chi0tzp/FFHQFaceAlignment/tree/master

人脸检测可以用一些轻量好安装的，比如InsightFace：

https://qq742971636.blog.csdn.net/article/details/134556830

人脸检测我这里就不贴代码了，下面看看效果。

stylegan2的仓库：https://github.com/NVlabs/stylegan2

stylegan2用了1024×1024的人脸，太高清了，一般512*512都很大了。

原图：
在这里插入图片描述

对齐后

在这里插入图片描述

代码：

import os
import os.path as osp
import argparse
from tqdm import tqdm
import numpy as np
import cv2
import PIL.Image
import PIL.ImageFile
from PIL import Image
import scipy.ndimage


# kpts 左眼，右眼，鼻子，左嘴角，右嘴角
# array([[267.35327   , 310.13452   ,   0.90008646, 381.736     ,
#         320.14508   ,   0.89044243, 312.23892   , 394.6481    ,
#           0.9141436 , 263.4799    , 438.52295   ,   0.90634793,
#         362.49573   , 446.24716   ,   0.89808387]], dtype=float32)
# landmarks=kpts[0]

def align_crop_image(image, landmarks, transform_size=256):
    eye_left = landmarks[0:2]
    eye_right = landmarks[3:5]
    eye_avg = (eye_left + eye_right) * 0.5
    eye_to_eye = eye_right - eye_left
    mouth_left = landmarks[9:11]
    mouth_right = landmarks[12:14]
    mouth_avg = (mouth_left + mouth_right) * 0.5
    eye_to_mouth = mouth_avg - eye_avg

    # Choose oriented crop rectangle
    x = eye_to_eye - np.flipud(eye_to_mouth) * [-1, 1]
    x /= np.hypot(*x)
    x *= max(np.hypot(*eye_to_eye) * 2.0, np.hypot(*eye_to_mouth) * 1.8)
    y = np.flipud(x) * [-1, 1]
    c = eye_avg + eye_to_mouth * 0.1
    quad = np.stack([c - x - y, c - x + y, c + x + y, c + x - y])
    qsize = np.hypot(*x) * 2

    img = Image.fromarray(image)
    shrink = int(np.floor(qsize / transform_size * 0.5))
    if shrink > 1:
        rsize = (int(np.rint(float(img.size[0]) / shrink)), int(np.rint(float(img.size[1]) / shrink)))
        img = img.resize(rsize, Image.Resampling.LANCZOS)
        quad /= shrink
        qsize /= shrink

    # Crop
    border = max(int(np.rint(qsize * 0.1)), 3)
    crop = (int(np.floor(min(quad[:, 0]))), int(np.floor(min(quad[:, 1]))), int(np.ceil(max(quad[:, 0]))),
            int(np.ceil(max(quad[:, 1]))))
    crop = (max(crop[0] - border, 0), max(crop[1] - border, 0), min(crop[2] + border, img.size[0]),
            min(crop[3] + border, img.size[1]))
    if crop[2] - crop[0] < img.size[0] or crop[3] - crop[1] < img.size[1]:
        img = img.crop(crop)
        quad -= crop[0:2]

    # Pad
    pad = (int(np.floor(min(quad[:, 0]))), int(np.floor(min(quad[:, 1]))), int(np.ceil(max(quad[:, 0]))),
           int(np.ceil(max(quad[:, 1]))))
    pad = (max(-pad[0] + border, 0), max(-pad[1] + border, 0), max(pad[2] - img.size[0] + border, 0),
           max(pad[3] - img.size[1] + border, 0))
    enable_padding = True
    if enable_padding and max(pad) > border - 4:
        pad = np.maximum(pad, int(np.rint(qsize * 0.3)))
        img = np.pad(np.float32(img), ((pad[1], pad[3]), (pad[0], pad[2]), (0, 0)), 'reflect')
        h, w, _ = img.shape
        y, x, _ = np.ogrid[:h, :w, :1]
        # mask = np.maximum(1.0 - np.minimum(np.float32(x) / pad[0], np.float32(w - 1 - x) / pad[2]),
        #                   1.0 - np.minimum(np.float32(y) / pad[1], np.float32(h - 1 - y) / pad[3]))
        mask = np.maximum(1.0 - np.minimum(np.float32(x) / (pad[0] + 1e-12), np.float32(w - 1 - x) / (pad[2] + 1e-12)),
                          1.0 - np.minimum(np.float32(y) / (pad[1] + 1e-12), np.float32(h - 1 - y) / (pad[3] + 1e-12)))

        blur = qsize * 0.01
        img += (scipy.ndimage.gaussian_filter(img, [blur, blur, 0]) - img) * np.clip(mask * 3.0 + 1.0, 0.0, 1.0)
        img += (np.median(img, axis=(0, 1)) - img) * np.clip(mask, 0.0, 1.0)
        img = PIL.Image.fromarray(np.uint8(np.clip(np.rint(img), 0, 255)), 'RGB')

        quad += pad[:2]

    # Transform
    img = img.transform((transform_size, transform_size), Image.Transform.QUAD, (quad + 0.5).flatten(),
                        Image.Resampling.BILINEAR)

    return np.array(img)


if __name__ == "__main__":
    src = "1.jpg"
    img_src = cv2.imread(src, cv2.IMREAD_COLOR)
    landmarks = np.asarray([[267.35327, 310.13452, 0.90008646, 381.736,
                             320.14508, 0.89044243, 312.23892, 394.6481,
                             0.9141436, 263.4799, 438.52295, 0.90634793,
                             362.49573, 446.24716, 0.89808387]])[0]
    img = align_crop_image(img_src, landmarks, transform_size=512)
    cv2.imwrite("1_out.jpg", img)