【深度学习实战—9】：基于MediaPipe的坐姿检测

✨博客主页：王乐予🎈
✨年轻人要：Living for the moment（活在当下）！💪
🏆推荐专栏：【图像处理】【千锤百炼Python】【深度学习】【排序算法】

😺一、MediaPipe概述

MediaPipe 是一款由 Google Research 开发并开源的多媒体机器学习模型应用框架。

MediaPipe目前支持的解决方案(Solution)及支持的平台如下图所示：
在这里插入图片描述

😺二、MediaPipe姿态特征点检测

🐶2.1 概述

通过 MediaPipe Pose Marker，可以检测图片或视频中人体的特征点。使用此任务识别关键的身体位置，分析姿势并对动作进行分类。该任务会在图片坐标和三维世界坐标中输出身体姿势地标。

姿势特征点使用一系列模型来预测姿势特征点。第一个模型检测图片帧中是否存在人体，第二个模型则在身体上定位地标。

姿势特征点模型会跟踪 33 个身体特征点位置，表示以下身体部位的大致位置：
请添加图片描述
点位信息如下：

0 - nose
1 - left eye (inner)
2 - left eye
3 - left eye (outer)
4 - right eye (inner)
5 - right eye
6 - right eye (outer)
7 - left ear
8 - right ear
9 - mouth (left)
10 - mouth (right)
11 - left shoulder
12 - right shoulder
13 - left elbow
14 - right elbow
15 - left wrist
16 - right wrist
17 - left pinky
18 - right pinky
19 - left index
20 - right index
21 - left thumb
22 - right thumb
23 - left hip
24 - right hip
25 - left knee
26 - right knee
27 - left ankle
28 - right ankle
29 - left heel
30 - right heel
31 - left foot index
32 - right foot index

🐶2.2 度量函数

坐姿检测将使用不同关键点的向量夹角做判定，向量内角图如下：
在这里插入图片描述
内角计算与向量的起止顺序有关，在上图中，假定选择kpt1和kpt2为人体的两个关键点，kpt3为向量起始点即kpt1的垂直方向任意位置的点，则夹角为：
$\theta =\arccos (\frac{\overrightarrow{P_{12} } \times \overrightarrow{P_{13} } }{\left | \overrightarrow{P_{12} } \right | \left | \overrightarrow{P_{13} } \right | } )$

不妨设kpt3的y3坐标为0，则带入坐标值有：
$\theta =\arccos (\frac{y_{1}^{2} - y_{1}\times y_{2} }{y_{1}\sqrt{(x_{2}-x_{1})^{2}+(y_{2}-y_{1})^{2} } } )$

根据上图可知 $\theta$ 为锐角，如果向量方向为由kpt2指向kpt1，则需要在kpt2的垂直方向标记点kpt3，此时 $\theta$ 为钝角。

😺三、代码实现

utils.py：包含度量函数的定义与姿态检测函数
main.py：主函数，获取需要的关键点数据，绘图

🐶3.1 utils.py

import math as m


# 度量函数
def findAngle(x1, y1, x2, y2):
    theta = m.acos((y2 - y1) * (-y1) / (m.sqrt((x2 - x1) ** 2 + (y2 - y1) ** 2) * y1))
    degree = int(180/m.pi)*theta
    return degree

"""
歪头监控：计算 左耳（7点）和 右耳（8点）的夹角
低头监控：计算 左嘴角（9点）和 左肩膀（11点）的夹角
侧脸监控：计算 右眼内（4点）和 左耳（7点）的距离，计算 左眼内（1点）和 右耳（8点）的距离
高低肩监控：计算 左肩膀（11点）和 右肩膀（12点）的夹角            *****有的人左肩和右肩一个高一个低*****
撑桌监控：如果 左嘴角（9点）或者 右嘴角（10点）的 y 坐标 大于 左肩膀（11点）或 右肩膀（12点）的 y 坐标，视为撑桌
仰头监控：计算 鼻子（0点）和 左耳（7点）的夹角
趴桌监控：如果 左肩膀（11点）和 右肩膀（12点）的 归一化y坐标 之和大于0.75，判定为趴桌
"""
def all_detection(nose_x, nose_y,                               # 鼻子（0点）的 x 坐标 和 y 坐标
                  left_eye_inner_x, left_eye_inner_y,           # 左眼内（1点）的 x 坐标 和 y 坐标
                  right_eye_inner_x, right_eye_inner_y,         # 右眼内（4点）的 x 坐标 和 y 坐标
                  left_ear_x, left_ear_y,                       # 左耳（7点）的 x 坐标 和 y 坐标
                  right_ear_x, right_ear_y,                     # 右耳（8点）的 x 坐标 和 y 坐标
                  left_mouth_x, left_mouth_y,                   # 左嘴角（9点）的 x 坐标 和 y 坐标
                  right_mouth_x, right_mouth_y,                 # 右嘴角（10点）的 x 坐标 和 y 坐标
                  left_shoulder_x, left_shoulder_y,             # 左肩膀（11点）的 x 坐标 和 y 坐标
                  right_shoulder_x, right_shoulder_y,           # 右肩膀（12点）的 x 坐标 和 y 坐标
                  left_shoulder_x_norm, left_shoulder_y_norm,   # 归一化后的左肩膀（11点）的 x 坐标 和 y 坐标
                  right_shoulder_x_norm, right_shoulder_y_norm  # 归一化后的右肩膀（12点）的 x 坐标 和 y 坐标
                  ):
    waitou_inclination = findAngle(left_ear_x, left_ear_y, right_ear_x, right_ear_y)
    ditou_inclination = findAngle(left_mouth_x, left_mouth_y, left_shoulder_x, left_shoulder_y)
    gaodijian_inclination = findAngle(left_shoulder_x, left_shoulder_y, right_shoulder_x, right_shoulder_y)
    yangtou_inclination = findAngle(nose_x, nose_y, left_ear_x, left_ear_y)
    if waitou_inclination < 80:
        tmp = '左歪头'
    elif waitou_inclination > 100:
        tmp = '右歪头'
    elif (left_shoulder_y_norm + right_shoulder_y_norm) > 1.5:
        tmp = '趴桌'
    elif ditou_inclination < 115:
        tmp = '低头'
    elif left_ear_x < right_eye_inner_x:
        tmp = '左侧脸'
    elif right_ear_x > left_eye_inner_x:
        tmp = '右侧脸'
    elif gaodijian_inclination > 100:
        tmp = '高低肩'
    elif gaodijian_inclination < 80:
        tmp = '高低肩'
    elif (left_mouth_y or right_mouth_y) > (left_shoulder_y or right_shoulder_y):
        tmp = '撑桌'
    elif yangtou_inclination > 90:
        tmp = '仰头'
    else:
        tmp = '正脸'
    return tmp

🐶3.2 main.py

import cv2
import mediapipe as mp
from utils import *


mp_pose = mp.solutions.pose
pose = mp_pose.Pose(model_complexity=1, min_detection_confidence=0.5, min_tracking_confidence=0.5)
mp_drawing = mp.solutions.drawing_utils

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    h, w = frame.shape[:2]
    image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    keypoints = pose.process(image)

    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    lm = keypoints.pose_landmarks
    lmPose = mp_pose.PoseLandmark

    # 歪头监控
    left_ear_x = int(lm.landmark[lmPose.LEFT_EAR].x * w)    # 左耳（7点）x 坐标
    left_ear_y = int(lm.landmark[lmPose.LEFT_EAR].y * h)    # 左耳（7点）y 坐标
    right_ear_x = int(lm.landmark[lmPose.RIGHT_EAR].x * w)  # 右耳（8点）x 坐标
    right_ear_y = int(lm.landmark[lmPose.RIGHT_EAR].y * h)  # 右耳（8点）y 坐标

    # 低头监控
    left_mouth_x = int(lm.landmark[lmPose.MOUTH_LEFT].x * w)    # 左嘴角（9点）x 坐标
    left_mouth_y = int(lm.landmark[lmPose.MOUTH_LEFT].y * h)    # 左嘴角（9点）y 坐标
    left_shoulder_x = int(lm.landmark[lmPose.LEFT_SHOULDER].x * w)    # 左肩膀（11点）x 坐标
    left_shoulder_y = int(lm.landmark[lmPose.LEFT_SHOULDER].y * h)    # 左肩膀（11点）y 坐标

    # 侧脸监控
    left_eye_inner_x = int(lm.landmark[lmPose.LEFT_EYE_INNER].x * w)    # 左眼内（1点）x 坐标
    left_eye_inner_y = int(lm.landmark[lmPose.LEFT_EYE_INNER].y * h)    # 左眼内（1点）y 坐标
    right_eye_inner_x = int(lm.landmark[lmPose.RIGHT_EYE_INNER].x * w)  # 右眼内（4点）x 坐标
    right_eye_inner_y = int(lm.landmark[lmPose.RIGHT_EYE_INNER].y * h)  # 右眼内（4点）y 坐标

    # 高低肩监控
    right_shoulder_x = int(lm.landmark[lmPose.RIGHT_SHOULDER].x * w)  # 右肩膀（12点）x 坐标
    right_shoulder_y = int(lm.landmark[lmPose.RIGHT_SHOULDER].y * h)  # 右肩膀（12点）y 坐标

    # 撑桌监控
    right_mouth_x = int(lm.landmark[lmPose.MOUTH_RIGHT].x * w)  # 左嘴角（10点）x 坐标
    right_mouth_y = int(lm.landmark[lmPose.MOUTH_RIGHT].y * h)  # 左嘴角（10点）y 坐标

    # 仰头监控
    nose_x = int(lm.landmark[lmPose.NOSE].x * w)    # 鼻子（0点）x 坐标
    nose_y = int(lm.landmark[lmPose.NOSE].y * h)    # 鼻子（0点）y 坐标

    # 趴桌监控
    left_shoulder_x_norm = lm.landmark[lmPose.LEFT_SHOULDER].x  # 左肩膀（11点）x 坐标-归一化
    left_shoulder_y_norm = lm.landmark[lmPose.LEFT_SHOULDER].y  # 左肩膀（11点）y 坐标-归一化
    right_shoulder_x_norm = lm.landmark[lmPose.RIGHT_SHOULDER].x  # 右肩膀（12点）x 坐标-归一化
    right_shoulder_y_norm = lm.landmark[lmPose.RIGHT_SHOULDER].y  # 右肩膀（12点）y 坐标-归一化

    results = all_detection(nose_x, nose_y,
                  left_eye_inner_x, left_eye_inner_y,
                  right_eye_inner_x, right_eye_inner_y,
                  left_ear_x, left_ear_y,
                  right_ear_x, right_ear_y,
                  left_mouth_x, left_mouth_y,
                  right_mouth_x, right_mouth_y,
                  left_shoulder_x, left_shoulder_y,
                  right_shoulder_x, right_shoulder_y,
                  left_shoulder_x_norm, left_shoulder_y_norm,
                  right_shoulder_x_norm, right_shoulder_y_norm)
    print(results)


    mp_drawing.draw_landmarks(image, keypoints.pose_landmarks, mp_pose.POSE_CONNECTIONS)
    cv2.imshow("Image", image)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()