mediapipe 谷歌高效ML框架-图像识别、人脸检测、关键点检测

news2025/1/23 22:38:37

参考:
https://github.com/google/mediapipe
https://developers.google.com/mediapipe/solutions/guide

框架也支持cv、nlp、audio等项目,速度很快:
在这里插入图片描述

1、图形识别

参考:https://developers.google.com/mediapipe/solutions/vision/object_detector/python
https://github.com/google/mediapipe/blob/master/docs/solutions/face_mesh.md

模型下载:https://developers.google.com/mediapipe/solutions/vision/object_detector
在这里插入图片描述
代码:

import cv2
import numpy as np

IMAGE_FILE="cat_dog.png"



MARGIN = 10  # pixels
ROW_SIZE = 10  # pixels
FONT_SIZE = 1
FONT_THICKNESS = 1
TEXT_COLOR = (255, 0, 0)  # red


def visualize(
    image,
    detection_result
) -> np.ndarray:
  """Draws bounding boxes on the input image and return it.
  Args:
    image: The input RGB image.
    detection_result: The list of all "Detection" entities to be visualize.
  Returns:
    Image with bounding boxes.
  """
  for detection in detection_result.detections:
    # Draw bounding_box
    bbox = detection.bounding_box
    start_point = bbox.origin_x, bbox.origin_y
    end_point = bbox.origin_x + bbox.width, bbox.origin_y + bbox.height
    cv2.rectangle(image, start_point, end_point, TEXT_COLOR, 3)

    # Draw label and score
    category = detection.categories[0]
    category_name = category.category_name
    probability = round(category.score, 2)
    result_text = category_name + ' (' + str(probability) + ')'
    text_location = (MARGIN + bbox.origin_x,
                     MARGIN + ROW_SIZE + bbox.origin_y)
    cv2.putText(image, result_text, text_location, cv2.FONT_HERSHEY_PLAIN,
                FONT_SIZE, TEXT_COLOR, FONT_THICKNESS)

  return image

# STEP 1: Import the necessary modules.
import numpy as np
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision

# STEP 2: Create an ObjectDetector object.
base_options = python.BaseOptions(model_asset_path='efficientdet_lite0.tflite')
options = vision.ObjectDetectorOptions(base_options=base_options,
                                       score_threshold=0.5)
detector = vision.ObjectDetector.create_from_options(options)

# STEP 3: Load the input image.
image = mp.Image.create_from_file(IMAGE_FILE)

# STEP 4: Detect objects in the input image.
detection_result = detector.detect(image)

# STEP 5: Process the detection result. In this case, visualize it.
image_copy = np.copy(image.numpy_view())
annotated_image = visualize(image_copy, detection_result)
rgb_annotated_image = cv2.cvtColor(annotated_image, cv2.COLOR_BGR2RGB)
# cv2_imshow(rgb_annotated_image)


cv2.imshow('my_window',rgb_annotated_image)
cv2.waitKey(0)

在这里插入图片描述

2、人脸检测

只输出检测坐标分类信息,没有向量等信息不可以用于后续人脸库检索,可能需要额外方法提取人脸向量特征

用高阶solutions接口,模型在安装mediapipe时就自动下载到如下modules目录了,solutions现在python支持的方法可以参考:

https://github.com/google/mediapipe/blob/master/docs/solutions/solutions.md
在这里插入图片描述
在这里插入图片描述

实时人脸 OpenCV摄像头:

import cv2
import time
import mediapipe as mp

class FaceDetector():
    def __init__(self, confidence=0.5, model=0) -> None:
        self.confidence = confidence
        self.model = model

        self.mp_draws = mp.solutions.drawing_utils
        self.mp_faces = mp.solutions.face_detection
        self.faces = self.mp_faces.FaceDetection(min_detection_confidence=confidence, model_selection=model)

    def face_detection(self, image, draw=True, position=False):
        img_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        results = self.faces.process(image)
        lst_box = list()

        if results.detections:
            if draw:
                for id, detection in enumerate(results.detections):
                    h, w, c = image.shape

                    r_bbox = detection.location_data.relative_bounding_box
                    print("-"*20)
                    bbox = int(r_bbox.xmin * w), int(r_bbox.ymin * h), \
                            int(r_bbox.width * w), int(r_bbox.height * h)
                    score = detection.score

                    print(bbox)
                    lst_box.append([id, bbox, score])
                    self.draw_box_detection(image, bbox, score)
                    # self.mp_draws.draw_detection(image, detection)
        return lst_box

    def draw_box_detection(self, image, bbox, score):
        xmin, ymin = bbox[0], bbox[1]
        h, w, c = image.shape
        l = 30

        cv2.rectangle(image, bbox, color=(255, 0, 255),  thickness=1)
        cv2.line(image, (xmin, ymin), (xmin+l, ymin), (255, 0, 255), thickness=5)
        cv2.line(image, (xmin, ymin), (xmin, ymin+l), (255, 0, 255), thickness=5)
        cv2.putText(image, f"{str(int(score[0] * 100))}%", (xmin, ymin - 10), 
                    cv2.FONT_HERSHEY_PLAIN, fontScale=1.3, 
                    color=(0, 255,0), thickness=1)


def main():
    capture = cv2.VideoCapture(0)
    face_detector = FaceDetector()
    prev_time = 0
    while True:
        sucess, frame = capture.read()
        lst_position = face_detector.face_detection(frame)
        if len(lst_position) != 0:
            print(lst_position[0])

        # calculate fps
        current_time = time.time()
        fps = 1 / (current_time - prev_time)
        prev_time = current_time

        # put fps of video in display
        cv2.putText(frame,  f"{str(int(fps))}", (19, 50),
                    cv2.FONT_HERSHEY_PLAIN, 1.5, 
                    (0, 255, 255), thickness=2)

        # display video window
        cv2.imshow("Video Display", frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    capture.release()
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()

实时人脸mesh(参数设置支持检测人脸数量max_num_faces
Maximum number of faces to detect. Default to 1. ):
with mp_face_mesh.FaceMesh(
max_num_faces=3,
refine_landmarks=True,
min_detection_confidence=0.5,
min_tracking_confidence=0.5) as face_mesh:

import cv2
import time
import mediapipe as mp

class FaceMesh():
    def __init__(self, mode=False, max_face=1, 
                 refine_landmarks=False, 
                 detect_confidence=0.5, track_confidence=0.5) -> None:
        self.mode = mode
        self.max_face = max_face
        self.refine_landmarks = refine_landmarks
        self.detect_confidence = detect_confidence
        self.track_confidence = track_confidence

        self.mp_draws = mp.solutions.drawing_utils
        self.mp_face_mesh = mp.solutions.face_mesh
        self.face_mesh = self.mp_face_mesh.FaceMesh(static_image_mode=self.mode,
                                                max_num_faces=self.max_face,
                                                refine_landmarks=self.refine_landmarks,
                                                min_detection_confidence=self.detect_confidence,
                                                min_tracking_confidence=self.track_confidence)

    def draw_mesh(self, image, thickness=1, circle_radius=1, color=(0,255, 0)):
        draw_spec = self.mp_draws.DrawingSpec(thickness=thickness, circle_radius=circle_radius, color=color)
        img_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        results = self.face_mesh.process(img_rgb)
        lst_mark = list()

        if results.multi_face_landmarks:
            h, w, c = image.shape
            for face_id, landmarks in enumerate(results.multi_face_landmarks):
                self.mp_draws.draw_landmarks(image, landmarks, 
                                             self.mp_face_mesh.FACEMESH_FACE_OVAL, draw_spec)
                for id,mark in enumerate(landmarks.landmark):
                    cx, cy = mark.x, mark.y
                    lst_mark.append([face_id, id, cx, cy])

        return lst_mark


def main():
    capture = cv2.VideoCapture(0)
    face_mesh = FaceMesh()
    prev_time = 0
    while True:
        sucess, frame = capture.read()
        lst_position = face_mesh.draw_mesh(frame)
        if len(lst_position) != 0:
            print(lst_position[0])

        # calculate fps
        current_time = time.time()
        fps = 1 / (current_time - prev_time)
        prev_time = current_time

        # put fps of video in display
        cv2.putText(frame,  f"{str(int(fps))}", (19, 50), cv2.FONT_HERSHEY_PLAIN, 1.5, (0, 255, 255), thickness=2)

        # display video window
        cv2.imshow("Video Display", frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    capture.release()
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()



import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_face_mesh = mp.solutions.face_mesh

# For static images:
IMAGE_FILES = []
drawing_spec = mp_drawing.DrawingSpec(thickness=1, circle_radius=1)
with mp_face_mesh.FaceMesh(
    static_image_mode=True,
    max_num_faces=1,
    refine_landmarks=True,
    min_detection_confidence=0.5) as face_mesh:
  for idx, file in enumerate(IMAGE_FILES):
    image = cv2.imread(file)
    # Convert the BGR image to RGB before processing.
    results = face_mesh.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

    # Print and draw face mesh landmarks on the image.
    if not results.multi_face_landmarks:
      continue
    annotated_image = image.copy()
    for face_landmarks in results.multi_face_landmarks:
      print('face_landmarks:', face_landmarks)
      mp_drawing.draw_landmarks(
          image=annotated_image,
          landmark_list=face_landmarks,
          connections=mp_face_mesh.FACEMESH_TESSELATION,
          landmark_drawing_spec=None,
          connection_drawing_spec=mp_drawing_styles
          .get_default_face_mesh_tesselation_style())
      mp_drawing.draw_landmarks(
          image=annotated_image,
          landmark_list=face_landmarks,
          connections=mp_face_mesh.FACEMESH_CONTOURS,
          landmark_drawing_spec=None,
          connection_drawing_spec=mp_drawing_styles
          .get_default_face_mesh_contours_style())
      mp_drawing.draw_landmarks(
          image=annotated_image,
          landmark_list=face_landmarks,
          connections=mp_face_mesh.FACEMESH_IRISES,
          landmark_drawing_spec=None,
          connection_drawing_spec=mp_drawing_styles
          .get_default_face_mesh_iris_connections_style())
    cv2.imwrite('/tmp/annotated_image' + str(idx) + '.png', annotated_image)

# For webcam input:
drawing_spec = mp_drawing.DrawingSpec(thickness=1, circle_radius=1)
cap = cv2.VideoCapture(0)
with mp_face_mesh.FaceMesh(
    max_num_faces=1,
    refine_landmarks=True,
    min_detection_confidence=0.5,
    min_tracking_confidence=0.5) as face_mesh:
  while cap.isOpened():
    success, image = cap.read()
    if not success:
      print("Ignoring empty camera frame.")
      # If loading a video, use 'break' instead of 'continue'.
      continue

    # To improve performance, optionally mark the image as not writeable to
    # pass by reference.
    image.flags.writeable = False
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    results = face_mesh.process(image)

    # Draw the face mesh annotations on the image.
    image.flags.writeable = True
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    if results.multi_face_landmarks:
      for face_landmarks in results.multi_face_landmarks:
        mp_drawing.draw_landmarks(
            image=image,
            landmark_list=face_landmarks,
            connections=mp_face_mesh.FACEMESH_TESSELATION,
            landmark_drawing_spec=None,
            connection_drawing_spec=mp_drawing_styles
            .get_default_face_mesh_tesselation_style())
        mp_drawing.draw_landmarks(
            image=image,
            landmark_list=face_landmarks,
            connections=mp_face_mesh.FACEMESH_CONTOURS,
            landmark_drawing_spec=None,
            connection_drawing_spec=mp_drawing_styles
            .get_default_face_mesh_contours_style())
        mp_drawing.draw_landmarks(
            image=image,
            landmark_list=face_landmarks,
            connections=mp_face_mesh.FACEMESH_IRISES,
            landmark_drawing_spec=None,
            connection_drawing_spec=mp_drawing_styles
            .get_default_face_mesh_iris_connections_style())
    # Flip the image horizontally for a selfie-view display.
    cv2.imshow('MediaPipe Face Mesh', cv2.flip(image, 1))
    if cv2.waitKey(5) & 0xFF == 27:
      break
cap.release()

在这里插入图片描述

3、关键点检测

参考:https://www.hackersrealm.net/post/realtime-human-pose-estimation-using-python
https://github.com/realsanjeev/Object-Detection-using-OpenCV
https://github.com/google/mediapipe/blob/master/docs/solutions/pose.md

import cv2
import mediapipe as mp
import time

class PoseDetector():
    def __init__(self, mode=False, complexity=1, smooth_landmarks=True,  
                 enable_segmentation=False, smooth_segmentation=True, 
                 detection_confidence=0.5, tracking_confidence=0.5) -> None:
        self.mode = mode
        self.complexity = complexity
        self.smooth_landmarks = smooth_landmarks
        self.enable_segmentation = enable_segmentation
        self.smooth_segmentations = smooth_segmentation
        self.detection_confidence = detection_confidence
        self.tracking_confidence = tracking_confidence

        self.mp_pose = mp.solutions.pose
        self.mp_draw = mp.solutions.drawing_utils
        self.poses = self.mp_pose.Pose(static_image_mode=self.mode,
                                  model_complexity=self.complexity, 
                                  smooth_landmarks=self.smooth_landmarks, 
                                  enable_segmentation=self.enable_segmentation, 
                                  smooth_segmentation=self.smooth_segmentations, 
                                  min_detection_confidence=self.detection_confidence, 
                                  min_tracking_confidence=self.tracking_confidence
                                  )
        
        
    def findPose(self, image, draw=True, postion_mark=False):
        img_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        results = self.poses.process(img_rgb)
        lst_mark_postion = list()
        if results.pose_landmarks:
            if draw:
                self.mp_draw.draw_landmarks(image, results.pose_landmarks, 
                                            self.mp_pose.POSE_CONNECTIONS)
        
        if postion_mark:
            for id, mark in enumerate(results.pose_landmarks.landmark):
                h, w, c = image.shape
                cx, cy = int(mark.x * w), int(mark.y * h)
                lst_mark_postion.append([id, cx, cy])
        return lst_mark_postion



pose_detector = PoseDetector()
cap = cv2.VideoCapture(0)

while cap.isOpened():
    # read frame
    _, frame = cap.read()
    try:
         # resize the frame for portrait video
        #  frame = cv2.resize(frame, (350, 600))
         # convert to RGB
         frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
         
         # process the frame for pose detection
         pose_results = pose_detector.poses.process(frame_rgb)
         # print(pose_results.pose_landmarks)
         
         # draw skeleton on the frame
         pose_detector.mp_draw.draw_landmarks(frame, pose_results.pose_landmarks, pose_detector.mp_pose.POSE_CONNECTIONS)
         # display the frame
         cv2.imshow('Output', frame)
    except:
        break
    
    if cv2.waitKey(1) == ord('q'):
        break
          
cap.release()
cv2.destroyAllWindows()

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/673244.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

【从零开始学习JAVA | 第九篇】字符串综合练习

前言: 在前一篇我们学习了String类以及两个接口函数,今天我们将利用昨天的知识以及讲解新的方法进行几个实战操作,以此来巩固我们的所学内容。 1.实现用户登录,对用户输入的密码进行验证 需求:已知正确的用户名和密码…

31 linux 中 用户栈帧 -> 内核栈帧

前言 比如 我们之前调试的 glibc 相关的库函数 glibc 相关是属于用户程序, 调用 操作系统的系统调用的时候, 会是 怎么样的一个情况呢? 系统调用 会有对应的系统栈帧来处理 系统调用的相关函数调用的堆栈支持 测试用例 我们这里主要是以 printf 中会分配缓冲区调用 ma…

NVIDIA Thrust教程

NVIDIA Thrust教程 Thrust 的 API 参考指南,CUDA C 模板库。 1.简介 Thrust 是基于标准模板库 (STL) 的 CUDA 的 C 模板库。 Thrust 允许您通过与 CUDA C 完全互操作的高级接口,以最少的编程工作实现高性能并行应用程序。 Thrust 提供了丰富的数据并…

windows自带的linux系统,从C盘迁移到D盘

1. 查看当前wsl版本和 运行状态 wsl -l -v wsl --list, -l 用于列出分发 本人电脑装的是Ubuntu-18.04&#xff0c;正在运行&#xff0c;版本1 2. 在D盘建linux目录&#xff0c;打包Ubuntu-18.04&#xff0c;导入到D盘的linux目录 wsl --export <DistributionName> &l…

9个最实用的PS插件盘点!

因为个人原因&#xff0c;对PS的插件用了不下 100 款&#xff0c;其中有好有坏&#xff0c;有优有劣&#xff0c;大浪淘沙&#xff0c;优胜劣汰&#xff0c;现在整理了自己觉得不错的 PS 插件。 1、Alien Skin Blow Up 3 for mac Blow Up 3 mac 版是 Macos 上一款 PS 图像无损放…

Apache Zeppelin系列教程第十篇——SQL Debug In Zeppelin

SQL Debug介绍 首先介绍下什么是SQL Debug&#xff1f; 但是经常有这样一个需求&#xff0c;一大段sql 跑出来之后&#xff0c;发现不是自己想要的结果&#xff1f;比如&#xff1a; demo 1: select id,name from ( select id,name from table1 union all select id,name fr…

web漏洞之文件上传漏洞

文章目录 一、漏洞原因二、漏洞危害三、漏洞利用1.三个条件2.利用方式3.绕过方式a.绕过JS验证① BP绕过② F12绕过③ 菜刀上传实操 b.绕过MIME-Type验证c.绕过黑名单验证① 直接修改后缀名绕过② htaccess绕过(有拦截)③ 大小写绕过(有拦截)④ 空格绕过⑤ .号绕过⑥ 特…

技术改变生活,开发者必须掌握这些技能

技术改变生活&#xff0c;开发者必须掌握这些技能 一、前言二、背景三、开发者必须掌握这些技能1. 语言与编程2. 数据结构与算法3. 开发框架与工具4. 应用开发与测试5. 团队协作与沟通 一、前言 随着科技的不断进步和发展&#xff0c;我们的生活方式也在不断地变化。互联网、智…

Session覆盖测试-业务安全测试实操(19)

弱Token设计缺陷测试,Session覆盖测试 Session覆盖测试 测试原理和方法 找回密码逻辑漏洞测试中也会遇到参数不可控的情况,比如要修改的用户名或者绑定的手机号无法在提交参数时修改,服务端通过读取当前session会话来判断要修改密码的账号,这种情况下能否对Session中的内容做…

【架构】洋葱架构

文章目录 前言一、为什么要用洋葱架构&#xff1f;二、原则2.1、依赖性2.2、数据封装2.3、关注点的分离2.4、耦合性 三、洋葱架构层四、领域模型/实体五、领域服务六、应用服务七、基础设施服务八、可观察性服务九、测试策略十、微服务十一、模块化与打包十二、框架、客户端和驱…

基于Java个人博客网站设计实现(源码+lw+部署文档+讲解等)

博主介绍&#xff1a; ✌全网粉丝30W,csdn特邀作者、博客专家、CSDN新星计划导师、java领域优质创作者,博客之星、掘金/华为云/阿里云/InfoQ等平台优质作者、专注于Java技术领域和毕业项目实战 ✌ &#x1f345; 文末获取源码联系 &#x1f345; &#x1f447;&#x1f3fb; 精…

基于Java游戏攻略网站设计实现(源码+lw+部署文档+讲解等)

博主介绍&#xff1a; ✌全网粉丝30W,csdn特邀作者、博客专家、CSDN新星计划导师、java领域优质创作者,博客之星、掘金/华为云/阿里云/InfoQ等平台优质作者、专注于Java技术领域和毕业项目实战 ✌ &#x1f345; 文末获取源码联系 &#x1f345; &#x1f447;&#x1f3fb; 精…

【Unityc#专题篇】之c#进阶篇

&#x1f468;‍&#x1f4bb;个人主页&#xff1a;元宇宙-秩沅 &#x1f468;‍&#x1f4bb; hallo 欢迎 点赞&#x1f44d; 收藏⭐ 留言&#x1f4dd; 加关注✅! &#x1f468;‍&#x1f4bb; 本文由 秩沅 原创 &#x1f468;‍&#x1f4bb; 收录于专栏&#xff1a;uni…

【C#进阶】C# 索引器

序号系列文章13【C#进阶】C# 特性14【C#进阶】C# 反射15【C#进阶】C# 属性 文章目录 前言1、索引器的概念2、索引器的定义3、索引器的基本使用4、索引器的重载5、接口中的索引器6、属性和索引器之间的比较7、索引器的适用场景结语 前言 &#x1f342; Hello大家好啊&#xff0c…

基于Java会员管理系统设计实现(源码+lw+部署文档+讲解等)

博主介绍&#xff1a; ✌全网粉丝30W,csdn特邀作者、博客专家、CSDN新星计划导师、java领域优质创作者,博客之星、掘金/华为云/阿里云/InfoQ等平台优质作者、专注于Java技术领域和毕业项目实战 ✌ &#x1f345; 文末获取源码联系 &#x1f345; &#x1f447;&#x1f3fb; 精…

从零开始 Spring Boot 46:@Lookup

从零开始 Spring Boot 46&#xff1a;Lookup 图源&#xff1a;简书 (jianshu.com) 在前文中&#xff0c;我介绍了 Spring Bean 的作用域&#xff08;Scope&#xff09;&#xff0c;且讨论了将一个短生命周期的 bean &#xff08;比如request作用域的 bean&#xff09;注入到长…

事务小总结

事务定义 是一个数据库操作序列&#xff0c;这些操作要么全部执行,要么全部不执行&#xff0c;是一个不可分割的工作&#xff08;程序执行&#xff09;单元。事务由事务开始与事务结束之间执行的全部数据库操作组成。 事务特性 原子性(Atomicity)一致性(Consistency)隔离性(…

Linux下 文件删除但是空间未被释放 或者 磁盘已满但找不到对应的大文件 的解决方案

Linux下文件删除但是空间未被释放的解决方案 前言1. 查看当前磁盘占用情况2. 模拟进程占用3. 执行rm -rf 命令删除文件4. 查看被删除但是未释放空间的文件5. 执行清空文件操作 前言 linux磁盘空间已满&#xff0c;手动rm -rf 删除了大文件之后&#xff0c;df -h 查看一下发现空…

操作系统第四章练习题

第一部分 教材习题&#xff08;P152&#xff09; 1、为什么要配置层次式存储器&#xff1f; 设置多个存储器能够使存储器两头的硬件能并行工作;采用多级存储系统,专门是Cache 技术,是减轻存储器带宽对系统性能影响的最佳结构方案;在微处置机内部设置各类缓冲存储器,减轻对存储…

【Android -- 面试】Android 面试题集锦(Java 基础)

Java 基础 1、Java 的类加载过程 jvm 将 .class 类文件信息加载到内存并解析成对应的 class 对象的过程&#xff0c; 注意&#xff1a;jvm 并不是一开始就把所有的类加载进内存中&#xff0c;只是在第一次遇到某个需要运行的类才会加载&#xff0c;并且只加载一次 主要分为三…