Java 实现 YoloV7 人体姿态识别

1 OpenCV 环境的准备

这个项目中需要用到 opencv 进行图片的读取与处理操作，因此我们需要先配置一下 opencv 在 java 中运行的配置。

首先前往 opencv 官网下载 opencv-4.6 ：点此下载；下载好后仅选择路径后即可完成安装。

此时将 opencv\build\java\x64 路径下的 opencv_java460.dll 复制到 C:\Windows\System32 中，再将 D:\Tools\opencv\opencv\build\java 下的 opencv-460.jar 放到我们 Springboot 项目 resources 文件夹下的 lib 文件夹下。

本文所需 ONNX 文件请点此下载。

JAVA使用YOLOV7进行 目标检测 请转至 Java使用OnnxRuntime及OpenCV实现YoloV7目标检测，
项目代码可前往项目主页查看。

2 Maven 配置

引入 onnxruntime 和 opencv 这两个依赖即可。值得注意的是，引 opencv 时systemPath记得与上文说的opencv-460.jar所在路径保持一致。

<dependency>
    <groupId>com.microsoft.onnxruntime</groupId>
    <artifactId>onnxruntime</artifactId>
    <version>1.12.1</version>
</dependency>

<dependency>
    <groupId>org.opencv</groupId>
    <artifactId>opencv</artifactId>
    <version>4.6.0</version>
    <scope>system</scope>
    <systemPath>${project.basedir}/src/main/resources/lib/opencv-460.jar</systemPath>
</dependency>

3 Config

3.1 PEPlotConfig.java

在此配置一些画图时用到的超参数

package cn.halashuo.config;

import org.opencv.core.Scalar;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public final class PEPlotConfig {

    public static final List<Scalar> palette= new ArrayList<>(Arrays.asList(
            new Scalar( 255, 128, 0 ),
            new Scalar( 255, 153, 51 ),
            new Scalar( 255, 178, 102 ),
            new Scalar( 230, 230, 0 ),
            new Scalar( 255, 153, 255 ),
            new Scalar( 153, 204, 255 ),
            new Scalar( 255, 102, 255 ),
            new Scalar( 255, 51, 255 ),
            new Scalar( 102, 178, 255 ),
            new Scalar( 51, 153, 255 ),
            new Scalar( 255, 153, 153 ),
            new Scalar( 255, 102, 102 ),
            new Scalar( 255, 51, 51 ),
            new Scalar( 153, 255, 153 ),
            new Scalar( 102, 255, 102 ),
            new Scalar( 51, 255, 51 ),
            new Scalar( 0, 255, 0 ),
            new Scalar( 0, 0, 255 ),
            new Scalar( 255, 0, 0 ),
            new Scalar( 255, 255, 255 )
    ));

    public static final int[][] skeleton = {
            {16, 14}, {14, 12}, {17, 15}, {15, 13}, {12, 13}, {6, 12},
            {7, 13}, {6, 7}, {6, 8}, {7, 9}, {8, 10}, {9, 11}, {2, 3},
            {1, 2}, {1, 3}, {2, 4}, {3, 5}, {4, 6}, {5, 7}
    };

    public static final List<Scalar> poseLimbColor = new ArrayList<>(Arrays.asList(
            palette.get(9), palette.get(9), palette.get(9), palette.get(9), palette.get(7),
            palette.get(7), palette.get(7), palette.get(0), palette.get(0), palette.get(0),
            palette.get(0), palette.get(0), palette.get(16), palette.get(16), palette.get(16),
            palette.get(16), palette.get(16), palette.get(16), palette.get(16)));

    public static final List<Scalar> poseKptColor = new ArrayList<>(Arrays.asList(
            palette.get(16), palette.get(16), palette.get(16), palette.get(16), palette.get(16),
            palette.get(0), palette.get(0), palette.get(0), palette.get(0), palette.get(0),
            palette.get(0), palette.get(9), palette.get(9), palette.get(9), palette.get(9),
            palette.get(9), palette.get(9)));

}

4 Utils

3.1 Letterbox.java

这个类负责调整图像大小和填充图像，使满足步长约束，并记录参数。

package cn.halashuo.utils;

import org.opencv.core.Core;
import org.opencv.core.Mat;
import org.opencv.core.Size;
import org.opencv.imgproc.Imgproc;

public class Letterbox {

    private final Size newShape = new Size(1280, 1280);
    private final double[] color = new double[]{114,114,114};
    private final Boolean auto = false;
    private final Boolean scaleUp = true;
    private final Integer stride = 32;

    private double ratio;
    private double dw;
    private double dh;

    public double getRatio() {
        return ratio;
    }

    public double getDw() {
        return dw;
    }

    public Integer getWidth() {
        return (int) this.newShape.width;
    }

    public Integer getHeight() {
        return (int) this.newShape.height;
    }

    public double getDh() {
        return dh;
    }

    public Mat letterbox(Mat im) { // 调整图像大小和填充图像，使满足步长约束，并记录参数

        int[] shape = {im.rows(), im.cols()}; // 当前形状 [height, width]
        // Scale ratio (new / old)
        double r = Math.min(this.newShape.height / shape[0], this.newShape.width / shape[1]);
        if (!this.scaleUp) { // 仅缩小，不扩大（一起为了mAP）
            r = Math.min(r, 1.0);
        }
        // Compute padding
        Size newUnpad = new Size(Math.round(shape[1] * r), Math.round(shape[0] * r));
        double dw = this.newShape.width - newUnpad.width, dh = this.newShape.height - newUnpad.height; // wh 填充
        if (this.auto) { // 最小矩形
            dw = dw % this.stride;
            dh = dh % this.stride;
        }
        dw /= 2; // 填充的时候两边都填充一半，使图像居于中心
        dh /= 2;
        if (shape[1] != newUnpad.width || shape[0] != newUnpad.height) { // resize
            Imgproc.resize(im, im, newUnpad, 0, 0, Imgproc.INTER_LINEAR);
        }
        int top = (int) Math.round(dh - 0.1), bottom = (int) Math.round(dh + 0.1);
        int left = (int) Math.round(dw - 0.1), right = (int) Math.round(dw + 0.1);
        // 将图像填充为正方形
        Core.copyMakeBorder(im, im, top, bottom, left, right, Core.BORDER_CONSTANT, new org.opencv.core.Scalar(this.color));
        this.ratio = r;
        this.dh = dh;
        this.dw = dw;
        return im;
    }
}

3.2 NMS.java

这个类负责进行非极大值抑制，以筛选检测到的人。

package cn.halashuo.utils;

import cn.halashuo.domain.PEResult;

import java.util.ArrayList;
import java.util.List;

public class NMS {

    public static List<PEResult> nms(List<PEResult> boxes, float iouThreshold) {
        // 根据score从大到小对List进行排序
        boxes.sort((b1, b2) -> Float.compare(b2.getScore(), b1.getScore()));
        List<PEResult> resultList = new ArrayList<>();
        for (int i = 0; i < boxes.size(); i++) {
            PEResult box = boxes.get(i);
            boolean keep = true;
            // 从i+1开始，遍历之后的所有boxes，移除与box的IOU大于阈值的元素
            for (int j = i + 1; j < boxes.size(); j++) {
                PEResult otherBox = boxes.get(j);
                float iou = getIntersectionOverUnion(box, otherBox);
                if (iou > iouThreshold) {
                    keep = false;
                    break;
                }
            }
            if (keep) {
                resultList.add(box);
            }
        }
        return resultList;
    }
    private static float getIntersectionOverUnion(PEResult box1, PEResult box2) {
        float x1 = Math.max(box1.getX0(), box2.getX0());
        float y1 = Math.max(box1.getY0(), box2.getY0());
        float x2 = Math.min(box1.getX1(), box2.getX1());
        float y2 = Math.min(box1.getY1(), box2.getY1());
        float intersectionArea = Math.max(0, x2 - x1) * Math.max(0, y2 - y1);
        float box1Area = (box1.getX1() - box1.getX0()) * (box1.getY1() - box1.getY0());
        float box2Area = (box2.getX1() - box2.getX0()) * (box2.getY1() - box2.getY0());
        float unionArea = box1Area + box2Area - intersectionArea;
        return intersectionArea / unionArea;
    }
}

5 domain

5.1 KeyPoint.java

记录关键点信息的实体类。

package cn.halashuo.domain;

public class KeyPoint {
    private Integer id;
    private Float x;
    private Float y;
    private Float score;

    public KeyPoint(Integer id, Float x, Float y, Float score) {
        this.id = id;
        this.x = x;
        this.y = y;
        this.score = score;
    }

    public Integer getId() {
        return id;
    }

    public Float getX() {
        return x;
    }

    public Float getY() {
        return y;
    }

    public Float getScore() {
        return score;
    }

    @Override
    public String toString() {
        return "    第 " + (id+1) + " 个关键点: " +
                " x=" + x +
                " y=" + y +
                " c=" + score +
                "\n";
    }
}

5.2 PEResult.java

记录所有人物检测信息的实体类。

package cn.halashuo.domain;

import java.util.ArrayList;
import java.util.List;

public class PEResult {

    private Float x0;
    private Float y0;
    private Float x1;
    private Float y1;
    private Float score;
    private Integer clsId;
    private List<KeyPoint> keyPointList;

    public PEResult(float[] peResult) {
        float x = peResult[0];
        float y = peResult[1];
        float w = peResult[2]/2.0f;
        float h = peResult[3]/2.0f;
        this.x0 = x-w;
        this.y0 = y-h;
        this.x1 = x+w;
        this.y1 = y+h;
        this.score = peResult[4];
        this.clsId = (int) peResult[5];
        this.keyPointList = new ArrayList<>();
        int keyPointNum = (peResult.length-6)/3;
        for (int i=0;i<keyPointNum;i++) {
            this.keyPointList.add(new KeyPoint(i, peResult[6+3*i], peResult[6+3*i+1], peResult[6+3*i+2]));
        }
    }

    public Float getX0() {
        return x0;
    }

    public Float getY0() {
        return y0;
    }

    public Float getX1() {
        return x1;
    }

    public Float getY1() {
        return y1;
    }

    public Float getScore() {
        return score;
    }

    public Integer getClsId() {
        return clsId;
    }

    public List<KeyPoint> getKeyPointList() {
        return keyPointList;
    }

    @Override
    public String toString() {
        String result = "PEResult:" +
                "  x0=" + x0 +
                ", y0=" + y0 +
                ", x1=" + x1 +
                ", y1=" + y1 +
                ", score=" + score +
                ", clsId=" + clsId +
                "\n";
        for (KeyPoint x : keyPointList) {
            result = result + x.toString();
        }
        return result;
    }
}

6 PoseEstimation.java

设置好 ONNX 文件路径及需要识别的图片路径即可。如有需要也可设置 CUDA 作为运行环境，大幅提升 FPS。

package cn.halashuo;

import ai.onnxruntime.OnnxTensor;
import ai.onnxruntime.OrtEnvironment;
import ai.onnxruntime.OrtException;
import ai.onnxruntime.OrtSession;
import cn.halashuo.domain.KeyPoint;
import cn.halashuo.domain.PEResult;
import cn.halashuo.utils.Letterbox;
import cn.halashuo.utils.NMS;
import cn.halashuo.config.PEPlotConfig;
import org.opencv.core.*;
import org.opencv.highgui.HighGui;
import org.opencv.imgcodecs.Imgcodecs;
import org.opencv.imgproc.Imgproc;

import java.nio.FloatBuffer;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;

public class PoseEstimation {
    static
    {
        //在使用OpenCV前必须加载Core.NATIVE_LIBRARY_NAME类,否则会报错
        System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
    }

    public static void main(String[] args) throws OrtException {

        // 加载ONNX模型
        OrtEnvironment environment = OrtEnvironment.getEnvironment();
        OrtSession.SessionOptions sessionOptions = new OrtSession.SessionOptions();
        OrtSession session = environment.createSession("src\\main\\resources\\model\\yolov7-w6-pose.onnx", sessionOptions);
        // 输出基本信息
        session.getInputInfo().keySet().forEach(x -> {
            try {
                System.out.println("input name = " + x);
                System.out.println(session.getInputInfo().get(x).getInfo().toString());
            } catch (OrtException e) {
                throw new RuntimeException(e);
            }
        });

        // 读取 image
        Mat img = Imgcodecs.imread("src\\main\\resources\\image\\bus.jpg");
        Imgproc.cvtColor(img, img, Imgproc.COLOR_BGR2RGB);
        Mat image = img.clone();

        // 在这里先定义下线的粗细、关键的半径(按比例设置大小粗细比较好一些)
        int minDwDh = Math.min(img.width(), img.height());
        int thickness = minDwDh / 333;
        int radius = minDwDh / 168;

        // 更改 image 尺寸
        Letterbox letterbox = new Letterbox();
        letterbox.setNewShape(new Size(960, 960));
        letterbox.setStride(64);
        image = letterbox.letterbox(image);
        double ratio = letterbox.getRatio();
        double dw = letterbox.getDw();
        double dh = letterbox.getDh();
        int rows = letterbox.getHeight();
        int cols = letterbox.getWidth();
        int channels = image.channels();

        // 将Mat对象的像素值赋值给Float[]对象
        float[] pixels = new float[channels * rows * cols];
        for (int i = 0; i < rows; i++) {
            for (int j = 0; j < cols; j++) {
                double[] pixel = image.get(j, i);
                for (int k = 0; k < channels; k++) {
                    // 这样设置相当于同时做了image.transpose((2, 0, 1))操作
                    pixels[rows * cols * k + j * cols + i] = (float) pixel[k] / 255.0f;
                }
            }
        }

        // 创建OnnxTensor对象
        long[] shape = {1L, (long) channels, (long) rows, (long) cols};
        OnnxTensor tensor = OnnxTensor.createTensor(environment, FloatBuffer.wrap(pixels), shape);
        HashMap<String, OnnxTensor> stringOnnxTensorHashMap = new HashMap<>();
        stringOnnxTensorHashMap.put(session.getInputInfo().keySet().iterator().next(), tensor);

        // 运行模型
        OrtSession.Result output = session.run(stringOnnxTensorHashMap);

        // 得到结果
        float[][] outputData = ((float[][][]) output.get(0).getValue())[0];

        List<PEResult> peResults = new ArrayList<>();
        for (int i=0;i<outputData.length;i++){
            PEResult result = new PEResult(outputData[i]);
            if (result.getScore()>0.25f) {
                peResults.add(result);
            }
        }

        // 对结果进行非极大值抑制
        peResults = NMS.nms(peResults, 0.65f);

        for (PEResult peResult: peResults) {
            System.out.println(peResult);
            // 画框
            Point topLeft = new Point((peResult.getX0()-dw)/ratio, (peResult.getY0()-dh)/ratio);
            Point bottomRight = new Point((peResult.getX1()-dw)/ratio, (peResult.getY1()-dh)/ratio);
            Imgproc.rectangle(img, topLeft, bottomRight, new Scalar(255,0,0), thickness);
            List<KeyPoint> keyPoints = peResult.getKeyPointList();
            // 画点
            keyPoints.forEach(keyPoint->{
                if (keyPoint.getScore()>0.50f) {
                    Point center = new Point((keyPoint.getX()-dw)/ratio, (keyPoint.getY()-dh)/ratio);
                    Scalar color = PEPlotConfig.poseKptColor.get(keyPoint.getId());
                    Imgproc.circle(img, center, radius, color, -1); //-1表示实心
                }
            });
            // 画线
            for (int i=0;i<PEPlotConfig.skeleton.length;i++){
                int indexPoint1 = PEPlotConfig.skeleton[i][0]-1;
                int indexPoint2 = PEPlotConfig.skeleton[i][1]-1;
                if ( keyPoints.get(indexPoint1).getScore()>0.5f && keyPoints.get(indexPoint2).getScore()>0.5f ) {
                    Scalar coler = PEPlotConfig.poseLimbColor.get(i);
                    Point point1 = new Point(
                            (keyPoints.get(indexPoint1).getX()-dw)/ratio,
                            (keyPoints.get(indexPoint1).getY()-dh)/ratio
                    );
                    Point point2 = new Point(
                            (keyPoints.get(indexPoint2).getX()-dw)/ratio,
                            (keyPoints.get(indexPoint2).getY()-dh)/ratio
                    );
                    Imgproc.line(img, point1, point2, coler, thickness);
                }
            }
        }
        Imgproc.cvtColor(img, img, Imgproc.COLOR_RGB2BGR);
        // 保存图像
        // Imgcodecs.imwrite("image.jpg", img);
        HighGui.imshow("Display Image", img);
        // 等待按下任意键继续执行程序
        HighGui.waitKey();

    }
}

运行结果：

input name = images
TensorInfo(javaType=FLOAT,onnxType=ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT,shape=[1, 3, 960, 960])
PEResult:  x0=164.1142, y0=357.69672, x1=341.0573, y1=800.42975, score=0.7534131, clsId=0
    第 1 个关键点:  x=246.95715 y=397.0772 c=0.9736327
    第 2 个关键点:  x=252.17633 y=389.2904 c=0.8335564
    第 3 个关键点:  x=238.11848 y=389.0014 c=0.96936846
    第 4 个关键点:  x=256.3179 y=395.70413 c=0.26490688
    第 5 个关键点:  x=217.58188 y=395.21817 c=0.9350542
    第 6 个关键点:  x=261.88034 y=443.02286 c=0.9898272
    第 7 个关键点:  x=197.16116 y=444.8831 c=0.9871879
    第 8 个关键点:  x=288.0304 y=506.67032 c=0.96026266
    第 9 个关键点:  x=232.06883 y=506.9636 c=0.97538215
    第 10 个关键点:  x=239.84201 y=521.9722 c=0.9516981
    第 11 个关键点:  x=277.70163 y=489.83765 c=0.970703
    第 12 个关键点:  x=257.0904 y=574.55255 c=0.9929459
    第 13 个关键点:  x=208.6877 y=576.3064 c=0.9918982
    第 14 个关键点:  x=271.411 y=667.6743 c=0.98325074
    第 15 个关键点:  x=203.27112 y=671.0604 c=0.98317075
    第 16 个关键点:  x=285.6367 y=760.895 c=0.91636044
    第 17 个关键点:  x=184.19543 y=757.7814 c=0.91179585

PEResult:  x0=316.40674, y0=360.82706, x1=426.57, y1=759.98706, score=0.2713838, clsId=0
    第 1 个关键点:  x=381.07776 y=401.219 c=0.97289705
    第 2 个关键点:  x=386.32465 y=393.87543 c=0.8813406
    第 3 个关键点:  x=372.7825 y=394.40253 c=0.9677005
    第 4 个关键点:  x=392.475 y=397.61212 c=0.38852227
    第 5 个关键点:  x=358.659 y=399.40833 c=0.90877795
    第 6 个关键点:  x=402.89664 y=442.8764 c=0.9850693
    第 7 个关键点:  x=344.9049 y=448.23697 c=0.98851293
    第 8 个关键点:  x=414.33658 y=491.29187 c=0.932391
    第 9 个关键点:  x=342.81982 y=514.2552 c=0.97218585
    第 10 个关键点:  x=372.42307 y=471.12778 c=0.9217508
    第 11 个关键点:  x=355.56168 y=568.59796 c=0.9616347
    第 12 个关键点:  x=395.88492 y=558.541 c=0.98994935
    第 13 个关键点:  x=356.53287 y=560.3552 c=0.99083
    第 14 个关键点:  x=402.41013 y=636.82916 c=0.97681665
    第 15 个关键点:  x=356.1795 y=645.22626 c=0.9832493
    第 16 个关键点:  x=363.65356 y=694.9054 c=0.92282534
    第 17 个关键点:  x=358.54623 y=727.66455 c=0.93670356

PEResult:  x0=120.30354, y0=488.97424, x1=189.55516, y1=765.24194, score=0.2625065, clsId=0
    第 1 个关键点:  x=128.37527 y=496.36432 c=0.039970636
    第 2 个关键点:  x=129.2826 y=488.29858 c=0.01759991
    第 3 个关键点:  x=129.88588 y=487.7059 c=0.023388624
    第 4 个关键点:  x=128.1956 y=486.87085 c=0.022539705
    第 5 个关键点:  x=129.61555 y=486.93362 c=0.026325405
    第 6 个关键点:  x=124.60656 y=506.1516 c=0.11605239
    第 7 个关键点:  x=124.11076 y=506.29758 c=0.1002911
    第 8 个关键点:  x=129.53989 y=577.0432 c=0.39045402
    第 9 个关键点:  x=129.05757 y=578.36163 c=0.4030531
    第 10 个关键点:  x=161.94182 y=651.2286 c=0.51389414
    第 11 个关键点:  x=162.66849 y=654.58966 c=0.54413426
    第 12 个关键点:  x=128.37022 y=633.2864 c=0.12599188
    第 13 个关键点:  x=128.395 y=635.9184 c=0.110325515
    第 14 个关键点:  x=128.9154 y=668.3744 c=0.098092705
    第 15 个关键点:  x=129.4807 y=669.07947 c=0.08956778
    第 16 个关键点:  x=128.86487 y=750.24927 c=0.09377599
    第 17 个关键点:  x=127.63382 y=751.3636 c=0.086484134

PEResult:  x0=710.87134, y0=352.32605, x1=839.29944, y1=781.6887, score=0.2580245, clsId=0
    第 1 个关键点:  x=815.21063 y=390.9094 c=0.37949353
    第 2 个关键点:  x=819.77454 y=382.87204 c=0.34996593
    第 3 个关键点:  x=816.6579 y=382.68045 c=0.0947094
    第 4 个关键点:  x=831.6544 y=386.69308 c=0.3775956
    第 5 个关键点:  x=830.4774 y=386.01678 c=0.044245332
    第 6 个关键点:  x=828.6047 y=435.97873 c=0.62260723
    第 7 个关键点:  x=838.1829 y=433.01996 c=0.32877648
    第 8 个关键点:  x=817.08154 y=511.3317 c=0.7232578
    第 9 个关键点:  x=824.0419 y=505.8941 c=0.21007198
    第 10 个关键点:  x=773.95953 y=496.15784 c=0.80840695
    第 11 个关键点:  x=790.11487 y=490.05597 c=0.33966026
    第 12 个关键点:  x=826.98004 y=571.4592 c=0.6694445
    第 13 个关键点:  x=830.14514 y=567.2725 c=0.508251
    第 14 个关键点:  x=796.26184 y=655.2373 c=0.81898046
    第 15 个关键点:  x=802.0529 y=650.5082 c=0.6584172
    第 16 个关键点:  x=762.1977 y=747.01917 c=0.6550461
    第 17 个关键点:  x=763.58057 y=741.5452 c=0.5072014

使用 yolov7-w6-pose 的官方模型训练并转化成 onnx 后，得到的结果维度为 $n\times 57$ 。其中，前六个元素分别是 x、y、w、h、score、classId。关键点信息由x、y、score三个元素构成，共有17个关键点，因此每个人体监测信息共计 $3\times 17 + 6 = 57$ 个元素。