太阳能光伏板航拍红外图像缺陷分类数据集

news2025/4/12 21:54:14

太阳能光伏板航拍红外图像缺陷分类数据集

一、数据集简介

太阳能光伏板的性能直接影响到光伏发电系统的效率和可靠性。随着无人机和红外成像技术的发展，通过航拍红外图像对光伏板进行缺陷检测已成为一种高效且准确的方法。本数据集包含11种不同的缺陷分类，总计20000张图像，适用于基于深度学习的缺陷分类任务。这些图像均为近红外黑白图像，经过可视化处理，以便于观察和分析。
在这里插入图片描述

二、数据集详情

1. 缺陷分类

数据集中的11种缺陷分类包括但不限于以下几类：

裂纹（Cracks）：光伏板表面出现的裂痕。
热斑（Hot Spots）：局部温度异常升高的区域。
阴影（Shadows）：由于周围物体遮挡导致的阴影区域。
污染（Pollution）：表面灰尘或污垢。
断线（Broken Cells）：电池片内部的断线。
脱焊（Delamination）：电池片与背板之间的分离。
烧毁（Burnt Cells）：电池片因过热而烧毁。
缺失（Missing Cells）：电池片缺失。
划痕（Scratches）：表面划痕。
水渍（Water Stains）：表面水渍。
其他（Others）：未归类的其他缺陷。

2. 图像特点

图像格式：近红外黑白图像。
图像尺寸：统一为256x256像素。
图像数量：总计20000张图像，每类缺陷大约1818张图像。
图像处理：所有图像均经过可视化处理，增强了对比度和清晰度，便于观察和分析。

三、数据集结构

数据集的文件结构如下：

solar_panel_defect_dataset/
│
├── train/
│   ├── cracks/
│   ├── hot_spots/
│   ├── shadows/
│   ├── pollution/
│   ├── broken_cells/
│   ├── delamination/
│   ├── burnt_cells/
│   ├── missing_cells/
│   ├── scratches/
│   ├── water_stains/
│   └── others/
│
├── val/
│   ├── cracks/
│   ├── hot_spots/
│   ├── shadows/
│   ├── pollution/
│   ├── broken_cells/
│   ├── delamination/
│   ├── burnt_cells/
│   ├── missing_cells/
│   ├── scratches/
│   ├── water_stains/
│   └── others/
│
└── test/
    ├── cracks/
    ├── hot_spots/
    ├── shadows/
    ├── pollution/
    ├── broken_cells/
    ├── delamination/
    ├── burnt_cells/
    ├── missing_cells/
    ├── scratches/
    ├── water_stains/
    └── others/

train/：训练集，包含18000张图像。
val/：验证集，包含1000张图像。
test/：测试集，包含1000张图像。

四、数据处理代码

以下是一个完整的Python代码示例，展示如何加载和预处理数据集，以及如何使用深度学习模型进行缺陷分类。

1. 导入必要的库

import os
import numpy as np
import pandas as pd
import cv2
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

2. 加载数据集

def load_images_from_folder(folder):
    images = []
    labels = []
    for category in os.listdir(folder):
        path = os.path.join(folder, category)
        if not os.path.isdir(path):
            continue
        for img in os.listdir(path):
            img_path = os.path.join(path, img)
            image = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
            image = cv2.resize(image, (256, 256))
            images.append(image)
            labels.append(category)
    return np.array(images), np.array(labels)

train_folder = 'solar_panel_defect_dataset/train'
val_folder = 'solar_panel_defect_dataset/val'
test_folder = 'solar_panel_defect_dataset/test'

X_train, y_train = load_images_from_folder(train_folder)
X_val, y_val = load_images_from_folder(val_folder)
X_test, y_test = load_images_from_folder(test_folder)

# 将标签编码为整数
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_train = le.fit_transform(y_train)
y_val = le.transform(y_val)
y_test = le.transform(y_test)

# 归一化图像数据
X_train = X_train / 255.0
X_val = X_val / 255.0
X_test = X_test / 255.0

# 扩展维度
X_train = np.expand_dims(X_train, axis=-1)
X_val = np.expand_dims(X_val, axis=-1)
X_test = np.expand_dims(X_test, axis=-1)

3. 构建深度学习模型

def build_model():
    model = Sequential([
        Conv2D(32, (3, 3), activation='relu', input_shape=(256, 256, 1)),
        MaxPooling2D((2, 2)),
        Conv2D(64, (3, 3), activation='relu'),
        MaxPooling2D((2, 2)),
        Conv2D(128, (3, 3), activation='relu'),
        MaxPooling2D((2, 2)),
        Flatten(),
        Dense(128, activation='relu'),
        Dropout(0.5),
        Dense(11, activation='softmax')
    ])
    model.compile(optimizer=Adam(learning_rate=0.001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

model = build_model()
model.summary()

4. 训练模型

early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
model_checkpoint = ModelCheckpoint('best_model.h5', monitor='val_loss', save_best_only=True)

history = model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=32,
    validation_data=(X_val, y_val),
    callbacks=[early_stopping, model_checkpoint]
)

5. 评估模型

# 加载最佳模型
model.load_weights('best_model.h5')

# 评估模型
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f'Test accuracy: {test_acc:.4f}')

# 绘制训练和验证的损失和准确率曲线
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.legend()
plt.title('Loss')

plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.legend()
plt.title('Accuracy')

plt.show()

6. 预测和可视化

def plot_predictions(images, true_labels, predicted_labels, n=5):
    fig, axes = plt.subplots(n, 2, figsize=(10, 5 * n))
    for i in range(n):
        ax = axes[i, 0]
        ax.imshow(images[i].squeeze(), cmap='gray')
        ax.set_title(f'True: {true_labels[i]}')
        ax.axis('off')

        ax = axes[i, 1]
        ax.imshow(images[i].squeeze(), cmap='gray')
        ax.set_title(f'Predicted: {predicted_labels[i]}')
        ax.axis('off')
    plt.tight_layout()
    plt.show()

# 从测试集中随机选择一些图像进行预测
indices = np.random.choice(len(X_test), 5, replace=False)
sample_images = X_test[indices]
sample_true_labels = y_test[indices]
sample_predicted_labels = model.predict(sample_images).argmax(axis=1)

plot_predictions(sample_images, sample_true_labels, sample_predicted_labels)