AI 工程应用建筑表面检测及修复

news2026/2/12 12:25:46

文章目录

1 项目概述（必写）：
2 技术方案与实施步骤
- 2.1 模型选择（必写）：
- 2.2 数据的构建：
- 2.3 功能整合（进阶）：
3 实施步骤：
- 3.1 环境搭建（必写）：
- 3.2 代码实现（必写）：
- - 3.2.1 chat_agent
  - 3.2.2 界面
4 项目成果与展示：
- 4.1 应用场景展示(必写)：
- 4.2 功能演示（必写）：

NVIDIA AI-AGENT夏季训练营

项目名称：AI-AGENT夏季训练营 — RAG智能对话机器人

报告日期：2024年8月18日

项目负责人：赵志远

1 项目概述（必写）：

在这部分介绍项目的整体情况，包括项目的应用场景与亮点
本项目目的是根据建筑表面的混凝土缺陷照片，利用AI提出对应的解决方案。可以使用在建筑的验收、检测和修复中。本项目使用多模态技术，对混凝土缺陷的照片进行识别，根据识别出的缺陷种类，利用RAG技术，对混凝土的修复方法进行增强检索。

2 技术方案与实施步骤

技术方案和实施步骤：
使用microsoft/phi-3-vision-128k-instruct对图片进行识别。得到混凝土表面缺陷的种类。
使用RAG技术，根据识别出的种类在自定义的文本数据库中找到合适的修复方法。
利用大模型输出结果。

2.1 模型选择（必写）：

详细描述项目采用的技术方案，包括大模型的选择理由、RAG模型的优势分析。
图片识别模型采用了microsoft/phi-3-vision-128k-instruct，其是当前最先进的图片模型之一，经过大量的预训练，具有卓越的图片理解能力。
Retrieval-augmented Generation (RAG) 是一种将信息检索与生成模型结合的方法。
RAG有如下优势：
增强的知识获取
降低幻觉效应
精确性与上下文相关性
处理长文档与复杂查询
高效资源利用
多领域应用
大语言模型采用的meta/llama-3.1-405b-instruct，具有开源和准确度高的优点。

2.2 数据的构建：

对混凝土表面各种缺陷的处理方法进行整理为.txt文件，并使用langchain.vectorstores FAISS 进行向量化。

2.3 功能整合（进阶）：

使用图片识别、RAG技术，生成一个agent来输出混凝土表面缺陷的处理。

3 实施步骤：

3.1 环境搭建（必写）：

描述开发环境的搭建过程，包括必要的软件、库的安装与配置。
环境，使用了NVIDIA的 AI Foundation Endpoints 环境。

import os
import base64
from operator import itemgetter

import matplotlib.pyplot as plt
import numpy as np

from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnableLambda
from langchain.schema.runnable.passthrough import RunnableAssign
from langchain_core.runnables import RunnableBranch
from langchain_core.runnables import RunnablePassthrough
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.vectorstores import FAISS
from langchain.text_splitter import CharacterTextSplitter
import faiss

3.2 代码实现（必写）：

3.2.1 chat_agent

def chart_agent(image_b64, user_input, text):
    # Convert image to base64
    image_b64 = image2b64(image_b64)
    
    # Image reading using the model
    image_reading = ChatNVIDIA(model="microsoft/phi-3-vision-128k-instruct")
    result = image_reading.invoke(f'Identifying types of concrete defects: <img src="data:image/png;base64,{image_b64}" />')
    
    # Initialize the LLM for generating repair suggestions
    llm = ChatNVIDIA(model="meta/llama-3.1-405b-instruct")
    
    # Assuming `store` is provided and is correct
    retriever = store.as_retriever()
    
    # Define the prompt template including image analysis result and context
    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "Based on the image analysis and the following context, provide repair suggestions.\n"
                "<Image Analysis>\n{image_result}\n</Image Analysis>\n"
                "<Documents>\n{context}\n</Documents>"
            ),
            ("user", "{question}"),
        ]
    )
    
    # Invoke the chain with all the necessary inputs
    result_text = prompt.invoke({
        "context": retriever,
        "image_result": result.content,
        "question": user_input
    })
    
    # Run the LLM to get the final repair suggestions
    final_result = llm.invoke(result_text)
    
    return final_result

3.2.2 界面

import gradio as gr
multi_modal_chart_agent = gr.Interface(fn=chart_agent,
                    inputs=[gr.Image(label="Upload image", type="filepath"), 'text'],
                    outputs=['text'],
                    title="Multi Modal chat agent",
                    description="Multi Modal chat agent",
                    allow_flagging="never")

multi_modal_chart_agent.launch(debug=True, share=False, show_api=False, server_port=5001, server_name="0.0.0.0")