《使用 LangChain 进行大模型应用开发》学习笔记(四)

news2025/1/19 23:16:47

前言

本文是 Harrison Chase (LangChain 创建者)和吴恩达(Andrew Ng)的视频课程《LangChain for LLM Application Development》(使用 LangChain 进行大模型应用开发)的学习笔记。由于原课程为全英文视频课程,国内访问较慢,同时我整理和替换了部分内容以便于国内学习。阅读本文可快速学习课程内容。

课程介绍

本课程介绍了强大且易于扩展的 LangChain 框架,LangChain 框架是一款用于开发大语言模型(LLM)应用的开源框架,其使用提示词、记忆、链、代理等简化了大语言模型应用的开发工作。由于 LangChain 仍处于快速发展期,部分 API 还不稳定,课程中的部分代码已过时,我使用了目前最新的 v0.2 版本进行讲解,所有代码均可在 v0.2 版本下执行。另外,课程使用的 OpenAI 在国内难以访问,我替换为国内的 Kimi 大模型及开源自建的 Ollama,对于学习没有影响。

参考这篇文章来获取 Kimi 的 API 令牌。
参考这篇文章来用 Ollama 部署自己的大模型。

课程分为五个部分:

  • 第一部分
  • 第二部分
  • 第三部分
  • 第四部分
  • 第五部分

在这里插入图片描述

课程链接

第四部分

评估

构建问答应用

当构建一个复杂的 LLM 应用时,比较重要但又困难的是如何去评价应用的效果。又或者,当我们切换不同的 LLM 模型时,如何去评价模型的优劣。再者,当我们使用不同的向量数据库或参数时,对结果是变好了还是变坏了。接下来,我们将介绍如何来评估 LLM 应用的结果是否正确。

首先,我们创建一条之前使用的问答链。

from langchain.chains import RetrievalQA
from langchain_ollama import ChatOllama
from langchain_community.document_loaders import CSVLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import DocArrayInMemorySearch
from langchain.indexes import VectorstoreIndexCreator
from langchain.evaluation.qa import QAGenerateChain

# Ollama 服务地址
base_url = 'http://localhost:11434'
# 模型名称
llm_model = 'qwen2'
# 测试文件
file_path = 'product.csv'
# 创建模型
llm = ChatOllama(base_url=base_url, model=llm_model)
# 载入测试数据
loader = CSVLoader(file_path=file_path)
data = loader.load()
# 创建嵌入
embeddings = OllamaEmbeddings(base_url=base_url, model=llm_model)
# 创建向量索引
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings
).from_loaders([loader])
# 创建问答链
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=index.vectorstore.as_retriever(),
    verbose=True,
    chain_type_kwargs={
        "document_separator": "<<<<>>>>>"
    }
)

添加测试数据

我们可以添加一些测试数据,从 product.csv 中选取几条数据,例如第 11 和 12 条是下面这样:

11,高清投影仪,"高亮度,高对比度,支持高清视频播放,适合家庭影院和商务演示。"
12,智能手环,"监测心率、计步、睡眠,智能提醒,是健康生活的好伴侣。"

由于数据由 LLM 自动生成,数据可能都不相同。

我们设置问题并提供答案。这是一个字典 list,每个字典包含 query 和 answer。

examples = [
    {
        "query": "高清投影仪支持高清视频播放吗?",
        "answer": "是"
    },
    {
        "query": "哪一款产品能监测心率?",
        "answer": "智能手环"
    }
]

我们这里创建了两条测试数据,但还不够,手动创建比较费时间,有没有更自动的方式呢?我们可以让大语言模型自己来生成。在 LangChain 中我们可以使用 QAGenerateChain 来让 LLM 自动对每条数据生成测试问题和答案。

# 创建测试集生成链
example_gen_chain = QAGenerateChain.from_llm(llm)
# 生成并解析结果(由于需要调用 LLM,我们这里只取前 5 条)
new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
)
print(new_examples[0])

我们查看第一条生成的测试数据,大概像这个样子。我们可以检查每一条生成的测试数据,看是否正确、合适。

{'qa_pairs': {'query': 'What features does the high-definition smart television have?', 'answer': 'The high-definition smart television has a 4K ultra-high definition resolution, an integrated intelligent system, and supports voice control. It provides a rich entertainment experience.'}}

另外,我们可以打开调试模式,看看它是如何运作的。

import langchain
langchain.debug = True

将上述代码放到前面,然后重新运行代码。下面的输出比较长,查看前面主要的部分,我们可以看到 QAGenerateChain 链对每一条数据启动了子链,并生成了提示词,要求 LLM 作为老师,根据下面的数据生成提问和答案。最后按特定的格式输出,然后 LangChain 就可以解析到字典中。

[chain/start] [chain:QAGenerateChain] Entering Chain run with input:
[inputs]
[llm/start] [chain:QAGenerateChain > llm:ChatOllama] Entering LLM run with input:
{
  "prompts": [
    "Human: You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\n<Begin Document>\n...\n<End Document>\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\n<Begin Document>\npage_content='no: 1\nname: 高清智能电视\ndescription: 这款高清智能电视拥有4K超高清分辨率,内置智能系统,支持语音控制,提供丰富的娱乐体验。' metadata={'source': 'product.csv', 'row': 0}\n<End Document>"
  ]
}
[llm/start] [chain:QAGenerateChain > llm:ChatOllama] Entering LLM run with input:
{
  "prompts": [
    "Human: You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\n<Begin Document>\n...\n<End Document>\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\n<Begin Document>\npage_content='no: 2\nname: 多功能料理机\ndescription: 集搅拌、打蛋、榨汁等多种功能于一身,操作简便,是厨房里的得力助手。' metadata={'source': 'product.csv', 'row': 1}\n<End Document>"
  ]
}
[llm/start] [chain:QAGenerateChain > llm:ChatOllama] Entering LLM run with input:
{
  "prompts": [
    "Human: You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\n<Begin Document>\n...\n<End Document>\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\n<Begin Document>\npage_content='no: 3\nname: 无线蓝牙耳机\ndescription: 轻巧舒适,音质清晰,支持长时间续航,适合运动和日常使用。' metadata={'source': 'product.csv', 'row': 2}\n<End Document>"
  ]
}
[llm/start] [chain:QAGenerateChain > llm:ChatOllama] Entering LLM run with input:
{
  "prompts": [
    "Human: You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\n<Begin Document>\n...\n<End Document>\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\n<Begin Document>\npage_content='no: 4\nname: 智能扫地机器人\ndescription: 自动规划清扫路线,智能避障,解放双手,保持家中清洁。' metadata={'source': 'product.csv', 'row': 3}\n<End Document>"
  ]
}
[llm/start] [chain:QAGenerateChain > llm:ChatOllama] Entering LLM run with input:
{
  "prompts": [
    "Human: You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\n<Begin Document>\n...\n<End Document>\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\n<Begin Document>\npage_content='no: 5\nname: 便携式榨汁机\ndescription: 小巧便携,操作简便,快速榨汁,适合健康生活需求。' metadata={'source': 'product.csv', 'row': 4}\n<End Document>"
  ]
}
[llm/end] [chain:QAGenerateChain > llm:ChatOllama] [75.50s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "QUESTION: What features does the high-definition smart television have?\nANSWER: The high-definition smart television has a 4K ultra-high definition resolution, an integrated intelligent system, and supports voice control. It provides a rich entertainment experience.",
        "generation_info": {
          "model": "qwen2",
          "created_at": "2024-09-12T02:27:28.132404919Z",
          "message": {
            "role": "assistant",
            "content": ""
          },
          "done_reason": "stop",
          "done": true,
          "total_duration": 15075322258,
          "load_duration": 4068642657,
          "prompt_eval_count": 146,
          "prompt_eval_duration": 3419985000,
          "eval_count": 48,
          "eval_duration": 7545190000
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "QUESTION: What features does the high-definition smart television have?\nANSWER: The high-definition smart television has a 4K ultra-high definition resolution, an integrated intelligent system, and supports voice control. It provides a rich entertainment experience.",
            "response_metadata": {
              "model": "qwen2",
              "created_at": "2024-09-12T02:27:28.132404919Z",
              "message": {
                "role": "assistant",
                "content": ""
              },
              "done_reason": "stop",
              "done": true,
              "total_duration": 15075322258,
              "load_duration": 4068642657,
              "prompt_eval_count": 146,
              "prompt_eval_duration": 3419985000,
              "eval_count": 48,
              "eval_duration": 7545190000
            },
            "type": "ai",
            "id": "run-e2282df6-a2bb-4b75-bd94-c6ee8338b339-0",
            "usage_metadata": {
              "input_tokens": 146,
              "output_tokens": 48,
              "total_tokens": 194
            },
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ],
  "llm_output": null,
  "run": null
}
[llm/end] [chain:QAGenerateChain > llm:ChatOllama] [75.51s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "QUESTION: What is the multifunctional kitchen appliance mentioned in the document capable of doing?\nANSWER: The multifunctional kitchen appliance, known as '多功能料理机', can perform various tasks such as blending, whisking eggs, and juicing. It's designed for ease of operation and serves as a helpful tool in the kitchen.",
        "generation_info": {
          "model": "qwen2",
          "created_at": "2024-09-12T02:27:40.655928024Z",
          "message": {
            "role": "assistant",
            "content": ""
          },
          "done_reason": "stop",
          "done": true,
          "total_duration": 12512086251,
          "load_duration": 62599702,
          "prompt_eval_count": 145,
          "prompt_eval_duration": 1594234000,
          "eval_count": 69,
          "eval_duration": 10853358000
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "QUESTION: What is the multifunctional kitchen appliance mentioned in the document capable of doing?\nANSWER: The multifunctional kitchen appliance, known as '多功能料理机', can perform various tasks such as blending, whisking eggs, and juicing. It's designed for ease of operation and serves as a helpful tool in the kitchen.",
            "response_metadata": {
              "model": "qwen2",
              "created_at": "2024-09-12T02:27:40.655928024Z",
              "message": {
                "role": "assistant",
                "content": ""
              },
              "done_reason": "stop",
              "done": true,
              "total_duration": 12512086251,
              "load_duration": 62599702,
              "prompt_eval_count": 145,
              "prompt_eval_duration": 1594234000,
              "eval_count": 69,
              "eval_duration": 10853358000
            },
            "type": "ai",
            "id": "run-db59bd5a-e8c5-4ce4-be93-477b1f7beeeb-0",
            "usage_metadata": {
              "input_tokens": 145,
              "output_tokens": 69,
              "total_tokens": 214
            },
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ],
  "llm_output": null,
  "run": null
}
[llm/end] [chain:QAGenerateChain > llm:ChatOllama] [75.51s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "QUESTION: What are the features of the product with the name \"无线蓝牙耳机\" (Wireless Bluetooth Earphones)?\n\nANSWER: The product named \"无线蓝牙耳机\" offers several features including:\n1. **Lightweight and Comfortable**: The earphones are designed to be lightweight, ensuring comfort during use.\n2. **Crystal Clear Sound Quality**: It provides clear sound quality for an enjoyable listening experience.\n3. **Long Battery Life**: The headphones support a long duration of battery usage, making them suitable for both sports activities and everyday use.\n4. **Versatile Use**: They can be used while exercising or in daily routines without any issues due to their versatile design and functionality.\n\nThese features highlight the product's suitability for users who value convenience, comfort, and audio quality in their listening devices.",
        "generation_info": {
          "model": "qwen2",
          "created_at": "2024-09-12T02:28:06.427487738Z",
          "message": {
            "role": "assistant",
            "content": ""
          },
          "done_reason": "stop",
          "done": true,
          "total_duration": 25761127075,
          "load_duration": 63109381,
          "prompt_eval_count": 139,
          "prompt_eval_duration": 1397453000,
          "eval_count": 162,
          "eval_duration": 24259968000
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "QUESTION: What are the features of the product with the name \"无线蓝牙耳机\" (Wireless Bluetooth Earphones)?\n\nANSWER: The product named \"无线蓝牙耳机\" offers several features including:\n1. **Lightweight and Comfortable**: The earphones are designed to be lightweight, ensuring comfort during use.\n2. **Crystal Clear Sound Quality**: It provides clear sound quality for an enjoyable listening experience.\n3. **Long Battery Life**: The headphones support a long duration of battery usage, making them suitable for both sports activities and everyday use.\n4. **Versatile Use**: They can be used while exercising or in daily routines without any issues due to their versatile design and functionality.\n\nThese features highlight the product's suitability for users who value convenience, comfort, and audio quality in their listening devices.",
            "response_metadata": {
              "model": "qwen2",
              "created_at": "2024-09-12T02:28:06.427487738Z",
              "message": {
                "role": "assistant",
                "content": ""
              },
              "done_reason": "stop",
              "done": true,
              "total_duration": 25761127075,
              "load_duration": 63109381,
              "prompt_eval_count": 139,
              "prompt_eval_duration": 1397453000,
              "eval_count": 162,
              "eval_duration": 24259968000
            },
            "type": "ai",
            "id": "run-3dc06185-da4e-4b56-b615-dcf831157fb2-0",
            "usage_metadata": {
              "input_tokens": 139,
              "output_tokens": 162,
              "total_tokens": 301
            },
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ],
  "llm_output": null,
  "run": null
}
[llm/end] [chain:QAGenerateChain > llm:ChatOllama] [75.51s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "QUESTION: What is the product being described and what are its main features?\n\nANSWER: The product being described is a \"智能扫地机器人\" (intelligent floor sweeping robot). Its main features include automatic planning of cleaning routes, intelligent obstacle avoidance, freeing up hands, and maintaining home cleanliness.",
        "generation_info": {
          "model": "qwen2",
          "created_at": "2024-09-12T02:28:17.028896159Z",
          "message": {
            "role": "assistant",
            "content": ""
          },
          "done_reason": "stop",
          "done": true,
          "total_duration": 10589442660,
          "load_duration": 26054599,
          "prompt_eval_count": 139,
          "prompt_eval_duration": 1401741000,
          "eval_count": 61,
          "eval_duration": 9159878000
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "QUESTION: What is the product being described and what are its main features?\n\nANSWER: The product being described is a \"智能扫地机器人\" (intelligent floor sweeping robot). Its main features include automatic planning of cleaning routes, intelligent obstacle avoidance, freeing up hands, and maintaining home cleanliness.",
            "response_metadata": {
              "model": "qwen2",
              "created_at": "2024-09-12T02:28:17.028896159Z",
              "message": {
                "role": "assistant",
                "content": ""
              },
              "done_reason": "stop",
              "done": true,
              "total_duration": 10589442660,
              "load_duration": 26054599,
              "prompt_eval_count": 139,
              "prompt_eval_duration": 1401741000,
              "eval_count": 61,
              "eval_duration": 9159878000
            },
            "type": "ai",
            "id": "run-a489b5fe-7798-41f0-8380-e9bde0e8a889-0",
            "usage_metadata": {
              "input_tokens": 139,
              "output_tokens": 61,
              "total_tokens": 200
            },
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ],
  "llm_output": null,
  "run": null
}
[llm/end] [chain:QAGenerateChain > llm:ChatOllama] [75.51s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "QUESTION: What is the product described in this document?\nANSWER: The product described in this document is a portable juicer named \"便携式榨汁机\" (portable juice extractor). It's characterized as compact, easy to operate, and capable of quickly extracting juice, making it suitable for needs related to health living.",
        "generation_info": {
          "model": "qwen2",
          "created_at": "2024-09-12T02:28:28.529352164Z",
          "message": {
            "role": "assistant",
            "content": ""
          },
          "done_reason": "stop",
          "done": true,
          "total_duration": 11484086566,
          "load_duration": 62195060,
          "prompt_eval_count": 140,
          "prompt_eval_duration": 1362653000,
          "eval_count": 68,
          "eval_duration": 10018610000
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "QUESTION: What is the product described in this document?\nANSWER: The product described in this document is a portable juicer named \"便携式榨汁机\" (portable juice extractor). It's characterized as compact, easy to operate, and capable of quickly extracting juice, making it suitable for needs related to health living.",
            "response_metadata": {
              "model": "qwen2",
              "created_at": "2024-09-12T02:28:28.529352164Z",
              "message": {
                "role": "assistant",
                "content": ""
              },
              "done_reason": "stop",
              "done": true,
              "total_duration": 11484086566,
              "load_duration": 62195060,
              "prompt_eval_count": 140,
              "prompt_eval_duration": 1362653000,
              "eval_count": 68,
              "eval_duration": 10018610000
            },
            "type": "ai",
            "id": "run-5709894f-ab18-4a1e-9e7b-0b8acd1eeb6a-0",
            "usage_metadata": {
              "input_tokens": 140,
              "output_tokens": 68,
              "total_tokens": 208
            },
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ],
  "llm_output": null,
  "run": null
}
[chain/end] [chain:QAGenerateChain] [75.51s] Exiting Chain run with output:
{
  "outputs": [
    {
      "qa_pairs": {
        "query": "What features does the high-definition smart television have?",
        "answer": "The high-definition smart television has a 4K ultra-high definition resolution, an integrated intelligent system, and supports voice control. It provides a rich entertainment experience."
      }
    },
    {
      "qa_pairs": {
        "query": "What is the multifunctional kitchen appliance mentioned in the document capable of doing?",
        "answer": "The multifunctional kitchen appliance, known as '多功能料理机', can perform various tasks such as blending, whisking eggs, and juicing. It's designed for ease of operation and serves as a helpful tool in the kitchen."
      }
    },
    {
      "qa_pairs": {
        "query": "What are the features of the product with the name \"无线蓝牙耳机\" (Wireless Bluetooth Earphones)?",
        "answer": "The product named \"无线蓝牙耳机\" offers several features including:"
      }
    },
    {
      "qa_pairs": {
        "query": "What is the product being described and what are its main features?",
        "answer": "The product being described is a \"智能扫地机器人\" (intelligent floor sweeping robot). Its main features include automatic planning of cleaning routes, intelligent obstacle avoidance, freeing up hands, and maintaining home cleanliness."
      }
    },
    {
      "qa_pairs": {
        "query": "What is the product described in this document?",
        "answer": "The product described in this document is a portable juicer named \"便携式榨汁机\" (portable juice extractor). It's characterized as compact, easy to operate, and capable of quickly extracting juice, making it suitable for needs related to health living."
      }
    }
  ]
}

接着,我们将手动创建的测试数据和自动创建的合并。

all_examples = examples + [ex['qa_pairs'] for ex in new_examples]

手动评估

我们让 LLM 来回答我们测试数据集中的问题,首先测试第一条手动添加的问题。

response = qa.run(examples[0]["query"])
print(response)

调试模式下的输出类似下面这样。

[chain/start] [chain:RetrievalQA] Entering Chain run with input:
{
  "query": "高清投影仪支持高清视频播放吗?"
}
[chain/start] [chain:RetrievalQA > chain:StuffDocumentsChain] Entering Chain run with input:
[inputs]
[chain/start] [chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] Entering Chain run with input:
{
  "question": "高清投影仪支持高清视频播放吗?",
  "context": "no: 11\nname: 高清投影仪\ndescription: 高亮度,高对比度,支持高清视频播放,适合家庭影院和商务演示。<<<<>>>>>no: 22\nname: 智能跑步机\ndescription: 多种运动模式,智能记录运动数据,适合家庭健身。<<<<>>>>>no: 12\nname: 智能手环\ndescription: 监测心率、计步、睡眠,智能提醒,是健康生活的好伴侣。<<<<>>>>>no: 4\nname: 智能扫地机器人\ndescription: 自动规划清扫路线,智能避障,解放双手,保持家中清洁。"
}
[llm/start] [chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain > llm:ChatOllama] Entering LLM run with input:
{
  "prompts": [
    "System: Use the following pieces of context to answer the user's question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\nno: 11\nname: 高清投影仪\ndescription: 高亮度,高对比度,支持高清视频播放,适合家庭影院和商务演示。<<<<>>>>>no: 22\nname: 智能跑步机\ndescription: 多种运动模式,智能记录运动数据,适合家庭健身。<<<<>>>>>no: 12\nname: 智能手环\ndescription: 监测心率、计步、睡眠,智能提醒,是健康生活的好伴侣。<<<<>>>>>no: 4\nname: 智能扫地机器人\ndescription: 自动规划清扫路线,智能避障,解放双手,保持家中清洁。\nHuman: 高清投影仪支持高清视频播放吗?"
  ]
}
[llm/end] [chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain > llm:ChatOllama] [6.70s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "是的,高清投影仪支持高清视频播放。",
        "generation_info": {
          "model": "qwen2",
          "created_at": "2024-09-12T02:45:31.841247748Z",
          "message": {
            "role": "assistant",
            "content": ""
          },
          "done_reason": "stop",
          "done": true,
          "total_duration": 6682410396,
          "load_duration": 25734266,
          "prompt_eval_count": 211,
          "prompt_eval_duration": 5067573000,
          "eval_count": 12,
          "eval_duration": 1532113000
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "是的,高清投影仪支持高清视频播放。",
            "response_metadata": {
              "model": "qwen2",
              "created_at": "2024-09-12T02:45:31.841247748Z",
              "message": {
                "role": "assistant",
                "content": ""
              },
              "done_reason": "stop",
              "done": true,
              "total_duration": 6682410396,
              "load_duration": 25734266,
              "prompt_eval_count": 211,
              "prompt_eval_duration": 5067573000,
              "eval_count": 12,
              "eval_duration": 1532113000
            },
            "type": "ai",
            "id": "run-6ef3e8d8-425e-4f61-9c1d-9925a2277e8f-0",
            "usage_metadata": {
              "input_tokens": 211,
              "output_tokens": 12,
              "total_tokens": 223
            },
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ],
  "llm_output": null,
  "run": null
}
[chain/end] [chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] [6.70s] Exiting Chain run with output:
{
  "text": "是的,高清投影仪支持高清视频播放。"
}
[chain/end] [chain:RetrievalQA > chain:StuffDocumentsChain] [6.71s] Exiting Chain run with output:
{
  "output_text": "是的,高清投影仪支持高清视频播放。"
}
[chain/end] [chain:RetrievalQA] [7.29s] Exiting Chain run with output:
{
  "result": "是的,高清投影仪支持高清视频播放。"
}
是的,高清投影仪支持高清视频播放。

可以看到,这里使用 stuff 链并生成了提示词,将我们的数据也一并提交给了 LLM,LLM 给出的答案是:是的,高清投影仪支持高清视频播放。答案并不一模一样,但意思是一样的。

让 LLM 自我评估

那如果我们要对所有数据进行测试呢?也需要一条条比对吗?我们也可以让 LLM 来帮助我们做这些。LangChain 提供了 QAEvalChain 链来自动评估结果。

我们可以先关闭调试模式 langchain.debug = False,避免过多的内容输出。

from langchain.evaluation.qa import QAEvalChain

# 获得所有测试数据的预测结果
predictions = qa.apply(all_examples)
# 可以使用之前的 LLM 模型,也可以使用一个新的模型
llm = ChatOllama(base_url=base_url, model=llm_model)
# 创建评估链
eval_chain = QAEvalChain.from_llm(llm)
# 获得评估结果
graded_outputs = eval_chain.evaluate(all_examples, predictions)
# 遍历输出结果
for i, eg in enumerate(all_examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['results'])
    print()

输出类似如下所示。

Example 0:
Question: 高清投影仪支持高清视频播放吗?
Real Answer: 是
Predicted Answer: 是的,高清投影仪支持高清视频播放。
Predicted Grade: CORRECT

Example 1:
Question: 哪一款产品能监测心率?
Real Answer: 智能手环
Predicted Answer: 智能手环能监测心率。
Predicted Grade: CORRECT

Example 2:
Question: What features does the high-definition smart TV have according to the document?
Real Answer: The high-definition smart TV mentioned in the document has several notable features. It boasts a 4K ultra-high definition resolution, indicating an exceptionally clear picture quality. Additionally, it is equipped with an internal smart system which allows for various interactive functionalities. One of these capabilities includes voice control, suggesting users can operate or navigate through its features using their voice commands. Lastly, the TV offers a rich entertainment experience, implying that it may include access to streaming services, internet connectivity, and other multimedia content options to ensure users enjoy a varied range of programming.
Predicted Answer: I'm sorry, but I don't know the answer because the provided context doesn't mention a "high-definition smart TV". The context includes information about a high-definition projector, an automatic coffee machine, and an intelligent treadmill.
Predicted Grade: INCORRECT

Example 3:
Question: What is the product described in this document?
Real Answer: The product described in this document is a "multifunctional kitchen appliance" which combines various functions such as mixing, beating eggs and juicing. It's noted for its ease of use, making it a helpful tool in the kitchen.
Predicted Answer: The document describes several different products:

1. 高清投影仪 - A high-definition projector with high brightness and contrast, suitable for home cinema and business presentations.
2. 无线蓝牙耳机 - Wireless Bluetooth headphones that are lightweight, comfortable to wear, have clear sound quality, and offer long battery life, suitable for sports and daily use.
3. 全自动咖啡机 - An automated coffee machine that allows one-button operation and offers multiple coffee flavor choices, providing a professional coffee experience.
4. 智能跑步机 - A smart treadmill with various exercise modes and the ability to record workout data automatically, suitable for home fitness routines.

Each product has been characterized by its unique features and application scenarios as detailed in their descriptions.
Predicted Grade: INCORRECT

Example 4:
Question: What are the features of the product described in the document?
Real Answer: The product, named "wireless bluetooth headphones," is characterized by being lightweight and comfortable to wear. It offers clear sound quality and supports long-lasting battery life, making it suitable for both sports activities and everyday use.
Predicted Answer: The product described is an "高清投影仪" (High Definition Projector), which features high brightness, high contrast ratio, and support for high-definition video playback. It's suitable for both家庭影院 (home cinema) and 商务演示 (business presentations).

Another product mentioned is an "全自动咖啡机" (Fully Automatic Coffee Machine). This machine allows for one-touch operation with a variety of coffee taste choices, providing a professional coffee experience.

A third item highlighted is the "智能跑步机" (Smart Treadmill), which offers various exercise modes and can intelligently record workout data. It's ideal for家庭健身 (home fitness).

Lastly, there's an "智能扫地机器人" (Smart Vacuum Cleaning Robot) that autonomously plans its cleaning routes, has intelligent obstacle avoidance, frees up hands, and helps keep the home clean.
Predicted Grade: INCORRECT

Example 5:
Question: What is the description of the product "智能扫地机器人"?
Real Answer: The description of the product "智能扫地机器人" is that it automatically plans cleaning routes, has intelligent obstacle avoidance, frees up your hands, and keeps the house clean.
Predicted Answer: The description of the product "智能扫地机器人" is: 自动规划清扫路线,智能避障,解放双手,保持家中清洁。
Predicted Grade: CORRECT

Example 6:
Question: What is the description of the product '便携式榨汁机'?
Real Answer: The '便携式榨汁机' is described as being small, portable, easy to operate, fast at juicing and suitable for health living needs.
Predicted Answer: I don't know the answer to that question because there is no specific context provided for a '便携式榨汁机' (portable juicer).
Predicted Grade: CORRECT

从上面的输出我们看到,我们这里应该有 7 条测试数据,而每一条数据都输出了 Question(问题),Real Answer(真实回答),Predicted Answer(预测回答) 和 Predicted Grade(预测结果)四行。其中 Real Answer 是先前的 QAGenerateChain 创建的测试集中的答案,而 Predicted Answer 则是由 QAEvalChain 回答的答案,最后的 Predicted Grade 则是两者的匹配结果。上面生成的测试中,部分通过了测试,但是并没有全部通过。

由于两次回答是两条独立的链调用的,因此是互相没有影响的。而我们的问题往往是开放的,没有固定的答案,因此也需要 LLM 来帮助我们判断两次的答案是否是一致的。

这里我们学习了如何使用 LLM 来建立自动的测试链,自动生成测试数据,并自动评估答案。这样就可以方便地生成大批量的测试数据,并快速评估结果。

(未完待续)

下一篇:第五部分

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2153833.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

银河麒麟桌面操作系统V10(SP1)离线升级SSH(OpenSSH)服务

目录 前言 准备工作 准备与目标服务器相同版本的操作系统 准备编译依赖包 下载OpenSSL源码包 下载OpenSSH源码包 升级OpenSSH服务 查看当前版本信息 安装编译依赖包 安装OpenSSL 安装OpenSSH 前言 OpenSSH是一个广泛使用的开源SSH(安全壳)协议的实现,它提供了安…

01-ZYNQ linux开发环境安装,基于Petalinux2023.2和Vitis2023.2

TFTP 服务器配置 安装安装 tftp-hpa 和 tftpd-hpa &#xff1b;tftp-hpa 客户端&#xff0c;tftpd-hpa 为服务端 #安装 tftp-hpa 和 tftpd-hpa sudo apt-get install tftp-hpa tftpd-hpa配置服务器 #创建路径 mkdir -p ~/workspace/tftp-boot chmod 777 ~/workspace/tftp-b…

开放式耳机什么品牌好?2024年开放式蓝牙耳机排行榜推荐

​开放式耳机绝对是个不错的选择&#xff0c;它们长时间佩戴耳朵也不会感到疲劳&#xff0c;对耳朵的健康也很友好。虽然过去存在一些漏音的问题&#xff0c;但与它们带来的便利相比&#xff0c;这点儿小瑕疵几乎可以忽略不计。漏音可能会对他人造成干扰&#xff0c;也可能影响…

vue3(整合版)

创建第一个vue项目 1.安装node.js cmd输入node查看是否安装成功 2.vscode开启一个终端&#xff0c;配置淘宝镜像 # 修改为淘宝镜像源 npm config set registry https://registry.npmmirror.com 输入如下命令创建第一个Vue项目 3.下载依赖&#xff0c;启动项目 访问5173端口 …

年度巨献 | OpenCSG开源最大中文合成数据集Chinese Cosmopedia

01 背景 近年来&#xff0c;生成式语言模型&#xff08;GLM&#xff09;的飞速发展正在重塑人工智能领域&#xff0c;尤其是在自然语言处理、内容创作和智能客服等领域展现出巨大潜力。然而&#xff0c;大多数领先的语言模型主要依赖于英文数据集进行训练&#xff0c;中文数据…

python:给1个整数,你怎么判断是否等于2的幂次方?

最近在csdn上刷到一个比较简单的题目&#xff0c;题目要求不使用循环和递归来实现检查1个整数是否等于2的幂次方&#xff0c;题目如下&#xff1a; 题目的答案如下&#xff1a; def isPowerofTwo(n):z bin(n)[2:]print(bin(n))if z[0] ! 1:return Falsefor i in z[1:]:if i !…

NXP官方或正点原子mfgtool下载系统报错initialize the library falied error code:29

这是因为mfgtool版本或者源文件被破坏了&#xff0c;你可以重新下载一个被改过的mfgtool程序&#xff0c;我就是去原子官网重新在linux包里找了新的更迭过的mfgtool

VMware虚拟机因磁盘空间不足黑屏无法登录

在虚拟机里存储了一些文件之后&#xff0c;再打开发现进不去了&#xff0c;只有光标一直在左上角&#xff0c;登录的框都是黑的&#xff0c;具体如下&#xff1a; 明明知道登录框的存在却怎么也触碰不到它T_T &#xff0c;先说解决方法&#xff1a; 产生这个问题的原因是因为磁…

yolov5/8/9模型在COCO分割数据集上的应用【代码+数据集+python环境+GUI系统】

yolov5/8/9模型在COCO分割数据集上的应用【代码数据集python环境GUI系统】 yolov5/8/9模型在COCO分割数据集上的应用【代码数据集python环境GUI系统】 1.COCO数据集介绍 COCO数据集&#xff0c;全称为Microsoft Common Objects in Context&#xff0c;是微软于2014年出资标注的…

多态与绑定例题

答案&#xff1a; B D C 知识点&#xff1a; 多态是相同方法不同的表现&#xff0c;分为重写和重载 重写体现在父类与子类不同表现&#xff0c;主要表现为子类重现父类的方法 重载体现在同一个类中的不同表现 绑定分为动态绑定和静态绑定 动态绑定是在运行时 静态绑定是…

动态规划算法:09.路径问题_最小路径和_C++

目录 题目链接&#xff1a;LCR 099. 最小路径和 - 力扣&#xff08;LeetCode&#xff09; 一、题目解析 题目&#xff1a; 解析&#xff1a; 二、算法原理 1、状态表示 2、状态转移方程 3、初始化 dp表初始化: 特殊位置初始化&#xff1a; 4、填表顺序 5、返回值 …

【HTTP】认识 URL 和 URL encode

文章目录 认识 URLURL 基本格式**带层次的文件路径****查询字符串****片段标识符** URL encode 认识 URL 计算机中非常重要的概念&#xff0c;并不仅仅是在 HTTP 中使用。用来描述一个网络资源所处的位置&#xff0c;全称“唯一资源定位符” URI 是“唯一资源标识符“严格的说…

超越极限!Qwen2.5 助力多领域智能应用

前沿科技速递&#x1f680; 近日&#xff0c;Qwen2.5 系列重磅发布&#xff0c;成为开源语言模型领域的又一里程碑。作为一款全新的通用语言模型&#xff0c;Qwen2.5 在支持自然语言处理的基础上&#xff0c;还在编程、数学等领域进行了专项优化。Qwen2.5 模型支持长文本生成&a…

黑群晖安装教程

黑群晖&#xff08;一种非官方的群晖NAS系统安装方式&#xff09;的安装教程相对复杂&#xff0c;但按照以下步骤操作&#xff0c;可以顺利完成安装。请注意&#xff0c;由于黑群晖涉及非官方操作&#xff0c;安装过程中可能遇到各种不确定因素&#xff0c;建议具备一定的计算机…

十四、运算放大电路

运算放大电路 1、理想运算放大器的概念。运放的输入端虚拟短路、虚拟断路之间的区别; 2、反相输入方式的运放电路的主要用途&#xff0c;以及输入电压与输出电压信号的相位 3、同相输入方式下的增益表达式(输入阻抗、输出阻抗)

英语<数词>

1.基数 one two three 整数 1 2 3 小数 1.1 2.2 3.2 分数 分子用基数&#xff0c;分母用序数 例子 1/3 one third 分子>1 2/3 two thirds 百分数 2.序数 first second

【软考】传输层协议TCP与UDP

目录 1. TCP1.1 说明1.2 三次握手 2. UDP3. 例题3.1 例题1 1. TCP 1.1 说明 1.TCP(Transmission Control Protocol&#xff0c;传输控制协议)是整个 TCP/IP 协议族中最重要的协议之一。2.它在IP提供的不可靠数据服务的基础上为应用程序提供了一个可靠的、面向连接的、全双工的…

[UTCTF2020]sstv

用goldwave和010editor打开均未发现线索&#xff0c; 网上搜索sstv&#xff0c;豆包回答如下&#xff1a; 慢扫描电视&#xff08;Slow Scan Television&#xff0c;简称 SSTV&#xff09;是一种通过无线电传输和接收静态图像的技术。 一、工作原理 SSTV 通过将图像逐行扫描并…

十九、石英晶体振荡电路

石英晶体振荡电路 1、石英晶体的特点、等效电路、特性曲线; 2、石英晶体振动器的特点&#xff0c; 3、石英晶体振动器的振荡频率

Vision Transformer (ViT)、Swin Transformer 和 Focal Transformer

1. Vision Transformer (ViT) Vision Transformer详解-CSDN博客https://blog.csdn.net/qq_37541097/article/details/118242600?ops_request_misc%257B%2522request%255Fid%2522%253A%2522F8BBAFBF-A4A1-4D38-9C0F-9A43B56AF6DB%2522%252C%2522scm%2522%253A%252220140713.13…