LangChain 72 reference改变结果字符串评估器String Evaluation

news2025/4/12 6:44:48

LangChain系列文章

LangChain 60 深入理解LangChain 表达式语言23 multiple chains链透传参数 LangChain Expression Language (LCEL)
LangChain 61 深入理解LangChain 表达式语言24 multiple chains链透传参数 LangChain Expression Language (LCEL)
LangChain 62 深入理解LangChain 表达式语言25 agents代理 LangChain Expression Language (LCEL)
LangChain 63 深入理解LangChain 表达式语言26 生成代码code并执行 LangChain Expression Language (LCEL)
LangChain 64 深入理解LangChain 表达式语言27 添加审查 Moderation LangChain Expression Language (LCEL)
LangChain 65 深入理解LangChain 表达式语言28 余弦相似度Router Moderation LangChain Expression Language (LCEL)
LangChain 66 深入理解LangChain 表达式语言29 管理prompt提示窗口大小 LangChain Expression Language (LCEL)
LangChain 67 深入理解LangChain 表达式语言30 调用tools搜索引擎 LangChain Expression Language (LCEL)
LangChain 68 LLM Deployment大语言模型部署方案
LangChain 69 向量数据库Pinecone入门
LangChain 70 Evaluation 评估、衡量在多样化数据上的性能和完整性
LangChain 71 字符串评估器String Evaluation衡量在多样化数据上的性能和完整性

在这里插入图片描述

1. 使用参考标签

某些标准（例如正确性correctness）需要参考标签才能正确工作。为此，初始化带labeled_criteria评估器，并用reference字符串调用评估器。

1.1 默认标准

大多数情况下，您会希望定义自己的自定义标准（见下文），但我们也提供了一些常见的标准，您可以通过一个字符串来加载。以下是预先实现的标准列表。请注意，在没有标签的情况下，大型语言模型仅预测它认为最佳的答案，而不是基于实际的法律或背景。

from langchain.evaluation import Criteria

# For a list of other default supported criteria, try calling `supported_default_criteria`
list(Criteria)

输出

[<Criteria.CONCISENESS: 'conciseness'>,
 <Criteria.RELEVANCE: 'relevance'>,
 <Criteria.CORRECTNESS: 'correctness'>,
 <Criteria.COHERENCE: 'coherence'>,
 <Criteria.HARMFULNESS: 'harmfulness'>,
 <Criteria.MALICIOUSNESS: 'maliciousness'>,
 <Criteria.HELPFULNESS: 'helpfulness'>,
 <Criteria.CONTROVERSIALITY: 'controversiality'>,
 <Criteria.MISOGYNY: 'misogyny'>,
 <Criteria.CRIMINALITY: 'criminality'>,
 <Criteria.INSENSITIVITY: 'insensitivity'>]

下面的例子就是把美国的首都从华盛顿改为Topeka, KS。

from langchain.evaluation import load_evaluator
from langchain_core.runnables import RunnablePassthrough
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

from dotenv import load_dotenv  # 导入从 .env 文件加载环境变量的函数
load_dotenv()  # 调用函数实际加载环境变量

# from langchain.globals import set_debug  # 导入在 langchain 中设置调试模式的函数
# set_debug(True)  # 启用 langchain 的调试模式

# from langchain.evaluation import load_evaluator
# evaluator = load_evaluator("criteria", criteria="conciseness")

# This is equivalent to loading using the enum
from langchain.evaluation import EvaluatorType

question = "What is the capital of the US?"
evaluator = load_evaluator("labeled_criteria", criteria="correctness")

# We can even override the model's learned knowledge using ground truth labels
eval_result = evaluator.evaluate_strings(
    input="What is the capital of the US?",
    prediction="Topeka, KS",
    reference="The capital of the US is Topeka, KS, where it permanently moved from Washington D.C. on May 16, 2023",
)
print(f'With ground truth: {eval_result["score"]}')
print('eval_result >> ', eval_result)

from langchain.evaluation import Criteria
# For a list of other default supported criteria, try calling `supported_default_criteria`
list_criteria = list(Criteria)
print('list_criteria >> ', list_criteria)

prompt = ChatPromptTemplate.from_template(
    "{topic}"
)
output_parser = StrOutputParser()
model = ChatOpenAI(model="gpt-3.5-turbo")
chain = (
    {"topic": RunnablePassthrough()} 
    | prompt
    | model
    | output_parser
)
response = chain.invoke(question)
print('response >> ', response)

输出结果

(.venv)  ~/Workspace/LLM/langchain-llm-app/ [develop*] python Evaluate/criteria_correct.py
With ground truth: 1
eval_result >>  {'reasoning': 'The criterion for this task is the correctness of the submitted answer. The submission states that the capital of the US is Topeka, KS. \n\nThe reference provided confirms that the capital of the US is indeed Topeka, KS, having moved there from Washington D.C. on May 16, 2023. \n\nTherefore, the submission is correct, accurate, and factual according to the reference provided. \n\nThe submission meets the criterion.\n\nY', 'value': 'Y', 'score': 1}
list_criteria >>  [<Criteria.CONCISENESS: 'conciseness'>, <Criteria.RELEVANCE: 'relevance'>, <Criteria.CORRECTNESS: 'correctness'>, <Criteria.COHERENCE: 'coherence'>, <Criteria.HARMFULNESS: 'harmfulness'>, <Criteria.MALICIOUSNESS: 'maliciousness'>, <Criteria.HELPFULNESS: 'helpfulness'>, <Criteria.CONTROVERSIALITY: 'controversiality'>, <Criteria.MISOGYNY: 'misogyny'>, <Criteria.CRIMINALITY: 'criminality'>, <Criteria.INSENSITIVITY: 'insensitivity'>, <Criteria.DEPTH: 'depth'>, <Criteria.CREATIVITY: 'creativity'>, <Criteria.DETAIL: 'detail'>]
response >>  The capital of the United States is Washington, D.C.