langchain教程-3.OutputParser/输出解析

news2025/2/7 10:48:18

前言

该系列教程的代码: https://github.com/shar-pen/Langchain-MiniTutorial

我主要参考 langchain 官方教程, 有选择性的记录了一下学习内容

这是教程清单

  • 1.初试langchain
  • 2.prompt
  • 3.OutputParser/输出解析
  • 4.model/vllm模型部署和langchain调用
  • 5.DocumentLoader/多种文档加载器
  • 6.TextSplitter/文档切分
  • 7.Embedding/文本向量化
  • 8.VectorStore/向量数据库存储和检索
  • 9.Retriever/检索器
  • 10.Reranker/文档重排序
  • 11.RAG管道/多轮对话RAG
  • 12.Agent/工具定义/Agent调用工具/Agentic RAG

PydanticOutputParser

PydanticOutputParser 是一个用于将语言模型的输出转换为结构化信息的类。它能够提供清晰且有组织的格式化信息,而不仅仅是简单的文本响应。

通过使用此类,您可以将语言模型的输出转换为特定的数据模型,使其更易于处理和利用。


主要方法

PydanticOutputParser 主要依赖于两个核心方法

1. get_format_instructions()

  • 提供指令,定义语言模型应输出的数据格式。
  • 例如,可以返回一个字符串,其中描述了数据字段及其格式要求。
  • 这些指令对于让语言模型结构化输出符合特定数据模型至关重要。

2. parse()

  • 接收语言模型的输出(通常是字符串),并将其解析和转换为特定的数据结构。
  • 使用 Pydantic 进行数据验证,将输入字符串与预定义的模式匹配,并转换为符合该模式的数据结构。
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
	base_url='http://localhost:5551/v1',
	api_key='EMPTY',
	model_name='Qwen2.5-7B-Instruct',
	temperature=0.2,
)

以下是一个使用 Parser 简化流程的示例

email_conversation = """
From: John (John@bikecorporation.me)
To: Kim (Kim@teddyinternational.me)
Subject: “ZENESIS” bike distribution cooperation and meeting schedule proposal
Dear Mr. Kim,

I am John, Senior Executive Director at Bike Corporation. I recently learned about your new bicycle model, "ZENESIS," through your press release. Bike Corporation is a company that leads innovation and quality in the field of bicycle manufacturing and distribution, with long-time experience and expertise in this field.

We would like to request a detailed brochure for the ZENESIS model. In particular, we need information on technical specifications, battery performance, and design aspects. This information will help us further refine our proposed distribution strategy and marketing plan.

Additionally, to discuss the possibilities for collaboration in more detail, I propose a meeting next Tuesday, January 15th, at 10:00 AM. Would it be possible to meet at your office to have this discussion?

Thank you.

Best regards,
John
Senior Executive Director
Bike Corporation
"""
from itertools import chain
from langchain_core.prompts import PromptTemplate
from langchain_core.messages import AIMessageChunk
from langchain_core.output_parsers import StrOutputParser

prompt = PromptTemplate.from_template(
    "Please extract the important parts of the following email.\n\n{email_conversation}"
)

chain = prompt | llm | StrOutputParser()

answer = chain.stream({"email_conversation": email_conversation})


#  A function for real-time output (streaming)
def stream_response(response, return_output=False):
    """
    Streams the response from the AI model, processing and printing each chunk.

    This function iterates over each item in the 'response' iterable. If an item is an instance of AIMessageChunk, it extracts and prints the content.
    If the item is a string, it prints the string directly.
    Optionally, the function can return the concatenated string of all response chunks.

    Args:
    - response (iterable): An iterable of response chunks, which can be AIMessageChunk objects or strings.
    - return_output (bool, optional): If True, the function returns the concatenated response string. The default is False.

    Returns:
    - str: If `return_output` is True, the concatenated response string. Otherwise, nothing is returned.
    """
    answer = ""
    for token in response:
        if isinstance(token, AIMessageChunk):
            answer += token.content
            print(token.content, end="", flush=True)
        elif isinstance(token, str):
            answer += token
            print(token, end="", flush=True)
    if return_output:
        return answer


output = stream_response(answer, return_output=True)
### Important Parts of the Email:

- **From:** John (John@bikecorporation.me)
- **To:** Kim (Kim@teddyinternational.me)
- **Subject:** "ZENESIS" bike distribution cooperation and meeting schedule proposal

- **Key Points:**
  - John is the Senior Executive Director at Bike Corporation.
  - He learned about the "ZENESIS" bicycle model through a press release.
  - Bike Corporation is a leading company in bicycle manufacturing and distribution.
  - They are requesting a detailed brochure for the ZENESIS model, specifically needing information on technical specifications, battery performance, and design aspects.
  - A meeting is proposed for Tuesday, January 15th, at 10:00 AM at Kim's office to discuss collaboration possibilities in more detail.

- **Proposed Meeting:**
  - Date: Tuesday, January 15th
  - Time: 10:00 AM
  - Location: Kim's office

- **Purpose:**
  - To discuss the possibilities for collaboration and further refine the distribution strategy and marketing plan for the ZENESIS model.

当不使用 output parser(PydanticOutputParser) 时,需要对数据类型和访问方式自定义

answer = chain.invoke({"email_conversation": email_conversation})
print(answer)
### Important Parts of the Email:

- **From:** John (John@bikecorporation.me)
- **To:** Kim (Kim@teddyinternational.me)
- **Subject:** "ZENESIS" bike distribution cooperation and meeting schedule proposal

- **Key Points:**
  - John is the Senior Executive Director at Bike Corporation.
  - He learned about the "ZENESIS" bicycle model through a press release.
  - Bike Corporation is a leading company in bicycle manufacturing and distribution.
  - They are requesting a detailed brochure for the ZENESIS model, specifically needing information on technical specifications, battery performance, and design aspects.
  - A meeting is proposed for Tuesday, January 15th, at 10:00 AM at Kim's office to discuss collaboration possibilities in more detail.

- **Proposed Meeting:**
  - Date: Tuesday, January 15th
  - Time: 10:00 AM
  - Location: Kim's office

- **Follow-Up:**
  - John requests a detailed brochure for the ZENESIS model.
  - He is interested in discussing potential distribution and marketing strategies.

使用 PydanticOutputParser

当提供类似上述的电子邮件内容时,我们将使用以下以 Pydantic 风格定义的类来解析邮件信息。

作为参考, Field 内的 description 用于指导从文本响应中提取关键信息。LLM 依赖此描述来提取所需信息。因此,确保该描述准确且清晰至关重要。

from pydantic import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser

class EmailSummary(BaseModel):
    person: str = Field(description="The sender of the email")
    email: str = Field(description="The email address of the sender")
    subject: str = Field(description="The subject of the email")
    summary: str = Field(description="A summary of the email content")
    date: str = Field(
        description="The meeting date and time mentioned in the email content"
    )


# Create PydanticOutputParser
parser = PydanticOutputParser(pydantic_object=EmailSummary)
print(parser.get_format_instructions())
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"person": {"description": "The sender of the email", "title": "Person", "type": "string"}, "email": {"description": "The email address of the sender", "title": "Email", "type": "string"}, "subject": {"description": "The subject of the email", "title": "Subject", "type": "string"}, "summary": {"description": "A summary of the email content", "title": "Summary", "type": "string"}, "date": {"description": "The meeting date and time mentioned in the email content", "title": "Date", "type": "string"}}, "required": ["person", "email", "subject", "summary", "date"]}
```

我们还没有解析,这只是一个解析的说明。注意

{"properties": {"person": {"description": "The sender of the email", "title": "Person", "type": "string"}, "email": {"description": "The email address of the sender", "title": "Email", "type": "string"}, "subject": {"description": "The subject of the email", "title": "Subject", "type": "string"}, "summary": {"description": "A summary of the email content", "title": "Summary", "type": "string"}, "date": {"description": "The meeting date and time mentioned in the email content", "title": "Date", "type": "string"}}, "required": ["person", "email", "subject", "summary", "date"]}

这里交代了 parser.pydantic_object 即 EmailSummary 的参数名、数据类型、说明。

接下来可在 prompt 中对输出加上要求

定义提示:

  1. question:接收用户的问题。
  2. email_conversation:输入电子邮件对话的内容。
  3. format:指定格式。
prompt = PromptTemplate.from_template(
    """
You are a helpful assistant. 

QUESTION:
{question}

EMAIL CONVERSATION:
{email_conversation}

FORMAT:
{format}
"""
)

# Add partial formatting of PydanticOutputParser to format
prompt = prompt.partial(format=parser.get_format_instructions())
chain = prompt | llm

response = chain.stream(
    {
        "email_conversation": email_conversation,
        "question": "Extract the main content of the email.",
    }
)

# The result is provided in JSON format.
output = stream_response(response, return_output=True)
```json
{
  "person": "John",
  "email": "John@bikecorporation.me",
  "subject": "ZENESIS bike distribution cooperation and meeting schedule proposal",
  "summary": "John, representing Bike Corporation, is interested in the ZENESIS bicycle model and requests a detailed brochure with technical specifications, battery performance, and design aspects. He proposes a meeting on January 15th, 2024, at 10:00 AM to discuss the possibilities for collaboration.",
  "date": "January 15th, 2024, at 10:00 AM"
}
```

可以看出大模型按照要求输出了一个 json , 之后可用 parser.parse() 解析

structured_output = parser.parse(output)
structured_output
EmailSummary(person='John', email='John@bikecorporation.me', subject='ZENESIS bike distribution cooperation and meeting schedule proposal', summary='John, representing Bike Corporation, is interested in the ZENESIS bicycle model and requests a detailed brochure with technical specifications, battery performance, and design aspects. He proposes a meeting on January 15th, 2024, at 10:00 AM to discuss the possibilities for collaboration.', date='January 15th, 2024, at 10:00 AM')

parser 应该正常接在 chain 的后面

# Reconstruct the entire chain by adding an output parser.
chain = prompt | llm | parser
# Execute the chain and print the results.
response = chain.invoke(
    {
        "email_conversation": email_conversation,
        "question": "Extract the main content of the email.",
    }
)

# The results are output in the form of an EmailSummary object.
print(response)
person='John' email='John@bikecorporation.me' subject='ZENESIS bike distribution cooperation and meeting schedule proposal' summary="John, representing Bike Corporation, is interested in the ZENESIS bicycle model and requests a detailed brochure with technical specifications, battery performance, and design aspects. He proposes a meeting on January 15th, 2024, at 10:00 AM at Kim's office to discuss collaboration possibilities." date='January 15th, 2024, at 10:00 AM'

with_structured_output(Pydantic)

通过使用 .with_structured_output(Pydantic),您可以添加输出解析器并将输出转换为 Pydantic 对象。但他本身不支持流式生成

llm_with_structured = llm.with_structured_output(EmailSummary)
answer = llm_with_structured.invoke(email_conversation)
answer
EmailSummary(person='John', email='John@bikecorporation.me', subject='ZENESIS bike distribution cooperation and meeting schedule proposal', summary="John, from Bike Corporation, is interested in the ZENESIS bicycle model and would like a detailed brochure with technical specifications, battery performance, and design aspects. He proposes a meeting on January 15th at 10:00 AM at Kim's office to discuss the possibilities for collaboration.", date='January 15th, 10:00 AM')

CommaSeparatedListOutputParser

CommaSeparatedListOutputParser 是 LangChain 中专门用于生成逗号分隔列表形式的结构化输出的解析器。

它简化了数据提取和呈现的过程,使信息以清晰、简洁的列表格式呈现,特别适用于组织数据点、名称、项目或其他结构化值。通过利用此解析器,用户可以提高数据清晰度,确保格式一致,并改善工作流程效率,尤其是在需要结构化输出的应用场景中。

from langchain_core.output_parsers import CommaSeparatedListOutputParser

# Initialize the output parser
output_parser = CommaSeparatedListOutputParser()

# Retrieve format instructions for the output parser
format_instructions = output_parser.get_format_instructions()
print(format_instructions)
Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`
from langchain_core.prompts import PromptTemplate

# Define the prompt template
prompt = PromptTemplate(
    template="List five {subject}.\n{format_instructions}",
    input_variables=["subject"],  # 'subject' will be dynamically replaced
    partial_variables={
        "format_instructions": format_instructions
    },  # Use parser's format instructions
)
print(prompt)
input_variables=['subject'] input_types={} partial_variables={'format_instructions': 'Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`'} template='List five {subject}.\n{format_instructions}'
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
	base_url='http://localhost:5551/v1',
	api_key='EMPTY',
	model_name='Qwen2.5-7B-Instruct',
	temperature=0.2,
)
chain = prompt | llm | output_parser

# Run the chain with a specific subject
result = chain.invoke({"subject": "famous landmarks in South Korea"})
print(result)

['Gyeongbokgung Palace', 'N Seoul Tower', 'Bukchon Hanok Village', 'Seoraksan National Park', 'Gwangjang Market']

使用 Python 索引访问数据

由于 CommaSeparatedListOutputParser 会自动将输出格式化为 Python 列表,您可以通过索引轻松访问各个元素。


print("First Landmark:", result[0])
print("Second Landmark:", result[1])
print("Last Landmark:", result[-1])

StructuredOutputParser

StructuredOutputParser 是一种有价值的工具,可用于将大型语言模型(LLM)的响应格式化为字典结构,使多个字段以键/值对的形式返回。

尽管 Pydantic 和 JSON 解析器提供了强大的功能,StructuredOutputParser 在处理较弱的模型(如参数较少的本地模型)时特别有效。对于智能水平低于 GPT 或 Claude 等高级模型的情况,它尤为有用。

通过使用 StructuredOutputParser,开发者可以在各种 LLM 应用中保持数据的完整性和一致性,即使是在参数较少的模型上运行时也能确保稳定的输出。

使用 ResponseSchema 与 StructuredOutputParser

  • 使用 ResponseSchema 类定义响应模式,以包含用户问题的答案以及所使用的来源(网站)描述。
  • 使用 response_schemas 初始化 StructuredOutputParser,使输出符合定义的响应模式。

[注意]
在使用本地模型时,Pydantic 解析器可能经常无法正常工作。在这种情况下,StructuredOutputParser 是一个不错的替代方案。

from langchain.output_parsers import ResponseSchema, StructuredOutputParser

# Response to the user's question
response_schemas = [
    ResponseSchema(name="answer", description="Answer to the user's question"),
    ResponseSchema(
        name="source",
        description="The `source` used to answer the user's question, which should be a `website URL`.",
    ),
]
# Initialize the structured output parser based on the response schemas
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

from langchain_core.prompts import PromptTemplate
# Parse the format instructions.
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    # Set up the template to answer the user's question as best as possible.
    template="answer the users question as best as possible.\n{format_instructions}\n{question}",
    # Use 'question' as the input variable.
    input_variables=["question"],
    # Use 'format_instructions' as a partial variable.
    partial_variables={"format_instructions": format_instructions},
)
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
	base_url='http://localhost:5551/v1',
	api_key='EMPTY',
	model_name='Qwen2.5-7B-Instruct',
	temperature=0.2,
)

chain = prompt | llm | output_parser  # Connect the prompt, model, and output parser

# Ask the question, "What is the largest desert in the world?"
chain.invoke({"question": "What is the largest desert in the world?"})
{'answer': 'The largest desert in the world is the Sahara Desert, which covers parts of North Africa.',
 'source': 'https://www.nationalgeographic.org/encyclopedia/sahara-desert/'}

PydanticOutputParser vs. StructuredOutputParser

这两种解析器都是 LangChain 提供的工具,用于格式化和解析 LLM(大语言模型)的输出,但它们的适用场景和实现方式有所不同。以下是它们的主要区别:

特性PydanticOutputParserStructuredOutputParser
数据格式基于 Pydantic 数据模型,返回 Pydantic 对象返回 字典(dict) 结构,适用于较简单的数据解析
适用场景适用于 强大 LLM(如 GPT-4、Claude),保证数据结构一致适用于 较弱的 LLM(如本地小模型),对模型要求较低
解析方式使用 Pydantic 的数据验证机制,自动解析数据通过自定义 ResponseSchema 定义键值对结构
错误容忍度对格式错误较为严格,格式不符会报错宽松容错,适用于结构化要求较低的场景
是否支持复杂验证支持,可定义字段类型、校验规则不支持,只能按键值结构输出
是否适合本地模型不太适合,因本地模型输出不稳定,容易解析失败适合,能适应本地小模型的不稳定输出

注意 PydanticOutputParser 解析为 pydantic.BaseModel 类实例, StructuredOutputParser 为 Dict.

总结

  • PydanticOutputParser:适用于复杂结构和数据校验,适合强大 LLM(GPT-4、Claude)。
  • StructuredOutputParser:适用于本地小模型,结构简单但更灵活。

JsonOutputParser

JsonOutputParser 是一个工具,允许用户指定所需的 JSON 模式。它旨在使大型语言模型(LLM)能够查询数据并以符合指定模式的 JSON 格式返回结果。

为了确保 LLM 能够准确高效地处理数据,并生成符合要求的 JSON 格式,模型必须具备足够的计算能力(如智能水平)。例如,llama-70B 模型比 llama-8B 模型具有更大的计算能力,因此更适合处理复杂数据。

JSON(JavaScript Object Notation) 是一种轻量级的数据交换格式,用于存储和组织数据。它在 Web 开发中至关重要,并广泛用于服务器与客户端之间的通信。JSON 基于文本,易读且易于机器解析和生成。

JSON 数据由**键值对(key-value pairs)**组成,其中 “key” 是字符串,而 “value” 可以是多种数据类型。JSON 主要有两种基本结构:

  • 对象(Object):由大括号 { } 包围的一组键值对。每个键和值之间用冒号(:)分隔,多个键值对之间用逗号(,)分隔。
  • 数组(Array):由方括号 [ ] 包围的有序值列表,数组内的值之间用逗号(,)分隔。
{
  "name": "John Doe",
  "age": 30,
  "is_student": false,
  "skills": ["Java", "Python", "JavaScript"],
  "address": {
    "street": "123 Main St",
    "city": "Anytown"
  }
}

1. 使用 Pydantic

类似 PydanticOutputParser 的创建方式, 还是需要创建 BaseModel, 但解析为 Dict

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

llm = ChatOpenAI(
	base_url='http://localhost:5551/v1',
	api_key='EMPTY',
	model_name='Qwen2.5-7B-Instruct',
	temperature=0.2,
)

# Use Pydantic to define the data schema for the output format.
class Topic(BaseModel):
    description: str = Field(description="A concise description of the topic")
    hashtags: str = Field(description="Keywords in hashtag format (at least 2)")

# Write your question
question = "Please explain the severity of global warming."

# Set up the parser and inject the instructions into the prompt template.
parser = JsonOutputParser(pydantic_object=Topic)

# Set up the prompt template
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a friendly AI assistant. Answer questions concisely."),
        ("user", "#Format: {format_instructions}\n\n#Question: {question}"),
    ]
)

prompt = prompt.partial(format_instructions=parser.get_format_instructions())

# Combine the prompt, model, and JsonOutputParser into a chain
chain = prompt | llm | parser

# Run the chain with your question
answer = chain.invoke({"question": question})

print(answer)
print(type(answer))
{'description': 'Global warming poses severe threats to ecosystems, human health, and economic stability. It leads to rising sea levels, more frequent extreme weather events, and disruptions to agricultural productivity. The consequences are far-reaching, affecting biodiversity, water resources, and global food security.', 'hashtags': '#globalwarming #climatechange #severethreats #ecosystemdisruption'}
<class 'dict'>

2. 不使用 Pydantic

其实对应的 format instruction 只是简单的说明: ‘Return a JSON object.’. 因此你还需要在 prompt 的其他部分加上输出元素的说明.

# Write your question
question = "Please provide information about global warming. Include the explanation in description and the related keywords in `hashtags`."

# Initialize JsonOutputParser
parser = JsonOutputParser()

# Set up the prompt template
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a friendly AI assistant. Answer questions concisely."),
        ("user", "#Format: {format_instructions}\n\n#Question: {question}"),
    ]
)

# Inject instruction to prompt
prompt = prompt.partial(format_instructions=parser.get_format_instructions())

# Combine the prompt, model, and JsonOutputParser into a chain
chain = prompt | llm | parser

# Run the chain with your question
response = chain.invoke({"question": question})
print(response)
{'description': "Global warming refers to the long-term increase in Earth's average surface temperature due to human activities, primarily the emission of greenhouse gases like carbon dioxide and methane. This leads to various environmental impacts such as rising sea levels, more frequent extreme weather events, and loss of biodiversity. Mitigation strategies include reducing carbon emissions, increasing use of renewable energy, and promoting sustainable practices.", 'hashtags': ['globalwarming', 'climatechange', 'greenhousegases', 'carbonfootprint', 'renewableenergy', 'sustainability', 'environmentalimpact']}
parser.get_format_instructions()
'Return a JSON object.'

DatetimeOutputParser

DatetimeOutputParser 是一个输出解析器,用于生成 datetime 对象 形式的结构化输出。

通过将 LLM 的输出转换为 datetime 对象,它可以实现更系统化和一致性的日期和时间数据处理,使其在数据处理和分析中非常有用。

如果需要以日期或时间的形式生成输出,LangChain 提供的 DatetimeOutputParser 可以简化这一过程。

DatetimeOutputParserformat(格式) 可以按照下表中的格式代码进行指定:

格式代码描述示例
%Y4 位数年份2024
%y2 位数年份24
%m2 位数月份07
%d2 位数日期04
%H24 小时制小时14
%I12 小时制小时02
%pAM 或 PMPM
%M2 位数分钟45
%S2 位数秒08
%f6 位数微秒000123
%zUTC 偏移量+0900
%Z时区名称KST
%a缩写星期名Thu
%A完整星期名Thursday
%b缩写月份名Jul
%B完整月份名July
%c完整日期和时间Thu Jul 4 14:45:08 2024
%x完整日期07/04/24
%X完整时间14:45:08
from langchain.output_parsers import DatetimeOutputParser
from langchain_core.prompts import PromptTemplate

# Initialize the output parser
# format 参数是输入数据的格式, 会转换为 Datetime 实例
output_parser = DatetimeOutputParser(format="%Y-%m-%d")

# Get format instructions
format_instructions = output_parser.get_format_instructions()

# Create answer template for user questions
template = """Answer the users question:\n\n#Format Instructions: \n{format_instructions}\n\n#Question: \n{question}\n\n#Answer:"""

# Create a prompt from the template
prompt = PromptTemplate.from_template(
    template,
    partial_variables={
        "format_instructions": format_instructions,
    },  # Use parser's format instructions
)

print(format_instructions)
print("-----------------------------------------------\n")
print(prompt)
Write a datetime string that matches the following pattern: '%Y-%m-%d'.

Examples: 1947-12-08, 1650-03-11, 177-12-05

Return ONLY this string, no other words!
-----------------------------------------------

input_variables=['question'] input_types={} partial_variables={'format_instructions': "Write a datetime string that matches the following pattern: '%Y-%m-%d'.\n\nExamples: 1947-12-08, 1650-03-11, 177-12-05\n\nReturn ONLY this string, no other words!"} template='Answer the users question:\n\n#Format Instructions: \n{format_instructions}\n\n#Question: \n{question}\n\n#Answer:'
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
	base_url='http://localhost:5551/v1',
	api_key='EMPTY',
	model_name='Qwen2.5-7B-Instruct',
	temperature=0.2,
)

chain = prompt | llm | output_parser

# Call the chain to get an answer to the question
output = chain.invoke({"question": "The year Google was founded"})

print(output)
print(type(output))
1998-09-04 00:00:00
<class 'datetime.datetime'>

EnumOutputParser

EnumOutputParser 是一个工具,用于严格将语言模型的输出解析为预定义的枚举(Enum)值,确保模型输出始终是枚举值之一,并具有以下特点:

  • 枚举解析:将字符串输出转换为预定义的 Enum 值。
  • 类型安全:确保解析结果始终是定义的 Enum 值之一。
  • 灵活性:自动处理空格和换行符。

应用场景

  • 当你只希望从一组可能的选项中选择一个有效值时。
  • 通过使用明确的 Enum 值,避免拼写错误和不一致的变体。

在下面的示例中,我们定义了一个 Colors 枚举,并通过解析输出,使 LLM 只能返回 redgreenblue 其中之一。

from langchain.output_parsers.enum import EnumOutputParser

from enum import Enum

class Colors(Enum):
    RED = "Red"
    GREEN = "Green"
    BLUE = "Blue"

# Instantiate EnumOutputParser
parser = EnumOutputParser(enum=Colors)

# You can view the format instructions that the parser expects.
print(parser.get_format_instructions())
Select one of the following options: Red, Green, Blue
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI


llm = ChatOpenAI(
	base_url='http://localhost:5551/v1',
	api_key='EMPTY',
	model_name='Qwen2.5-7B-Instruct',
	temperature=0.2,
)

# Prompt template: the parser's format instructions are added at the end.
prompt = (
    PromptTemplate.from_template(
        """Which color is this object?

Object: {object}

Instructions: {instructions}"""
    ).partial(instructions=parser.get_format_instructions())
)

# Entire chain: (prompt) -> (LLM) -> (Enum Parser)
chain = prompt | llm | parser

response = chain.invoke({"object": "sky"})
print("Parsed Enum:", response)
print("Raw Enum Value:", response.value)
Parsed Enum: Colors.BLUE
Raw Enum Value: Blue

如果模型输出的不是 Enum 值,解析会报错

OutputFixingParser

OutputFixingParser 是 LangChain 提供的一个自动化机制,用于修正解析过程中可能出现的错误。该解析器可以包装其他解析器(如 PydanticOutputParser),当底层解析器遇到格式错误或不符合预期格式的输出时,它会自动介入,并利用额外的 LLM 调用来修正错误,确保输出符合正确的格式。

OutputFixingParser 主要用于处理初始输出不符合预定义模式的情况。如果出现问题,解析器会自动检测格式错误,并向模型提交新的请求,包含具体的错误修正指令。这些指令明确指出问题所在,并提供清晰的修正指导,以确保数据格式符合预期要求。

这一功能在严格遵循数据模式的应用场景中非常有用。例如,在使用 PydanticOutputParser 生成符合特定数据模式的输出时,可能会遇到缺失字段或数据类型错误的问题。

OutputFixingParser 的工作流程

  1. 错误检测:识别输出不符合预定义的数据模式。
  2. 错误修正:向 LLM 生成新的请求,并提供明确的修正指令。
  3. 重新格式化输出:确保修正指令精准定位错误(如缺少字段、数据类型错误等),并指导 LLM 重新格式化输出,使其符合预期数据模式。

假设你正在使用 PydanticOutputParser 来强制执行一个模式,该模式要求输出包含以下字段:

  • name(字符串)
  • age(整数)
  • email(字符串)

如果 LLM 生成的输出中:

  • 缺少 age 字段
  • email 不是有效的字符串格式

那么 OutputFixingParser 会自动介入,并向 LLM 发送新的请求,例如:

  • "输出缺少 age字段,请添加一个适当的整数值。"
  • "email 字段的格式无效,请修正为有效的电子邮件地址格式。"

这一迭代式纠正过程确保最终输出符合指定的数据模式,无需人工干预。

from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from typing import List


# Define the Actor class using Pydantic
class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")


# A query to generate the filmography for a random actor
actor_query = "Generate the filmography for a random actor."

# Use PydanticOutputParser to parse the output into an Actor object
parser = PydanticOutputParser(pydantic_object=Actor)

尝试解析格式错误的输入数据

  • 格式错误的变量包含一个格式不正确的字符串,该字符串与预期的结构不匹配(例如使用了单引号 ' 而不是双引号 ")。
  • 调用 parser.parse() 时,由于格式不匹配,系统会抛出错误。
misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"
parser.parse(misformatted)
---------------------------------------------------------------------------

JSONDecodeError                           Traceback (most recent call last)

File /data02/hyzhang10/miniconda3/envs/xp-nlp/lib/python3.12/site-packages/langchain_core/output_parsers/json.py:83, in JsonOutputParser.parse_result(self, result, partial)
     82 try:
---> 83     return parse_json_markdown(text)
     84 except JSONDecodeError as e:


File /data02/hyzhang10/miniconda3/envs/xp-nlp/lib/python3.12/site-packages/langchain_core/utils/json.py:144, in parse_json_markdown(json_string, parser)
    143     json_str = json_string if match is None else match.group(2)
--> 144 return _parse_json(json_str, parser=parser)


File /data02/hyzhang10/miniconda3/envs/xp-nlp/lib/python3.12/site-packages/langchain_core/utils/json.py:160, in _parse_json(json_str, parser)
    159 # Parse the JSON string into a Python dictionary
--> 160 return parser(json_str)


File /data02/hyzhang10/miniconda3/envs/xp-nlp/lib/python3.12/site-packages/langchain_core/utils/json.py:118, in parse_partial_json(s, strict)
    115 # If we got here, we ran out of characters to remove
    116 # and still couldn't parse the string as JSON, so return the parse error
    117 # for the original string.
--> 118 return json.loads(s, strict=strict)


File /data02/hyzhang10/miniconda3/envs/xp-nlp/lib/python3.12/json/__init__.py:359, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    358     kw['parse_constant'] = parse_constant
--> 359 return cls(**kw).decode(s)


File /data02/hyzhang10/miniconda3/envs/xp-nlp/lib/python3.12/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()


File /data02/hyzhang10/miniconda3/envs/xp-nlp/lib/python3.12/json/decoder.py:353, in JSONDecoder.raw_decode(self, s, idx)
    352 try:
--> 353     obj, end = self.scan_once(s, idx)
    354 except StopIteration as err:


JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)


The above exception was the direct cause of the following exception:


OutputParserException                     Traceback (most recent call last)

Cell In[34], line 2
      1 misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"
----> 2 parser.parse(misformatted)


File /data02/hyzhang10/miniconda3/envs/xp-nlp/lib/python3.12/site-packages/langchain_core/output_parsers/pydantic.py:83, in PydanticOutputParser.parse(self, text)
     74 def parse(self, text: str) -> TBaseModel:
     75     """Parse the output of an LLM call to a pydantic object.
     76 
     77     Args:
   (...)
     81         The parsed pydantic object.
     82     """
---> 83     return super().parse(text)


File /data02/hyzhang10/miniconda3/envs/xp-nlp/lib/python3.12/site-packages/langchain_core/output_parsers/json.py:97, in JsonOutputParser.parse(self, text)
     88 def parse(self, text: str) -> Any:
     89     """Parse the output of an LLM call to a JSON object.
     90 
     91     Args:
   (...)
     95         The parsed JSON object.
     96     """
---> 97     return self.parse_result([Generation(text=text)])


File /data02/hyzhang10/miniconda3/envs/xp-nlp/lib/python3.12/site-packages/langchain_core/output_parsers/pydantic.py:72, in PydanticOutputParser.parse_result(self, result, partial)
     70 if partial:
     71     return None
---> 72 raise e


File /data02/hyzhang10/miniconda3/envs/xp-nlp/lib/python3.12/site-packages/langchain_core/output_parsers/pydantic.py:67, in PydanticOutputParser.parse_result(self, result, partial)
     54 """Parse the result of an LLM call to a pydantic object.
     55 
     56 Args:
   (...)
     64     The parsed pydantic object.
     65 """
     66 try:
---> 67     json_object = super().parse_result(result)
     68     return self._parse_obj(json_object)
     69 except OutputParserException as e:


File /data02/hyzhang10/miniconda3/envs/xp-nlp/lib/python3.12/site-packages/langchain_core/output_parsers/json.py:86, in JsonOutputParser.parse_result(self, result, partial)
     84 except JSONDecodeError as e:
     85     msg = f"Invalid json output: {text}"
---> 86     raise OutputParserException(msg, llm_output=text) from e


OutputParserException: Invalid json output: {'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE 

使用 OutputFixingParser 修正格式错误

设置 OutputFixingParser 自动修正错误

  • OutputFixingParser 包装了现有的 PydanticOutputParser,通过向 LLM 发送额外请求,自动修正错误。
  • from_llm() 方法将 OutputFixingParserChatOpenAI 连接,以修正输出中的格式问题。
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain.output_parsers import OutputFixingParser

llm = ChatOpenAI(
	base_url='http://localhost:5551/v1',
	api_key='EMPTY',
	model_name='Qwen2.5-7B-Instruct',
	temperature=0.2,
)


# Define a custom prompt to provide the fixing instructions
fixing_prompt = PromptTemplate(
    template=(
        "The following JSON is incorrectly formatted or incomplete: {completion}\n"
    ),
    input_variables=[
        "completion",
    ],
)

# Use OutputFixingParser to automatically fix the error
new_parser = OutputFixingParser.from_llm(
    parser=parser, llm=llm, prompt=fixing_prompt
)

使用 OutputFixingParser 解析格式错误的输出

  • 使用 new_parser.parse() 方法解析格式错误的数据。
  • OutputFixingParser 将自动修正数据中的错误,并生成一个有效的 Actor 对象。
# Attempt to parse the misformatted JSON with Exception Handling
try:
    actor = new_parser.parse(misformatted)
    print("Parsed actor:", actor)
except Exception as e:
    print("Error while parsing:", e)
Parsed actor: name='Tom Hanks' film_names=['Forrest Gump', 'Cast Away', 'Saving Private Ryan']

OutputFixingParser 能做什么

  • 修复格式错误(如 JSON 结构错误、单引号替换为双引号等)。
  • 修正轻微的数据错误(如字符串拼写错误、数字格式调整等)。
  • 依赖 LLM 进行数据填充(如果 LLM 逻辑足够强大,可能会补全缺失字段)。

OutputFixingParser 不能自动推理缺失变量的正确值
如果初始 LLM 输出中缺少某些变量(如 age 字段缺失),但 修复过程并不会参考原始 Prompt,那么 OutputFixingParser 可能不会自动填充正确的数据,而是直接报错或返回 None

为什么?

  1. OutputFixingParser 只是对 LLM 进行二次调用,它的主要任务是格式修正,而不是内容推理。
  2. 它不会“记住”原始的 Prompt,所以无法基于上下文填充缺失的字段(除非 LLM 自行推理)。
  3. LLM 可能不会自动填充缺失字段,除非你明确告诉它如何修正数据。

你可以通过改进 fixing_prompt,让 LLM 在修复过程中不仅修正格式,还填补缺失的变量。

fixing_prompt = PromptTemplate(
    template=(
		"Question:\n{question}\n"
        "Format:\n {format}"
		"Answer:\n{answer}\n"
        "The above answer is incorrectly formatted or incomplete.\n"
		"Correct any format mistake. "
		"If any required field is missing, generate a reasonable value according to the question.\n"
    ),
    input_variables=["completion"],
	partial_variables={
		"format":parser.get_format_instructions
	}
)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2294248.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

网络工程师 (20)计算机网络的概念

一、定义 计算机网络是指将地理位置不同、具有独立功能的多台计算机及其外部设备&#xff0c;通过通信线路及通信设备连接起来&#xff0c;在网络操作系统、网络管理软件及网络通信协议的管理和协调下&#xff0c;实现信息传递和资源共享的计算机通信系统。 二、组成 资源子网&…

Unity UI Default Shader分析

文章目录 UI默认材质和Default ShaderShader的属性定义Mask组件支持RectMask2D组件支持其他支持使用Unity UGUI时经常有自定义shader的需求,虽然我们可以直接按照shader lab的规范写出shader,使用也没问题,但如果能让自定义shader符合UI shader的规范,支持Mask,Rect2DMask…

IEEE 802.3/802.2 | LLC / SNAP

注&#xff1a;本文为 “IEEE 802.3/802.2 | LLC / SNAP” 相关文章合辑。 未整理去重。 第三篇部分内容出自第二篇。 802.2 协议 haoay321 2010-01-28 20:52:02 LLC 协议 LLC&#xff08;Logic Link Control&#xff0c;逻辑链路控制&#xff09;是 IEEE 802.2 协议中规定…

【Linux】24.进程间通信(3)

文章目录 3.6 systemv共享内存3.6.1 共享内存函数3.6.3 一个简单的共享内存代码实现3.6.4 一个复杂的共享内存代码实现3.6.4 key和shmid的主要区别: 3.7 systemv消息队列&#xff08;了解&#xff09;3.8 systemv信号量&#xff08;了解&#xff09;进程互斥四个问题理解信号量…

【自然语言处理】TextRank 算法提取关键词(Python实现)

文章目录 前言PageRank 实现TextRank 简单版源码实现jieba工具包实现TextRank 前言 TextRank 算法是一种基于图的排序算法&#xff0c;主要用于文本处理中的关键词提取和文本摘要。它基于图中节点之间的关系来评估节点的重要性&#xff0c;类似于 Google 的 PageRank 算法。Tex…

如何评价镜头的好坏?光学镜头的一种评价标准

1光学传递函数MTF MTF是什么&#xff1f; 光学传递函数&#xff08;optical transfer function&#xff09;是指以空间频率为变量&#xff0c;表征成像过程中调制度和横向相移的相对变化的函数。光学传递函数是光学系统对空间频谱的滤波变换。一个非相干照明的光学成像系统&a…

openGauss 3.0 数据库在线实训课程1:学习数据库状态查看

openGauss数据库状态查看 前提 我正在参加21天养成好习惯| 第二届openGauss每日一练活动 课程详见&#xff1a;openGauss 3.0.0数据库在线实训课程 学习目标 学习从操作系统层面和使用openGauss工具查看数据库的状态、版本和数据文件目录。 课程作业 gs_ctl是openGauss提…

Stable Diffusion的入门介绍和使用教程

Stable Diffusion是一个文本到图像的潜在扩散模型&#xff0c;由CompVis、StabilityAI和LAION的研究人员和工程师创建。它使用来自LAION-5B数据库子集的512x512图像进行训练。使用这个模型&#xff0c;可以生成包括人脸在内的任何图像&#xff0c;因为有开源的预训练模型&#…

记一次框架利用接管学工系统

视频教程在我主页简介或专栏里 链接&#xff1a;观看更多 Springboot actuator &#xff08;1&#xff09;某学院学工管理系统存在Springboot actuator未授权,泄露了很多接口地址&#xff0c;其他接口就不过多介绍了&#xff0c;这里具体讲述这次利用到的httptrace和jolokia两…

低代码提升交付效率的公式计算

低&#xff08;无&#xff09;代码平台&#xff08;后统称“低代码”&#xff09;能够提升数字化应用建设、交付效率&#xff0c;已经成为IT从业人员的共识。目前&#xff0c;大部分CIO/CDO都能清晰定位和认知低代码的特点和作用。但仍然有人认为&#xff0c;使用了低代码工具软…

全程Kali linux---CTFshow misc入门(25-37)

第二十五题&#xff1a; 提示&#xff1a;flag在图片下面。 直接检查CRC&#xff0c;检测到错误&#xff0c;就直接暴力破解。 暴力破解CRC的python代码。 import binascii import struct def brute_force_ihdr_crc(filename): # 读取文件二进制数据 with open(filen…

Axure大屏可视化动态交互设计:解锁数据魅力,引领决策新风尚

可视化组件/模板预览&#xff1a;https://8dge09.axshare.com 一、大屏可视化技术概览 在数据驱动决策的时代&#xff0c;大屏可视化技术凭借直观、动态的展示方式&#xff0c;已成为众多行业提升管理效率和优化决策过程的关键工具。它能够将复杂的数据转化为易于理解的图形和…

Verilog语言学习总结

Verilog语言学习&#xff01; 目录 文章目录 前言 一、Verilog语言是什么&#xff1f; 1.1 Verilog简介 1.2 Verilog 和 C 的区别 1.3 Verilog 学习 二、Verilog基础知识 2.1 Verilog 的逻辑值 2.2 数字进制 2.3 Verilog标识符 2.4 Verilog 的数据类型 2.4.1 寄存器类型 2.4.2 …

软件工程-数据流图DFD

数据流图&#xff08;DFD&#xff09;是一种图形化技术&#xff0c;它描绘信息流和数据从输入移动到输出的过程中经受的变换。 数据流图是系统逻辑功能和图形表示&#xff0c;即使不是专业的计算机人员也容易理解它&#xff0c;因此是分析员与用户之间极好的通信工具。 设计数…

Java数据结构与算法之“树”

目录 一、什么是树 ​编辑 二、树的相关组成 1. 常用名词 2.需要了解的名词 三、树的分类 &#xff08;一&#xff09;初级树 1.普通树 2.二叉树 &#xff08;二&#xff09;中级树 1.哈夫曼树HuffmanTree 2.二叉搜索树BST 3.平衡二叉树AVL &#xff08;三&#x…

基于Python的智能物流路径优化算法研究与应用

基于Python的智能物流路径优化算法研究与应用 摘要 随着电商行业的迅猛发展&#xff0c;物流配送的效率和成本成为影响企业竞争力的关键因素。本论文聚焦于基于Python语言实现智能物流路径优化算法的研究。通过对经典路径优化算法如Dijkstra算法、A*算法等的深入分析&#xff…

Origin2024 软件安装步骤与百度网盘

软件简介&#xff1a; Origin 2024是一款功能强大的科学绘图与数据分析软件&#xff0c;广泛应用于科研和工程领域&#xff0c;支持多种图形绘制、数据分析功能以及便捷的数据导入和管理。 网盘链接: https://pan.baidu.com/s/1MNQG2pD802LWxuIN40JfeA?pwdc85q 提取码:c85…

【算法应用】Alpha进化算法求解二维栅格路径规划问题

目录 1.算法原理2.二维路径规划数学模型3.结果展示4.参考文献5.代码获取 1.算法原理 Alpha进化&#xff1a;一种具有进化路径自适应和矩阵生成的高效进化算法 2.二维路径规划数学模型 栅格法模型最早由 W.E. Howden 于 1968 年提出&#xff0c;障碍物的栅格用黑色表示&#…

嵌入式八股文面试题(一)C语言部分

1. 变量/函数的声明和定义的区别&#xff1f; &#xff08;1&#xff09;变量 定义不仅告知编译器变量的类型和名字&#xff0c;还会分配内存空间。 int x 10; // 定义并初始化x int x; //同样是定义 声明只是告诉编译器变量的名字和类型&#xff0c;但并不为它分配内存空间…

Redis企业开发实战(二)——点评项目之商户缓存查询

目录 一、缓存介绍 二、缓存更新策略 三、如何保证redis与数据库一致性 1.解决方案概述 2.双写策略 3.双删策略 3.1延迟双删的目的 4.数据重要程度划分 四、缓存穿透 (一)缓存穿透解决方案 (二)缓存穿透示意图 五、缓存雪崩 (一)缓存雪崩解决方案 (二)缓存雪崩…