【吴恩达deeplearning.ai】基于LangChain开发大语言应用模型（上）

以下内容均整理来自deeplearning.ai的同名课程

Location 课程访问地址

DLAI - Learning Platform Beta (deeplearning.ai)

一、什么是LangChain

1、LangChain介绍

LangChain是一个框架，用于开发由大语言模型驱动的应用程序。开发者相信，最强大的、差异化的应用不仅会调用语言模型，而且还会具备以下原则：
数据感知：将语言模型与其他数据源连接起来。
代理性：允许语言模型与环境互动

LangChain支持python和javascript两种语言。专注于组合和模块化。
官方文档：https://python.langchain.com/en/latest/
中文文档：https://www.langchain.com.cn/

2、LangChain的模块化能力

包括大量的整合对话模型、聊天模型；提示词模板，输出分析器，示例选择器。

支持检索和调用其他数据源，包括不限于文本、数组，支持多个数据检索工具。

支持搭建对话链模板，按输入信息，自动生成标准化加工后的输出结果。

可调用多个预设或者自定义的算法和小工具。

二、模型、提示词和输出解析器Models, Prompts and Output Parsers

1、Prompt template提示词模板

通常来说，我们通过以下方式调用gpt

def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, 
    )
    return response.choices[0].message["content"]
# 创建一个调用函数

prompt = f"""Translate the text \
that is delimited by triple backticks 
into a style that is {style}.
text: ```{customer_email}```
"""
# 编写提示语

response = get_completion(prompt)
#调用生成结果

现在看下langchain怎么基于模型进行调用

from langchain.chat_models import ChatOpenAI
chat = ChatOpenAI(temperature=0.0)
# 加载langchain对话模型，并设置对话随机性为0

template_string = """Translate the text \
that is delimited by triple backticks \
into a style that is {style}. \
text: ```{text}```
"""
# 设计模板信息

from langchain.prompts import ChatPromptTemplate
prompt_template = ChatPromptTemplate.from_template(template_string)
# 加载提示语模板，载入模板信息

customer_style = """American English \
in a calm and respectful tone
"""
customer_email = """
Arrr, I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie! And to make matters worse, \
the warranty don't cover the cost of \
cleaning up me kitchen. I need yer help \
right now, matey!
"""
# 定义模板中可变字段的变量信息

customer_messages = prompt_template.format_messages(
                    style=customer_style,
                    text=customer_email)
# 调用模板，对模板中的变量进行赋值，并生成最终提示语

customer_response = chat(customer_messages)
# 调用提示语，生成对话结果

通过“创建包含变量信息的提示词模板”，可以按照需求场景，灵活的通过改变变量信息，生成新的提示词。实现了模板的复用。

2、Output Parsers输出解析器

将大语言模型生成的结果，转换为特定结构的输出，如字典，数组等

from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser
# 加载输出解析器

gift_schema = ResponseSchema(name="gift",
                             description="Was the item purchased\
                             as a gift for someone else? \
                             Answer True if yes,\
                             False if not or unknown.")
delivery_days_schema = ResponseSchema(name="delivery_days",
                                      description="How many days\
                                      did it take for the product\
                                      to arrive? If this \
                                      information is not found,\
                                      output -1.")
price_value_schema = ResponseSchema(name="price_value",
                                    description="Extract any\
                                    sentences about the value or \
                                    price, and output them as a \
                                    comma separated Python list.")

response_schemas = [gift_schema, 
                    delivery_days_schema,
                    price_value_schema]
# 创建一组解析规则

output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
format_instructions = output_parser.get_format_instructions()
#编译解析规则

review_template_2 = """\
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product\
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

text: {text}

{format_instructions}
"""
# 创建一个提示词模板，将编译好的解析规则添加到模板中

prompt = ChatPromptTemplate.from_template(template=review_template_2)
messages = prompt.format_messages(text=customer_review, 
                                format_instructions=format_instructions)
# 通过模板生成提示词信息

response = chat(messages)
# 生成结果
output_dict = output_parser.parse(response.content)
# 将生成结果存入字典中

三、Memory内存组件

大语言模型在通过接口调用过程中，并不会自动记忆历史问答/上下文（来进行回答）。而通过调用memory组件。langchain提供了多种记忆历史问答/上下文的方式。

Outline概要

ConversationBufferMemory
ConversationBufferWindowMemory
ConversationTokenBufferMemory
ConversationSummaryMemory

ConversationBufferMemory对话内存

from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
# 加载所需包

llm = ChatOpenAI(temperature=0.0)
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)
# 船创建一个对话，创建一个上下文储存区，创建一个链式沟通会话。

conversation.predict(input="Hi, my name is Andrew")
conversation.predict(input="What is 1+1?")
conversation.predict(input="What is my name?")
#在会话中添加会话内容，程序会自动将提问和回答一起保存到上下文储存区

print(memory.buffer)
memory.load_memory_variables({})
#显示上下文储存区内保存的会话内容

memory.save_context({"input": "Hi"}, 
                    {"output": "What's up"})
#直接对上下文储存区内的会话内容进行赋值（赋值内容为问答对）

ConversationBufferWindowMemory有限对话内存

from langchain.memory import ConversationBufferWindowMemory
# 加载组件

memory = ConversationBufferWindowMemory(k=1)
# 添加一个只有1空间的记忆内存

memory.save_context({"input": "Hi"},
                    {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
# 此时，上下文储存区里面，只有第二个对话的记忆，即在1空间情况下，程序只会记忆最新的1空间的问答记忆。

ConversationTokenBufferMemory有限词汇内存

from langchain.memory import ConversationTokenBufferMemory
from langchain.llms import OpenAI
llm = ChatOpenAI(temperature=0.0)
# 加载组件

memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=30)
# 创建一个只有30词汇大小的记忆空间（因为有限空间的判断也会用到大预言模型，所以需要加载llm）

memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})
# 在这种情况下，程序只会保存不大于30个词汇的最新的问答，此时并不会强行保证问答都存在，仅包含答案也行。

memory.load_memory_variables({})
# 显示结果：{'history': 'AI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!'}

ConversationSummaryMemory总结式记忆内存

from langchain.memory import ConversationSummaryBufferMemory
# 加载包

schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."
# 一个长内容

memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)
# 创建一个最大词汇量为100的上下文总结式记忆空间（需要大预言模型进行总结，所以加载模型）

memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"}, 
                    {"output": f"{schedule}"})
# 添加对话

memory.load_memory_variables({})
# 显示结果为总结后的内容，通过总结将记忆内容缩短到100个词汇以内：{'history': "System: The human and AI engage in small talk before discussing the day's schedule. The AI informs the human of a morning meeting with the product team, time to work on the LangChain project, and a lunch meeting with a customer interested in the latest AI developments."}

conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)
conversation.predict(input="What would be a good demo to show?")
# 特别的，在对话中调用总结式记忆空间。会自动保存最新一段AI答的原文（不总结归纳）
# 并把其他对话内容进行总结。这样做可能是为了更好的获取回答，最后一段AI答价值很大，不宜信息缩减。

三、Chains对话链

Outline

LLMChain
Sequential Chains
- SimpleSequentialChain
- SequentialChain
Router Chain

LLMChain基础链

from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain
llm = ChatOpenAI(temperature=0.9)
# 加载包

prompt = ChatPromptTemplate.from_template(
    "What is the best name to describe \
    a company that makes {product}?"
)
# 创建一个待变量product的提示词

chain = LLMChain(llm=llm, prompt=prompt)
# 创建一个基础对话链

product = "Queen Size Sheet Set"
chain.run(product)
# 提示词变量赋值，并获得回答

SimpleSequentialChain一般序列链

一般序列链可以将前一个链的输出结果，作为后一个链的输入。一般序列链有唯一输入和输出变量。

from langchain.chains import SimpleSequentialChain
llm = ChatOpenAI(temperature=0.9)
# 加载包

first_prompt = ChatPromptTemplate.from_template(
    "What is the best name to describe \
    a company that makes {product}?"
)
# 提示词模板1，变量为product

chain_one = LLMChain(llm=llm, prompt=first_prompt)
# 链1

second_prompt = ChatPromptTemplate.from_template(
    "Write a 20 words description for the following \
    company:{company_name}"
)
# 提示词模板2，变量为company_name

chain_two = LLMChain(llm=llm, prompt=second_prompt)
# 链2

overall_simple_chain = SimpleSequentialChain(chains=[chain_one, chain_two],
                                             verbose=True)
overall_simple_chain.run(product)
# 组合链1、链2，获取结果

SequentialChain序列链

序列链中包含多个链，其中一些链的结果可以作为另一个链的输入。序列链可以支持多个输入和输出变量。

from langchain.chains import SequentialChain
llm = ChatOpenAI(temperature=0.9)
# 加载

first_prompt = ChatPromptTemplate.from_template(
    "Translate the following review to english:"
    "\n\n{Review}"
chain_one = LLMChain(llm=llm, prompt=first_prompt, 
                     output_key="English_Review"
                    )
# 链1：输入Review，输出English_Review

second_prompt = ChatPromptTemplate.from_template(
    "Can you summarize the following review in 1 sentence:"
    "\n\n{English_Review}"
)
chain_two = LLMChain(llm=llm, prompt=second_prompt, 
                     output_key="summary"
                    )
# 链2：输入English_Review，输出summary

third_prompt = ChatPromptTemplate.from_template(
    "What language is the following review:\n\n{Review}"
)
chain_three = LLMChain(llm=llm, prompt=third_prompt,
                       output_key="language"
                      )
# 链3：输入Review，输出language

fourth_prompt = ChatPromptTemplate.from_template(
    "Write a follow up response to the following "
    "summary in the specified language:"
    "\n\nSummary: {summary}\n\nLanguage: {language}"
)
chain_four = LLMChain(llm=llm, prompt=fourth_prompt,
                      output_key="followup_message"
                     )
# 链4：输入summary、language，输出followup_message

overall_chain = SequentialChain(
    chains=[chain_one, chain_two, chain_three, chain_four],
    input_variables=["Review"],
    output_variables=["English_Review", "summary","followup_message"],
    verbose=True
)
# 构建完整链，输入Review，输出"English_Review", "summary","followup_message"

overall_chain(review)

Router Chain路由链

路由链类似一个while else的函数，根据输入值，选择对应的路由（路径）进行后续的链路。整个路由链一般一个输入，一个输出。

physics_template = """You are a very smart physics professor. \
You are great at answering questions about physics in a concise\
and easy to understand manner. \
When you don't know the answer to a question you admit\
that you don't know.

Here is a question:
{input}"""


math_template = """You are a very good mathematician. \
You are great at answering math questions. \
You are so good because you are able to break down \
hard problems into their component parts, 
answer the component parts, and then put them together\
to answer the broader question.

Here is a question:
{input}"""

history_template = """You are a very good historian. \
You have an excellent knowledge of and understanding of people,\
events and contexts from a range of historical periods. \
You have the ability to think, reflect, debate, discuss and \
evaluate the past. You have a respect for historical evidence\
and the ability to make use of it to support your explanations \
and judgements.

Here is a question:
{input}"""


computerscience_template = """ You are a successful computer scientist.\
You have a passion for creativity, collaboration,\
forward-thinking, confidence, strong problem-solving capabilities,\
understanding of theories and algorithms, and excellent communication \
skills. You are great at answering coding questions. \
You are so good because you know how to solve a problem by \
describing the solution in imperative steps \
that a machine can easily interpret and you know how to \
choose a solution that has a good balance between \
time complexity and space complexity. 

Here is a question:
{input}"""

# 创建4种提示词模板

prompt_infos = [
    {
        "name": "physics", 
        "description": "Good for answering questions about physics", 
        "prompt_template": physics_template
    },
    {
        "name": "math", 
        "description": "Good for answering math questions", 
        "prompt_template": math_template
    },
    {
        "name": "History", 
        "description": "Good for answering history questions", 
        "prompt_template": history_template
    },
    {
        "name": "computer science", 
        "description": "Good for answering computer science questions", 
        "prompt_template": computerscience_template
    }
]
# 提示词要点信息

from langchain.chains.router import MultiPromptChain
from langchain.chains.router.llm_router import LLMRouterChain,RouterOutputParser
from langchain.prompts import PromptTemplate
llm = ChatOpenAI(temperature=0)
# 加载


destination_chains = {}
for p_info in prompt_infos:
    name = p_info["name"]
    prompt_template = p_info["prompt_template"]
    prompt = ChatPromptTemplate.from_template(template=prompt_template)
    chain = LLMChain(llm=llm, prompt=prompt)
    destination_chains[name] = chain  
    
destinations = [f"{p['name']}: {p['description']}" for p in prompt_infos]
destinations_str = "\n".join(destinations)
# 根据提示词要点信息，生成4个链，存入destination中

default_prompt = ChatPromptTemplate.from_template("{input}")
default_chain = LLMChain(llm=llm, prompt=default_prompt)
# 创建默认提示词和链

MULTI_PROMPT_ROUTER_TEMPLATE = """Given a raw text input to a \
language model select the model prompt best suited for the input. \
You will be given the names of the available prompts and a \
description of what the prompt is best suited for. \
You may also revise the original input if you think that revising\
it will ultimately lead to a better response from the language model.

<< FORMATTING >>
Return a markdown code snippet with a JSON object formatted to look like:
```json
{{{{
    "destination": string \ name of the prompt to use or "DEFAULT"
    "next_inputs": string \ a potentially modified version of the original input
}}}}
```

REMEMBER: "destination" MUST be one of the candidate prompt \
names specified below OR it can be "DEFAULT" if the input is not\
well suited for any of the candidate prompts.
REMEMBER: "next_inputs" can just be the original input \
if you don't think any modifications are needed.

<< CANDIDATE PROMPTS >>
{destinations}

<< INPUT >>
{{input}}

<< OUTPUT (remember to include the ```json)>>"""
# 创建一个提示词模板，包含destination和input两个变量

router_template = MULTI_PROMPT_ROUTER_TEMPLATE.format(
    destinations=destinations_str
)
# 提示词模板赋值destination

router_prompt = PromptTemplate(
    template=router_template,
    input_variables=["input"],
    output_parser=RouterOutputParser(),
)
# 提示词模板赋值

router_chain = LLMRouterChain.from_llm(llm, router_prompt)
chain = MultiPromptChain(router_chain=router_chain, 
                         destination_chains=destination_chains, 
                         default_chain=default_chain, verbose=True)
# 生成路由链

chain.run("xxx")