【langchain手把手3】使用示例选择器构建Prompt
Example selector
示例选择器实现用于选择示例以将其包括在提示中的逻辑。这使我们能够选择与输入最相关的示例。core内置的有以下3种示例选择器:
- LengthBasedExampleSelector:
- MaxMarginalRelevanceExampleSelector:
- SemanticSimilarityExampleSelector:
此外,langchain_community还有多种示例选择器,这里就不再过多尝试了。
# 为了使用示例选择器,我们需要创建一个示例列表。这些通常应该是示例输入和输出。
# 为了这个演示的目的,让我们假设我们正在选择如何转换反义词的例子。
examples = [
{"input": "happy", "output": "sad"},
{"input": "tall", "output": "short"},
{"input": "energetic", "output": "lethargic"},
{"input": "sunny", "output": "gloomy"},
{"input": "windy", "output": "calm"},
]
from langchain_core.prompts import PromptTemplate
# PromptTemplate用于格式化这些用例。
example_prompt = PromptTemplate(
input_variables=["input", "output"],
template="Input: {input}\nOutput: {output}",
)
Select by length
当您担心构建一个将超过上下文窗口长度的提示时,这很有用。对于较长的输入,它将选择较少的示例来包括,而对于较短的输入,则会选择更多。
官方API地址:https://api.python.langchain.com/en/latest/example_selectors/langchain_core.example_selectors.length_based.LengthBasedExampleSelector.html
from langchain_core.example_selectors import LengthBasedExampleSelector
from langchain_core.prompts import FewShotPromptTemplate
"""
LengthBasedExampleSelector根据长度来选择示例,以下为几个变量:
- examples: List[dict] # 提示模板所需的示例列表。
- example_prompt: PromptTemplate # 用于设置示例格式的提示模板。
- get_text_length: Callable[[str], int] = _get_length_based # 用于测量提示长度的函数。默认为字数。get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
- max_length: int = 2048 # Max length for the prompt, beyond which examples are cut.
"""
example_selector = LengthBasedExampleSelector(
examples=examples,
example_prompt=example_prompt,
max_length=25,
)
"""
通过LengthBasedExampleSelector构成FewShotPromptTemplate
"""
dynamic_prompt = FewShotPromptTemplate(
# We provide an ExampleSelector instead of examples.
example_selector=example_selector,
example_prompt=example_prompt,
prefix="Give the antonym of every input",
suffix="Input: {adjective}\nOutput:",
input_variables=["adjective"],
)
# An example with small input, so it selects all examples.
print(dynamic_prompt.format(adjective="big"))
Give the antonym of every input
Input: happy
Output: sad
Input: tall
Output: short
Input: energetic
Output: lethargic
Input: sunny
Output: gloomy
Input: windy
Output: calm
Input: big
Output:
# 给一个长句子(21个词),他就会限制prompt的长度,只输出1个示例
long_string = "big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else"
print(dynamic_prompt.format(adjective=long_string))
import re
print("word count:" + str(len(re.split("\n| ", long_string))))
Give the antonym of every input
Input: happy
Output: sad
Input: big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else
Output:
word count:21
# 给一个长句子(27个词),他就会限制prompt的长度,没有示例输出
long_string = "big and huge and massive and big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else"
print(dynamic_prompt.format(adjective=long_string))
import re
print("word count:" + str(len(re.split("\n| ", long_string))))
Give the antonym of every input
Input: big and huge and massive and big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else
Output:
word count:27
由以上实验可知,LengthBasedExampleSelector主要用于控制Prompt长度,以免超过模型要求的上限
Select by maximal marginal relevance (MMR)
MaxMarginalRelevanceExampleSelector根据与输入最相似的示例的组合来选择示例,同时优化多样性。它通过找到嵌入与输入具有最大余弦相似性的示例来做到这一点,然后迭代地添加它们,同时惩罚它们与已经选择的示例的接近程度。
官方API地址:https://api.python.langchain.com/en/latest/example_selectors/langchain_core.example_selectors.semantic_similarity.MaxMarginalRelevanceExampleSelector.html
官方Demo地址:https://python.langchain.com/v0.1/docs/modules/model_io/prompts/example_selectors/mmr/
论文:https://arxiv.org/pdf/2211.13892.pdf
from langchain_core.example_selectors import MaxMarginalRelevanceExampleSelector
from langchain_community.embeddings import DashScopeEmbeddings
from langchain_community.vectorstores.faiss import FAISS
import os
import getpass
"""
LengthBasedExampleSelector使用示例列表和嵌入创建k-shot示例选择器,以下为几个变量:
- examples: List[dict] # 提示模板所需的示例列表。
- embeddings: Embeddings # 初始化的嵌入API接口,例如OpenAIEmbeddings()。
- vectorstore_cls (Type[VectorStore]) # A vector store DB interface class, e.g. FAISS.
- k: int=4 # 选择生成的示例个数
"""
example_selector = MaxMarginalRelevanceExampleSelector.from_examples(
# The list of examples available to select from.
examples,
# 由于OpenAI连接不方便,星火的Embedding API新老版本不一致,这里使用dashscope/阿里的
# 注册地址:https://dashscope.console.aliyun.com/overview
embeddings = DashScopeEmbeddings(dashscope_api_key="*", model="text-embedding-v1",),
# 存储embedding
vectorstore_cls=FAISS,
# The number of examples to produce.
k=2,
)
# Embedding和向量存储的选择后续章节会再讲,又是一大章篇幅
mmr_prompt = FewShotPromptTemplate(
example_selector=example_selector,
example_prompt=example_prompt,
prefix="Give the antonym of every input",
suffix="Input: {adjective}\nOutput:",
input_variables=["adjective"],
)
# 输入是一种感觉,所以应该选择快乐/悲伤的例子作为第一个
print(mmr_prompt.format(adjective="worried"))
Give the antonym of every input
Input: happy
Output: sad
Input: windy
Output: calm
Input: worried
Output:
Select by similarity
此对象根据与输入的相似性来选择示例。它通过找到具有与输入具有最大余弦相似性的嵌入的示例来实现这一点。
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
example_selector = SemanticSimilarityExampleSelector.from_examples(
# The list of examples available to select from.
examples,
# 由于OpenAI连接不方便,星火的Embedding API新老版本不一致,这里使用dashscope/阿里的
# 注册地址:https://dashscope.console.aliyun.com/overview
embeddings = DashScopeEmbeddings(dashscope_api_key="*", model="text-embedding-v1",),
# 存储embedding
vectorstore_cls=FAISS,
# The number of examples to produce.
k=1,
)
# Embedding和向量存储的选择后续章节会再讲,又是一大章篇幅
mmr_prompt = FewShotPromptTemplate(
example_selector=example_selector,
example_prompt=example_prompt,
prefix="Give the antonym of every input",
suffix="Input: {adjective}\nOutput:",
input_variables=["adjective"],
)
# 输入是一种感觉,所以应该选择快乐/悲伤的例子作为第一个
print(mmr_prompt.format(adjective="worried"))
Give the antonym of every input
Input: happy
Output: sad
Input: worried
Output:
# 输入是一种度量,因此应选择高/短示例
print(mmr_prompt.format(adjective="large"))
Give the antonym of every input
Input: tall
Output: short
Input: large
Output:
请关注
LLM硬着陆
公众号,共同学习进步