探索约束LLM输出JSON的应用

0、引言

JSON（JavaScript Object Notation）因其简洁、易读和易于解析的特性，已成为全球使用最广泛的数据交换格式之一。它能够满足各种数据交换需求，特别是在构建人工智能驱动的应用程序时，工程师们经常需要将大型语言模型（LLM）的输出整合到他们的代码库中。

通过向LLM指定特定的语法或模式，并指导其生成符合这些规范的结果，可以提高应用程序的可预测性和稳定性。这种标准化的输出方式，使得应用程序能够更加高效地处理和利用由LLM生成的数据。

简而言之，JSON的互操作性、灵活性和广泛支持，使其成为不同系统和应用程序之间数据交换的首选格式。
在这里插入图片描述

1、为什么让LLM 输出JSON数据如此困难？

语言模型擅长预测下一个标记并生成文本，但它们在产生文本之外的精确输出方面可能具有挑战性，因为它们并不总是精确地遵循指令

例如：对于 OpenAI，希望 GPT-3.5-turbo 始终以以下形式响应

(message_type) {message_content}

然而，它可能会以略微不同的方式响应：

message_type：message_content
message_type："message_content"
(message_type): "message_content"

2、使用提示工程

Please provide the response in the form of a Python list. It should begin with “[“ and end with “]”.
“请以Python列表的形式提供回复。它应该以‘[’开始，以‘]’结束。”

Chatgpt (gpt4) 支持提示系统/用户 (gpt4 api) 将数据格式化为 csv。通常工作完美。虽然 gpt4 非常适合制作演示原型，但它相当昂贵，因此本地解决方案将是完美的。

有许多提示工程框架可以限制 json 格式的输出，请参阅此处的一个用于 LLM 输出的严格 JSON 框架。

## simple example provided by the author
res = strict_output(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful day',
                    output_format = {"Sentiment": "Type of Sentiment",
                                    "Tense": "Type of Tense"})     
print(res)
## output
{'Sentiment': 'Positive', 'Tense': 'Present'}

虽然提示工程对于某些用例可能是有效的，但它有一个局限性—LLM所做的任何内部更改都可能导致意外的输出。众所周知，这会在生产环境中引起问题，正如在线故事中所见，依赖 ChatGPT API 的 AI 应用程序由于不断的后台更新而失败。

3、约束LLM输出

这一领域已经有大量的创新工作，这里探索三个框架，它们都从不同的角度解决了这个问题。尽管使用不同的方法，但每个框架如何达到相似的结果给我留下了深刻的印象。

GRAMMAR — 约束模型输出的语法。例如，你可以强制模型仅输出 JSON：
KOR — 这是一个半成品原型，可以“帮助”你使用LLM从文本中提取结构化数据
LM-Format-Enforcer — 强制语言模型的输出格式（JSON Schema、Regex 等）
Finetune LLM 模型 — 教导模型根据输入数据输出 JSON

3.1 使用语法规则强制模型仅输出 JSON

在这种方法中，你需要使用 Llama.cpp 来运行模型并创建语法文件。 GBNF (GGML BNF) 是一种用于定义形式语法以约束 llama.cpp 中模型输出的格式。

这是我为基本测试创建的一个简单语法文件：

root ::= answer
answer ::= "{"   ws   ""id":"   ws   number   ","   ws   ""name":"   ws   string   "}"
answerlist ::= "[]" | "["   ws   answer   (","   ws   answer)*   "]"
string ::= """   ([^"]*)   """
boolean ::= "true" | "false"
ws ::= [ tn]*
number ::= [0-9]+   "."?   [0-9]*
stringlist ::= "["   ws   "]" | "["   ws   string   (","   ws   string)*   ws   "]"
numberlist ::= "["   ws   "]" | "["   ws   string   (","   ws   number)*   ws   "]"

它更难理解，但是，可以从更容易理解的模式定义开始。如下所示：

interface answer {
    id: number;
    name: string;
}

接下来将模式粘贴到这个在线工具以自动生成语法文件 - 省去很多麻烦。

现在，有了一个语法文件并准备好插入 Llama.cpp。有关在你的计算机上本地运行的设置的更多详细信息，请参阅存储库。

## start with a prompt
 ./main -m ./models/Mistral-7B-Instruct-v0.1-Q8.gguf -n 256 — grammar-file grammars/answer.gbnf -p ‘Q: Name the planets in the solar system? A:’
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  =   64.00 MB
llama_new_context_with_model: compute buffer total size = 79.13 MB
llama_new_context_with_model: VRAM scratch buffer: 73.00 MB
llama_new_context_with_model: total VRAM used: 73.00 MB (model: 0.00 MB, context: 73.00 MB)

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
sampling: 
 repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
 top_k = 40, tfs_z = 1.000, top_p = 0.950, typical_p = 1.000, temp = 0.800
 mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 512, n_batch = 512, n_predict = 256, n_keep = 0

## response
Q: Name the planets in the solar system? A:{ "id": 1, "name": "Mercury"} [end of text]

llama_print_timings:        load time =     845.86 ms
llama_print_timings:      sample time =     157.01 ms /    16 runs   (    9.81 ms per token,   101.91 tokens per second)
llama_print_timings: prompt eval time =     649.35 ms /    13 tokens (   49.95 ms per token,    20.02 tokens per second)
llama_print_timings:        eval time =    3280.48 ms /    15 runs   (  218.70 ms per token,     4.57 tokens per second)
llama_print_timings:       total time =    4104.05 ms
Log end

搞定！结果是合法的 json对象 {"id"：1，"name"："Mercury"} 。

因此，语法可以灵活地创建复杂的对象。这是我第二次尝试创建收据模式和语法文件。

## Receipt Type Definitions using Typescript.

interface RestaurantReceipt {
    restaurant: Restaurant;
    customer: Customer;
    order_date: string;
    total_price: number;
    tax_rate: number;
    tax_amount: number;
    discount_code: string;
    payment_method: string;
    card_type: string;
    card_number: string;
    expiration_month: number;
    expiration_year: number;
    cvv: string;
    shipping_address: string;
    items: Item[];
  }
   
  interface Restaurant {
    name: string;
    location: Location;
    year: number;
    phone_number: string;  
    email:string;  
  }
  
  interface Customer {
    first_name: string;
    last_name: string;
    email:string;
    phone_number: string;
  }
  
  interface Location {
    address: string;
    city: string;
    state: string;
    country: string;
  }
  
  interface Item {
    item_name: string;
    quantity: number;
    unit_price: number;
    description: string;
    item_total: number;
  }

对此收据生成的语法文件：

## Generated Grammar used during LLMs generation.

root ::= RestaurantReceipt
Item ::= "{"   ws   ""item_name":"   ws   string   ","   ws   ""quantity":"   ws   number   ","   ws   ""unit_price":"   ws   number   ","   ws   ""description":"   ws   string   ","   ws   ""item_total":"   ws   number   "}"
Itemlist ::= "[]" | "["   ws   Item   (","   ws   Item)*   "]"
Location ::= "{"   ws   ""address":"   ws   string   ","   ws   ""city":"   ws   string   ","   ws   ""state":"   ws   string   ","   ws   ""country":"   ws   string   "}"
Locationlist ::= "[]" | "["   ws   Location   (","   ws   Location)*   "]"
Customer ::= "{"   ws   ""first_name":"   ws   string   ","   ws   ""last_name":"   ws   string   ","   ws   ""email":"   ws   string   ","   ws   ""phone_number":"   ws   string   "}"
Customerlist ::= "[]" | "["   ws   Customer   (","   ws   Customer)*   "]"
Restaurant ::= "{"   ws   ""name":"   ws   string   ","   ws   ""location":"   ws   Location   ","   ws   ""year":"   ws   number   ","   ws   ""phone_number":"   ws   string   ","   ws   ""email":"   ws   string   "}"
Restaurantlist ::= "[]" | "["   ws   Restaurant   (","   ws   Restaurant)*   "]"
RestaurantReceipt ::= "{"   ws   ""restaurant":"   ws   Restaurant   ","   ws   ""customer":"   ws   Customer   ","   ws   ""order_date":"   ws   string   ","   ws   ""total_price":"   ws   number   ","   ws   ""tax_rate":"   ws   number   ","   ws   ""tax_amount":"   ws   number   ","   ws   ""discount_code":"   ws   string   ","   ws   ""payment_method":"   ws   string   ","   ws   ""card_type":"   ws   string   ","   ws   ""card_number":"   ws   string   ","   ws   ""expiration_month":"   ws   number   ","   ws   ""expiration_year":"   ws   number   ","   ws   ""cvv":"   ws   string   ","   ws   ""shipping_address":"   ws   string   ","   ws   ""items":"   ws   Itemlist   "}"
RestaurantReceiptlist ::= "[]" | "["   ws   RestaurantReceipt   (","   ws   RestaurantReceipt)*   "]"
string ::= """   ([^"]*)   """
boolean ::= "true" | "false"
ws ::= [ tn]*
number ::= [0-9]+   "."?   [0-9]*
stringlist ::= "["   ws   "]" | "["   ws   string   (","   ws   string)*   ws   "]"
numberlist ::= "["   ws   "]" | "["   ws   string   (","   ws   number)*   ws   "]"

然后运行 llama.cpp：

## Constrained output with grammars
> llama.cpp supports grammars to constrain model output. For example, you can force the model to output JSON only:
 ./main -m ./models/Mistral-7B-Instruct-v0.1-Q8.gguf -n 256 --grammar-file grammars/json.gbnf -p 'give me a sample receipt:'

输出结果：

llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  =   64.00 MB
llama_new_context_with_model: compute buffer total size = 79.13 MB
llama_new_context_with_model: VRAM scratch buffer: 73.00 MB
llama_new_context_with_model: total VRAM used: 73.00 MB (model: 0.00 MB, context: 73.00 MB)

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
sampling: 
 repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
 top_k = 40, tfs_z = 1.000, top_p = 0.950, typical_p = 1.000, temp = 0.800
 mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 512, n_batch = 512, n_predict = 256, n_keep = 0

give me a sample receipt:{"receiptNumber":"12345","customerName":"John Smith","date":
"2021-01-01 10:30:00.000000",
"items": [
{
"itemId": "1",
"productId": "ABC123",
"quantity": 1,
"unitPrice": 19.99
},
{
"itemId": "2",
"productId": "DEF456",
"quantity": 2,
"unitPrice": 29.99
}
],
"subTotal": 59.98,
"taxAmount": 2.37,
"total": 62.35
} [end of text]

llama_print_timings:        load time =     842.78 ms
llama_print_timings:      sample time =    2477.51 ms /   177 runs   (   14.00 ms per token,    71.44 tokens per second)
llama_print_timings: prompt eval time =     509.36 ms /     9 tokens (   56.60 ms per token,    17.67 tokens per second)
llama_print_timings:        eval time =   38122.00 ms /   176 runs   (  216.60 ms per token,     4.62 tokens per second)
llama_print_timings:       total time =   41331.49 ms
Log end

到目前为止，语法可以控制输出始终生成 JSON 作为输出—看起来很有前途的解决方案。请参阅我的存储库，了解我为此测试创建的架构和语法文件。

3.2 KOR — 使用LLM提取文本中的结构化数据

关于一些可以用 Kor 完成的事情的想法。

从与提取模式匹配的文本中提取数据。
通过精确理解用户请求，为人工智能助手提供技能。
提供对现有 API 的自然语言访问。

请参阅此处的存储库链接，了解我为此测试创建的测试笔记本。

对于此测试，我将使用开源 LLama-2 模型，因为我们都喜欢节省不使用 ChatGPT api 的成本。

## download LLM model
from huggingface_hub import hf_hub_download
downloaded_model_path = hf_hub_download(repo_id="TheBloke/Llama-2-7b-Chat-GGUF", filename="llama-2-7b-chat.Q5_K_M.gguf")

from langchain.llms  import LlamaCpp
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from kor.extraction import create_extraction_chain

# get model chain
llm = LlamaCpp(model_path=downloaded_model_path,temperature=0.8,verbose=True,echo=True,n_ctx=512)

DEFAULT_SYSTEM_PROMPT = """
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.nnIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
"""
def get_prompt(message: str, system_prompt: str = DEFAULT_SYSTEM_PROMPT) -> str:
    return f'<s>[INST] <<SYS>>n{system_prompt}n<</SYS>>nn{message} [/INST]'

示例 1：模式和链 — 输出单个 Json 对象

#from langchain.chat_models import ChatOpenAI
from kor import create_extraction_chain, Object, Text
from kor.nodes import Object, Text, Number

schema = Object(
    id="player",
    description=(
        "User is controlling a music player to select songs, pause or start them or play"
        " music by a particular artist."
    ),
    attributes=[
        Text(
            id="song",
            description="User wants to play this song",
            examples=[],
            many=True,
        ),
        Text(
            id="album",
            description="User wants to play this album",
            examples=[],
            many=True,
        ),
        Text(
            id="artist",
            description="Music by the given artist",
            examples=[("Songs by paul simon", "paul simon")],
            many=True,
        ),
        Text(
            id="action",
            description="Action to take one of: `play`, `stop`, `next`, `previous`.",
            examples=[
                ("Please stop the music", "stop"),
                ("play something", "play"),
                ("play a song", "play"),
                ("next song", "next"),
            ],
        ),
    ],
    many=False,
)
## chain
chain = create_extraction_chain(llm, schema, encoder_or_encoder_class='json')

chain.run("play songs by paul simon and led zeppelin and the doors")['data']

## result 
{'player': {'artist': ['paul simon', 'led zeppelin', 'the doors']}}

结果看起来不错，与单个对象的架构定义匹配。 KOR 还支持更流行的 pydantic 模式定义。这是创建 json 对象列表的第二个示例。

示例 2：Pydantic Schema — Json 对象的输出列表

from kor import from_pydantic
from typing import List, Optional
from pydantic import BaseModel, Field

## schema
class PlanetSchema(BaseModel):
    planet_name: str = Field(description="The name of the planet")

class PlanetList(BaseModel):
    planets: List[PlanetSchema]

schema, validator = from_pydantic(
    PlanetSchema,
    description="Planet Information",  
    many=True,  # <-- Note Many = True
)

chain = create_extraction_chain(llm, schema, validator=validator)

result = chain.run(("list planets in our solar system."))
result

## output
{'data': {'planetschema': []},
 'raw': 'n"planetname|name|nMercury|4|244|0.387|nVenus|10|210|0.936|nEarth|5|127|1.000|nMars|2|210|0.181|nJupiter|15|890|4.35|nSaturn|6|720|0.550|nUranus|7|510|0.750|nNeptune|8|490|1.778|"',
 'errors': [],
 'validated_data': []}

嗯，结果与我对 json 对象列表的预期不符。需要更多调查。鉴于原始数据确实得出了正确的值。

3.3 LM-Format-Enforcer — 强制LLM的输出格式

LM-Format-Enforcer可以强制LLM的输出格式，例如JSON、Regex等，这是一个看起来很有希望成为最好的框架。根据文档，框架根据架构设计操纵令牌的输出来生成 json。

请参阅我为此测试创建的笔记本。与 KOR 测试类似，我将继续使用开源 LLama-2 模型，因为它受到框架的支持。

## setup LLM model
from llama_cpp import Llama
from huggingface_hub import hf_hub_download
downloaded_model_path = hf_hub_download(repo_id="TheBloke/Llama-2-7b-Chat-GGUF", filename="llama-2-7b-chat.Q5_K_M.gguf")
llm = Llama(model_path=downloaded_model_path)


DEFAULT_SYSTEM_PROMPT = """
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.nnIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
"""
def get_prompt(message: str, system_prompt: str = DEFAULT_SYSTEM_PROMPT) -> str:
    return f'<s>[INST] <<SYS>>n{system_prompt}n<</SYS>>nn{message} [/INST]'

对于令牌的输出操作，它与 LLM 推理框架紧密耦合。对于 Llama.cpp，它需要创建一个 LogitProcessor。参见下面的代码：

## LM Format Enforcer Logits Processor
from typing import Optional
from llama_cpp import LogitsProcessorList
from lmformatenforcer import CharacterLevelParser
from lmformatenforcer.integrations.llamacpp import build_llamacpp_logits_processor
from lmformatenforcer import JsonSchemaParser
from pydantic import BaseModel
from typing import List
from IPython.display import display, Markdown

def display_header(text):
    display(Markdown(f'**{text}**'))

def display_content(text):
    display(Markdown(f'```n{text}n```'))

def llamacpp_with_character_level_parser(llm: Llama, prompt: str, character_level_parser: Optional[CharacterLevelParser]) -> str:
    logits_processors: Optional[LogitsProcessorList] = None
    if character_level_parser:
        logits_processors = LogitsProcessorList([build_llamacpp_logits_processor(llm, character_level_parser)])
    
    output = llm(prompt, logits_processor=logits_processors)
    text: str = output['choices'][0]['text']
    return text

现在，我们要运行一个简单的测试来返回单个 json 对象

class PlayerSchema(BaseModel):
    first_name: str
    last_name: str
    year_of_birth: int
    num_seasons_in_nba: int

question = 'Please give me information about Michael Jordan. You MUST answer using the following json schema: '
question_with_schema = f'{question}{PlayerSchema.schema_json()}'
prompt = get_prompt(question_with_schema)

display_header("Standard LLM Output:")
result = llamacpp_with_character_level_parser(llm, prompt, None)
display_content(result)

## result 
 Of course! I'd be happy to provide information about Michael Jordan using the provided JSON schema.
{
"first_name": "Michael",
"last_name": "Jordan",
"year_of_birth": 1963,
"num_seasons_in_nba": 15
}
I hope this helps! Let me know if you have any other questions.

所以，结果还不错，它包含一个json对象。但是，对于要使用此输出的应用程序，它仍然需要额外的解析工作来删除不需要的文本。所以这个框架正是在输出中保留不需要的文本—只返回一个 json 对象。

display_header("LLM Output with json schema enforcing:")
result = llamacpp_with_character_level_parser(llm, prompt, JsonSchemaParser(PlayerSchema.schema()))
display_content(result)

{ "first_name": "Michael", "last_name": "Jordan", "year_of_birth": 1963, "num_seasons_in_nba": 15 }

接下来，测试一下json对象列表的生成，首先从标准LLM输出开始：

message="Q:please give me a list of planets in the solar system? A: "
prompt=get_prompt(message,DEFAULT_SYSTEM_PROMPT)
output = llm(prompt,max_tokens=512,stop=["Q:"])
text: str = output['choices'][0]['text']
display_header("LLM standard output")
print(text)

## LLM standard output

  Of course! I'd be happy to help you with that. The eight planets in our solar system are:
1. Mercury
2. Venus
3. Earth
4. Mars
5. Jupiter
6. Saturn
7. Uranus
8. Neptune

现在，让我们加入 LLM 输出强制以及一个简单的模式。

## llm
llm = Llama(model_path=downloaded_model_path, n_ctx=4096,n_threads=16,verbose=False)

from typing import List
from pydantic import BaseModel

## schema
class PlanetSchema(BaseModel):
    planet_name: str

class PlanetList(BaseModel):
    planets: List[PlanetSchema]

## question
question = 'please give me a list of planets in the solar system?. You MUST answer using the following json schema: '
question_with_schema = f'{question}{PlanetList.schema_json()}'
prompt = get_prompt(question_with_schema)
#display_content(prompt)

## response
display_header("LLM Output with json schema enforcing:")
result = llamacpp_with_character_level_parser(llm, prompt, JsonSchemaParser(PlanetList.schema()))
display_content(result)

## LLM Output with json schema enforcing:
{ "planets": [ 
{ "planet_name": "Mercury" }, 
{ "planet_name": "Venus" }, { "planet_name": "Earth" }, 
{ "planet_name": "Mars" }, { "planet_name": "Jupiter" }, 
{ "planet_name": "Saturn" }, { "planet_name": "Uranus" }, 
{ "planet_name": "Neptune" } 
] }