一文速学-零成本与数据沟通NL2SQL的概念和实现技术

news2024/12/23 2:37:39

前言

关于NL2SQL的技术,如果大家最近有关注AI圈的话,或多或少都有所了解。其实很多业务场景下,于用户而言更多的是想要获取到最终数据的呈现效果,关于数据是如何获取得到的学习成本,是尽可能越少越好。众所周知当学习成本越低,那么产品的获客率也越高,当然对于我们技术人员来说,更多的还是研发思维。最终我们开发的服务主要还是为了业务服务,NL2SQL必然是以后数据开发的趋势所在,因此我们数据开发人员来说,暂且不谈掌握这门技术,清楚理念还是十分必要的。

NL2SQL技术概念

NL2SQL(Natural Language to SQL)是自然语言处理和数据库查询相结合的一项技术,旨在将用户以自然语言输入的查询转换为SQL查询语句,从而实现自然语言问答与数据库之间的自动交互。就按照企业日常报表业务,按照研发思路,我们首先可以通过UI或者页面前端获取到客户的文本信息,传输到后端进行落库,然后工单展示或者是直接进行数据库查询,由数据开发人员编写SQL语言,最终查询得到结果再进行数据可视化展示。

在这里插入图片描述那么按照NL2SQL的理念来说,整个数据查询流程可以被大大简化。首先用户不需要再通过复杂的页面选择或表单提交,甚至不需要对数据库结构和SQL语言有任何了解。用户可以直接在前端输入自然语言的问题,例如“查询上个月的销售报表”或"显示2023年所有产品的库存情况",系统会自动将这些自然语言问题传递给后端。

在后端,NL2SQL技术会解析用户输入的自然语言,识别其中的意图、关键词,并结合具体的数据库模式,自动生成相应的SQL查询语句。这个过程涉及自然语言处理(NLP)、语义解析、数据库模式映射等多项核心技术。通过这些技术,系统能够理解用户的需求,并生成准确的SQL语句,从而直接查询数据库,获取数据。

查询结果同样可以直接返回给前端进行展示,或通过数据可视化工具进行图表化呈现。这种技术避免了传统方法中数据开发人员手动编写SQL查询的繁琐步骤,提升了响应速度。更重要的是,用户不再需要依赖技术人员进行数据查询,而是可以实时获得自己想要的结果。这不仅降低了学习成本,还能显著的提高客户的满意度和体验感。

在这里插入图片描述
可能光说概念大家还是有所不解,我们以一个实际场景来看:假设一位销售经理需要在月度会议前准备一份关于公司销售情况的报告,通常,整个过程可能会涉及多次向数据分析师或IT团队请求帮助,以编写SQL查询,导出数据并生成报告。那么传统方法一般为:

传统方法

  1. 自然语言需求:销售经理提出问题:“我想知道上个月每个产品的销售额是多少,并按销售额排序。”
  2. 沟通与转化:销售经理将这一需求发给数据分析师或IT人员。
  3. SQL查询编写:数据分析师根据需求,手动编写SQL查询:
SELECT product_name, SUM(sales_amount) AS total_sales
FROM sales
WHERE sale_date BETWEEN '2023-07-01' AND '2023-07-31'
GROUP BY product_name
ORDER BY total_sales DESC;
  1. 数据提取与导出:查询结果被导出并整理为Excel或其他格式,交给销售经理。

  2. 报告制作:销售经理将导出的数据整合进报告中,进一步处理可视化图表。

这种传统方法需要多个步骤,涉及不同角色的协作,尤其是在多个查询需求下,可能会导致反复沟通和修改SQL查询的过程。

NL2SQL技术

  1. 自然语言查询:销售经理直接在系统中输入问题:“显示上个月每个产品的销售额,并按销售额排序。”
  2. NL2SQL技术处理:系统通过自然语言处理,将该需求解析为SQL查询,自动生成如下SQL语句:
SELECT product_name, SUM(sales_amount) AS total_sales
FROM sales
WHERE sale_date BETWEEN '2023-07-01' AND '2023-07-31'
GROUP BY product_name
ORDER BY total_sales DESC;

  1. 自动执行查询并显示结果:系统执行查询,并将结果以表格或图表形式直接呈现给销售经理。

  2. 即时反馈与可视化:销售经理立即获取数据,并可以根据需要进一步调整查询,如“按产品类别分类显示销售额”或“查看去年同期的销售数据”。

在这里插入图片描述效果对比就十分明显了,使用NL2SQL后查询过程自动化,大幅缩短了数据获取的时间。销售经理无需学习SQL语言,仅凭自然语言就可以完成复杂查询,降低了技术门槛。而且可以基于效果实时验证,查询可以随时进行,不再依赖技术团队的实时对接,查询结果直接以可视化形式呈现,用户能够更直观理解数据,从而快速做出业务决策。

NL2SQL技术支持

NL2SQL工作流程

在这里插入图片描述
用户输入自然语言问题,例如“查找2020年销售额最高的产品。系统首先需要理解用户输入的查询意图。自然语言本质上是模棱两可的,使其容易受到多种解释的影响。解决这种歧义是自然语言处理中的一项关键任务,以确保人与机器之间的准确通信。在实践中,情况可能并非总是如此。最终用户可能不知道(全部或部分)列或表的语义。因此最大的问题在于如何让AI能够精确定位到数据的具体表字段信息。所以说做NL2SQL不可避免的需要业务强绑定,必然要对所在领域的数据集有清楚的认知,才能训练好语义大模型。

目前比较火的英文数据集有WikiSQL、Spider、WikiTableQuestions、ATIS等,中文数据集有刚刚结束的中文首届NL2SQL挑战赛公开的数据,各个数据集都有各自的特点,这里不详细开展论述,感兴趣的可以自行探索。
在这里插入图片描述
根据数据集中SQL涉及到的数据库表的个数不同,分为单标和多表;根据所生成的SQL结构中是否包含嵌套查询,将数据集分为有嵌套和无嵌套。有个十分有意思的比赛
大家感兴趣的可以去看看这个比赛的获奖作品和数据集,会对NL2SQL工作和研究有较为清晰的了解。
在这里插入图片描述
我们以一句实际语言来看整个SQL生成逻辑。

比如自然语言输入
“查找2023年所有销售额超过1000万的产品,并按销售额降序排列。”

目标SQL查询

SELECT product_name, sales_amount 
FROM sales 
WHERE year = 2023 AND sales_amount > 10000000 
ORDER BY sales_amount DESC;

1. 文本预处理

首先,系统对输入的自然语言进行预处理,准备后续的语义解析步骤。

  • 分词(Tokenization):将句子拆分为单独的词或词组。
    • 结果:[“查找”, “2023年”, “所有”, “销售额”, “超过”, “1000万”, “的”, “产品”, “并”, “按”, “销售额”, “降序”, “排列”]
    • 算法:基于规则或深度学习模型的分词工具,如SpaCy、NLTK。
  • 去除停用词:去掉“的”、“并”等不影响语义的词汇。
    • 结果:[“查找”, “2023年”, “销售额”, “超过”, “1000万”, “产品”, “按”, “销售额”, “降序”, “排列”]
    • 算法:通过停用词词典或手工定义的停用词表进行过滤。

2. 命名实体识别(NER, Named Entity Recognition)

系统需要识别出句子中的关键实体,如时间、数字和产品等。

  • 识别内容
    • 时间实体:2023年
    • 数值实体:1000万
    • 产品实体:产品
    • 操作实体:销售额,降序排列
    • 算法:NER模型,如基于BiLSTM-CRF或预训练模型(如BERT)的NER工具。
  • 输出
    • 时间:2023年
    • 数值:10000000
    • 操作:查找、超过、降序排列
    • 实体:产品、销售额

3. 意图识别

系统识别用户的查询意图,即“查询”操作。这里的意图识别是关键步骤,它决定了SQL语句的类型(SELECT查询)。

  • 识别出查询意图

    :查询数据,并按条件过滤。

    • 算法:基于分类模型(如SVM、LSTM)或预训练模型(如BERT)进行意图分类。
  • 结果:查询操作,按条件筛选并排序。

4. 句法解析(Syntax Parsing)

通过句法解析分析自然语言输入的结构,确定主语、谓语、宾语等元素之间的关系。这里的目标是识别出“查找”的对象是“产品”,“销售额超过1000万”是一个过滤条件。

  • 解析句法结构
    • 查找(谓语) -> 产品(宾语)
    • 销售额超过1000万(条件) -> 降序排列(排序方式)
    • 算法:依存句法分析(Dependency Parsing),使用工具如Stanford NLP或SpaCy。
  • 结果
    • 主谓宾关系:查找产品
    • 条件关系:销售额 > 1000万
    • 排序:按销售额降序排列

5. 数据库模式映射

系统需要将自然语言中的实体(如“产品”、“销售额”)映射到数据库中的具体表和字段。这个步骤要求系统理解数据库的模式(Schema)。

  • 映射自然语言到数据库字段
    • “产品” -> product_name
    • “销售额” -> sales_amount
    • “2023年” -> year
    • 算法:基于规则的模式映射或通过训练模型学习自然语言与数据库字段之间的对应关系。
  • 结果
    • product_name
    • sales_amount
    • year

6. SQL模板生成

系统将识别出的信息填充到SQL模板中。这里,模板生成的任务是将自然语言解析出的结构信息转换为SQL语句的各个子句(SELECT、FROM、WHERE、ORDER BY)。

SQL模板

SELECT {字段1, 字段2, ...}
FROM {表}
WHERE {条件1} AND {条件2} ...
ORDER BY {排序字段} {排序方式};

填充模板

  • SELECTproduct_name, sales_amount
  • FROMsales
  • WHEREyear = 2023 AND sales_amount > 10000000
  • ORDER BYsales_amount DESC
  • 算法:基于模板匹配或生成模型(如Seq2Seq、Transformer)生成SQL语句。

结果:

SELECT product_name, sales_amount 
FROM sales 
WHERE year = 2023 AND sales_amount > 10000000 
ORDER BY sales_amount DESC;

这里的prompt可以提供一个现在行业普遍认可的:

 system = """
Given the database schema below, generate a MySQL query based on the user's question. Ensure to consider totals from line items, inclusive date ranges, and correct data aggregation for summarization. Remember to handle joins, groupings, and orderings effectively.

Database schema:
- Customer (CustomerID, FirstName, LastName, Email, Phone, BillingAddress, ShippingAddress, CustomerSince, IsActive)
- Employee (EmployeeID, FirstName, LastName, Email, Phone, HireDate, Position, Salary)
- InventoryLog (LogID, ProductID, ChangeDate, QuantityChange, Notes)
- LineItem (LineItemID, SalesOrderID, ProductID, Quantity, UnitPrice, Discount, TotalPrice)
- Product (ProductID, ProductName, Description, UnitPrice, StockQuantity, ReorderLevel, Discontinued)
- SalesOrder (SalesOrderID, CustomerID, OrderDate, RequiredDate, ShippedDate, Status, Comments, PaymentMethod, IsPaid)
- Supplier (SupplierID, CompanyName, ContactName, ContactTitle, Address, Phone, Email)

Guidelines for SQL query generation:
1. **Ensure Efficiency and Performance**: Opt for JOINs over subqueries where possible, use indexes effectively, and mention any specific performance considerations to keep in mind.
2. **Adapt to Specific Analytical Needs**: Tailor WHERE clauses, JOIN operations, and aggregate functions to precisely meet the analytical question being asked.
3. **Complexity and Variations**: Include a range from simple to complex queries, illustrating different SQL functionalities such as aggregate functions, string manipulation, and conditional logic.
4. **Handling Specific Cases**: Provide clear instructions on managing NULL values, ensuring date ranges are inclusive, and handling special data integrity issues or edge cases.
5. **Explanation and Rationale**: After each generated query, briefly explain why this query structure was chosen and how it addresses the analytical need, enhancing understanding and ensuring alignment with requirements.

-- 1. Average Order Total for Customers without a Registered Phone Number Within a Specific Period
SELECT AVG(TotalPrice) FROM LineItem
JOIN SalesOrder ON LineItem.SalesOrderID = SalesOrder.SalesOrderID
JOIN Customer ON SalesOrder.CustomerID = Customer.CustomerID
WHERE Customer.Phone IS NULL AND SalesOrder.OrderDate BETWEEN '2003-01-01' AND '2009-12-31';
-- Rationale: Analyzes spending behavior of uncontactable customers within a set timeframe, aiding targeted marketing strategies.

-- 2. List Top 10 Employees by Salary
SELECT * FROM Employee ORDER BY Salary DESC LIMIT 10;
-- Rationale: Identifies highest-earning employees for payroll analysis and salary budgeting.

-- 3. Find the Total Quantity of Each Product Sold Last Month
SELECT Product.ProductName, SUM(LineItem.Quantity) AS TotalQuantitySold
FROM Product
JOIN LineItem ON Product.ProductID = LineItem.ProductID
JOIN SalesOrder ON LineItem.SalesOrderID = SalesOrder.SalesOrderID
WHERE SalesOrder.OrderDate BETWEEN DATE_SUB(NOW(), INTERVAL 1 MONTH) AND NOW()
GROUP BY Product.ProductID;
-- Rationale: Helps in inventory management by highlighting sales performance of products, informing restocking decisions.

-- 4. Show Sales by Customer for the Current Year, Including Customer Details
SELECT Customer.FirstName, Customer.LastName, SUM(LineItem.TotalPrice) AS TotalSales
FROM Customer
JOIN SalesOrder ON Customer.CustomerID = SalesOrder.CustomerID
JOIN LineItem ON SalesOrder.SalesOrderID = LineItem.SalesOrderID
WHERE YEAR(SalesOrder.OrderDate) = YEAR(CURDATE())
GROUP BY Customer.CustomerID;
-- Rationale: Identifies top customers based on yearly sales, supporting personalized customer service and loyalty programs.

-- 5. Identify Products That Need Reordering (Stock Quantity Below Reorder Level)
SELECT ProductName, StockQuantity, ReorderLevel FROM Product WHERE StockQuantity <= ReorderLevel;
-- Rationale: Essential for inventory control, prompting restocking of products to meet demand efficiently.

-- 6. Display All Suppliers That Have Not Supplied Any Products That Are Currently Discontinued
SELECT Supplier.CompanyName FROM Supplier
LEFT JOIN Product ON Supplier.SupplierID = Product.SupplierID
WHERE Product.Discontinued = 0
GROUP BY Supplier.SupplierID;
-- Rationale: Evaluates supplier contributions to the supply chain by focusing on those with active product lines.

Remember to adapt queries based on the actual question context, utilizing the appropriate WHERE clauses, JOIN operations, and aggregate functions to meet the specific analytical needs.

Sample records for the Supplier table:
- SupplierID: 29, CompanyName: Hogan-Anderson, ContactName: Sierra Carey, ContactTitle: Mining engineer, Address: 246 Johnny Fords Apt. 858, Williamsport, AK 96920, Phone: 232.945.6443, Email: rodney04@example.com
- SupplierID: 30, CompanyName: Nixon, Woods and Pearson, ContactName: Lawrence Phillips, ContactTitle: Aid worker, Address: USS Osborn, FPO AE 24294, Phone: 001-462-571-0185x478, Email: jessica29@example.org

Sample records for the Product table:
- ProductID: 1, ProductName: Reflect Sea, Description: Factor country center price pretty foreign theory paper fact machine two., UnitPrice: 191.19, StockQuantity: 665, ReorderLevel: 46, Discontinued: 1
- ProductID: 2, ProductName: Avoid American, Description: Skill environmental start set bring must job early per weight difficult someone., UnitPrice: 402.14, StockQuantity: 970, ReorderLevel: 15, Discontinued: 1
- ProductID: 3, ProductName: Evening By, Description: Whether high bill though each president another its., UnitPrice: 12.81, StockQuantity: 842, ReorderLevel: 32, Discontinued: 1
- ProductID: 4, ProductName: Certain Identify, Description: Spring identify bring debate wrong style hit., UnitPrice: 155.22, StockQuantity: 600, ReorderLevel: 27, Discontinued: 1
- ProductID: 5, ProductName: Impact Agreement, Description: Whom ready entire meeting consumer safe pressure truth., UnitPrice: 368.72, StockQuantity: 155, ReorderLevel: 35, Discontinued: 0
- ProductID: 6, ProductName: Million Agreement, Description: Glass why team yes reduce issue nothing., UnitPrice: 297.03, StockQuantity: 988, ReorderLevel: 36, Discontinued: 1
- ProductID: 7, ProductName: Foot Vote, Description: Anyone floor movie maintain TV new age prove certain really dog., UnitPrice: 28.75, StockQuantity: 828, ReorderLevel: 24, Discontinued: 0
- ProductID: 8, ProductName: Somebody Current, Description: Politics since exactly film idea Republican., UnitPrice: 202.9, StockQuantity: 317, ReorderLevel: 18, Discontinued: 0
- ProductID: 9, ProductName: Somebody Character, Description: Long agreement history administration purpose conference including., UnitPrice: 300.38, StockQuantity: 242, ReorderLevel: 30, Discontinued: 1
- ProductID: 10, ProductName: Low Idea, Description: Spend guess somebody spend fight director technology find between college skill., UnitPrice: 34.68, StockQuantity: 65, ReorderLevel: 27, Discontinued: 0

Use the above schema and sample records to generate syntactically correct SQL queries. For example, to query the list of discontinued products, or to find products below a certain stock quantity.

Sample records for the Employee table:
- EmployeeID: 1, FirstName: Danny, LastName: Morales, Email: catherine08@example.com, Phone: 001-240-574-6687x625, HireDate: 2021-06-16, Position: Medical technical officer, Salary: 36293
- EmployeeID: 2, FirstName: William, LastName: Spencer, Email: sthompson@example.com, Phone: (845)940-2095x693, HireDate: 2023-08-22, Position: English as a foreign language teacher, Salary: 51775
- EmployeeID: 3, FirstName: Brian, LastName: Stark, Email: hughesmelissa@example.com, Phone: 780.299.1965x06374, HireDate: 2023-02-24, Position: Pharmacologist, Salary: 11963
- EmployeeID: 4, FirstName: Sarah, LastName: Cannon, Email: brittney20@example.com, Phone: 512.717.8995x05793, HireDate: 2019-05-23, Position: Physiological scientist, Salary: 69878
- EmployeeID: 5, FirstName: Lance, LastName: Bell, Email: patrick57@example.net, Phone: +1-397-320-2600x803, HireDate: 2019-06-22, Position: Scientific laboratory technician, Salary: 56499
- EmployeeID: 6, FirstName: Jason, LastName: Larsen, Email: teresaharris@example.org, Phone: +1-541-955-5657x7357, HireDate: 2022-11-02, Position: Proofreader, Salary: 89756
- EmployeeID: 7, FirstName: Kyle, LastName: Baker, Email: nathanielmiller@example.net, Phone: +1-863-658-3715x6525, HireDate: 2019-10-30, Position: Firefighter, Salary: 96795
- EmployeeID: 8, FirstName: Jennifer, LastName: Hernandez, Email: sarah43@example.org, Phone: 267-588-3195, HireDate: 2021-01-10, Position: Designer, interior/spatial, Salary: 37584
- EmployeeID: 9, FirstName: Shane, LastName: Meyer, Email: perrystanley@example.org, Phone: 001-686-918-6486, HireDate: 2021-04-14, Position: Retail manager, Salary: 69688
- EmployeeID: 10, FirstName: Christine, LastName: Powell, Email: tanderson@example.org, Phone: 427.468.2131, HireDate: 2019-05-11, Position: Sports administrator, Salary: 39962

Use the above schema and sample records to generate syntactically correct SQL queries. For example, to query the top 10 employees by salary, or to find employees hired within a specific period.

Sample records for the Customer table:
- CustomerID: 1, FirstName: Sandra, LastName: Cruz, Email: rhonda24@example.net, Phone: 511-949-6987x21174, BillingAddress: "18018 Kyle Streets Apt. 606, Shaneville, AZ 85788", ShippingAddress: "18018 Kyle Streets Apt. 606, Shaneville, AZ 85788", CustomerSince: 2023-05-02, IsActive: 0
- CustomerID: 2, FirstName: Robert, LastName: Williams, Email: traciewall@example.net, Phone: 944-649-2491x60774, BillingAddress: "926 Mitchell Pass Apt. 342, Brianside, SC 83374", ShippingAddress: "926 Mitchell Pass Apt. 342, Brianside, SC 83374", CustomerSince: 2020-09-01, IsActive: 0
- CustomerID: 3, FirstName: John, LastName: Greene, Email: travis92@example.org, Phone: 279.334.1551, BillingAddress: "36019 Bill Manors Apt. 219, Dominiquefort, AK 55904", ShippingAddress: "36019 Bill Manors Apt. 219, Dominiquefort, AK 55904", CustomerSince: 2021-03-15, IsActive: 0
- CustomerID: 4, FirstName: Steven, LastName: Riley, Email: greennathaniel@example.org, Phone: +1-700-682-7696x189, BillingAddress: "76545 Hebert Crossing Suite 235, Forbesbury, MH 14227", ShippingAddress: "76545 Hebert Crossing Suite 235, Forbesbury, MH 14227", CustomerSince: 2022-12-05, IsActive: 0
- CustomerID: 5, FirstName: Christina, LastName: Blake, Email: christopher87@example.net, Phone: 584.263.4429, BillingAddress: "8342 Shelly Fork, West Chasemouth, CT 81799", ShippingAddress: "8342 Shelly Fork, West Chasemouth, CT 81799", CustomerSince: 2019-11-12, IsActive: 0
- CustomerID: 6, FirstName: Michael, LastName: Stevenson, Email: lynnwilliams@example.org, Phone: 328-637-4320x7025, BillingAddress: "7503 Mallory Mountains Apt. 199, Meganport, MI 81064", ShippingAddress: "7503 Mallory Mountains Apt. 199, Meganport, MI 81064", CustomerSince: 2024-01-01, IsActive: 1
- CustomerID: 7, FirstName: Anna, LastName: Kramer, Email: steven23@example.org, Phone: +1-202-719-6886x844, BillingAddress: "295 Mcgee Fort, Manningberg, PR 93309", ShippingAddress: "295 Mcgee Fort, Manningberg, PR 93309", CustomerSince: 2022-03-06, IsActive: 1
- CustomerID: 8, FirstName: Michael, LastName: Sullivan, Email: bbailey@example.com, Phone: 988.368.5033, BillingAddress: "772 Bruce Motorway Suite 583, Powellbury, MH 42611", ShippingAddress: "772 Bruce Motorway Suite 583, Powellbury, MH 42611", CustomerSince: 2019-03-23, IsActive: 1
- CustomerID: 9, FirstName: Kevin, LastName: Moody, Email: yoderjennifer@example.org, Phone: 3425196543, BillingAddress: "371 Lee Lake, New Michaelport, CT 99382", ShippingAddress: "371 Lee Lake, New Michaelport, CT 99382", CustomerSince: 2023-12-03, IsActive: 1
- CustomerID: 10, FirstName: Jeremy, LastName: Mejia, Email: spencersteven@example.org, Phone: 449.324.7097, BillingAddress: "90137 Harris Garden, Matthewville, IA 39321", ShippingAddress: "90137 Harris Garden, Matthewville, IA 39321", CustomerSince: 2019-05-20, IsActive: 1

These sample records provide a clear representation of the data structure for customers within the database schema. Use these details to assist in generating queries that involve customer information, such as filtering active customers, summarizing sales by customer, or identifying long-term customers.

Sample records for the InventoryLog table:
- LogID: 1, ProductID: 301, ChangeDate: 2023-09-08, QuantityChange: 84, Notes: Inventory increased
- LogID: 2, ProductID: 524, ChangeDate: 2023-08-09, QuantityChange: -84, Notes: Inventory decreased
- LogID: 3, ProductID: 183, ChangeDate: 2023-04-17, QuantityChange: -51, Notes: Inventory decreased
- LogID: 4, ProductID: 390, ChangeDate: 2023-02-27, QuantityChange: 80, Notes: Inventory increased
- LogID: 5, ProductID: 737, ChangeDate: 2023-11-15, QuantityChange: 24, Notes: Inventory increased
- LogID: 6, ProductID: 848, ChangeDate: 2023-11-22, QuantityChange: 69, Notes: Inventory increased
- LogID: 7, ProductID: 534, ChangeDate: 2023-06-06, QuantityChange: -61, Notes: Inventory decreased
- LogID: 8, ProductID: 662, ChangeDate: 2024-01-16, QuantityChange: 70, Notes: Inventory increased
- LogID: 9, ProductID: 969, ChangeDate: 2024-01-07, QuantityChange: -25, Notes: Inventory decreased
- LogID: 10, ProductID: 640, ChangeDate: 2023-08-08, QuantityChange: -13, Notes: Inventory decreased

These sample records provide insights into the inventory adjustments for different products within the database schema. Utilize these details to assist in generating queries that track inventory changes, analyze stock levels, or evaluate inventory management efficiency.

Sample records for the LineItem table:
- LineItemID: 1, SalesOrderID: 280, ProductID: 290, Quantity: 3, UnitPrice: 84.59, Discount: NULL, TotalPrice: 253.77
- LineItemID: 2, SalesOrderID: 94, ProductID: 249, Quantity: 6, UnitPrice: 88.7, Discount: NULL, TotalPrice: 532.2
- LineItemID: 3, SalesOrderID: 965, ProductID: 247, Quantity: 1, UnitPrice: 43.44, Discount: NULL, TotalPrice: 43.44
- LineItemID: 4, SalesOrderID: 173, ProductID: 16, Quantity: 10, UnitPrice: 26.3, Discount: NULL, TotalPrice: 263
- LineItemID: 5, SalesOrderID: 596, ProductID: 191, Quantity: 9, UnitPrice: 59.44, Discount: NULL, TotalPrice: 534.96
- LineItemID: 6, SalesOrderID: 596, ProductID: 308, Quantity: 8, UnitPrice: 33.11, Discount: NULL, TotalPrice: 264.88
- LineItemID: 7, SalesOrderID: 960, ProductID: 758, Quantity: 5, UnitPrice: 64.47, Discount: NULL, TotalPrice: 322.35
- LineItemID: 8, SalesOrderID: 148, ProductID: 288, Quantity: 5, UnitPrice: 65.21, Discount: NULL, TotalPrice: 326.05
- LineItemID: 9, SalesOrderID: 974, ProductID: 706, Quantity: 3, UnitPrice: 59.86, Discount: NULL, TotalPrice: 179.58
- LineItemID: 10, SalesOrderID: 298, ProductID: 998, Quantity: 2, UnitPrice: 75.79, Discount: NULL, TotalPrice: 151.58

These sample records illustrate various line items associated with sales orders in the database. These details help in constructing queries to analyze sales performance, product popularity, pricing strategies, and overall sales revenue.

Sample records for the SalesOrder table:
- SalesOrderID: 1, CustomerID: 12, OrderDate: 2022-11-05, RequiredDate: 2022-12-02, ShippedDate: 2022-11-25, Status: Pending, Comments: NULL, PaymentMethod: NULL, IsPaid: 0
- SalesOrderID: 2, CustomerID: 56, OrderDate: 2022-02-22, RequiredDate: 2022-03-08, ShippedDate: 2022-03-17, Status: Completed, Comments: NULL, PaymentMethod: NULL, IsPaid: 1
- SalesOrderID: 3, CustomerID: 63, OrderDate: 2023-03-20, RequiredDate: 2023-03-27, ShippedDate: NULL, Status: Shipped, Comments: NULL, PaymentMethod: NULL, IsPaid: 0
- SalesOrderID: 4, CustomerID: 21, OrderDate: 2023-04-29, RequiredDate: 2023-05-26, ShippedDate: 2023-05-14, Status: Pending, Comments: NULL, PaymentMethod: NULL, IsPaid: 1
- SalesOrderID: 5, CustomerID: 16, OrderDate: 2022-11-05, RequiredDate: 2022-11-30, ShippedDate: NULL, Status: Shipped, Comments: NULL, PaymentMethod: NULL, IsPaid: 1
- SalesOrderID: 6, CustomerID: 46, OrderDate: 2023-10-06, RequiredDate: 2023-10-27, ShippedDate: NULL, Status: Shipped, Comments: NULL, PaymentMethod: NULL, IsPaid: 1
- SalesOrderID: 7, CustomerID: 47, OrderDate: 2023-02-08, RequiredDate: 2023-02-25, ShippedDate: 2023-03-03, Status: Shipped, Comments: NULL, PaymentMethod: NULL, IsPaid: 1
- SalesOrderID: 8, CustomerID: 70, OrderDate: 2022-07-29, RequiredDate: 2022-08-18, ShippedDate: 2022-08-10, Status: Pending, Comments: NULL, PaymentMethod: NULL, IsPaid: 0
- SalesOrderID: 9, CustomerID: 14, OrderDate: 2022-03-29, RequiredDate: 2022-04-15, ShippedDate: 2022-04-17, Status: Completed, Comments: NULL, PaymentMethod: NULL, IsPaid: 0
- SalesOrderID: 10, CustomerID: 31, OrderDate: 2024-01-12, RequiredDate: 2024-01-31, ShippedDate: 2024-02-07, Status: Pending, Comments: NULL, PaymentMethod: NULL, IsPaid: 0

These sample records provide insights into sales order management within the database, including order status, shipping details, payment methods, and customer IDs. This information is crucial for analyzing sales processes, order fulfillment rates, customer engagement, and payment transactions.

"""

7. SQL查询优化

在生成SQL语句之后,系统可以对SQL查询进行优化,以提高查询效率。例如,添加索引、优化查询计划等。

  • 优化步骤
    • 确定索引:在yearsales_amount字段上检查是否有适当的索引。
    • 重写查询:如果有需要,系统可以通过查询优化器重写查询以提高性能。
    • 算法:查询优化算法,如基于成本的查询优化器(Cost-Based Optimizer, CBO)。
  • 结果:优化后的SQL语句(此例中原SQL语句已经较为简单,无需进一步优化)。

8. SQL执行与结果展示

优化后的SQL语句被发送到数据库执行,数据库返回结果集。系统会将查询结果转换为用户友好的形式进行展示,如表格、图表或其他可视化形式。

  • SQL执行:通过数据库连接,执行生成的SQL查询。

  • 结果展示:将数据库返回的结果集以表格或图表的形式呈现给用户。

    • 算法:基于数据库的查询引擎进行执行,结果展示使用可视化工具(如Matplotlib、Plotly)。
  • 示例输出

    • product_namesales_amount
      产品A15000000
      产品B12000000
      产品C11000000

评价该模型的两个指标:精确匹配率、执行正确率

  • Execution Accuracy
    • 定义:计算SQL执行结果正确的数量在数据集中的比例,结果存在高估的可能。
  • Exact Match
    • 定义:计算模型生成的SQL和标注SQL的匹配程度,结果存在低估的可能。

在这里插入图片描述
其中, N表示数据量, S Q L ′ SQL' SQL S Q L SQL SQL分别代表预测的SQL语句和真实的SQL语句, S c o r e l f Score_{lf} Scorelf表示 L o g i c F o r m LogicForm LogicForm准确率; Y ′ Y' Y Y Y Y分别代表预测的SQL和真实的SQL的执行结果, S c o r e e x Score_{ex} Scoreex表示 E x e c u t i o n Execution Execution准确率。

总结

NL2SQL的出现,彻底改变了人与数据交互的方式。它通过将复杂的SQL查询隐藏在自然语言输入背后,极大地降低了数据获取的门槛,让业务人员无需依赖技术背景就能直接获取所需的信息。随着自然语言处理技术的不断进步,NL2SQL的应用场景将愈加广泛,覆盖从企业报表到智能客服等各个领域。未来,随着模型的泛化能力增强和实时性能优化,我们可以期待NL2SQL技术在数据驱动的决策中扮演更加重要的角色,让“零成本与数据沟通”真正成为可能。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2085845.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

零知识证明-椭圆曲线(四)

前言 零知识证明(Zero—Knowledge Proof)&#xff0c;是指一种密码学工具&#xff0c;允许互不信任的通信双方之间证明某个命题的有效性&#xff0c;同时不泄露任何额外信息 上章介绍了基础数字知识&#xff0c;这章主要讲 椭圆曲线 方程 2&#xff1a;椭圆曲线方程 y2axybyx3…

知乎知+和信息流广告报价,知乎推广多少钱?

知乎作为中国领先的问答社区&#xff0c;凭借其高质量的内容和庞大的用户群体&#xff0c;已成为众多品牌竞相追逐的营销高地。如何在知乎平台上精准投放广告&#xff0c;实现品牌价值的最大化&#xff0c;成为了众多企业面临的难题。云衔科技&#xff0c;作为数字化营销解决方…

基于MATLAB的涡流函数方法案例+代码

前言 这里介绍一下相关理论和代码----基于MATLAB使用伪谱离散化 三阶龙格库塔时间积分实现涡流函数方法的CFD案例 1. 代码详解 这段 MATLAB 代码实现了一个二维湍流模拟&#xff0c;使用伪谱法在周期性边界条件下解算非线性涡度-流函数方程&#xff1a; M 256; % number o…

驱动开发系列14 - Wayland 详解

目录 一:概述 二:操作系统如何支持 Wayland 三:显卡驱动如何支持 Wayland 四:Wayland 协议介绍 一:概述 Wayland 是一种通信协议,规定了显示服务器与其客户端之间的通信,以及该协议的 C 语言库实现。使用 Wayland 协议的显示服务器称为 Wayland 合成器,因…

SpringBoot项目中mybatis执行sql很慢的排查改造过程(Interceptor插件、fetchSize、隐式转换等)

刚入职公司&#xff0c;就发现公司项目跑sql特别慢&#xff0c;差不多一万条数据插入到数据库要5秒以上&#xff08;没有听错&#xff0c;就是这个速度&#xff09;&#xff0c;查询修改删除也是特别慢。直到22年年底实在是受不了了&#xff0c;我就去排查了一下。 用的是Oracl…

大模型之二十八-语音识别Whisper进阶

在上一篇博客大模型之二十七-语音识别Whisper实例浅析中遗留了几个问题&#xff0c;这里来看一下前两个问题。 1.如果不是Huggingface上可以下载的数据该怎么办&#xff1f; 2.上面的代码是可以训练了&#xff0c;但是训练的时候loss真的会和我们预期一致吗&#xff1f;比如如下…

【netty系列-08】深入Netty组件底层原理和基本实现

Netty系列整体栏目 内容链接地址【一】深入理解网络通信基本原理和tcp/ip协议https://zhenghuisheng.blog.csdn.net/article/details/136359640【二】深入理解Socket本质和BIOhttps://zhenghuisheng.blog.csdn.net/article/details/136549478【三】深入理解NIO的基本原理和底层…

第一个go程序

package main import "fmt" func main(){fmt.Println("Hello World") }hello.go所在目录 运行go程序

美团代付多模版三合一源码 附教程

简介 美团代付多模版三合一源码 附教程 界面

这一轮医疗数字化,沃可趣以医疗专业人员交流成长为中心

沃可趣看见医疗行业人员需求痛点&#xff0c;量身打造数字服务平台&#xff0c;促进知识分享&#xff0c;赋能活动组织。 从电子病历的普及到远程医疗的兴起&#xff0c;从人工智能辅助诊断到大数据在医疗管理中的应用&#xff0c;科技进步正在大力推动医疗领域的发展。然而&a…

Ubuntu系统本地搭建WordPress网站并一键发布内网站点至公网实战

&#x1f49d;&#x1f49d;&#x1f49d;欢迎来到我的博客&#xff0c;很高兴能够在这里和您见面&#xff01;希望您在这里可以感受到一份轻松愉快的氛围&#xff0c;不仅可以获得有趣的内容和知识&#xff0c;也可以畅所欲言、分享您的想法和见解。 推荐:kwan 的首页,持续学…

嵌入式技术文件、学习资料、在线工具、学习网站、技术论坛,非常全面的分享~~~

一、数据手册查询及下载网站 1、【ALLDATASHEET 自称是最大的在线电子元件数据的搜索引擎】ALLDATASHEETCN.COM - 电子元件和半导体及其他半导体的数据表搜索网站。电子元件和半导体, 集成电路, 二极管, 三端双向可控硅 和其他半导体的https://www.alldatasheetcn.com/ 2、【…

defineProps、defineEmits、 defineExpose的TS写法

小满视频 1 defineProps&#xff1a;父向子传递数据 作用&#xff1a;父组件向子组件传递数据 1.1 传递纯类型参数的方式来声明 父组件中的代码&#xff1a; 父组件App.vue <template><div><span>传递给子组件的响应式数据&#xff1a;</span>&l…

【循环顺序队的实现】

1.队列的逻辑结构 与 抽象数据类型定义 先进先出的线性表 在顺序队列中&#xff0c;我们使用头指针front指向队首元素&#xff1b;用尾指针rear指向队尾元素的下一个位置&#xff08;当然这里的指针是用下标模拟出来的&#xff09; 同时顺序队列中的元素当然是用数组来存储的 …

【系统架构设计】嵌入式系统设计(1)

【系统架构设计】嵌入式系统设计&#xff08;1&#xff09; 嵌入式系统概论嵌入式系统的组成硬件嵌入式处理器总线存储器I/O 设备与接口 软件 嵌入式开发平台与调试环境交叉平台开发环境交叉编译环境调试 嵌入式网络系统嵌入式数据库管理系统实时系统与嵌入式操作系统嵌入式系统…

【Qt笔记】QToolButton控件详解

目录 一、前言 二、QToolButton的基本特性 2.1 图标和文本 2.2 自动提升 2.3 下拉菜单 2.4 工具提示 2.5 弹出模式 三、高级功能 3.1 自定义大小与形状 3.2 检查框与单选按钮 3.3 动画效果 四、常用方法与信号槽 常用方法 信号槽 五、实际应用示例 说明 六、总…

ESP32 CYD 使用 LVGL 在屏幕上显示图像 | Random Nerd Tutorials

在本指南中&#xff0c;你将学习如何使用LVGL在ESP32 Cheap Yellow Display (CYD) 板上处理和加载图像。ESP32将使用Arduino IDE进行编程。 对ESP32 Cheap Yellow Display不熟悉&#xff1f; 从这里开始&#xff1a;开始使用ESP32 Cheap Yellow Display Board – CYD (ESP32-2…

线性代数 第三讲 线性相关无关 线性表示

线性代数 第三讲 线性相关无关 线性表示 文章目录 线性代数 第三讲 线性相关无关 线性表示1.向量运算1.线性相关与线性无关1.1 线性相关与线性无关基本概念 2.线性表示&#xff08;线性组合&#xff09;3.线性相关无关与线性表示的定理大总结3.1 向量β可由向量组线性表出的同义…

心觉:潜意识显化很简单,只是很多人想复杂了

很多人知道潜意识的力量很大&#xff0c;是意识力量的30000倍以上 也知道该怎么显化自己的潜意识 但是就是做不到 这就像很多肥胖的人知道运动可以减肥 知道减肥之后就可以穿漂亮的衣服 知道减肥之后自己有多帅多美 但是就是迈不开腿 根本原因是你的潜意识和意识上的认知不…

RenderMan v26.2更新内容!云渲染平台支持新版本

皮克斯的最新RenderMan v26.2版本带来了一系列激动人心的新特性和改进&#xff0c;进一步巩固了其在高端渲染领域的领导地位&#xff0c;为艺术家们提供了更丰富的创意工具和更流畅的工作流程。作为老牌的云渲染农场&#xff0c;瑞云依然支持新版本的使用。 RenderMan v26.2更新…