概要
我们在使用SQL Server生成报表统计数据的时候,经常使用sum,count等聚合函数,有时候还会配合case when语句一边执行过滤操作,一边进行聚合。
本文介绍一个在使用中的常见的错误和解决方法。
设计和实现
下面是一张订单记录表,我们要统计2019年,2020年和2021年每个客户的的订单金额总数和订单数量。
我们使用常见的聚合函数sum和count 实现统计功能:
SELECT customer_id,
customer_name,
sum(case
WHEN format(billing_creation_date, 'yyyy') = '2019' THEN
billing_amount
ELSE 0
END ) AS amount_2019, sum(case
WHEN format(billing_creation_date, 'yyyy') = '2020' THEN
billing_amount
ELSE 0
END ) AS amount_2020, sum(case
WHEN format(billing_creation_date, 'yyyy') = '2021' THEN
billing_amount
ELSE 0
END ) AS amount_2021, count(case
WHEN format(billing_creation_date, 'yyyy') = '2019' THEN
billing_amount
ELSE 0
END ) AS cnt_2019, count(case
WHEN format(billing_creation_date, 'yyyy') = '2020' THEN
billing_amount
ELSE 0
END ) AS cnt_2020, count(case
WHEN format(billing_creation_date, 'yyyy') = '2020' THEN
billing_amount
ELSE 0
END ) AS cnt_2021
FROM billings
GROUP BY customer_id, customer_name
代码执行结果如下:
显然,订单个数的统计是错误的。
我们以2019年为例,在进行sum操作时候,如果是2019年的数据,则进行累加;如果不是2019年,也进行累加,但是累加数为0, 所以结果正确。
但是我们计算订单个数时候,如果是2019年的数据,则该条记录符合要求,返回一个数字,订单数加1;如果不是2019年,返回0, 0也是一个数字,所以订单数也会加1,只要有返回值,count就会自动加1。
解决方法就是如果是非2019年的记录,不再返回任何数字或字符串,只返回null。这样count在统计时候,就不会自动加1。
正确代码如下:
SELECT customer_id,
customer_name,
sum(case
WHEN format(billing_creation_date, 'yyyy') = '2019' THEN
billing_amount
ELSE 0
END ) AS amount_2019, sum(case
WHEN format(billing_creation_date, 'yyyy') = '2020' THEN
billing_amount
ELSE 0
END ) AS amount_2020, sum(case
WHEN format(billing_creation_date, 'yyyy') = '2021' THEN
billing_amount
ELSE 0
END ) AS amount_2021, count(case
WHEN format(billing_creation_date, 'yyyy') = '2019' THEN
billing_amount
ELSE null
END ) AS cnt_2019, count(case
WHEN format(billing_creation_date, 'yyyy') = '2020' THEN
billing_amount
ELSE null
END ) AS cnt_2020, count(case
WHEN format(billing_creation_date, 'yyyy') = '2020' THEN
billing_amount
ELSE null
END ) AS cnt_2021
FROM billings
GROUP BY customer_id, customer_name
附录
建表语句:
if OBJECT_ID('billings','U') is not null
drop table billings
create table billings(
id tinyint primary key identity(1,1),
customer_id tinyint not null,
customer_name nvarchar(10) not null,
billing_id char(3) not null,
billing_creation_date date not null,
billing_amount int not null
)
insert into billings (customer_id, customer_name, billing_id, billing_creation_date, billing_amount) values
(1, 'A', 'k11', '10-10-2020', 100),
(1, 'A', 'k12', '11-11-2020', 150),
(1, 'A', 'k13', '12-11-2021', 100),
(2, 'B', 'k34', '10-11-2019', 150),
(2, 'B', 'k35', '11-11-2020', 200),
(2, 'B', 'k36', '12-11-2021', 250),
(3, 'C', 'k47', '01-01-2018', 100),
(3, 'C', 'k48', '05-01-2019', 250),
(3, 'C', 'k49', '06-01-2021', 300)