hiveSql 京东面试题-有效值问题
- 需求
- 准备数据
- 分析
- 实现
- 最后
需求
有入库成本表,一个商品每次入库就会产生一条数据,里面包含商品id,入库时间time,以及入库采购的成本。但由于某些某些原因,导致表中某些数据的成本是有丢失的。
现在的逻辑是:当成本丢失时,有两种取成本的方式,现在需要把两种成本都取出来,最后取2次成本的平均值作为本次入库的成本。取数逻辑如下:
- 1、取同一个商品最近一次之前入库的有效成本,即丢失成本商品的丢失成本当前数据的前一条有效成本数据
- 2、取同一个商品最近一次之后入库的有效成本,即丢失成本商品的丢失成本当前数据的后一条有效成本数据
- 3、上述中结果依然有无效值时,记为0
具体数据如下:
可见截图中商品id为2的商品在2022-12-02号和2022-12-03号的入库成本丢失,按照上述取数逻辑,会生成两个新的字段last_cost、next_cost。其中
last_cost是当前丢失成本数据的前一条有效成本数据;
next_cost是当前丢失成本数据的后一条有效成本数据。
还是看商品id为2的数据,在2022-12-02号这条丢失成本数据中:
它的last_cost是商品id同样是2,且它的上一条有效成本数据,即2022-12-01的150,
它的next_cost是商品id同样是2,且它的下一条有效成本数据,即2022-12-04的200。
即上截图中第一条填充色为红色的数据行。
同理id为2的2022-12-03号数据也是它的上一行有效成本 和 它的下一条有效成本。
最后一条商品id为4的2022-12-05号丢失成本数据中next_cost为0,因为它没有下一条有效成本。(即上述逻辑3)
准备数据
select '1' as id, '2022-12-01' as itime, 120 as price
union all
select '2' as id, '2022-12-01' as itime, 150 as price
union all
select '2' as id, '2022-12-02' as itime, null as price
union all
select '2' as id, '2022-12-03' as itime, null as price
union all
select '2' as id, '2022-12-04' as itime, 200 as price
union all
select '2' as id, '2022-12-05' as itime, 210 as price
union all
select '3' as id, '2022-12-06' as itime, 300 as price
union all
select '3' as id, '2022-12-07' as itime, null as price
union all
select '3' as id, '2022-12-08' as itime, 400 as price
union all
select '4' as id, '2022-12-01' as itime, 140 as price
union all
select '4' as id, '2022-12-02' as itime, null as price
union all
select '4' as id, '2022-12-03' as itime, null as price
union all
select '4' as id, '2022-12-04' as itime, 200 as price
union all
select '4' as id, '2022-12-05' as itime, null as price
分析
上述需求中可以看出,其实想要补充丢失成本行的数据,只要拿到相对当前丢失成本数据的前、后同商品的最近有效成本,不论有多少条连续的丢失成本数据行,见下图:
只要做到将丢失成本数据行与它的前、后有效成本利用重分组思想将他们分组在一组中,取组内max值即可。
实现
一、分组
with tmp as (
select '1' as id, '2022-12-01' as itime, 120 as price
union all
select '2' as id, '2022-12-01' as itime, 150 as price
union all
select '2' as id, '2022-12-02' as itime, null as price
union all
select '2' as id, '2022-12-03' as itime, null as price
union all
select '2' as id, '2022-12-04' as itime, 200 as price
union all
select '2' as id, '2022-12-05' as itime, 210 as price
union all
select '3' as id, '2022-12-06' as itime, 300 as price
union all
select '3' as id, '2022-12-07' as itime, null as price
union all
select '3' as id, '2022-12-08' as itime, 400 as price
union all
select '4' as id, '2022-12-01' as itime, 140 as price
union all
select '4' as id, '2022-12-02' as itime, null as price
union all
select '4' as id, '2022-12-03' as itime, null as price
union all
select '4' as id, '2022-12-04' as itime, 200 as price
union all
select '4' as id, '2022-12-05' as itime, null as price
)
select
id,itime,price,
sum(if(price is null, 0, 1)) over(partition by id order by itime) as last_index,
sum(if(price is null, 0, 1)) over(partition by id order by itime desc) as next_index
from tmp
先利用重分组思想,根据price值是否为null为界限,顺序,逆序sum开窗,即可将丢失成本数据与它相对应的前、后有效成本分到同一组中。
last_cost分组,可见丢失成本的数据行已经和它的前一行有效成本行分在一组
next_cost分组,可见丢失成本的数据行已经和它的后一行有效成本行分在一组
二、组内取最大price
按照商品id和last_index、next_index分组,取组内最大的price,其中nullprice赋0值。
with tmp as (
select '1' as id, '2022-12-01' as itime, 120 as price
union all
select '2' as id, '2022-12-01' as itime, 150 as price
union all
select '2' as id, '2022-12-02' as itime, null as price
union all
select '2' as id, '2022-12-03' as itime, null as price
union all
select '2' as id, '2022-12-04' as itime, 200 as price
union all
select '2' as id, '2022-12-05' as itime, 210 as price
union all
select '3' as id, '2022-12-06' as itime, 300 as price
union all
select '3' as id, '2022-12-07' as itime, null as price
union all
select '3' as id, '2022-12-08' as itime, 400 as price
union all
select '4' as id, '2022-12-01' as itime, 140 as price
union all
select '4' as id, '2022-12-02' as itime, null as price
union all
select '4' as id, '2022-12-03' as itime, null as price
union all
select '4' as id, '2022-12-04' as itime, 200 as price
union all
select '4' as id, '2022-12-05' as itime, null as price
)
select
id,itime,price,
max(if(price is null,0,price)) over(partition by id,last_index) as last_price,
max(if(price is null,0,price)) over(partition by id,next_index) as next_price
from
(select
id,itime,price,
sum(if(price is null, 0, 1)) over(partition by id order by itime) as last_index,
sum(if(price is null, 0, 1)) over(partition by id order by itime desc) as next_index
from tmp
) t;
三、取平均值作为最后的成本
取每条数据last_price和next_price的平均值作为最后的成本数据
with tmp as (
select '1' as id, '2022-12-01' as itime, 120 as price
union all
select '2' as id, '2022-12-01' as itime, 150 as price
union all
select '2' as id, '2022-12-02' as itime, null as price
union all
select '2' as id, '2022-12-03' as itime, null as price
union all
select '2' as id, '2022-12-04' as itime, 200 as price
union all
select '2' as id, '2022-12-05' as itime, 210 as price
union all
select '3' as id, '2022-12-06' as itime, 300 as price
union all
select '3' as id, '2022-12-07' as itime, null as price
union all
select '3' as id, '2022-12-08' as itime, 400 as price
union all
select '4' as id, '2022-12-01' as itime, 140 as price
union all
select '4' as id, '2022-12-02' as itime, null as price
union all
select '4' as id, '2022-12-03' as itime, null as price
union all
select '4' as id, '2022-12-04' as itime, 200 as price
union all
select '4' as id, '2022-12-05' as itime, null as price
)
select
id,itime,price,
case when price is null then (last_price + next_price) / 2 else price end as last_price
from
(select
id,itime,price,
max(if(price is null,0,price)) over(partition by id,last_index) as last_price,
max(if(price is null,0,price)) over(partition by id,next_index) as next_price
from
(select
id,itime,price,
sum(if(price is null, 0, 1)) over(partition by id order by itime) as last_index,
sum(if(price is null, 0, 1)) over(partition by id order by itime desc) as next_index
from tmp
) t
) t1;
最后
喜欢的点赞、关注、收藏吧~ 你的支持是最大的创作动力~~