hiveSql 京东面试题-有效值问题

news2024/11/17 9:26:04

hiveSql 京东面试题-有效值问题

    • 需求
    • 准备数据
    • 分析
    • 实现
    • 最后

需求

有入库成本表,一个商品每次入库就会产生一条数据,里面包含商品id,入库时间time,以及入库采购的成本。但由于某些某些原因,导致表中某些数据的成本是有丢失的。
现在的逻辑是:当成本丢失时,有两种取成本的方式,现在需要把两种成本都取出来,最后取2次成本的平均值作为本次入库的成本。取数逻辑如下:

  • 1、取同一个商品最近一次之前入库的有效成本,即丢失成本商品的丢失成本当前数据的前一条有效成本数据
  • 2、取同一个商品最近一次之后入库的有效成本,即丢失成本商品的丢失成本当前数据的后一条有效成本数据
  • 3、上述中结果依然有无效值时,记为0

具体数据如下:
在这里插入图片描述
可见截图中商品id为2的商品在2022-12-02号和2022-12-03号的入库成本丢失,按照上述取数逻辑,会生成两个新的字段last_cost、next_cost。其中
last_cost是当前丢失成本数据的前一条有效成本数据;
next_cost是当前丢失成本数据的后一条有效成本数据。

还是看商品id为2的数据,在2022-12-02号这条丢失成本数据中:
它的last_cost是商品id同样是2,且它的上一条有效成本数据,即2022-12-01的150,
它的next_cost是商品id同样是2,且它的下一条有效成本数据,即2022-12-04的200。
即上截图中第一条填充色为红色的数据行。

同理id为2的2022-12-03号数据也是它的上一行有效成本 和 它的下一条有效成本。
最后一条商品id为4的2022-12-05号丢失成本数据中next_cost为0,因为它没有下一条有效成本。(即上述逻辑3)

准备数据

select '1' as id, '2022-12-01' as itime, 120 as price
    union all 
    select '2' as id, '2022-12-01' as itime, 150 as price
    union all 
    select '2' as id, '2022-12-02' as itime, null as price
    union all 
    select '2' as id, '2022-12-03' as itime, null as price
    union all 
    select '2' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '2' as id, '2022-12-05' as itime, 210 as price
    union all 
    select '3' as id, '2022-12-06' as itime, 300 as price
    union all 
    select '3' as id, '2022-12-07' as itime, null as price
    union all 
    select '3' as id, '2022-12-08' as itime, 400 as price
    union all 
    select '4' as id, '2022-12-01' as itime, 140 as price
    union all 
    select '4' as id, '2022-12-02' as itime, null as price
    union all 
    select '4' as id, '2022-12-03' as itime, null as price
    union all 
    select '4' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '4' as id, '2022-12-05' as itime, null as price

在这里插入图片描述

分析

上述需求中可以看出,其实想要补充丢失成本行的数据,只要拿到相对当前丢失成本数据的前、后同商品的最近有效成本,不论有多少条连续的丢失成本数据行,见下图:
在这里插入图片描述
只要做到将丢失成本数据行与它的前、后有效成本利用重分组思想将他们分组在一组中,取组内max值即可。

实现

一、分组

with tmp as (
    select '1' as id, '2022-12-01' as itime, 120 as price
    union all 
    select '2' as id, '2022-12-01' as itime, 150 as price
    union all 
    select '2' as id, '2022-12-02' as itime, null as price
    union all 
    select '2' as id, '2022-12-03' as itime, null as price
    union all 
    select '2' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '2' as id, '2022-12-05' as itime, 210 as price
    union all 
    select '3' as id, '2022-12-06' as itime, 300 as price
    union all 
    select '3' as id, '2022-12-07' as itime, null as price
    union all 
    select '3' as id, '2022-12-08' as itime, 400 as price
    union all 
    select '4' as id, '2022-12-01' as itime, 140 as price
    union all 
    select '4' as id, '2022-12-02' as itime, null as price
    union all 
    select '4' as id, '2022-12-03' as itime, null as price
    union all 
    select '4' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '4' as id, '2022-12-05' as itime, null as price
)

select 
    id,itime,price,
    sum(if(price is null, 0, 1)) over(partition by id order by itime) as last_index,
    sum(if(price is null, 0, 1)) over(partition by id order by itime desc) as next_index
from tmp

先利用重分组思想,根据price值是否为null为界限,顺序,逆序sum开窗,即可将丢失成本数据与它相对应的前、后有效成本分到同一组中。

last_cost分组,可见丢失成本的数据行已经和它的前一行有效成本行分在一组
在这里插入图片描述

next_cost分组,可见丢失成本的数据行已经和它的后一行有效成本行分在一组在这里插入图片描述
二、组内取最大price
按照商品id和last_index、next_index分组,取组内最大的price,其中nullprice赋0值。

with tmp as (
    select '1' as id, '2022-12-01' as itime, 120 as price
    union all 
    select '2' as id, '2022-12-01' as itime, 150 as price
    union all 
    select '2' as id, '2022-12-02' as itime, null as price
    union all 
    select '2' as id, '2022-12-03' as itime, null as price
    union all 
    select '2' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '2' as id, '2022-12-05' as itime, 210 as price
    union all 
    select '3' as id, '2022-12-06' as itime, 300 as price
    union all 
    select '3' as id, '2022-12-07' as itime, null as price
    union all 
    select '3' as id, '2022-12-08' as itime, 400 as price
    union all 
    select '4' as id, '2022-12-01' as itime, 140 as price
    union all 
    select '4' as id, '2022-12-02' as itime, null as price
    union all 
    select '4' as id, '2022-12-03' as itime, null as price
    union all 
    select '4' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '4' as id, '2022-12-05' as itime, null as price
)
select 
    id,itime,price,
    max(if(price is null,0,price)) over(partition by id,last_index) as last_price,
    max(if(price is null,0,price)) over(partition by id,next_index) as next_price
from
    (select 
        id,itime,price,
        sum(if(price is null, 0, 1)) over(partition by id order by itime) as last_index,
        sum(if(price is null, 0, 1)) over(partition by id order by itime desc) as next_index
    from tmp
    ) t;

在这里插入图片描述
三、取平均值作为最后的成本
取每条数据last_price和next_price的平均值作为最后的成本数据


with tmp as (
    select '1' as id, '2022-12-01' as itime, 120 as price
    union all 
    select '2' as id, '2022-12-01' as itime, 150 as price
    union all 
    select '2' as id, '2022-12-02' as itime, null as price
    union all 
    select '2' as id, '2022-12-03' as itime, null as price
    union all 
    select '2' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '2' as id, '2022-12-05' as itime, 210 as price
    union all 
    select '3' as id, '2022-12-06' as itime, 300 as price
    union all 
    select '3' as id, '2022-12-07' as itime, null as price
    union all 
    select '3' as id, '2022-12-08' as itime, 400 as price
    union all 
    select '4' as id, '2022-12-01' as itime, 140 as price
    union all 
    select '4' as id, '2022-12-02' as itime, null as price
    union all 
    select '4' as id, '2022-12-03' as itime, null as price
    union all 
    select '4' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '4' as id, '2022-12-05' as itime, null as price
)
select 
    id,itime,price,
    case when price is null then (last_price + next_price) / 2 else price end as last_price
from
    (select 
        id,itime,price,
        max(if(price is null,0,price)) over(partition by id,last_index) as last_price,
        max(if(price is null,0,price)) over(partition by id,next_index) as next_price
    from
        (select 
            id,itime,price,
            sum(if(price is null, 0, 1)) over(partition by id order by itime) as last_index,
            sum(if(price is null, 0, 1)) over(partition by id order by itime desc) as next_index
        from tmp
        ) t
    ) t1;

在这里插入图片描述

最后

喜欢的点赞、关注、收藏吧~ 你的支持是最大的创作动力~~

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/137782.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

MySQL-慢sql优化思路

目录 1、开启Mysql慢查询 1.1、查看慢查询相关配置 1.2、查询慢查询sql耗时临界点 1.3、开启Mysql慢查询 2、explain查看SQL执行计划 2.1、Select_type 2.2、Type 2.3、Possible_keys 2.4、Key 2.5、Key_len 2.6、Rows 2.7、Extra 3、profile 分析执行耗时 3.1、…

gin web开发模板渲染

一、模板输出模板输出首先需要指定模板所放位置 r.LoadHTMLGlob("templates/**/*")参数中: **代表匹配任意名称的目录 *代表匹配任意名称的模板文件"templates/**/*" 代表可以匹配指定模板文件到 templates目录下的 hello/index.tmpl user/inde…

自然语言处理 第八章 机器翻译复习

机器翻译机器翻译概述典型神经机器翻译模型神经机器翻译 ( Neural Machine Translation, NMT神经机器翻译系统相关技术罕见词处理技术(词表受限问题)解决办法subword 方法beam searchcoverage penalty (翻译覆盖率问题)推敲网络&a…

qt运行外部exe并自定义外部功能界面(QWidget添加工具栏QToolBar)

先放效果图,就是在qt运行的界面中,不只是运行出了外部exe的界面,还可以在外围放置自定义的功能按钮。(本质就是在QWidget中添加工具栏QToolBar) 关于如何只运行出外部的exe,我之前写过教程:ht…

序列生成策略——束搜索、贪心搜索、穷举搜索

序列搜索策略包括贪心搜索、穷举搜索和束搜索。 贪心搜索所选取序列的计算量最小,但精度相对较低。 穷举搜索所选取序列的精度最高,但计算量最大。 束搜索通过灵活选择束宽,在正确率和计算代价之间进行权衡。 在序列到序列学习&#xff08…

Java集合面试题汇总大全

每个集合的出现一定是为了解决某种问题的解决方案。 集合流程图 JAVA中集合和数组的区别Collection和Collections的区别ArrayList和LinkedList 和Vector的区别list/set/map的区别HashSet和TreeSet和LinkedHashSet区别HashMap和Hashtable的比较HashMap和ConcurrentHashMap区别H…

【零基础】学python数据结构与算法笔记5

文章目录前言30.归并排序归并31.归并排序实现32.归并排序时间复杂度讨论33.NB三人组小结总结前言 学习python数据结构与算法,学习常用的算法, b站学习链接 30.归并排序归并 将两个箭头指向两个列表的首个元素,比较,哪个大就把它…

Java IO流 - 字符流的使用详细介绍

文章目录字符流的使用文件字符输入流创建字符输入流字符输入流读取文件字符输出流创建字符输出流字符输出流写入字符流的使用 字节流读取中文输出会存在什么问题? 会乱码。或者内存溢出。 读取中文输出,哪个流更合适,为什么? 字符…

MAX78000训练自己的神经网络模型

参考: The MAX78000 Microcontroller, Some Coffee, and Artificial Intelligence GitHub - MaximIntegratedAI/ai8x-training: Model Training for ADIs MAX78000 and MAX78002 AI Devices ai8x-synthesis/WSL2.md at develop MaximIntegratedAI/ai8x-synthesi…

移动金融管理系统设计与开发实训(课程设计报告)——基于Android+Django的银行系统部分功能设计与实现

阅读过程中若有不解和需要可私信,乐于解答 项目代码文档都保存着 基于AndroidDjango的银行系统部分功能开发与设计 摘要: 随着信息技术的快速发展以及经济的快速发展,金融界的信息化的进程也在不断推进,信息技术的不断成熟和人…

JavaWeb:EL表达式JSTL标签

1,EL表达式 1.1 概述 EL(全称Expression Language)表达式语言,用于简化JSP页面内的Java代码。 EL表达式的主要作用是 获取数据 。其实就是从域对象中获取数据,然后将数据展示在页面上。 而EL表达式的语法也比较简单…

HR软件七步帮助企业管理员工

对于中小企业(SMB)来说,员工就意味着一切。你的员工几乎掌握着企业的整体增长和发展,他们可以成就企业,但也能破坏企业的发展。为了提高员工效率,中小型企业需要出色的人力资源管理。员工只有在受到重视和培…

【Linux】Linux环境变量的理解

加油布鲁斯,你能行的! 文章目录一、环境变量PATH中的系统默认搜索路径1.将程序安装到/usr/bin目录(不带./运行自己写的程序)2.将程序路径添加到PATH环境变量里面(不带./运行自己写的程序)二、环境变量的深…

PyTorch基础部分——毕设进行时

为了完成毕设准备开始学习PyTorch,第一步到蓝桥云课搜索实验项目,找到了“PyTorch入门与实战(第二版)”,开始边实验边学习(本身有了一点点点点的相关基础了)学习传送门:PyTorch基础入…

SQL WHERE 子句

WHERE 子句用于过滤记录。 SQL WHERE 子句 WHERE 子句用于提取那些满足指定条件的记录。 SQL WHERE 语法 SELECT column1, column2, ... FROM table_name WHERE condition; 参数说明: column1, column2, ...:要选择的字段名称,可以为多个…

06SpringCloudAlibaba负载均衡服务调用-OpenFeign

目录 SpringCloud Feign 查看此博客:005SpringCloud--Feign:负载均衡(基于服务端)_gh_xiaohe的博客-CSDN博客 OpenFegin概述 OpenFeign是什么: OpenFegin能干什么 Feign和OpenFeign两者区别 OpenFeign使用步骤 服务消费者 接口注解 微…

TP相关知识

说明 该文章来源于徒弟lu2ker转载至此处,更多文章可参考:https://github.com/lu2ker/ 文章目录说明PHP中有一些内置类PHP反序列化问题绕过姿势:魔术方法(反序列化如何利用)CMS可能存在的部分逻辑问题$_REQUEST相关安全…

小游戏引擎选型参考指南

写在前面 前面写了几期有关于小游戏的文章,主要从小游戏开发、小游戏运营、小游戏变现等多个角度进行了较为粗略的介绍,很多同学表示对小游戏引擎部分很感兴趣,希望能够有一些更为深入的分析介绍。今天就对目前主流的小游戏引擎进行探讨。 …

Java 并发编程 (二)CountDownLatch和CyclicBarrier的使用

CountDownLatch和CyclicBarrier CountDownLatch 功能介绍 CountDownLatch 是一个同步功能的辅助类 线程计数不为0时呈wait状态如果为0则继续执行。通过await 和 countDown 两个方法来实现等待和继续运行。 作用:一个线程或多个线程等待另一个线程或多个线程完成后…

2003-2021年高铁列车信息

2003-2021年高铁列车信息 1、时间:2003-2021年 2、指标: 列车车次、出发站、出发站所属地级市、出发站所属省份、出发站类型、到达站、到达站所属地级市、到达站所属省份、到达站类型、车型、开车时间、到站时间、运行时间、里程 3、指标说明&#x…