Postgresql 根据单列或几列分组去重row_number() over() partition by

news2024/9/29 15:28:22

Postgresql 根据单列或几列分组去重row_number() over() partition by

一般用于单列或者几列需要去重后进行计算值的

count(distinct(eid)) 可以

比如有个例子,需要根据名称,城市去筛选覆盖的道路长度,以月因为建立了唯一索引是ok的,年时可能会有重复的,如何去重呢?用窗口函数:row_number() over() partition by
count(distinct(length)) 不行,因为很多道路数据本就有相同的长度

1. 效果图

可以看到 distinctCnt > Cnt说明有重复,点开string_agg的结果发现确实是有重复;, 这样计算其所对应的length值肯定偏大。
在这里插入图片描述
去重后效果图如下: 把所有的聚合条件都写在partition by后边。
可以看到后边的里程和也正常了不少。
试验发现pname有无差别不大,可能是因为构造的数据集小;,但其实是需要的;
在这里插入图片描述
以第一条数据去验证:

2. 源码

2.1 建表,构建数据

drop table if exists t_pa_cover;
create table if not exists t_pa_cover(
	pname text COLLATE pg_catalog."default" NOT NULL,
    upload_date varchar(12),
    city_code varchar(20) default '',
    link_pid varchar(20),
    link_length numeric default 0,
    create_time timestamp with time zone NOT NULL DEFAULT now(),
    constraint t_pa_cover_unique_key unique (pname,upload_date,city_code,link_pid)
);
COMMENT ON TABLE t_pa_cover IS '覆盖率中间表';
COMMENT ON COLUMN t_pa_cover.pname IS '名称';
COMMENT ON COLUMN t_pa_cover.upload_date IS '日期';
COMMENT ON COLUMN t_pa_cover.city_code IS '城市行政编码';
COMMENT ON COLUMN t_pa_cover.link_pid IS 'linkpid';
COMMENT ON COLUMN t_pa_cover.link_length IS 'linkpid长度m';
COMMENT ON COLUMN t_pa_cover.create_time IS '创建时间';

create index if not exists t_pa_cover_citycode on t_pa_cover(city_code);
create index if not exists t_pa_cover_pname on t_pa_cover(pname);
create index if not exists t_pa_cover_uploaddate on t_pa_cover(upload_date);

INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202201', '1101', 4721472607, 99.88);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202201', '1201', 4731620766, 64.96);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202201', '1301', 4725763511, 82.77);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202202', '1101', 4732413545, 23.63);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202202', '1201', 4733766774, 17.97);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202202', '1301', 4725763511, 82.77);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202203', '1101', 4732413545, 23.63);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202203', '1201', 4721472607, 99.88);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202203', '1301', 4733766774, 17.97);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202204', '1101', 4721472607, 99.88);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202204', '1201', 4738504835, 37.94);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202204', '1301', 4727435973, 39.05);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202205', '1101', 4737641033, 1.41);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202205', '1201', 4725763511, 82.77);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202205', '1301', 4727435973, 39.05);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202206', '1101', 4725763511, 82.77);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202206', '1201', 4737641033, 1.41);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202206', '1301', 4733766774, 17.97);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202207', '1101', 4725763511, 82.77);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202207', '1201', 4740662897, 86.96);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202207', '1301', 4719251580, 43.12);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202208', '1101', 4719251580, 43.12);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202208', '1201', 4727435973, 39.05);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202208', '1301', 4725763511, 82.77);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202209', '1101', 4741477663, 35.39);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202209', '1201', 4738504835, 37.94);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202209', '1301', 4740789027, 5.36);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202210', '1101', 4721472607, 99.88);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202210', '1201', 4733766774, 17.97);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202210', '1301', 4732413545, 23.63);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202211', '1101', 4719251580, 43.12);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202211', '1201', 4740789027, 5.36);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202211', '1301', 4719251580, 43.12);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202212', '1101', 4740789027, 5.36);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202212', '1201', 4740662897, 86.96);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('aa', '202212', '1301', 4721472607, 99.88);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202201', '1101', 4738492963, 10.75);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202201', '1201', 4736532327, 44.78);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202201', '1301', 4740856924, 39.60);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202202', '1101', 4739710021, 85.77);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202202', '1201', 4736532327, 44.78);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202202', '1301', 4712358476, 44.06);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202203', '1101', 4734479408, 25.51);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202203', '1201', 4738273045, 99.60);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202203', '1301', 4740856924, 39.60);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202204', '1101', 4735500946, 49.98);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202204', '1201', 4738273045, 99.60);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202204', '1301', 4736169127, 58.38);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202205', '1101', 4736797286, 26.90);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202205', '1201', 4716723755, 89.29);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202205', '1301', 4740856924, 39.60);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202206', '1101', 4738492963, 10.75);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202206', '1201', 4735500946, 49.98);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202206', '1301', 4712358476, 44.06);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202207', '1101', 4716723755, 89.29);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202207', '1201', 4740108020, 77.72);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202207', '1301', 4730167080, 0.11);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202208', '1101', 4716723755, 89.29);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202208', '1201', 4738492963, 10.75);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202208', '1301', 4730167080, 0.11);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202209', '1101', 4716723755, 89.29);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202209', '1201', 4735500946, 49.98);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202209', '1301', 4712358476, 44.06);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202210', '1101', 4736532327, 44.78);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202210', '1201', 4738273045, 99.60);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202210', '1301', 4716723755, 89.29);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202211', '1101', 4740108020, 77.72);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202211', '1201', 4740108020, 77.72);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202211', '1301', 4741340832, 83.51);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202212', '1101', 4738492963, 10.75);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202212', '1201', 4734479408, 25.51);
INSERT INTO public.t_pa_cover(pname, upload_date, city_code, link_pid, link_length) VALUES ('bb', '202212', '1301', 4741340832, 83.51);

2.2 去重与没去重——sql对比

-- 有重复
select pname,
       substring(upload_date,0,5) as upDate,
	   city_code as cityCode,
	   count(distinct(link_pid)) distinctCnt,
	   count(link_pid) cnt,
	   string_agg(link_pid,','),
	   sum(link_length)
from t_pa_cover
group by pname,upDate,cityCode

-- 去重后
select pname,
       substring(upload_date,0,5) as upDate,
	   city_code as cityCode,
	   count(distinct(link_pid)) distinctCnt,
	   count(link_pid) cnt,
	   string_agg(link_pid,','),
	   sum(link_length)
from (
	select row_number() over(partition by pname,substring(upload_date,0,5),city_code,link_pid) as rn,
	a.*
	from t_pa_cover a
    where substring(upload_date,0,5) ='2022'
) b 
where b.rn=1
group by pname,upDate,cityCode;

参考

  • Postgresql语句持续更新
  • Postgresql大全
  • https://blog.csdn.net/wbj3106/article/details/82109077

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/339033.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

前端项目集成Vite配置一览无余

Vite配置文件 默认指定&#xff1a;vite.config.js。自定义指定&#xff1a;vite --config 自定义名称.js。 Vite相关命令 查看Vite有哪些命令&#xff1a;npx vite -help。 --host [host]// 指定域名 --port <port>// 指定端口 --https // 使用 TLSHTTP/2 --cors //…

Spring 中 ApplicationContext 和 BeanFactory 的区别

文章目录类图包目录不同国际化强大的事件机制&#xff08;Event&#xff09;底层资源的访问延迟加载常用容器类图 包目录不同 spring-beans.jar 中 org.springframework.beans.factory.BeanFactoryspring-context.jar 中 org.springframework.context.ApplicationContext 国际…

HTTP 和 HTTPS 的区别

文章目录前言一、HTTP 与 HTTPS 的基本概念HTTPHTTPS二、HTTP 和 HTTPS协议的区别前言 浏览网站时&#xff0c;我们会发现网址有两种格式&#xff0c;一种以http://开头&#xff0c;一种https://开头。好像这两种格式差别不大&#xff0c;只多了一个s&#xff0c;实际上他们有…

Java零基础教程——数组

目录数组静态初始化数组数组的访问数组的动态初始化元素默认值规则&#xff1a;数组的遍历数组遍历-求和冒泡排序数组的逆序交换数组 数组就是用来存储一批同种类型数据的容器。 20, 10, 80, 60, 90 int[] arr {20, 10, 80, 60, 90}; //位置 0 1 2 3 4数组的…

死锁的原因及解决方法

❣️关注专栏&#xff1a; JavaEE 死锁☘️1.什么是死锁☘️2.死锁的三个典型情况☘️2.1情况一☘️2.2情况二☘️2.2.1死锁的代码展示☘️2.3多个线程多把锁☘️3死锁产生的必要条件☘️3.1互斥性☘️3.2不可抢占☘️3.3请求和保持☘️3.4循环等待☘️4如何避免死锁☘️4.1避免…

【Spark分布式内存计算框架——Spark Core】6. RDD 持久化

3.6 RDD 持久化 在实际开发中某些RDD的计算或转换可能会比较耗费时间&#xff0c;如果这些RDD后续还会频繁的被使用到&#xff0c;那么可以将这些RDD进行持久化/缓存&#xff0c;这样下次再使用到的时候就不用再重新计算了&#xff0c;提高了程序运行的效率。 缓存函数 可以…

Kubernetes集群-部署Java项目

Kubernetes集群-部署Java项目&#xff08;SSG&#xff09; k8s部署项目java流程图 第一步 打包制作镜像 打包 java源码&#xff1a; application.properties #在有pom.xml的路径下执行 mvn clean package制作镜像&#xff1a; 将刚才打包后的文件夹传到&#xff0c;装有dock…

(考研湖科大教书匠计算机网络)第三章数据链路层-第十一节:虚拟局域网VLAN概述和实现机制

获取pdf&#xff1a;密码7281专栏目录首页&#xff1a;【专栏必读】考研湖科大教书匠计算机网络笔记导航 文章目录一&#xff1a;VLAN概述&#xff08;1&#xff09;分割广播域&#xff08;2&#xff09;虚拟局域网VLAN二&#xff1a;VLAN实现机制&#xff08;1&#xff09;IEE…

LeetCode题目笔记——15. 三数之和

文章目录题目描述题目链接题目难度——中等方法一&#xff1a;暴力&#xff08;参考&#xff09;代码/Python 参考方法二&#xff1a;哈希代码/Python参考方法三&#xff1a;排序双指针代码/Python代码/C总结题目描述 龙门阵&#xff1a;这个n个数之和是不是有什么深意啊&#…

Python中的类和对象(6)

文章目录1.多态2.类继承的多态3.自定义函数的多态4.鸭子类型5.思维导图1.多态 多态在编程中是一个非常重要的概念&#xff0c;它是指同一个运算符、函数或对象在不同的场景下&#xff0c;具有不同的作用效果&#xff0c;这么一个技能。 我们知道加号&#xff08;&#xff09;…

加载sklearn新闻数据集出错 fetch_20newsgroups() HTTPError: HTTP Error 403: Forbidden解决方案

大家好,我是爱编程的喵喵。双985硕士毕业,现担任全栈工程师一职,热衷于将数据思维应用到工作与生活中。从事机器学习以及相关的前后端开发工作。曾在阿里云、科大讯飞、CCF等比赛获得多次Top名次。喜欢通过博客创作的方式对所学的知识进行总结与归纳,不仅形成深入且独到的理…

大数据框架之Hadoop:入门(五)Hadoop编译源码(面试重点)

5.1 前期准备工作 1.CentOS联网 配置CentOS能连接外网。Linux虚拟机ping www.baidu.com 是畅通的 注意&#xff1a;采用root角色编译&#xff0c;减少文件夹权限出现问题 2.jar包准备(hadoop源码、JDK8、maven、ant 、protobuf) &#xff08;1&#xff09;hadoop-2.7.7-sr…

【TCP的拥塞控制】基于窗口的拥塞控制

TCP的拥塞窗口CWND大小和传输轮次n的关系如下所示。&#xff08;本题10分&#xff09; cwnd12481632333435363738394041422122232425261248N1234567891011121314151617181920212223242526 问题&#xff1a; &#xff08;1&#xff09;慢开始阶段的时间间隔&#xff1f;&#…

NFC enable NFC使能流程

同学,别退出呀,我可是全网最牛逼的 WIFI/BT/GPS/NFC分析博主,我写了上百篇文章,请点击下面了解本专栏,进入本博主主页看看再走呗,一定不会让你后悔的,记得一定要去看主页置顶文章哦。 NFC enable NFC使能流程 认识nfc系统如何工作,最好的方法就是了解nfc的各个流程,…

linux系统下SVN服务器搭建

linux新手&#xff0c;整了好几天才搞好&#xff0c;做下笔记以备后续使用&#xff1a; 1、下载svn服务器 yum -y install subversion 2、创建仓库 svnadmin create /opt/svn/pro/respos1 svnadmin create /opt/svn/pro/respos2 3、配置用户以及权限 1:cd到仓库目录下&#…

py3常用返回规则字符串的函数+ascii与char的转换

文章目录py3常用返回规则字符串的函数字符转ascii以及ascii转字符的方法为&#xff1a;py3常用返回规则字符串的函数 注明原来的网址为&#xff1a;https://docs.python.org/3.8/library/string.html string.ascii_letters 返回所有的大写、小写字母 string.ascii_lowercase 返…

寒假安全作业nginx-host绕过实例复现

1.测试环境搭建 LNMP架构的话&#xff0c;肯定就是linux、nginx、mysql、php四大组件。在后面的复现中我们还会用到https的一部分知识&#xff0c;故这里的nginx就需要使用虚拟主机并且配置https证书&#xff0c;且具有php解析功能。 1.1 基础nginx配置 #1.创建web目录 mkdir …

MySQL-InnoDB行格式浅析

简介 我们知道读写磁盘的速度非常慢&#xff0c;和内存读写差了几个数量级&#xff0c;所以当我们想从表中获取某些记录时&#xff0c; InnoDB 存储引擎需要一条一条的把记录从磁盘上读出来么&#xff1f; 不&#xff0c;那样会慢死&#xff0c;InnoDB 采取的方式是&#xff1a…

雅思经验(十三)

感觉这篇其实有点小难&#xff0c;我在精听的才发现那是六个实验对象&#xff0c;但是叫做six subjects在精听的时候感觉有些手忙脚乱&#xff0c;像是一团乱麻一样&#xff0c;但是也是没有什么关系。其实这才是查漏补缺&#xff0c;cello player 这是大提琴师violinists 小提…

08- 数据升维 (PolynomialFeatures) (机器学习)

在做数据升维的时候&#xff0c;最常见的手段就是将已知维度进行相乘&#xff08;或者自乘&#xff09;来构建新的维度 使用 np.concatenate()进行简单的&#xff0c;幂次合并&#xff0c;注意数据合并的方向axis 1 数据可视化时&#xff0c;注意切片&#xff0c;因为数据升维…