背景:
在项目开发中,有时候需要手动处理一下数据库表的数据。涉及到数据得到备份、恢复,清洗,计算,合并等操作。
举例记录一下最近对数据的一些处理过程。
1、对数据表进行数据量统计
select count(*) from table_a;
2、记住数据库表的数据,然后进行备份
create table table_a_bak as select * from table_a where 1 = 1;
3、我们再对数据进行处理之前一定记得进行备份,以防错误的操作后能将数据恢复,或最大程度的恢复,这是一个好习惯。然后我们才开始对数据进行处理。
4、建立一个临时表导入需要处理的数据。一般临时表最好是 tmp_开头,这也是一个好习惯,方便数据计算完后,对临时表进行删除,以免误删。
create table tmp_table_a as select * from table_a where 1 = 2;
5、对表数据进行去重的操作,先查询看看有多少手机号重复的记录。
select count(1) from (
select phone ,count(1) cn from table_a where phone is not null group by phone ) t where t.cn > 1 ;
6、多数据进行去重。先对table_a表进行接收
drop table tmp_table_a_1;
create table tmp_table_a_1 as
select * from (
select phone ,count(1) cn from table_a where phone is not null group by phone ) t where t.cn > 1 ;
7、tmp_table_a_1增加一个自增的id字段,先建立一个SEQUENCE
CREATE SEQUENCE tmp_table_a_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
给id增加自增约束
alter table tmp_table_a_1 alter column id set default nextval('tmp_table_a_id_seq');
8、将数据重新插回tmp_table_a_1 ,先清空tmp_table_a_1
清空数据
truncate table tmp_table_a_1 ;
插回数据
INSERT INTO tmp_table_a_1
( user_id, phone, user_name, id_card, wx_account, state, app_id, app_name, head_icon, sex, sex_name, reg_time, user_state, user_state_name, open_id, opt_user, birthday_type, birthday, org_name, last_trading_time, org_code, union_id, orig_phone, update_time, belong_code, belong_name, data_type, client_num, client_name, country_code, inviter, channel_id, channel_name, company)
select user_id, phone, user_name, id_card, wx_account, state, app_id, app_name, head_icon, sex, sex_name, reg_time, user_state, user_state_name, open_id, opt_user, birthday_type, birthday, org_name, last_trading_time, org_code, union_id, orig_phone, update_time, belong_code, belong_name, data_type, client_num, client_name, country_code, inviter, channel_id, channel_name, company
from table_a;
9、去重操作
DELETE FROM tmp_table_a_1
WHERE id
NOT IN (
SELECT max(id)
FROM tmp_table_a_1
GROUP BY phone
);
10、A表中有但是B表中没有的数据
drop table if exists tmp_table_a_2;
create table tmp_table_a_2 as
select t1.* from tmp_table_a_1 t1
left join tmp_table_b_1 t2
on t1.phone = t2.phone
where t2.phone is null;