问题描述:
客户在使用expdp/impdp迁移数据库完成后,在新环境收集统计信息,但是在迁移完成的当天中午,好多SQL语句执行变慢,执行计划发生了改变,下面通过案例来说明。
1、准备数据
scott用户下创建test表,插入9999行数据,并且把id>2的全部更新成99,这样id列数据就会出现严重倾斜。
conn scott/tiger
drop table test purge;
create table test (id int, name varchar2(20));
insert into test select level lv, dbms_random.string('l',20) from dual connect by level < 10000;
update test set id = 99 where id > 2;
commit;
2、 根据ID列查询TEST表(这样ID列会记录在col_usage$视图里面)
SQL> select * from test where id = 1;
ID NAME
---------- ------------------------------------------------------------
1 whkefbijsvipefdgnoez
SQL>
#####根据test表对应的object_id查询col_usage$视图
SQL> exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
PL/SQL procedure successfully completed.
SQL> select * from col_usage$ where obj# = (select object_id from dba_objects where object_name = 'TEST' and owner='SCOTT');
OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS
---------- ---------- -------------- -------------- -----------------
RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
----------- ---------- ---------- ---------------
97981 1 1 0 0
1 0 0 13-SEP-23
3、对TEST表收集统计信息,统计信息参数都不用写,用默认值
SQL> exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'SCOTT',tabname=>'TEST');
PL/SQL procedure successfully completed.
SQL>
4、查看TEST表列的统计信息
可以看到ID列上面有直方图信息
SQL> set lin 200
SQL> col COLUMN_NAME for a30
SQL> select a.column_name,
2 b.num_rows,
3 a.num_distinct distinct_num,
4 round(a.num_distinct / b.num_rows * 100, 2) selectivity,
5 a.histogram,
6 a.num_buckets
7 from dba_tab_col_statistics a, dba_tables b
8 where a.owner = b.owner
9 and a.table_name = b.table_name
10 and a.owner = 'SCOTT'
11 and a.table_name = 'TEST';
COLUMN_NAME NUM_ROWS DISTINCT_NUM SELECTIVITY HISTOGRAM NUM_BUCKETS
------------------------------ ---------- ------------ ----------- --------------------------------------------- -----------
NAME 9999 9999 100 NONE 1
ID 9999 3 .03 FREQUENCY 3
SQL>
5、导出TEST
统计信息也一起导出
[oracle@11g ~]$ cat expdp_full_data.par
userid="/ as sysdba"
directory=MY_DIR
dumpfile=expdp_full_data_%U.dmp
logfile=expdp_full_data.log
PARALLEL=16
CLUSTER=N
tables=scott.test
compression=all
[oracle@11g ~]$ expdp parfile=expdp_full_data.par
Export: Release 11.2.0.4.0 - Production on Wed Sep 13 14:57:55 2023
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
FLASHBACK automatically enabled to preserve database integrity.
Starting "SYS"."SYS_EXPORT_TABLE_01": /******** AS SYSDBA parfile=expdp_full_data.par
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 384 KB
. . exported "SCOTT"."TEST" 129.0 KB 9999 rows
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Master table "SYS"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SYS.SYS_EXPORT_TABLE_01 is:
/home/oracle/dir/expdp_full_data_01.dmp
/home/oracle/dir/expdp_full_data_02.dmp
Job "SYS"."SYS_EXPORT_TABLE_01" successfully completed at Wed Sep 13 14:58:03 2023 elapsed 0 00:00:06
[oracle@11g ~]$
6、DROP TEST表
SQL> drop table test purge;
Table dropped.
SQL>
7、重新导入TEST表
[oracle@11g ~]$ cat impdp_full_data.par
userid="/ as sysdba"
directory=MY_DIR
dumpfile=expdp_full_data_%U.dmp
logfile=impdp_full_data.log
PARALLEL=16
CLUSTER=N
full=y
[oracle@11g ~]$ impdp parfile=impdp_full_data.par
Import: Release 11.2.0.4.0 - Production on Wed Sep 13 15:00:23 2023
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Master table "SYS"."SYS_IMPORT_FULL_01" successfully loaded/unloaded
Starting "SYS"."SYS_IMPORT_FULL_01": /******** AS SYSDBA parfile=impdp_full_data.par
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "SCOTT"."TEST" 129.0 KB 9999 rows
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Job "SYS"."SYS_IMPORT_FULL_01" successfully completed at Wed Sep 13 15:00:27 2023 elapsed 0 00:00:03
[oracle@11g ~]$
8、查看TEST表统计信息
上面还有直方图信息
SQL> set lin 200
SQL> col COLUMN_NAME for a30
SQL> select a.column_name,
2 b.num_rows,
3 a.num_distinct distinct_num,
4 round(a.num_distinct / b.num_rows * 100, 2) selectivity,
5 a.histogram,
6 a.num_buckets
7 from dba_tab_col_statistics a, dba_tables b
8 where a.owner = b.owner
9 and a.table_name = b.table_name
10 and a.owner = 'SCOTT'
11 and a.table_name = 'TEST';
COLUMN_NAME NUM_ROWS DISTINCT_NUM SELECTIVITY HISTOGRAM NUM_BUCKETS
------------------------------ ---------- ------------ ----------- --------------------------------------------- -----------
NAME 9999 9999 100 NONE 1
ID 9999 3 .03 FREQUENCY 3
9、重新收集统计信息
SQL> select * from col_usage$ where obj# = (select object_id from dba_objects where object_name = 'TEST' and owner='SCOTT');
no rows selected
SQL> SQL> exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'SCOTT',tabname=>'TEST');
PL/SQL procedure successfully completed.
SQL>
9、再次确认TEST表的统计信息
发现ID的直方图信息没有了,数据倾斜没有直方图信息,可能导致执行计划不准确。
SQL> set lin 200
SQL> col COLUMN_NAME for a30
SQL> select a.column_name,
2 b.num_rows,
3 a.num_distinct distinct_num,
4 round(a.num_distinct / b.num_rows * 100, 2) selectivity,
5 a.histogram,
6 a.num_buckets
7 from dba_tab_col_statistics a, dba_tables b
8 where a.owner = b.owner
9 and a.table_name = b.table_name
10 and a.owner = 'SCOTT'
11 and a.table_name = 'TEST';
COLUMN_NAME NUM_ROWS DISTINCT_NUM SELECTIVITY HISTOGRAM NUM_BUCKETS
------------------------------ ---------- ------------ ----------- --------------------------------------------- -----------
NAME 9999 9999 100 NONE 1
ID 9999 3 .03 NONE 1
SQL>
##########
前面步骤都一样,但是如果导入数据之后,使用repeat的方式收集,直方图信息还在
SQL> set wrap off
SQL> set lin 200
SQL> set lin 200
SQL> col COLUMN_NAME for a30
SQL> select a.column_name,
2 b.num_rows,
3 a.num_distinct distinct_num,
4 round(a.num_distinct / b.num_rows * 100, 2) selectivity,
5 a.histogram,
6 a.num_buckets
7 from dba_tab_col_statistics a, dba_tables b
8 where a.owner = b.owner
9 and a.table_name = b.table_name
10 and a.owner = 'SCOTT'
11 and a.table_name = 'TEST';
COLUMN_NAME NUM_ROWS DISTINCT_NUM SELECTIVITY HISTOGRAM NUM_BUCKETS
------------------------------ ---------- ------------ ----------- --------------------------------------------- -----------
NAME 9999 9999 100 NONE 1
ID 9999 3 .03 FREQUENCY 3
SQL> exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'SCOTT',tabname=>'TEST',ESTIMATE_PERCENT=>10,method_opt=>'for all columns size repeat',cascade=>true,force=>true,degree=>8);
PL/SQL procedure successfully completed.
SQL> set lin 200
SQL> col COLUMN_NAME for a30
SQL> select a.column_name,
2 b.num_rows,
3 a.num_distinct distinct_num,
4 round(a.num_distinct / b.num_rows * 100, 2) selectivity,
5 a.histogram,
6 a.num_buckets
7 from dba_tab_col_statistics a, dba_tables b
8 where a.owner = b.owner
9 and a.table_name = b.table_name
10 and a.owner = 'SCOTT'
11 and a.table_name = 'TEST';
COLUMN_NAME NUM_ROWS DISTINCT_NUM SELECTIVITY HISTOGRAM NUM_BUCKETS
------------------------------ ---------- ------------ ----------- --------------------------------------------- -----------
NAME 10009 10009 100 NONE 1
ID 10009 3 .03 FREQUENCY 3
SQL>
所以,使用expdp/impdp迁移数据库后,如果使用默认的方式收集统计信息,会导致列上面的直方图信息丢失,造成SQL执行计划和原库存在差异,SQL执行效率变低,随着数据库运行一段时间后, col_usage$表中记录的列越来越多,使用默认的方式(for all columns size auto)的方式也会逐渐把列的直方图收集。如果生产中SQL遇到了问题,需要手动收集统计信息(因为SQL已经运行过,where条件中用到的列已经记录到col_usage$中,所以auto的方式也会把where中的列收集直方图)。