[2023-09-13]使用EXPDP/IMPDP迁移数据库后统计信息引起的性能问题

问题描述：

客户在使用expdp/impdp迁移数据库完成后，在新环境收集统计信息，但是在迁移完成的当天中午，好多SQL语句执行变慢，执行计划发生了改变，下面通过案例来说明。

1、准备数据

scott用户下创建test表，插入9999行数据，并且把id>2的全部更新成99，这样id列数据就会出现严重倾斜。

conn scott/tiger
drop table test purge;
create table test (id int, name varchar2(20));
insert into test select level lv, dbms_random.string('l',20) from dual connect by level < 10000;
update test set id = 99 where id > 2;
commit;

2、根据ID列查询TEST表（这样ID列会记录在col_usage$视图里面）

SQL> select * from test where id = 1;

        ID NAME
---------- ------------------------------------------------------------
         1 whkefbijsvipefdgnoez

SQL> 

#####根据test表对应的object_id查询col_usage$视图
SQL> exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

PL/SQL procedure successfully completed.

SQL> select * from col_usage$ where obj# = (select object_id from dba_objects where object_name = 'TEST' and owner='SCOTT');

      OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS
---------- ---------- -------------- -------------- -----------------
RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
----------- ---------- ---------- ---------------
     97981          1              1              0                 0
          1          0          0 13-SEP-23

3、对TEST表收集统计信息，统计信息参数都不用写，用默认值


SQL> exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'SCOTT',tabname=>'TEST');

PL/SQL procedure successfully completed.

SQL>

4、查看TEST表列的统计信息

可以看到ID列上面有直方图信息

SQL> set lin 200
SQL> col COLUMN_NAME for a30
SQL> select a.column_name,
  2  b.num_rows,
  3  a.num_distinct distinct_num,
  4  round(a.num_distinct / b.num_rows * 100, 2) selectivity,
  5  a.histogram,
  6  a.num_buckets
  7  from dba_tab_col_statistics a, dba_tables b
  8  where a.owner = b.owner
  9  and a.table_name = b.table_name
 10  and a.owner = 'SCOTT'
 11  and a.table_name = 'TEST';

COLUMN_NAME                      NUM_ROWS DISTINCT_NUM SELECTIVITY HISTOGRAM                                     NUM_BUCKETS
------------------------------ ---------- ------------ ----------- --------------------------------------------- -----------
NAME                                 9999         9999         100 NONE                                                    1
ID                                   9999            3         .03 FREQUENCY                                               3

SQL>

5、导出TEST

统计信息也一起导出

[oracle@11g ~]$ cat expdp_full_data.par 
userid="/ as sysdba"
directory=MY_DIR
dumpfile=expdp_full_data_%U.dmp
logfile=expdp_full_data.log
PARALLEL=16
CLUSTER=N
tables=scott.test
compression=all
[oracle@11g ~]$ expdp parfile=expdp_full_data.par

Export: Release 11.2.0.4.0 - Production on Wed Sep 13 14:57:55 2023

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
FLASHBACK automatically enabled to preserve database integrity.
Starting "SYS"."SYS_EXPORT_TABLE_01":  /******** AS SYSDBA parfile=expdp_full_data.par 
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 384 KB
. . exported "SCOTT"."TEST"                              129.0 KB    9999 rows
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Master table "SYS"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SYS.SYS_EXPORT_TABLE_01 is:
  /home/oracle/dir/expdp_full_data_01.dmp
  /home/oracle/dir/expdp_full_data_02.dmp
Job "SYS"."SYS_EXPORT_TABLE_01" successfully completed at Wed Sep 13 14:58:03 2023 elapsed 0 00:00:06

[oracle@11g ~]$

6、DROP TEST表

SQL> drop table test purge;

Table dropped.

SQL>

7、重新导入TEST表

[oracle@11g ~]$ cat impdp_full_data.par 
userid="/ as sysdba"
directory=MY_DIR
dumpfile=expdp_full_data_%U.dmp
logfile=impdp_full_data.log
PARALLEL=16
CLUSTER=N
full=y
[oracle@11g ~]$ impdp parfile=impdp_full_data.par

Import: Release 11.2.0.4.0 - Production on Wed Sep 13 15:00:23 2023

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Master table "SYS"."SYS_IMPORT_FULL_01" successfully loaded/unloaded
Starting "SYS"."SYS_IMPORT_FULL_01":  /******** AS SYSDBA parfile=impdp_full_data.par 
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "SCOTT"."TEST"                              129.0 KB    9999 rows
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Job "SYS"."SYS_IMPORT_FULL_01" successfully completed at Wed Sep 13 15:00:27 2023 elapsed 0 00:00:03

[oracle@11g ~]$

8、查看TEST表统计信息

上面还有直方图信息

SQL> set lin 200
SQL> col COLUMN_NAME for a30
SQL> select a.column_name,
  2  b.num_rows,
  3  a.num_distinct distinct_num,
  4  round(a.num_distinct / b.num_rows * 100, 2) selectivity,
  5  a.histogram,
  6  a.num_buckets
  7  from dba_tab_col_statistics a, dba_tables b
  8  where a.owner = b.owner
  9  and a.table_name = b.table_name
 10  and a.owner = 'SCOTT'
 11  and a.table_name = 'TEST';

COLUMN_NAME                      NUM_ROWS DISTINCT_NUM SELECTIVITY HISTOGRAM                                     NUM_BUCKETS
------------------------------ ---------- ------------ ----------- --------------------------------------------- -----------
NAME                                 9999         9999         100 NONE                                                    1
ID                                   9999            3         .03 FREQUENCY                                               3

9、重新收集统计信息

SQL> select * from col_usage$ where obj# = (select object_id from dba_objects where object_name = 'TEST' and owner='SCOTT');


no rows selected

SQL> SQL> exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'SCOTT',tabname=>'TEST');

PL/SQL procedure successfully completed.

SQL>

9、再次确认TEST表的统计信息

发现ID的直方图信息没有了，数据倾斜没有直方图信息，可能导致执行计划不准确。

SQL> set lin 200
SQL> col COLUMN_NAME for a30
SQL> select a.column_name,
  2  b.num_rows,
  3  a.num_distinct distinct_num,
  4  round(a.num_distinct / b.num_rows * 100, 2) selectivity,
  5  a.histogram,
  6  a.num_buckets
  7  from dba_tab_col_statistics a, dba_tables b
  8  where a.owner = b.owner
  9  and a.table_name = b.table_name
 10  and a.owner = 'SCOTT'
 11  and a.table_name = 'TEST';

COLUMN_NAME                      NUM_ROWS DISTINCT_NUM SELECTIVITY HISTOGRAM                                     NUM_BUCKETS
------------------------------ ---------- ------------ ----------- --------------------------------------------- -----------
NAME                                 9999         9999         100 NONE                                                    1
ID                                   9999            3         .03 NONE                                                    1

SQL>

##########

前面步骤都一样，但是如果导入数据之后，使用repeat的方式收集，直方图信息还在


SQL> set wrap off
SQL> set lin 200
SQL> set lin 200
SQL> col COLUMN_NAME for a30
SQL> select a.column_name,
  2  b.num_rows,
  3  a.num_distinct distinct_num,
  4  round(a.num_distinct / b.num_rows * 100, 2) selectivity,
  5  a.histogram,
  6  a.num_buckets
  7  from dba_tab_col_statistics a, dba_tables b
  8  where a.owner = b.owner
  9  and a.table_name = b.table_name
 10  and a.owner = 'SCOTT'
 11  and a.table_name = 'TEST';

COLUMN_NAME                      NUM_ROWS DISTINCT_NUM SELECTIVITY HISTOGRAM                                     NUM_BUCKETS
------------------------------ ---------- ------------ ----------- --------------------------------------------- -----------
NAME                                 9999         9999         100 NONE                                                    1
ID                                   9999            3         .03 FREQUENCY                                               3

SQL> exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'SCOTT',tabname=>'TEST',ESTIMATE_PERCENT=>10,method_opt=>'for all columns size repeat',cascade=>true,force=>true,degree=>8);

PL/SQL procedure successfully completed.

SQL> set lin 200
SQL> col COLUMN_NAME for a30
SQL> select a.column_name,
  2  b.num_rows,
  3  a.num_distinct distinct_num,
  4  round(a.num_distinct / b.num_rows * 100, 2) selectivity,
  5  a.histogram,
  6  a.num_buckets
  7  from dba_tab_col_statistics a, dba_tables b
  8  where a.owner = b.owner
  9  and a.table_name = b.table_name
 10  and a.owner = 'SCOTT'
 11  and a.table_name = 'TEST';

COLUMN_NAME                      NUM_ROWS DISTINCT_NUM SELECTIVITY HISTOGRAM                                     NUM_BUCKETS
------------------------------ ---------- ------------ ----------- --------------------------------------------- -----------
NAME                                10009        10009         100 NONE                                                    1
ID                                  10009            3         .03 FREQUENCY                                               3

SQL>

所以，使用expdp/impdp迁移数据库后，如果使用默认的方式收集统计信息，会导致列上面的直方图信息丢失，造成SQL执行计划和原库存在差异，SQL执行效率变低，随着数据库运行一段时间后， col_usage$表中记录的列越来越多，使用默认的方式(for all columns size auto)的方式也会逐渐把列的直方图收集。如果生产中SQL遇到了问题，需要手动收集统计信息（因为SQL已经运行过，where条件中用到的列已经记录到col_usage$中，所以auto的方式也会把where中的列收集直方图）。