Hive MetaStore 的ER 图如下。
部分表结构和说明。
CTLGS(CATALOGS)
catalogs 可以隔离元数据。默认只有1行。一个 CATALOG 可以有多个数据库。
mysql> DESC CTLGS;
+--------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------------+------+-----+---------+-------+
| CTLG_ID | bigint(20) | NO | PRI | NULL | |
| NAME | varchar(256) | YES | UNI | NULL | |
| DESC | varchar(4000) | YES | | NULL | |
| LOCATION_URI | varchar(4000) | NO | | NULL | |
+--------------+---------------+------+-----+---------+-------+
记录示例
mysql> SELECT * FROM CTLGS \G
*************************** 1. row ***************************
CTLG_ID: 1
NAME: hive
DESC: Default catalog for Hive
LOCATION_URI: bos://bmr-rd-wh/houzhizhen/warehouse
1 row in set (0.00 sec)
DBS(数据库表)
每个数据库一条记录。
+-----------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+-------+
| DB_ID | bigint(20) | NO | PRI | NULL | |
| DESC | varchar(4000) | YES | | NULL | |
| DB_LOCATION_URI | varchar(4000) | NO | | NULL | |
| NAME | varchar(128) | YES | MUL | NULL | |
| OWNER_NAME | varchar(128) | YES | | NULL | |
| OWNER_TYPE | varchar(10) | YES | | NULL | |
| CTLG_NAME | varchar(256) | NO | MUL | hive | |
+-----------------+---------------+------+-----+---------+-------+
- 记录示例
mysql> SELECT * FROM DBS WHERE NAME='tpcds_bos_orc_3' \G;
*************************** 1. row ***************************
DB_ID: 7
DESC: NULL
DB_LOCATION_URI: bos://bmr-rd-wh/houzhizhen/warehouse/tpcds_bos_orc_3.db
NAME: tpcds_bos_orc_3
OWNER_NAME: hive
OWNER_TYPE: USER
CTLG_NAME: hive
TBLS(表记录)
- 表结构
mysql> DESC TBLS;
+--------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| TBL_ID | bigint(20) | NO | PRI | NULL | |
| CREATE_TIME | int(11) | NO | | NULL | |
| DB_ID | bigint(20) | YES | MUL | NULL | |
| LAST_ACCESS_TIME | int(11) | NO | | NULL | |
| OWNER | varchar(767) | YES | | NULL | |
| OWNER_TYPE | varchar(10) | YES | | NULL | |
| RETENTION | int(11) | NO | | NULL | |
| SD_ID | bigint(20) | YES | MUL | NULL | |
| TBL_NAME | varchar(256) | YES | MUL | NULL | |
| TBL_TYPE | varchar(128) | YES | | NULL | |
| VIEW_EXPANDED_TEXT | mediumtext | YES | | NULL | |
| VIEW_ORIGINAL_TEXT | mediumtext | YES | | NULL | |
| IS_REWRITE_ENABLED | bit(1) | NO | | b'0' | |
+--------------------+--------------+------+-----+---------+-------+
- 记录示例
mysql> select * from TBLS WHERE DB_ID=7 AND TBL_NAME='store_sales' \G
*************************** 1. row ***************************
TBL_ID: 42
CREATE_TIME: 1688096043
DB_ID: 7
LAST_ACCESS_TIME: 0
OWNER: hive
OWNER_TYPE: USER
RETENTION: 0
SD_ID: 42
TBL_NAME: store_sales
TBL_TYPE: MANAGED_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
IS_REWRITE_ENABLED:
1 row in set (0.00 sec)
PARTITION_KEYS(表的分区字段)
mysql> SELECT * FROM PARTITION_KEYS WHERE TBL_ID=42 ORDER BY INTEGER_IDX;
+--------+--------------+-----------------+-----------+-------------+
| TBL_ID | PKEY_COMMENT | PKEY_NAME | PKEY_TYPE | INTEGER_IDX |
+--------+--------------+-----------------+-----------+-------------+
| 42 | NULL | ss_sold_date_sk | bigint | 0 |
+--------+--------------+-----------------+-----------+-------------+
PARTITIONS(分区信息)
+------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+-------+
| PART_ID | bigint(20) | NO | PRI | NULL | |
| CREATE_TIME | int(11) | NO | | NULL | |
| LAST_ACCESS_TIME | int(11) | NO | | NULL | |
| PART_NAME | varchar(767) | YES | MUL | NULL | |
| SD_ID | bigint(20) | YES | MUL | NULL | |
| TBL_ID | bigint(20) | YES | MUL | NULL | |
+------------------+--------------+------+-----+---------+-------+
- 记录示例
可以看到,同一张表,不同分区的SD_ID 不一样。表的 SD_ID 是42,这两个分区的SD_ID分别为 43 和 44。
mysql> select * from PARTITIONS where TBL_ID=42 limit 2 \G
*************************** 1. row ***************************
PART_ID: 1
CREATE_TIME: 0
LAST_ACCESS_TIME: 0
PART_NAME: ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
SD_ID: 43
TBL_ID: 42
*************************** 2. row ***************************
PART_ID: 2
CREATE_TIME: 0
LAST_ACCESS_TIME: 0
PART_NAME: ss_sold_date_sk=2452641
SD_ID: 44
TBL_ID: 42
PARTITION_KEY_VALS
mysql> DESC PARTITION_KEY_VALS;
+--------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| PART_ID | bigint(20) | NO | PRI | NULL | |
| PART_KEY_VAL | varchar(256) | YES | | NULL | |
| INTEGER_IDX | int(11) | NO | PRI | NULL | |
+--------------+--------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
- 记录示例
以下记录说明,分区的值是 2452641,由于分区 KEY 是 ss_sold_date_sk。
mysql> select * from PARTITION_KEY_VALS where PART_ID=2 order by INTEGER_IDX;
+---------+--------------+-------------+
| PART_ID | PART_KEY_VAL | INTEGER_IDX |
+---------+--------------+-------------+
| 2 | 2452641 | 0 |
+---------+--------------+-------------+
1 row in set (0.00 sec)
SDS(序列化信息)
mysql> DESC SDS;
+---------------------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------+---------------+------+-----+---------+-------+
| SD_ID | bigint(20) | NO | PRI | NULL | |
| CD_ID | bigint(20) | YES | MUL | NULL | |
| INPUT_FORMAT | varchar(4000) | YES | | NULL | |
| IS_COMPRESSED | bit(1) | NO | | NULL | |
| IS_STOREDASSUBDIRECTORIES | bit(1) | NO | | NULL | |
| LOCATION | varchar(4000) | YES | | NULL | |
| NUM_BUCKETS | int(11) | NO | | NULL | |
| OUTPUT_FORMAT | varchar(4000) | YES | | NULL | |
| SERDE_ID | bigint(20) | YES | MUL | NULL | |
+---------------------------+---------------+------+-----+---------+-------+
- 记录示例
只要表的字段不变,分区对应的 CD_ID 不变。
mysql> select * from SDS WHERE SD_ID in(42,43,44)\G
*************************** 1. row ***************************
SD_ID: 42
CD_ID: 42
INPUT_FORMAT: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
IS_COMPRESSED:
IS_STOREDASSUBDIRECTORIES:
LOCATION: bos://bmr-rd-wh/houzhizhen/warehouse/tpcds_bos_orc_3.db/store_sales
NUM_BUCKETS: -1
OUTPUT_FORMAT: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
SERDE_ID: 42
*************************** 2. row ***************************
SD_ID: 43
CD_ID: 42
INPUT_FORMAT: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
IS_COMPRESSED:
IS_STOREDASSUBDIRECTORIES:
LOCATION: bos://bmr-rd-wh/houzhizhen/warehouse/tpcds_bos_orc_3.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
NUM_BUCKETS: -1
OUTPUT_FORMAT: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
SERDE_ID: 43
*************************** 3. row ***************************
SD_ID: 44
CD_ID: 42
INPUT_FORMAT: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
IS_COMPRESSED:
IS_STOREDASSUBDIRECTORIES:
LOCATION: bos://bmr-rd-wh/houzhizhen/warehouse/tpcds_bos_orc_3.db/store_sales/ss_sold_date_sk=2452641
NUM_BUCKETS: -1
OUTPUT_FORMAT: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
SERDE_ID: 44
CDS
从 SDS 表的每条记录都有一个 CD_ID。cds 表非常简单,只有一个字段。
+-------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| CD_ID | bigint(20) | NO | PRI | NULL | |
+-------+------------+------+-----+---------+-------+
COLUMNS_V2
表的列信息
mysql> desc COLUMNS_V2;
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| CD_ID | bigint(20) | NO | PRI | NULL | |
| COMMENT | varchar(256) | YES | | NULL | |
| COLUMN_NAME | varchar(128) | NO | PRI | NULL | |
| TYPE_NAME | mediumtext | YES | | NULL | |
| INTEGER_IDX | int(11) | NO | | NULL | |
+-------------+--------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
mysql> select * from COLUMNS_V2 where CD_ID=42 ORDER BY INTEGER_IDX;
+-------+---------+-----------------------+--------------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-----------------------+--------------+-------------+
| 42 | NULL | ss_sold_time_sk | bigint | 0 |
| 42 | NULL | ss_item_sk | bigint | 1 |
| 42 | NULL | ss_customer_sk | bigint | 2 |
| 42 | NULL | ss_cdemo_sk | bigint | 3 |
| 42 | NULL | ss_hdemo_sk | bigint | 4 |
| 42 | NULL | ss_addr_sk | bigint | 5 |
| 42 | NULL | ss_store_sk | bigint | 6 |
| 42 | NULL | ss_promo_sk | bigint | 7 |
| 42 | NULL | ss_ticket_number | bigint | 8 |
| 42 | NULL | ss_quantity | int | 9 |
| 42 | NULL | ss_wholesale_cost | decimal(7,2) | 10 |
| 42 | NULL | ss_list_price | decimal(7,2) | 11 |
| 42 | NULL | ss_sales_price | decimal(7,2) | 12 |
| 42 | NULL | ss_ext_discount_amt | decimal(7,2) | 13 |
| 42 | NULL | ss_ext_sales_price | decimal(7,2) | 14 |
| 42 | NULL | ss_ext_wholesale_cost | decimal(7,2) | 15 |
| 42 | NULL | ss_ext_list_price | decimal(7,2) | 16 |
| 42 | NULL | ss_ext_tax | decimal(7,2) | 17 |
| 42 | NULL | ss_coupon_amt | decimal(7,2) | 18 |
| 42 | NULL | ss_net_paid | decimal(7,2) | 19 |
| 42 | NULL | ss_net_paid_inc_tax | decimal(7,2) | 20 |
| 42 | NULL | ss_net_profit | decimal(7,2) | 21 |
+-------+---------+-----------------------+--------------+-------------+