Hive日分区表如何快速导入到StarRocks

news2025/2/25 21:34:03

1、背景

业务现状:集团使用FineBI做数据呈现及报表分析工具,经过近两年的BI建设,供应链域及营销域的BI建设已初具规模并体系化。数仓规模60TB,FineBI数据集约8000个,BI挂出报表数约1600个,报表月增幅在40左右。

技术现状:数据加工链路:业务系统数据库->Hive数仓->PG导出库->FineBI抽取数据集->BI报表。该数据链路下,遇到了一些问题。本文不展开全部问题,仅讨论PG导出库数据规模过大,导致FineBI每日数据更新任务卡住问题。

问题描述:集团供应链的物料的帐龄表、库龄表按日分区,日增量100W,且存储了所有历史数据,该账龄、库龄数据在一年内单表接近160GB,FineBI每日早晨抽取数据的更新任务卡住(FineBI只配置了4个后台更新线程),导致大部分数据更新延迟。即使开放更多的后台数据更新线程(按CPU核数推荐经验值,过多会影响BI的报表查看性能),该数据的抽取对PG导出库也有压力。

解决思路:1)从报表的逻辑优化减少数据量;2)切换查询引擎,使用StarRocks替换PG,FineBI直连StarRocks,减少BI数据抽取。本文仅讨论切换查询引擎的方案。

2、最初实验

思路:

使用Broker Load导入Hive中的按日动态分区表数据导出到StarRocks中。StarRocks中的表也按日动态分区。

相关文档:

  • 参考Broker Load官方文档:Broker load研究导入配置。
  • 参考动态分区官方文档:动态分区研究StarRocks按日动态分区表建表语句。

问题:

使用Brocker Load方式遇到以下问题:

  • hdfs parquet文件有几年的分区数据,有的是月分区,有的是天分区
  • 一些分区的字段不同,有的有A BC字段,有的有ABD字段有的ABCD字段
  • StarRocks导入时,不允许StarRocks的字段比原表字段多

官方给出了以下脚本,创建一张支持动态分区的表,表名为 site_access,动态分区通过 PROPERTIES 进行配置。分区的区间为当前时间的前后 3 天,总共 6 天。

CREATE TABLE site_access(
event_day DATE,
site_id INT DEFAULT '10',
city_code VARCHAR(100),
user_name VARCHAR(32) DEFAULT '',
pv BIGINT DEFAULT '0'
)
DUPLICATE KEY(event_day, site_id, city_code, user_name)
PARTITION BY RANGE(event_day)(
PARTITION p20200321 VALUES LESS THAN ("2020-03-22"),
PARTITION p20200322 VALUES LESS THAN ("2020-03-23"),
PARTITION p20200323 VALUES LESS THAN ("2020-03-24"),
PARTITION p20200324 VALUES LESS THAN ("2020-03-25")
)
DISTRIBUTED BY HASH(event_day, site_id) BUCKETS 32
PROPERTIES(
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-3",
    "dynamic_partition.end" = "3",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.buckets" = "32"
);

如果是Hive使用经验的用户,第一次使用StarRocks,参考上面的SQL逻辑设计自己的表逻辑时,可能有疑问:PARTITION BY RANGE(event_day)(这里的内容改如何填写呢)?比如,以下需要迁移的Hive表,

 -- drop table if exists bda${db_para}.bda_inv_item_age_dtl;
create table if not exists bda${db_para}.bda_inv_item_age_dtl (
    stat_date             string comment '统计日期',
    entr_date             string comment '入库日期',
    item_id               string comment '物料id',
    itm_cd                string comment '物料编码',
    org_id                string comment '库存组织id',
    org_cd                string comment '库存组织编码',
    sub_invtr_cd          string comment '子库编码',
    bch_nbr               string comment '批次号',
    entr_qty              string comment '入库数量',
    total_qty             string comment '倒算总入库数',
    left_qty              string comment '剩余库存数',
    alloc_qty             string comment '分摊库存数',
    stock_age             string comment '库龄',
    item_cost             string comment 'pac成本',
    actual_cost           string comment '实际成本',
    item_business         string comment '最近三个月使用事业部',
    pch_big_ctg           string comment '采购品类大类',
    pch_med_ctg           string comment '采购品类中类',
    pch_sml_ctg           string comment '采购品类小类',
    ship_customer_name    string comment '订单收货客户',
    team_bu_name          string comment '战队',
    real_customer_name    string comment '真实客户',
    item_category         string comment '物料类别',
    prod_model_id         string comment '产品型号id',
    prod_model            string comment '产品型号',
    job_belong_bu         string comment '工单所属事业部',
    item_bu               string comment '近三个使用事业部',
    pac_cost              string comment 'pac成本单价',
    item_category_bg      string comment '物料大类')
comment '库龄'
partitioned by(part_dt string)
row format delimited fields terminated by '\036'
stored as parquet;

-- add by tjl 2022.03.10
-- alter table bda${db_para}.bda_inv_item_age_dtl add columns(dept_name string comment '工单所属部门') cascade;

-- 2022.04.18:xxx:新增:工单制单时间、制单人
alter table bda${db_para}.bda_inv_item_age_dtl add columns(wdj_creation_date string comment '工单制单时间') cascade;
alter table bda${db_para}.bda_inv_item_age_dtl add columns(created_by string comment '工单制单人') cascade;

请注意:这里面有表字段表更。Parquet格式的分区表使用Alter更新字段或新增字段,Parquet文件会如何变化?旧分区的历史数据要如何处理?

其hdfs文件如下:

drwxrwx--x+  - hive hive          0 2018-08-30 03:51 warehouse/bda.db/bda_inv_item_age_dtl/part_dt=2018-08-29
#此处省略一万行
drwxrwx--x+  - hive hive          0 2022-11-26 03:55 warehouse/bda.db/bda_inv_item_age_dtl/part_dt=2022-11-25
drwxrwx--x+  - hive hive          0 2022-11-27 03:52 warehouse/bda.db/bda_inv_item_age_dtl/part_dt=2022-11-26

讨论:

  • StarRocks有多种数据表模型,该场景应该选择什么数据表模型呢?
  • 应该如何建StarRocks的表呢?
  • 应该如何建Brocker load任务呢?

3、问题分析

  • Parquet格式的Hive表,执行add columns的工作流程及底层执行机制如何?

1、Hive元数据表中,在表末尾,追加一列。

2、对于Hive分区表,需要cascade修饰词,管理旧分区的元数据信息。

3、如果是使用Parquet、ORC等存储格式,文件中存储的内容没有重写,则不会变动。

4、Hive如何识别就分区表中没有新增列的?

参考:[Hive] Alter Table/Partition/Column - Huawei Enterprise Support Community

简单而言,如果不使用cascade修饰词,只是修改了Hive中bda_inv_item_age_dtl表元信息(表结构)。写入新分区的数据,hive能自动识别并正常显示。对于旧分区,如果包含了新列,即使旧分区使用了INSERT OVERWRITE重写数据,如果没有使用Drop 并 recreate分区,hive查询有不能正常工作。

参考:partitioning - how to add columns to existing hive partitioned table? - Stack Overflow

  • Parquet文件中的schema变化情况

查看最早分区2016-06-30的数据结构:

[cloud@dp-master001 ~]$ sudo hdfs dfs -get warehouse/bda.db/bda_inv_item_age_dtl/part_dt=2016-06-30/000000_0
[cloud@dp-master001 ~]$ /opt/cloudera/parcels/CDH/lib/parquet/bin/parquet-tools schema -d 000000_0 >> hive20160630.sql
[cloud@dp-master001 ~]$ vcat hive20160630.sql 
-bash: vcat: command not found
[cloud@dp-master001 ~]$ cat hive20160630.sql 
message hive_schema {
  optional binary stat_date (UTF8);
  optional binary entr_date (UTF8);
  optional binary item_id (UTF8);
  optional binary itm_cd (UTF8);
  optional binary org_id (UTF8);
  optional binary org_cd (UTF8);
  optional binary sub_invtr_cd (UTF8);
  optional binary bch_nbr (UTF8);
  optional binary entr_qty (UTF8);
  optional binary total_qty (UTF8);
  optional binary left_qty (UTF8);
  optional binary alloc_qty (UTF8);
  optional binary stock_age (UTF8);
  optional binary item_cost (UTF8);
  optional binary actual_cost (UTF8);
  optional binary item_business (UTF8);
  optional binary pch_big_ctg (UTF8);
  optional binary pch_med_ctg (UTF8);
  optional binary pch_sml_ctg (UTF8);
  optional binary ship_customer_name (UTF8);
  optional binary team_bu_name (UTF8);
  optional binary real_customer_name (UTF8);
  optional binary item_category (UTF8);
  optional binary prod_model_id (UTF8);
  optional binary prod_model (UTF8);
  optional binary job_belong_bu (UTF8);
  optional binary item_bu (UTF8);
  optional binary pac_cost (UTF8);
  optional binary item_category_bg (UTF8);
}
creator: parquet-mr version 1.5.0-cdh5.15.2 (build ${buildNumber})

file schema: hive_schema
----------------------------------------------------------------------------------------------------
stat_date: OPTIONAL BINARY O:UTF8 R:0 D:1
entr_date: OPTIONAL BINARY O:UTF8 R:0 D:1
item_id: OPTIONAL BINARY O:UTF8 R:0 D:1
itm_cd: OPTIONAL BINARY O:UTF8 R:0 D:1
org_id: OPTIONAL BINARY O:UTF8 R:0 D:1
org_cd: OPTIONAL BINARY O:UTF8 R:0 D:1
sub_invtr_cd: OPTIONAL BINARY O:UTF8 R:0 D:1
bch_nbr: OPTIONAL BINARY O:UTF8 R:0 D:1
entr_qty: OPTIONAL BINARY O:UTF8 R:0 D:1
total_qty: OPTIONAL BINARY O:UTF8 R:0 D:1
left_qty: OPTIONAL BINARY O:UTF8 R:0 D:1
alloc_qty: OPTIONAL BINARY O:UTF8 R:0 D:1
stock_age: OPTIONAL BINARY O:UTF8 R:0 D:1
item_cost: OPTIONAL BINARY O:UTF8 R:0 D:1
actual_cost: OPTIONAL BINARY O:UTF8 R:0 D:1
item_business: OPTIONAL BINARY O:UTF8 R:0 D:1
pch_big_ctg: OPTIONAL BINARY O:UTF8 R:0 D:1
pch_med_ctg: OPTIONAL BINARY O:UTF8 R:0 D:1
pch_sml_ctg: OPTIONAL BINARY O:UTF8 R:0 D:1
ship_customer_name: OPTIONAL BINARY O:UTF8 R:0 D:1
team_bu_name: OPTIONAL BINARY O:UTF8 R:0 D:1
real_customer_name: OPTIONAL BINARY O:UTF8 R:0 D:1
item_category: OPTIONAL BINARY O:UTF8 R:0 D:1
prod_model_id: OPTIONAL BINARY O:UTF8 R:0 D:1
prod_model: OPTIONAL BINARY O:UTF8 R:0 D:1
job_belong_bu: OPTIONAL BINARY O:UTF8 R:0 D:1
item_bu: OPTIONAL BINARY O:UTF8 R:0 D:1
pac_cost: OPTIONAL BINARY O:UTF8 R:0 D:1
item_category_bg: OPTIONAL BINARY O:UTF8 R:0 D:1

row group 1: RC:190899 TS:8780200
----------------------------------------------------------------------------------------------------
stat_date:  BINARY UNCOMPRESSED DO:0 FPO:4 SZ:204/204/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
entr_date:  BINARY UNCOMPRESSED DO:0 FPO:208 SZ:249018/249018/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
item_id:  BINARY UNCOMPRESSED DO:0 FPO:249226 SZ:245992/245992/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
itm_cd:  BINARY UNCOMPRESSED DO:0 FPO:495218 SZ:409890/409890/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
org_id:  BINARY UNCOMPRESSED DO:0 FPO:905108 SZ:3641/3641/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
org_cd:  BINARY UNCOMPRESSED DO:0 FPO:908749 SZ:3637/3637/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
sub_invtr_cd:  BINARY UNCOMPRESSED DO:0 FPO:912386 SZ:187698/187698/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
bch_nbr:  BINARY UNCOMPRESSED DO:0 FPO:1100084 SZ:286227/286227/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
entr_qty:  BINARY UNCOMPRESSED DO:0 FPO:1386311 SZ:495330/495330/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
total_qty:  BINARY UNCOMPRESSED DO:0 FPO:1881641 SZ:59/59/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
left_qty:  BINARY UNCOMPRESSED DO:0 FPO:1881700 SZ:486445/486445/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
alloc_qty:  BINARY UNCOMPRESSED DO:0 FPO:2368145 SZ:536856/536856/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
stock_age:  BINARY UNCOMPRESSED DO:0 FPO:2905001 SZ:243903/243903/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
item_cost:  BINARY UNCOMPRESSED DO:0 FPO:3148904 SZ:2322210/2322210/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,PLAIN,RLE
actual_cost:  BINARY UNCOMPRESSED DO:0 FPO:5471114 SZ:2922009/2922009/1.00 VC:190899 ENC:BIT_PACKED,PLAIN,RLE
item_business:  BINARY UNCOMPRESSED DO:0 FPO:8393123 SZ:1817/1817/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
pch_big_ctg:  BINARY UNCOMPRESSED DO:0 FPO:8394940 SZ:4757/4757/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
pch_med_ctg:  BINARY UNCOMPRESSED DO:0 FPO:8399697 SZ:13669/13669/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
pch_sml_ctg:  BINARY UNCOMPRESSED DO:0 FPO:8413366 SZ:27595/27595/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
ship_customer_name:  BINARY UNCOMPRESSED DO:0 FPO:8440961 SZ:26641/26641/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
team_bu_name:  BINARY UNCOMPRESSED DO:0 FPO:8467602 SZ:12107/12107/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
real_customer_name:  BINARY UNCOMPRESSED DO:0 FPO:8479709 SZ:33/33/1.00 VC:190899 ENC:BIT_PACKED,PLAIN,RLE
item_category:  BINARY UNCOMPRESSED DO:0 FPO:8479742 SZ:399/399/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
prod_model_id:  BINARY UNCOMPRESSED DO:0 FPO:8480141 SZ:36922/36922/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
prod_model:  BINARY UNCOMPRESSED DO:0 FPO:8517063 SZ:33472/33472/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
job_belong_bu:  BINARY UNCOMPRESSED DO:0 FPO:8550535 SZ:3769/3769/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
item_bu:  BINARY UNCOMPRESSED DO:0 FPO:8554304 SZ:33/33/1.00 VC:190899 ENC:BIT_PACKED,PLAIN,RLE
pac_cost:  BINARY UNCOMPRESSED DO:0 FPO:8554337 SZ:225484/225484/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE
item_category_bg:  BINARY UNCOMPRESSED DO:0 FPO:8779821 SZ:383/383/1.00 VC:190899 ENC:PLAIN_DICTIONARY,BIT_PACKED,RLE

查看最新分区2022-11-26,Parquet文件的数据结构

[cloud@dp-master001 ~]$ sudo hdfs dfs -get warehouse/bda.db/bda_inv_item_age_dtl/part_dt=2022-11-24/000000_0
[cloud@dp-master001 ~]$ /opt/cloudera/parcels/CDH/lib/parquet/bin/parquet-tools schema -d 000000_0 >> hive.sql
[cloud@dp-master001 ~]$ cat hive.sql 
message hive_schema {
  optional binary stat_date (UTF8);
  optional binary entr_date (UTF8);
  optional binary item_id (UTF8);
  optional binary itm_cd (UTF8);
  optional binary org_id (UTF8);
  optional binary org_cd (UTF8);
  optional binary sub_invtr_cd (UTF8);
  optional binary bch_nbr (UTF8);
  optional binary entr_qty (UTF8);
  optional binary total_qty (UTF8);
  optional binary left_qty (UTF8);
  optional binary alloc_qty (UTF8);
  optional binary stock_age (UTF8);
  optional binary item_cost (UTF8);
  optional binary actual_cost (UTF8);
  optional binary item_business (UTF8);
  optional binary pch_big_ctg (UTF8);
  optional binary pch_med_ctg (UTF8);
  optional binary pch_sml_ctg (UTF8);
  optional binary ship_customer_name (UTF8);
  optional binary team_bu_name (UTF8);
  optional binary real_customer_name (UTF8);
  optional binary item_category (UTF8);
  optional binary prod_model_id (UTF8);
  optional binary prod_model (UTF8);
  optional binary job_belong_bu (UTF8);
  optional binary item_bu (UTF8);
  optional binary pac_cost (UTF8);
  optional binary item_category_bg (UTF8);
  optional binary dept_name (UTF8);
  optional binary wdj_creation_date (UTF8);
  optional binary created_by (UTF8);
}

creator: parquet-mr version 1.5.0-cdh5.15.2 (build ${buildNumber})

file schema: hive_schema
----------------------------------------------------------------------------------------------------
stat_date: OPTIONAL BINARY O:UTF8 R:0 D:1
entr_date: OPTIONAL BINARY O:UTF8 R:0 D:1
item_id: OPTIONAL BINARY O:UTF8 R:0 D:1
itm_cd: OPTIONAL BINARY O:UTF8 R:0 D:1
org_id: OPTIONAL BINARY O:UTF8 R:0 D:1
org_cd: OPTIONAL BINARY O:UTF8 R:0 D:1
sub_invtr_cd: OPTIONAL BINARY O:UTF8 R:0 D:1
bch_nbr: OPTIONAL BINARY O:UTF8 R:0 D:1
entr_qty: OPTIONAL BINARY O:UTF8 R:0 D:1
total_qty: OPTIONAL BINARY O:UTF8 R:0 D:1
left_qty: OPTIONAL BINARY O:UTF8 R:0 D:1
alloc_qty: OPTIONAL BINARY O:UTF8 R:0 D:1
stock_age: OPTIONAL BINARY O:UTF8 R:0 D:1
item_cost: OPTIONAL BINARY O:UTF8 R:0 D:1
actual_cost: OPTIONAL BINARY O:UTF8 R:0 D:1
item_business: OPTIONAL BINARY O:UTF8 R:0 D:1
pch_big_ctg: OPTIONAL BINARY O:UTF8 R:0 D:1
pch_med_ctg: OPTIONAL BINARY O:UTF8 R:0 D:1
pch_sml_ctg: OPTIONAL BINARY O:UTF8 R:0 D:1
ship_customer_name: OPTIONAL BINARY O:UTF8 R:0 D:1
team_bu_name: OPTIONAL BINARY O:UTF8 R:0 D:1
real_customer_name: OPTIONAL BINARY O:UTF8 R:0 D:1
item_category: OPTIONAL BINARY O:UTF8 R:0 D:1
prod_model_id: OPTIONAL BINARY O:UTF8 R:0 D:1
prod_model: OPTIONAL BINARY O:UTF8 R:0 D:1
job_belong_bu: OPTIONAL BINARY O:UTF8 R:0 D:1
item_bu: OPTIONAL BINARY O:UTF8 R:0 D:1
pac_cost: OPTIONAL BINARY O:UTF8 R:0 D:1
item_category_bg: OPTIONAL BINARY O:UTF8 R:0 D:1
dept_name: OPTIONAL BINARY O:UTF8 R:0 D:1
wdj_creation_date: OPTIONAL BINARY O:UTF8 R:0 D:1
created_by: OPTIONAL BINARY O:UTF8 R:0 D:1

row group 1: RC:428480 TS:21535478
----------------------------------------------------------------------------------------------------
stat_date:  BINARY UNCOMPRESSED DO:0 FPO:4 SZ:381/381/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
entr_date:  BINARY UNCOMPRESSED DO:0 FPO:385 SZ:608962/608962/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
item_id:  BINARY UNCOMPRESSED DO:0 FPO:609347 SZ:774164/774164/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
itm_cd:  BINARY UNCOMPRESSED DO:0 FPO:1383511 SZ:1185783/1185783/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
org_id:  BINARY UNCOMPRESSED DO:0 FPO:2569294 SZ:49829/49829/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
org_cd:  BINARY UNCOMPRESSED DO:0 FPO:2619123 SZ:49810/49810/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
sub_invtr_cd:  BINARY UNCOMPRESSED DO:0 FPO:2668933 SZ:539203/539203/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
bch_nbr:  BINARY UNCOMPRESSED DO:0 FPO:3208136 SZ:582198/582198/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
entr_qty:  BINARY UNCOMPRESSED DO:0 FPO:3790334 SZ:1028459/1028459/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
total_qty:  BINARY UNCOMPRESSED DO:0 FPO:4818793 SZ:141/141/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
left_qty:  BINARY UNCOMPRESSED DO:0 FPO:4818934 SZ:1074391/1074391/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
alloc_qty:  BINARY UNCOMPRESSED DO:0 FPO:5893325 SZ:1063130/1063130/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
stock_age:  BINARY UNCOMPRESSED DO:0 FPO:6956455 SZ:599509/599509/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
item_cost:  BINARY UNCOMPRESSED DO:0 FPO:7555964 SZ:5625018/5625018/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,PLAIN,RLE
actual_cost:  BINARY UNCOMPRESSED DO:0 FPO:13180982 SZ:6410120/6410120/1.00 VC:428480 ENC:BIT_PACKED,PLAIN,RLE
item_business:  BINARY UNCOMPRESSED DO:0 FPO:19591102 SZ:114122/114122/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
pch_big_ctg:  BINARY UNCOMPRESSED DO:0 FPO:19705224 SZ:33/33/1.00 VC:428480 ENC:BIT_PACKED,PLAIN,RLE
pch_med_ctg:  BINARY UNCOMPRESSED DO:0 FPO:19705257 SZ:33/33/1.00 VC:428480 ENC:BIT_PACKED,PLAIN,RLE
pch_sml_ctg:  BINARY UNCOMPRESSED DO:0 FPO:19705290 SZ:33/33/1.00 VC:428480 ENC:BIT_PACKED,PLAIN,RLE
ship_customer_name:  BINARY UNCOMPRESSED DO:0 FPO:19705323 SZ:37444/37444/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
team_bu_name:  BINARY UNCOMPRESSED DO:0 FPO:19742767 SZ:17370/17370/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
real_customer_name:  BINARY UNCOMPRESSED DO:0 FPO:19760137 SZ:18033/18033/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
item_category:  BINARY UNCOMPRESSED DO:0 FPO:19778170 SZ:917/917/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
prod_model_id:  BINARY UNCOMPRESSED DO:0 FPO:19779087 SZ:148421/148421/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
prod_model:  BINARY UNCOMPRESSED DO:0 FPO:19927508 SZ:113394/113394/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
job_belong_bu:  BINARY UNCOMPRESSED DO:0 FPO:20040902 SZ:16902/16902/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
item_bu:  BINARY UNCOMPRESSED DO:0 FPO:20057804 SZ:91023/91023/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
pac_cost:  BINARY UNCOMPRESSED DO:0 FPO:20148827 SZ:791340/791340/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
item_category_bg:  BINARY UNCOMPRESSED DO:0 FPO:20940167 SZ:673/673/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
dept_name:  BINARY UNCOMPRESSED DO:0 FPO:20940840 SZ:7675/7675/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
wdj_creation_date:  BINARY UNCOMPRESSED DO:0 FPO:20948515 SZ:574110/574110/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE
created_by:  BINARY UNCOMPRESSED DO:0 FPO:21522625 SZ:12857/12857/1.00 VC:428480 ENC:BIT_PACKED,PLAIN_DICTIONARY,RLE

4、方案验证

4.1、StarRocks建表

  • 选择什么样的StarRocks表模型?

根据BI报表的分析逻辑,主要用于监控日/月维度库龄、帐龄的趋势(影响物料/产成品跌价计提),主要是明细表做分析。故选择StarRocks明细表模型。期望按照Hive的按日分区建表。考虑到Brocker load只支持忽略源表中的字段,不支持目标表多的字段忽略/或者填充默认值的机制。根据探源发现,最早分区2016-06-30与最新分区2022-11-26字段不同(DDL中显示,2022.03.10,2022.04.18增加了三个字段。)可以考虑,03.10前,建立一个StarRocks表,之后建立一个StarRocks表。数据同步到StarRocks后,做数据汇聚。为了简单0310前的存储到一个分区表。

  • 如何建StarRocks的表

0310前一个分区表:

drop table  if  exists bda.bda_inv_item_age_dtl0310;
create table if not exists bda.bda_inv_item_age_dtl0310(
    stat_date date,
	entr_date date,
	item_id varchar(10),
	itm_cd varchar(500),
    org_id varchar(5),
    org_cd varchar(500),
    sub_invtr_cd varchar(20),
    bch_nbr varchar(50),
	entr_qty decimal,
	total_qty decimal,
	left_qty decimal,
	alloc_qty decimal,
	stock_age decimal,
	item_cost decimal,
	actual_cost decimal,
	item_business varchar(500),
	pch_big_ctg varchar(500),
	pch_med_ctg varchar(500),
	pch_sml_ctg varchar(500),
	ship_customer_name varchar(500),
	team_bu_name varchar(500),
	real_customer_name varchar(500),
	item_category varchar(500),
	prod_model_id varchar(500),
	prod_model varchar(500),
	job_belong_bu varchar(500),
	item_bu varchar(500),
	pac_cost decimal,
	item_category_bg varchar(500)
)
DUPLICATE KEY(stat_date,entr_date,item_id)
DISTRIBUTED BY HASH(entr_date, item_id) BUCKETS 32;

0310后动态分区

create table if not exists bda.bda_inv_item_age_dtl_part(
	stat_date date,
    entr_date date,
    item_id varchar(10),
    itm_cd varchar(500),
    org_id varchar(5),
	org_cd varchar(500),
    sub_invtr_cd varchar(20),
    bch_nbr varchar(50),
	entr_qty varchar(500),
	total_qty varchar(500),
	left_qty varchar(500),
	alloc_qty varchar(500),
	stock_age varchar(500),
	item_cost varchar(500),
    actual_cost varchar(500),
	item_business varchar(500),
	pch_big_ctg varchar(500),
	pch_med_ctg varchar(500),
	pch_sml_ctg varchar(500),
	ship_customer_name varchar(500),
	team_bu_name varchar(500),
	real_customer_name varchar(500),
	item_category varchar(500),
	prod_model_id varchar(500),
	prod_model varchar(500),
	job_belong_bu varchar(500),
	item_bu varchar(500),
	pac_cost varchar(500),
	item_category_bg varchar(500),
	dept_name varchar(300),
	wdj_creation_date varchar(200),
	created_by varchar(200)
)
DUPLICATE KEY(stat_date,entr_date,item_id)
PARTITION BY RANGE(stat_date)()
DISTRIBUTED BY HASH(entr_date, item_id) BUCKETS 32
PROPERTIES(
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.end" = "1",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.buckets" = "32"
);

4.1、Brocker 同步任务

  • 如何建Brocker load任务

参考文档:FileSystem (Apache Hadoop Main 3.3.4 API)

0310之前分区的抽取任务

LOAD LABEL bda.bda_inv_item_age_dtl0310_before
(
DATA INFILE("hdfs://10.21.25.161:8020/user/hive/warehouse/bda.db/bda_inv_item_age_dtl/part_dt=[2016-2021]-*/*")
INTO TABLE `bda_inv_item_age_dtl_part`
COLUMNS TERMINATED BY "\036"
FORMAT AS "parquet"
where entr_date is not null 
)
WITH BROKER broker198
(
"hadoop.security.authentication"="kerberos",
"kerberos_principal"="hdfs",
"kerberos_keytab"="/opt/StarRocks/kerberos/hdfs.keytab"
);

执行发现,依旧报错,字段缺少,通过二分法,逐一排查hdfs分区文件,发现2022-03-10之前的分区,有的是按月分区,有的是按日分区,一些分区的字段各不相同,例如2018年分区情况、2022-02分区情况:

[cloud@dp-master001 ~]$ /opt/cloudera/parcels/CDH/lib/parquet/bin/parquet-tools schema -d 000000_0_2018_12_02 >> hive20181202.sql
[cloud@dp-master001 ~]$ cat hive20181202.sql 
message hive_schema {
  optional binary stat_date (UTF8);
  optional binary entr_date (UTF8);
  optional binary item_id (UTF8);
  optional binary itm_cd (UTF8);
  optional binary org_id (UTF8);
  optional binary org_cd (UTF8);
  optional binary sub_invtr_cd (UTF8);
  optional binary bch_nbr (UTF8);
  optional binary entr_qty (UTF8);
  optional binary total_qty (UTF8);
  optional binary left_qty (UTF8);
  optional binary alloc_qty (UTF8);
  optional binary stock_age (UTF8);
  optional binary item_cost (UTF8);
  optional binary actual_cost (UTF8);
  optional binary item_business (UTF8);
}

creator: parquet-mr version 1.5.0-cdh5.15.2 (build ${buildNumber})

file schema: hive_schema
----------------------------------------------------------------------------------------------------
stat_date: OPTIONAL BINARY O:UTF8 R:0 D:1
entr_date: OPTIONAL BINARY O:UTF8 R:0 D:1
item_id: OPTIONAL BINARY O:UTF8 R:0 D:1
itm_cd: OPTIONAL BINARY O:UTF8 R:0 D:1
org_id: OPTIONAL BINARY O:UTF8 R:0 D:1
org_cd: OPTIONAL BINARY O:UTF8 R:0 D:1
sub_invtr_cd: OPTIONAL BINARY O:UTF8 R:0 D:1
bch_nbr: OPTIONAL BINARY O:UTF8 R:0 D:1
entr_qty: OPTIONAL BINARY O:UTF8 R:0 D:1
total_qty: OPTIONAL BINARY O:UTF8 R:0 D:1
left_qty: OPTIONAL BINARY O:UTF8 R:0 D:1
alloc_qty: OPTIONAL BINARY O:UTF8 R:0 D:1
stock_age: OPTIONAL BINARY O:UTF8 R:0 D:1
item_cost: OPTIONAL BINARY O:UTF8 R:0 D:1
actual_cost: OPTIONAL BINARY O:UTF8 R:0 D:1
item_business: OPTIONAL BINARY O:UTF8 R:0 D:1

row group 1: RC:244224 TS:12749582
----------------------------------------------------------------------------------------------------
stat_date:  BINARY UNCOMPRESSED DO:0 FPO:4 SZ:263/263/1.00 VC:244224 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
entr_date:  BINARY UNCOMPRESSED DO:0 FPO:267 SZ:354158/354158/1.00 VC:244224 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
item_id:  BINARY UNCOMPRESSED DO:0 FPO:354425 SZ:746753/746753/1.00 VC:244224 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
itm_cd:  BINARY UNCOMPRESSED DO:0 FPO:1101178 SZ:1059151/1059151/1.00 VC:244224 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
org_id:  BINARY UNCOMPRESSED DO:0 FPO:2160329 SZ:52235/52235/1.00 VC:244224 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
org_cd:  BINARY UNCOMPRESSED DO:0 FPO:2212564 SZ:52222/52222/1.00 VC:244224 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
sub_invtr_cd:  BINARY UNCOMPRESSED DO:0 FPO:2264786 SZ:279733/279733/1.00 VC:244224 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
bch_nbr:  BINARY UNCOMPRESSED DO:0 FPO:2544519 SZ:633935/633935/1.00 VC:244224 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
entr_qty:  BINARY UNCOMPRESSED DO:0 FPO:3178454 SZ:690868/690868/1.00 VC:244224 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
total_qty:  BINARY UNCOMPRESSED DO:0 FPO:3869322 SZ:100/100/1.00 VC:244224 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
left_qty:  BINARY UNCOMPRESSED DO:0 FPO:3869422 SZ:686921/686921/1.00 VC:244224 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
alloc_qty:  BINARY UNCOMPRESSED DO:0 FPO:4556343 SZ:747099/747099/1.00 VC:244224 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
stock_age:  BINARY UNCOMPRESSED DO:0 FPO:5303442 SZ:345381/345381/1.00 VC:244224 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
item_cost:  BINARY UNCOMPRESSED DO:0 FPO:5648823 SZ:3316772/3316772/1.00 VC:244224 ENC:RLE,PLAIN,BIT_PACKED
actual_cost:  BINARY UNCOMPRESSED DO:0 FPO:8965595 SZ:3783958/3783958/1.00 VC:244224 ENC:RLE,PLAIN,BIT_PACKED
item_business:  BINARY UNCOMPRESSED DO:0 FPO:12749553 SZ:33/33/1.00 VC:244224 ENC:RLE,PLAIN,BIT_PACKED
[cloud@dp-master001 ~]$ sudo hdfs dfs -get warehouse/bda.db/bda_inv_item_age_dtl/part_dt=2018-12-15/000000_0
get: `warehouse/bda.db/bda_inv_item_age_dtl/part_dt=2018-12-15/000000_0': No such file or directory
[cloud@dp-master001 ~]$ sudo hdfs dfs -get warehouse/bda.db/bda_inv_item_age_dtl/part_dt=2020-02-26/000000_0
[cloud@dp-master001 ~]$ mv 000000_0 000000_0_2020_02_26
[cloud@dp-master001 ~]$ /opt/cloudera/parcels/CDH/lib/parquet/bin/parquet-tools schema -d 000000_0_2020_02_26 >> hive20200226.sql
[cloud@dp-master001 ~]$ cat hive20200226.sql 
message hive_schema {
  optional binary stat_date (UTF8);
  optional binary entr_date (UTF8);
  optional binary item_id (UTF8);
  optional binary itm_cd (UTF8);
  optional binary org_id (UTF8);
  optional binary org_cd (UTF8);
  optional binary sub_invtr_cd (UTF8);
  optional binary bch_nbr (UTF8);
  optional binary entr_qty (UTF8);
  optional binary total_qty (UTF8);
  optional binary left_qty (UTF8);
  optional binary alloc_qty (UTF8);
  optional binary stock_age (UTF8);
  optional binary item_cost (UTF8);
  optional binary actual_cost (UTF8);
  optional binary item_business (UTF8);
  optional binary pch_big_ctg (UTF8);
  optional binary pch_med_ctg (UTF8);
  optional binary pch_sml_ctg (UTF8);
}

creator: parquet-mr version 1.5.0-cdh5.15.2 (build ${buildNumber})

file schema: hive_schema
----------------------------------------------------------------------------------------------------
stat_date: OPTIONAL BINARY O:UTF8 R:0 D:1
entr_date: OPTIONAL BINARY O:UTF8 R:0 D:1
item_id: OPTIONAL BINARY O:UTF8 R:0 D:1
itm_cd: OPTIONAL BINARY O:UTF8 R:0 D:1
org_id: OPTIONAL BINARY O:UTF8 R:0 D:1
org_cd: OPTIONAL BINARY O:UTF8 R:0 D:1
sub_invtr_cd: OPTIONAL BINARY O:UTF8 R:0 D:1
bch_nbr: OPTIONAL BINARY O:UTF8 R:0 D:1
entr_qty: OPTIONAL BINARY O:UTF8 R:0 D:1
total_qty: OPTIONAL BINARY O:UTF8 R:0 D:1
left_qty: OPTIONAL BINARY O:UTF8 R:0 D:1
alloc_qty: OPTIONAL BINARY O:UTF8 R:0 D:1
stock_age: OPTIONAL BINARY O:UTF8 R:0 D:1
item_cost: OPTIONAL BINARY O:UTF8 R:0 D:1
actual_cost: OPTIONAL BINARY O:UTF8 R:0 D:1
item_business: OPTIONAL BINARY O:UTF8 R:0 D:1
pch_big_ctg: OPTIONAL BINARY O:UTF8 R:0 D:1
pch_med_ctg: OPTIONAL BINARY O:UTF8 R:0 D:1
pch_sml_ctg: OPTIONAL BINARY O:UTF8 R:0 D:1

row group 1: RC:242410 TS:9017086
----------------------------------------------------------------------------------------------------
stat_date:  BINARY UNCOMPRESSED DO:0 FPO:4 SZ:263/263/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
entr_date:  BINARY UNCOMPRESSED DO:0 FPO:267 SZ:317433/317433/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
item_id:  BINARY UNCOMPRESSED DO:0 FPO:317700 SZ:562585/562585/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
itm_cd:  BINARY UNCOMPRESSED DO:0 FPO:880285 SZ:900712/900712/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
org_id:  BINARY UNCOMPRESSED DO:0 FPO:1780997 SZ:42974/42974/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
org_cd:  BINARY UNCOMPRESSED DO:0 FPO:1823971 SZ:42965/42965/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
sub_invtr_cd:  BINARY UNCOMPRESSED DO:0 FPO:1866936 SZ:294784/294784/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
bch_nbr:  BINARY UNCOMPRESSED DO:0 FPO:2161720 SZ:437366/437366/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
entr_qty:  BINARY UNCOMPRESSED DO:0 FPO:2599086 SZ:691103/691103/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
total_qty:  BINARY UNCOMPRESSED DO:0 FPO:3290189 SZ:100/100/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
left_qty:  BINARY UNCOMPRESSED DO:0 FPO:3290289 SZ:707584/707584/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
alloc_qty:  BINARY UNCOMPRESSED DO:0 FPO:3997873 SZ:754179/754179/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
stock_age:  BINARY UNCOMPRESSED DO:0 FPO:4752052 SZ:310402/310402/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
item_cost:  BINARY UNCOMPRESSED DO:0 FPO:5062454 SZ:33/33/1.00 VC:242410 ENC:RLE,PLAIN,BIT_PACKED
actual_cost:  BINARY UNCOMPRESSED DO:0 FPO:5062487 SZ:3631076/3631076/1.00 VC:242410 ENC:RLE,PLAIN,BIT_PACKED
item_business:  BINARY UNCOMPRESSED DO:0 FPO:8693563 SZ:67646/67646/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
pch_big_ctg:  BINARY UNCOMPRESSED DO:0 FPO:8761209 SZ:63343/63343/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
pch_med_ctg:  BINARY UNCOMPRESSED DO:0 FPO:8824552 SZ:83151/83151/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
pch_sml_ctg:  BINARY UNCOMPRESSED DO:0 FPO:8907703 SZ:109387/109387/1.00 VC:242410 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
[cloud@dp-master001 ~]$ 

这就导致了同步的数据导入的难度,需要检查每个分区的schema。为什么会出现这种情况呢?出现了这种情况如何快速同步呢?

0310以后的分区抽取任务

LOAD LABEL bda.bda_inv_item_age_dtl0310_after
(
DATA INFILE("hdfs://10.21.25.161:8020/user/hive/warehouse/bda.db/bda_inv_item_age_dtl/part_dt=2022-[03-12]-*/*")
INTO TABLE `bda_inv_item_age_dtl_part`
COLUMNS TERMINATED BY "\036"
FORMAT AS "parquet"
where entr_date is not null 
)
WITH BROKER broker198
(
"hadoop.security.authentication"="kerberos",
"kerberos_principal"="hdfs",
"kerberos_keytab"="/opt/StarRocks/kerberos/hdfs.keytab"
);

最终方案思考:

方案1:

  修改StarRocks源码,支持StarRocks目标表字段多于原表字段时,填充默认值。

方案2:

  • 使用python 读取hdfs 分区的schema,根据schema建立StarRocks分区临时表
  • 各分区导入成功后,自动合并成StarRocks目标表

方案3:

  • 使用DataX同步Hive表,每个分区配置一个同步作业。有点不用考虑表字段问题。但是分区较多时,配置同步任务也比较麻烦。可以使用脚本自动生成。如果数据量较大,同步效率也不如Brocks Load方式。

 

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/39576.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

非凡社群管理之社群管理有什么内容

社群作为一个非常重要的私域流量池,它本身就是一个提升用户价值的利器。但如果管理不好社群,那么也是无济于事的。 社群小助手提示:高效管理社群,以下这五个方面要做好。 一,社群为用户解决问题,让群成员都…

iwebsec靶场 SQL注入漏洞通关笔记6- 宽字节注入

系列文章目录 iwebsec靶场 SQL注入漏洞通关笔记1- 数字型注入_mooyuan的博客-CSDN博客 iwebsec靶场 SQL注入漏洞通关笔记2- 字符型注入(宽字节注入)_mooyuan的博客-CSDN博客 iwebsec靶场 SQL注入漏洞通关笔记3- bool注入(布尔型盲注&#…

【ML特征工程】第 7 章 :通过K-Means 模型堆叠进行非线性特征化

🔎大家好,我是Sonhhxg_柒,希望你看完之后,能对你有所帮助,不足请指正!共同学习交流🔎 📝个人主页-Sonhhxg_柒的博客_CSDN博客 📃 🎁欢迎各位→点赞…

计算机视觉之目标检测(object detection)《1》

在计算机视觉领域,除了识别图像并分类之外,我们很多时候想关注图像里面一些感兴趣的目标,比如视频监控中寻找一个或多个嫌疑犯;无人驾驶需要识别车辆、行人、红绿灯、路障等等,都是需要去及时掌握画面中的不同目标。我…

古瑞瓦特能源通过聆讯:上半年营收23亿 IDG投资9亿持股6.5%

雷递网 雷建平 11月26日古瑞瓦特能源(简称:“Growatt Technology”)日前递交招股书,准备在香港上市。上半年营收23.45亿古瑞瓦特能源于2011年在深圳成立,是一家分布式能源解决方案提供商,专注于可持续能源发…

Crack:GoJS 2.2.18 -2022-09-08 update

使用 JavaScript 和 TypeScript 为 Web 构建图表 流程图 构建交互式流程图或流程图。让您的用户使用 JSON 模型输出构建、修改和保存图表。状态图 可视化状态图和其他行为图。创建具有实时更新的图表以监控状态,或创建交互式图表以进行规划。桑基图 GoJS 允许对链接…

史上最全MATLAB误差分析工具箱,不看别后悔 【矢量化代码、效率嘎嘎快、支持计算50种指标】

在拟合、插值、模拟预测等计算中,往往需要通过不同指标参数来分析实际值与计算值之间差异依次衡量相关方法的可行性。常用的表征指标有残差平方和(SSE)、均方差(MSE)、均方根差(RMSE)、平均绝对误差(MAE)和决定系数R方(R-Squared)等等。 考虑到误差分析在实际应用中…

Kafka部署安装及简单使用

一、环境准备 1、jdk 8 2、zookeeper 3、kafka 说明:在kafka较新版本中已经集成了zookeeper,所以不用单独安装zookeeper,只需要在kafka文件目录中启动zookeeper即可 二、下载地址 Apache Kafka 三、部署 1、启动zookeeper -- 启动 .…

CSDN第11次竞赛题解与总结

CSDN第11次竞赛题解与总结前言建议题解T1圆小艺扩展完整代码T2K皇把妹完整代码T3筛选宝物完整代码T4圆桌完整代码总结前言 2022/11/27 CSDN第11次竞赛 由「壹合原码 & CSDN」联合主办 本次奖励还是不错的 (毕竟有赞助商),前三十名都有奖励,连以前第…

跑步10年回望

回顾跑步这10年有点遗憾,最终还是决定放弃参加2022年厦马,因为要求更早到厦门,也担心回福州后影响小朋友上课,权衡之下还是决定申请退赛。本想在这次活动上实现全马破4的目标,却只能晒个退赛截图。。。今年是厦马20年&…

【敲级实用】:某小伙写了一个的办公脚本后~变精神了~

文章目录📯小哔哔✏️注册有道智云✏️咋滴调用?✏️使用前的小操作✏️源代码专栏Python零基础入门篇🔥Python网络蜘蛛🔥Python数据分析Django基础入门宝典🔥小玩意儿🔥Web前端学习tkinter学习笔记Excel自…

基于储能电站服务的冷热电多微网系统双层优化配置(Matlab代码实现)

👨‍🎓个人主页:研学社的博客 💥💥💞💞欢迎来到本博客❤️❤️💥💥 🏆博主优势:🌞🌞🌞博客内容尽量做到思维缜…

ETCD快速入门-01 ETCD概述

1.ETCD概述 1.1 ETCD概述 etcd是一个高可用的分布式的键值对存储系统,常用做配置共享和服务发现。由CoreOS公司发起的一个开源项目,受到ZooKeeper与doozer启发而催生的项目,名称etcd源自两个想法,即Linux的/etc文件夹和d分布式系…

一篇快速搞懂python模块、包和库

个人主页:天寒雨落的博客_CSDN博客-初学者入门C语言,python,数据库领域博主 💬 热门专栏:python_天寒雨落的博客-CSDN博客 ​每日赠语:没有窘迫的失败,就不会有自豪的成功;失败不可怕,只要能从失…

用DIV+CSS技术设计的凤阳旅游网站(web前端网页制作课作业)HTML+CSS+JavaScript

👨‍🎓学生HTML静态网页基础水平制作👩‍🎓,页面排版干净简洁。使用HTMLCSS页面布局设计,web大学生网页设计作业源码,这是一个不错的旅游网页制作,画面精明,排版整洁,内容…

Android App开发语音处理之系统自带的语音引擎、文字转语音、语音识别的讲解及实战(超详细 附源码)

需要源码请点赞关注收藏后评论区留下QQ~~~ 一、系统自带的语音引擎 语音播报的本质是将书面文字转换成自然语言的音频流,这个转换操作被称作语音合成,又称TTS(从文本到语音)在转换过程中,为了避免机械合成的呆板和停顿…

一款客服系统有哪些必备的功能模块?

为了提升客户服务质量,和客户更好地进行沟通,越来越多的企业配置了客服系统。那一款优秀的客服系统需要配置哪些功能模块呢? 1、支持多渠道接入 新媒体的快速发展使得企业有机会通过更多的渠道和客户进行联系,比如公众号、微博、…

java环境安装与配置

这篇文章只是为了以后我配置环境方便而写 1,点击网址,进入Oracle官网 然后参照Java JDK下载安装及环境配置超详细图文教程 2,安装之后如果目录里没有jre文件夹 参考Jdk中没有jre文件夹怎么办? ①简单点就是,管理员模式…

京东零售大数据云原生平台化实践

分享嘉宾:吴维伟 京东 架构工程师 编辑整理:陈妃君 深圳大学 出品社区:DataFun 导读:随着业务调整和集群资源整合需求,大数据系统中集群数据迁移复杂混乱。本文将以京东大数据平台为例,介绍京东近一年在数…

HTML+CSS+JS制作一个迅雷看看电影网页设计实例 ,排版整洁,内容丰富,主题鲜明,简单的网页制作期末作业

HTML实例网页代码, 本实例适合于初学HTML的同学。该实例里面有设置了css的样式设置,有div的样式格局,这个实例比较全面,有助于同学的学习,本文将介绍如何通过从头开始设计个人网站并将其转换为代码的过程来实践设计。 文章目录一、网页介绍一…