电商数仓项目----笔记七(数仓DIM层)

news2024/10/1 23:28:38

所谓的维度层其实就是分析数据的角度,维度层保存的表其实是分析数据的角度,比如:

        --性别,年龄,品牌,品类

这层的表主要用于统计分析,因此DIM层的数据存储格式为orc列式存储+snappy压缩(时间短)

orc列式存储的好处:

  1. 查询的时候不需要扫描全部的数据,而只需要读取每次查询涉及的列,这样可以将I/O消耗降低N倍,另外可以保存每一列的统计信息(min、max、sum等),实现部分的谓词下推。
  2. 由于每一列的成员都是同构的,可以针对不同的数据类型使用更高效的数据压缩算法,进一步减小I/O。
  3. 由于每一列的成员的同构性,可以使用更加适合CPU pipeline的编码方式,减小CPU的缓存失效。

维度表的设计

        一个维度就是一张表,从实践的角度来讲,不同的维度就是这张表的字段,可以达到解耦的目的。如果维度特别简单,可以不用创建表,可以在事实表直接使用。

        字段:只要能用来分析的维度,都是字段;

        数据(字段)来源:参考业务数据的表字段:

                -- 主维表:业务数据库主要用于分析维度字段的表;

                -- 相关维表:业务数据库相关用于分析维度字段的表;

        维度字段的确定:

                尽可能生成丰富的维度属性:字段越多越好;

                编码和文字共存(0男/1女);

                计算通用的维度属性;

下面举几个例子:

优惠券维度表 

从主维表和相关维表分析:

        主维表:coupon_info,相关维表:coupon_range,coupon_use,但是coupon_use算是一种行为概念,并不属于状态,状态才是用来做分析的。但是在coupon_info里面也有range相关字段,因此发生了冗余,只需关注coupon_info即可。

coupon_info长这样:

我们这样设计:

DROP TABLE IF EXISTS dim_coupon_full;
CREATE EXTERNAL TABLE dim_coupon_full
(
    `id`               STRING COMMENT '购物券编号',
    `coupon_name`      STRING COMMENT '购物券名称',
    `coupon_type_code` STRING COMMENT '购物券类型编码',
    `coupon_type_name` STRING COMMENT '购物券类型名称',
    `condition_amount` DECIMAL(16, 2) COMMENT '满额数',
    `condition_num`    BIGINT COMMENT '满件数',
    `activity_id`      STRING COMMENT '活动编号',
    `benefit_amount`   DECIMAL(16, 2) COMMENT '减金额',
    `benefit_discount` DECIMAL(16, 2) COMMENT '折扣',
    `benefit_rule`     STRING COMMENT '优惠规则:满元*减*元,满*件打*折',
    `create_time`      STRING COMMENT '创建时间',
    `range_type_code`  STRING COMMENT '优惠范围类型编码',
    `range_type_name`  STRING COMMENT '优惠范围类型名称',
    `limit_num`        BIGINT COMMENT '最多领取次数',
    `taken_count`      BIGINT COMMENT '已领取次数',
    `start_time`       STRING COMMENT '可以领取的开始日期',
    `end_time`         STRING COMMENT '可以领取的结束日期',
    `operate_time`     STRING COMMENT '修改时间',
    `expire_time`      STRING COMMENT '过期时间'
) COMMENT '优惠券维度表'
    PARTITIONED BY (`dt` STRING)
    STORED AS ORC
    LOCATION '/warehouse/gmall/dim/dim_coupon_full/'
    TBLPROPERTIES ('orc.compress' = 'snappy');

        其中不一样的地方有我们将ODS层原表的coupon_type分解为了coupon_type_code和coupon_type_name。将range_type分解为了range_type_code,range_type_name,并且增加了benefit_rule字段(优惠规则)。这样做符合我们上面说的编码和文字共存规则。

数据装载

        我们的表主要从coupon_info和base_dic(字典表)中取得:

        记住这里的主维表是coupon_info,因此我们先select coupon_info这张表,select里面的字段依照我们建表语句里面的字段先写好,当然其中肯定会有几个字段会报红,没关系我们后面还要join 操作,其中coupon_type_code,coupon_type_name,range_type_code,range_type_name字段是找不到的,因此需要join操作。我们join base_dic字典表:

join base_dic两次分别得到coupon_type_code,coupon_type_name字段和range_type_code,range_type_name字段;

        接下来是benefit_rule字段,这里需要我们自行拼接。拼接逻辑如下:

case coupon_type
        when '3201' then concat('满',condition_amount,'元减',benefit_amount,'元')
        when '3202' then concat('满',condition_num,'件打',10*(1-benefit_discount),'折')
        when '3203' then concat('减',benefit_amount,'元')
    end benefit_rule,

 完整是这样:

select
    id,
    coupon_name,
    coupon_type,
    coupon_dic.dic_name,
    condition_amount,
    condition_num,
    activity_id,
    benefit_amount,
    benefit_discount,
    case coupon_type
        when '3201' then concat('满',condition_amount,'元减',benefit_amount,'元')
        when '3202' then concat('满',condition_num,'件打',10*(1-benefit_discount),'折')
        when '3203' then concat('减',benefit_amount,'元')
    end benefit_rule,
    create_time,
    range_type,
    range_dic.dic_name,
    limit_num,
    taken_count,
    start_time,
    end_time,
    operate_time,
    expire_time
from
(
    select
        id,
        coupon_name,
        coupon_type,
        condition_amount,
        condition_num,
        activity_id,
        benefit_amount,
        benefit_discount,
        create_time,
        range_type,
        limit_num,
        taken_count,
        start_time,
        end_time,
        operate_time,
        expire_time
    from ods_coupon_info_full
    where dt='2020-06-14'
)ci
left join
(
    select
        dic_code,
        dic_name
    from ods_base_dic_full
    where dt='2020-06-14'
    and parent_code='32'
)coupon_dic
on ci.coupon_type=coupon_dic.dic_code
left join
(
    select
        dic_code,
        dic_name
    from ods_base_dic_full
    where dt='2020-06-14'
    and parent_code='33'
)range_dic
on ci.range_type=range_dic.dic_code;

数据装载我们只需要前面加上下面这一句即可:

insert overwrite table dim_coupon_full partition(dt='2020-06-14')

我们的Dim层优惠券维度表就设计完啦。

活动维度表

        同样的,找到主维表和相关维表。

        activity_info ,activity_rule,activity_sku:我们分析的更多的是活动规则,而不是活动本身,所以主维表是activity_rule,相关维表是activity_info。

我们这样设计:

DROP TABLE IF EXISTS dim_activity_full;
CREATE EXTERNAL TABLE dim_activity_full
(
    `activity_rule_id`   STRING COMMENT '活动规则ID',
    `activity_id`        STRING COMMENT '活动ID',
    `activity_name`      STRING COMMENT '活动名称',
    `activity_type_code` STRING COMMENT '活动类型编码',
    `activity_type_name` STRING COMMENT '活动类型名称',
    `activity_desc`      STRING COMMENT '活动描述',
    `start_time`         STRING COMMENT '开始时间',
    `end_time`           STRING COMMENT '结束时间',
    `create_time`        STRING COMMENT '创建时间',
    `condition_amount`   DECIMAL(16, 2) COMMENT '满减金额',
    `condition_num`      BIGINT COMMENT '满减件数',
    `benefit_amount`     DECIMAL(16, 2) COMMENT '优惠金额',
    `benefit_discount`   DECIMAL(16, 2) COMMENT '优惠折扣',
    `benefit_rule`       STRING COMMENT '优惠规则',
    `benefit_level`      STRING COMMENT '优惠级别'
) COMMENT '活动信息表'
    PARTITIONED BY (`dt` STRING)
    STORED AS ORC
    LOCATION '/warehouse/gmall/dim/dim_activity_full/'
    TBLPROPERTIES ('orc.compress' = 'snappy');

数据装载:

insert overwrite table dim_activity_full partition(dt='2020-06-14')
select
    `activity_rule_id`   ,--STRING COMMENT '活动规则ID',
    `activity_id`        ,--STRING COMMENT '活动ID',
    `activity_name`      ,--STRING COMMENT '活动名称',
    `activity_type_code` ,--STRING COMMENT '活动类型编码',
    `activity_type_name` ,--STRING COMMENT '活动类型名称',
    `activity_desc`      ,--STRING COMMENT '活动描述',
    `start_time`         ,--STRING COMMENT '开始时间',
    `end_time`           ,--STRING COMMENT '结束时间',
    `create_time`        ,--STRING COMMENT '创建时间',
    `condition_amount`   ,--DECIMAL(16, 2) COMMENT '满减金额',
    `condition_num`      ,--BIGINT COMMENT '满减件数',
    `benefit_amount`     ,--DECIMAL(16, 2) COMMENT '优惠金额',
    `benefit_discount`   ,--DECIMAL(16, 2) COMMENT '优惠折扣',
    `benefit_rule`       ,--STRING COMMENT '优惠规则',
    `benefit_level`      --STRING COMMENT '优惠级别'
from
    (select
         id `activity_rule_id`   ,--STRING COMMENT '活动规则ID',
        `activity_id`        ,--STRING COMMENT '活动ID',
        --`activity_name`      ,--STRING COMMENT '活动名称',
        activity_type `activity_type_code` ,--STRING COMMENT '活动类型编码',
        --`activity_type_name` ,--STRING COMMENT '活动类型名称',
        --`activity_desc`      ,--STRING COMMENT '活动描述',
        --`start_time`         ,--STRING COMMENT '开始时间',
        --`end_time`           ,--STRING COMMENT '结束时间',
         dt create_time                   ,--STRING COMMENT '创建时间',
        `condition_amount`   ,--DECIMAL(16, 2) COMMENT '满减金额',
        `condition_num`      ,--BIGINT COMMENT '满减件数',
        `benefit_amount`     ,--DECIMAL(16, 2) COMMENT '优惠金额',
        `benefit_discount`   ,--DECIMAL(16, 2) COMMENT '优惠折扣',
        case activity_type
            when '3101' then concat('满',condition_amount,'元减',benefit_amount,'元')
            when '3102' then concat('满',condition_num,'件打',benefit_discount,'折')
            when '3103' then concat('打',benefit_discount,'折')
        end `benefit_rule`       ,--STRING COMMENT '优惠规则',
        `benefit_level`      --STRING COMMENT '优惠级别'
    from ods_activity_rule_full
    where dt='2020-06-14')rule
left join
    (
    select
        id,
        activity_name,
        activity_desc,
        start_time,
        end_time
    from ods_activity_info_full
    where dt='2020-06-14') info
on rule.activity_id=info.id
left join (
    select
        dic_code,
        dic_name activity_type_name
    from ods_base_dic_full
    where dt='2020-06-14' and parent_code='31'
        )dic on rule.activity_type_code=dic.dic_code

        整体思路就是先将create表中的字段复制到select 主维表的语句中,爆红的字段我们一一给他们join出来,或在join的那张表中给他们查询出来,这里就不详细分析了。

日期维度表

建表语句

DROP TABLE IF EXISTS dim_date;
CREATE EXTERNAL TABLE dim_date
(
    `date_id`    STRING COMMENT '日期ID',
    `week_id`    STRING COMMENT '周ID,一年中的第几周',
    `week_day`   STRING COMMENT '周几',
    `day`        STRING COMMENT '每月的第几天',
    `month`      STRING COMMENT '一年中的第几月',
    `quarter`    STRING COMMENT '一年中的第几季度',
    `year`       STRING COMMENT '年份',
    `is_workday` STRING COMMENT '是否是工作日',
    `holiday_id` STRING COMMENT '节假日'
) COMMENT '时间维度表'
    STORED AS ORC
    LOCATION '/warehouse/gmall/dim/dim_date/'
    TBLPROPERTIES ('orc.compress' = 'snappy');

数据装载

        通常情况下,时间维度表的数据并不是来自于业务系统,而是手动写入,并且由于时间维度表数据的可预见性,无须每日导入,一般可一次性导入一年的数据。

(1)创建临时表

DROP TABLE IF EXISTS tmp_dim_date_info;
CREATE EXTERNAL TABLE tmp_dim_date_info (
    `date_id` STRING COMMENT '日',
    `week_id` STRING COMMENT '周ID',
    `week_day` STRING COMMENT '周几',
    `day` STRING COMMENT '每月的第几天',
    `month` STRING COMMENT '第几月',
    `quarter` STRING COMMENT '第几季度',
    `year` STRING COMMENT '年',
    `is_workday` STRING COMMENT '是否是工作日',
    `holiday_id` STRING COMMENT '节假日'
) COMMENT '时间维度表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/warehouse/gmall/tmp/tmp_dim_date_info/';

将数据文件上传到HFDS上临时表路径/warehouse/gmall/tmp/tmp_dim_date_info 

(3)执行以下语句将其导入时间维度表

insert overwrite table dim_date select * from tmp_dim_date_info;

 

用户维度表

        用户维度表这里我们使用拉链表,来记录用户姓名的变更或者用户的增加减少。

(1)数据装载过程

(2)数据流向 

首日装载 

我们的用户数据在进行首日装载和后续的变更都是insert overwrite到9999-12-31的分区,首日装载如下:

insert overwrite table dim_user_zip partition (dt='9999-12-31')
select
    data.id,
    data.login_name,
    data.nick_name,
    md5(data.name),
    md5(data.phone_num),
    md5(data.email),
    data.user_level,
    data.birthday,
    data.gender,
    data.create_time,
    data.operate_time,
    '2020-06-14' start_date,
    '9999-12-31' end_date
from ods_user_info_inc
where dt='2020-06-14'
and type='bootstrap-insert';

每日装载

        装载思路:

         装载语句:

with
tmp as
(
    select
        old.id old_id,
        old.login_name old_login_name,
        old.nick_name old_nick_name,
        old.name old_name,
        old.phone_num old_phone_num,
        old.email old_email,
        old.user_level old_user_level,
        old.birthday old_birthday,
        old.gender old_gender,
        old.create_time old_create_time,
        old.operate_time old_operate_time,
        old.start_date old_start_date,
        old.end_date old_end_date,
        new.id new_id,
        new.login_name new_login_name,
        new.nick_name new_nick_name,
        new.name new_name,
        new.phone_num new_phone_num,
        new.email new_email,
        new.user_level new_user_level,
        new.birthday new_birthday,
        new.gender new_gender,
        new.create_time new_create_time,
        new.operate_time new_operate_time,
        new.start_date new_start_date,
        new.end_date new_end_date
    from
    (
        select
            id,
            login_name,
            nick_name,
            name,
            phone_num,
            email,
            user_level,
            birthday,
            gender,
            create_time,
            operate_time,
            start_date,
            end_date
        from dim_user_zip
        where dt='9999-12-31'
    )old
    full outer join
    (
        select
            id,
            login_name,
            nick_name,
            md5(name) name,
            md5(phone_num) phone_num,
            md5(email) email,
            user_level,
            birthday,
            gender,
            create_time,
            operate_time,
            '2020-06-15' start_date,
            '9999-12-31' end_date
        from
        (
            select
                data.id,
                data.login_name,
                data.nick_name,
                data.name,
                data.phone_num,
                data.email,
                data.user_level,
                data.birthday,
                data.gender,
                data.create_time,
                data.operate_time,
                row_number() over (partition by data.id order by ts desc) rn
            from ods_user_info_inc
            where dt='2020-06-15'
        )t1
        where rn=1
    )new
    on old.id=new.id
)
insert overwrite table dim_user_zip partition(dt)
select
    if(new_id is not null,new_id,old_id),
    if(new_id is not null,new_login_name,old_login_name),
    if(new_id is not null,new_nick_name,old_nick_name),
    if(new_id is not null,new_name,old_name),
    if(new_id is not null,new_phone_num,old_phone_num),
    if(new_id is not null,new_email,old_email),
    if(new_id is not null,new_user_level,old_user_level),
    if(new_id is not null,new_birthday,old_birthday),
    if(new_id is not null,new_gender,old_gender),
    if(new_id is not null,new_create_time,old_create_time),
    if(new_id is not null,new_operate_time,old_operate_time),
    if(new_id is not null,new_start_date,old_start_date),
    if(new_id is not null,new_end_date,old_end_date),
    if(new_id is not null,new_end_date,old_end_date) dt
from tmp
union all
select
    old_id,
    old_login_name,
    old_nick_name,
    old_name,
    old_phone_num,
    old_email,
    old_user_level,
    old_birthday,
    old_gender,
    old_create_time,
    old_operate_time,
    old_start_date,
    cast(date_add('2020-06-15',-1) as string) old_end_date,
    cast(date_add('2020-06-15',-1) as string) dt
from tmp
where old_id is not null
and new_id is not null;

 

数据装载脚本

首日装载脚本

#!/bin/bash

APP=gmall

if [ -n "$2" ] ;then
   do_date=$2
else 
   echo "请传入日期参数"
   exit
fi 

dim_user_zip="
insert overwrite table ${APP}.dim_user_zip partition (dt='9999-12-31')
select
    data.id,
    data.login_name,
    data.nick_name,
    md5(data.name),
    md5(data.phone_num),
    md5(data.email),
    data.user_level,
    data.birthday,
    data.gender,
    data.create_time,
    data.operate_time,
    '$do_date' start_date,
    '9999-12-31' end_date
from ${APP}.ods_user_info_inc
where dt='$do_date'
and type='bootstrap-insert';
"

dim_sku_full="
with
sku as
(
    select
        id,
        price,
        sku_name,
        sku_desc,
        weight,
        is_sale,
        spu_id,
        category3_id,
        tm_id,
        create_time
    from ${APP}.ods_sku_info_full
    where dt='$do_date'
),
spu as
(
    select
        id,
        spu_name
    from ${APP}.ods_spu_info_full
    where dt='$do_date'
),
c3 as
(
    select
        id,
        name,
        category2_id
    from ${APP}.ods_base_category3_full
    where dt='$do_date'
),
c2 as
(
    select
        id,
        name,
        category1_id
    from ${APP}.ods_base_category2_full
    where dt='$do_date'
),
c1 as
(
    select
        id,
        name
    from ${APP}.ods_base_category1_full
    where dt='$do_date'
),
tm as
(
    select
        id,
        tm_name
    from ${APP}.ods_base_trademark_full
    where dt='$do_date'
),
attr as
(
    select
        sku_id,
        collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs
    from ${APP}.ods_sku_attr_value_full
    where dt='$do_date'
    group by sku_id
),
sale_attr as
(
    select
        sku_id,
        collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs
    from ${APP}.ods_sku_sale_attr_value_full
    where dt='$do_date'
    group by sku_id
)
insert overwrite table ${APP}.dim_sku_full partition(dt='$do_date')
select
    sku.id,
    sku.price,
    sku.sku_name,
    sku.sku_desc,
    sku.weight,
    sku.is_sale,
    sku.spu_id,
    spu.spu_name,
    sku.category3_id,
    c3.name,
    c3.category2_id,
    c2.name,
    c2.category1_id,
    c1.name,
    sku.tm_id,
    tm.tm_name,
    attr.attrs,
    sale_attr.sale_attrs,
    sku.create_time
from sku
left join spu on sku.spu_id=spu.id
left join c3 on sku.category3_id=c3.id
left join c2 on c3.category2_id=c2.id
left join c1 on c2.category1_id=c1.id
left join tm on sku.tm_id=tm.id
left join attr on sku.id=attr.sku_id
left join sale_attr on sku.id=sale_attr.sku_id;
"

dim_province_full="
insert overwrite table ${APP}.dim_province_full partition(dt='$do_date')
select
    province.id,
    province.name,
    province.area_code,
    province.iso_code,
    province.iso_3166_2,
    region_id,
    region_name
from
(
    select
        id,
        name,
        region_id,
        area_code,
        iso_code,
        iso_3166_2
    from ${APP}.ods_base_province_full
    where dt='$do_date'
)province
left join
(
    select
        id,
        region_name
    from ${APP}.ods_base_region_full
    where dt='$do_date'
)region
on province.region_id=region.id;
"

dim_coupon_full="
insert overwrite table ${APP}.dim_coupon_full partition(dt='$do_date')
select
    id,
    coupon_name,
    coupon_type,
    coupon_dic.dic_name,
    condition_amount,
    condition_num,
    activity_id,
    benefit_amount,
    benefit_discount,
    case coupon_type
        when '3201' then concat('满',condition_amount,'元减',benefit_amount,'元')
        when '3202' then concat('满',condition_num,'件打',10*(1-benefit_discount),'折')
        when '3203' then concat('减',benefit_amount,'元')
    end benefit_rule,
    create_time,
    range_type,
    range_dic.dic_name,
    limit_num,
    taken_count,
    start_time,
    end_time,
    operate_time,
    expire_time
from
(
    select
        id,
        coupon_name,
        coupon_type,
        condition_amount,
        condition_num,
        activity_id,
        benefit_amount,
        benefit_discount,
        create_time,
        range_type,
        limit_num,
        taken_count,
        start_time,
        end_time,
        operate_time,
        expire_time
    from ${APP}.ods_coupon_info_full
    where dt='$do_date'
)ci
left join
(
    select
        dic_code,
        dic_name
    from ${APP}.ods_base_dic_full
    where dt='$do_date'
    and parent_code='32'
)coupon_dic
on ci.coupon_type=coupon_dic.dic_code
left join
(
    select
        dic_code,
        dic_name
    from ${APP}.ods_base_dic_full
    where dt='$do_date'
    and parent_code='33'
)range_dic
on ci.range_type=range_dic.dic_code;
"

dim_activity_full="
insert overwrite table ${APP}.dim_activity_full partition(dt='$do_date')
select
    rule.id,
    info.id,
    activity_name,
    rule.activity_type,
    dic.dic_name,
    activity_desc,
    start_time,
    end_time,
    create_time,
    condition_amount,
    condition_num,
    benefit_amount,
    benefit_discount,
    case rule.activity_type
        when '3101' then concat('满',condition_amount,'元减',benefit_amount,'元')
        when '3102' then concat('满',condition_num,'件打',10*(1-benefit_discount),'折')
        when '3103' then concat('打',10*(1-benefit_discount),'折')
    end benefit_rule,
    benefit_level
from
(
    select
        id,
        activity_id,
        activity_type,
        condition_amount,
        condition_num,
        benefit_amount,
        benefit_discount,
        benefit_level
    from ${APP}.ods_activity_rule_full
    where dt='$do_date'
)rule
left join
(
    select
        id,
        activity_name,
        activity_type,
        activity_desc,
        start_time,
        end_time,
        create_time
    from ${APP}.ods_activity_info_full
    where dt='$do_date'
)info
on rule.activity_id=info.id
left join
(
    select
        dic_code,
        dic_name
    from ${APP}.ods_base_dic_full
    where dt='$do_date'
    and parent_code='31'
)dic
on rule.activity_type=dic.dic_code;
"

case $1 in
"dim_user_zip")
    hive -e "$dim_user_zip"
;;
"dim_sku_full")
    hive -e "$dim_sku_full"
;;
"dim_province_full")
    hive -e "$dim_province_full"
;;
"dim_coupon_full")
    hive -e "$dim_coupon_full"
;;
"dim_activity_full")
    hive -e "$dim_activity_full"
;;
"all")
    hive -e "$dim_user_zip$dim_sku_full$dim_province_full$dim_coupon_full$dim_activity_full"
;;
esac

每日装载脚本

#!/bin/bash

APP=gmall

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$2" ] ;then
    do_date=$2
else 
    do_date=`date -d "-1 day" +%F`
fi

dim_user_zip="
set hive.exec.dynamic.partition.mode=nonstrict;
with
tmp as
(
    select
        old.id old_id,
        old.login_name old_login_name,
        old.nick_name old_nick_name,
        old.name old_name,
        old.phone_num old_phone_num,
        old.email old_email,
        old.user_level old_user_level,
        old.birthday old_birthday,
        old.gender old_gender,
        old.create_time old_create_time,
        old.operate_time old_operate_time,
        old.start_date old_start_date,
        old.end_date old_end_date,
        new.id new_id,
        new.login_name new_login_name,
        new.nick_name new_nick_name,
        new.name new_name,
        new.phone_num new_phone_num,
        new.email new_email,
        new.user_level new_user_level,
        new.birthday new_birthday,
        new.gender new_gender,
        new.create_time new_create_time,
        new.operate_time new_operate_time,
        new.start_date new_start_date,
        new.end_date new_end_date
    from
    (
        select
            id,
            login_name,
            nick_name,
            name,
            phone_num,
            email,
            user_level,
            birthday,
            gender,
            create_time,
            operate_time,
            start_date,
            end_date
        from ${APP}.dim_user_zip
        where dt='9999-12-31'
    )old
    full outer join
    (
        select
            id,
            login_name,
            nick_name,
            md5(name) name,
            md5(phone_num) phone_num,
            md5(email) email,
            user_level,
            birthday,
            gender,
            create_time,
            operate_time,
            '$do_date' start_date,
            '9999-12-31' end_date
        from
        (
            select
                data.id,
                data.login_name,
                data.nick_name,
                data.name,
                data.phone_num,
                data.email,
                data.user_level,
                data.birthday,
                data.gender,
                data.create_time,
                data.operate_time,
                row_number() over (partition by data.id order by ts desc) rn
            from ${APP}.ods_user_info_inc
            where dt='$do_date'
        )t1
        where rn=1
    )new
    on old.id=new.id
)
insert overwrite table ${APP}.dim_user_zip partition(dt)
select
    if(new_id is not null,new_id,old_id),
    if(new_id is not null,new_login_name,old_login_name),
    if(new_id is not null,new_nick_name,old_nick_name),
    if(new_id is not null,new_name,old_name),
    if(new_id is not null,new_phone_num,old_phone_num),
    if(new_id is not null,new_email,old_email),
    if(new_id is not null,new_user_level,old_user_level),
    if(new_id is not null,new_birthday,old_birthday),
    if(new_id is not null,new_gender,old_gender),
    if(new_id is not null,new_create_time,old_create_time),
    if(new_id is not null,new_operate_time,old_operate_time),
    if(new_id is not null,new_start_date,old_start_date),
    if(new_id is not null,new_end_date,old_end_date),
    if(new_id is not null,new_end_date,old_end_date) dt
from tmp
union all
select
    old_id,
    old_login_name,
    old_nick_name,
    old_name,
    old_phone_num,
    old_email,
    old_user_level,
    old_birthday,
    old_gender,
    old_create_time,
    old_operate_time,
    old_start_date,
    cast(date_add('$do_date',-1) as string) old_end_date,
    cast(date_add('$do_date',-1) as string) dt
from tmp
where old_id is not null
and new_id is not null;
"

dim_sku_full="
with
sku as
(
    select
        id,
        price,
        sku_name,
        sku_desc,
        weight,
        is_sale,
        spu_id,
        category3_id,
        tm_id,
        create_time
    from ${APP}.ods_sku_info_full
    where dt='$do_date'
),
spu as
(
    select
        id,
        spu_name
    from ${APP}.ods_spu_info_full
    where dt='$do_date'
),
c3 as
(
    select
        id,
        name,
        category2_id
    from ${APP}.ods_base_category3_full
    where dt='$do_date'
),
c2 as
(
    select
        id,
        name,
        category1_id
    from ${APP}.ods_base_category2_full
    where dt='$do_date'
),
c1 as
(
    select
        id,
        name
    from ${APP}.ods_base_category1_full
    where dt='$do_date'
),
tm as
(
    select
        id,
        tm_name
    from ${APP}.ods_base_trademark_full
    where dt='$do_date'
),
attr as
(
    select
        sku_id,
        collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs
    from ${APP}.ods_sku_attr_value_full
    where dt='$do_date'
    group by sku_id
),
sale_attr as
(
    select
        sku_id,
        collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs
    from ${APP}.ods_sku_sale_attr_value_full
    where dt='$do_date'
    group by sku_id
)
insert overwrite table ${APP}.dim_sku_full partition(dt='$do_date')
select
    sku.id,
    sku.price,
    sku.sku_name,
    sku.sku_desc,
    sku.weight,
    sku.is_sale,
    sku.spu_id,
    spu.spu_name,
    sku.category3_id,
    c3.name,
    c3.category2_id,
    c2.name,
    c2.category1_id,
    c1.name,
    sku.tm_id,
    tm.tm_name,
    attr.attrs,
    sale_attr.sale_attrs,
    sku.create_time
from sku
left join spu on sku.spu_id=spu.id
left join c3 on sku.category3_id=c3.id
left join c2 on c3.category2_id=c2.id
left join c1 on c2.category1_id=c1.id
left join tm on sku.tm_id=tm.id
left join attr on sku.id=attr.sku_id
left join sale_attr on sku.id=sale_attr.sku_id;
"

dim_province_full="
insert overwrite table ${APP}.dim_province_full partition(dt='$do_date')
select
    province.id,
    province.name,
    province.area_code,
    province.iso_code,
    province.iso_3166_2,
    region_id,
    region_name
from
(
    select
        id,
        name,
        region_id,
        area_code,
        iso_code,
        iso_3166_2
    from ${APP}.ods_base_province_full
    where dt='$do_date'
)province
left join
(
    select
        id,
        region_name
    from ${APP}.ods_base_region_full
    where dt='$do_date'
)region
on province.region_id=region.id;
"

dim_coupon_full="
insert overwrite table ${APP}.dim_coupon_full partition(dt='$do_date')
select
    id,
    coupon_name,
    coupon_type,
    coupon_dic.dic_name,
    condition_amount,
    condition_num,
    activity_id,
    benefit_amount,
    benefit_discount,
    case coupon_type
        when '3201' then concat('满',condition_amount,'元减',benefit_amount,'元')
        when '3202' then concat('满',condition_num,'件打',10*(1-benefit_discount),'折')
        when '3203' then concat('减',benefit_amount,'元')
    end benefit_rule,
    create_time,
    range_type,
    range_dic.dic_name,
    limit_num,
    taken_count,
    start_time,
    end_time,
    operate_time,
    expire_time
from
(
    select
        id,
        coupon_name,
        coupon_type,
        condition_amount,
        condition_num,
        activity_id,
        benefit_amount,
        benefit_discount,
        create_time,
        range_type,
        limit_num,
        taken_count,
        start_time,
        end_time,
        operate_time,
        expire_time
    from ${APP}.ods_coupon_info_full
    where dt='$do_date'
)ci
left join
(
    select
        dic_code,
        dic_name
    from ${APP}.ods_base_dic_full
    where dt='$do_date'
    and parent_code='32'
)coupon_dic
on ci.coupon_type=coupon_dic.dic_code
left join
(
    select
        dic_code,
        dic_name
    from ${APP}.ods_base_dic_full
    where dt='$do_date'
    and parent_code='33'
)range_dic
on ci.range_type=range_dic.dic_code;
"

dim_activity_full="
insert overwrite table ${APP}.dim_activity_full partition(dt='$do_date')
select
    rule.id,
    info.id,
    activity_name,
    rule.activity_type,
    dic.dic_name,
    activity_desc,
    start_time,
    end_time,
    create_time,
    condition_amount,
    condition_num,
    benefit_amount,
    benefit_discount,
    case rule.activity_type
        when '3101' then concat('满',condition_amount,'元减',benefit_amount,'元')
        when '3102' then concat('满',condition_num,'件打',10*(1-benefit_discount),'折')
        when '3103' then concat('打',10*(1-benefit_discount),'折')
    end benefit_rule,
    benefit_level
from
(
    select
        id,
        activity_id,
        activity_type,
        condition_amount,
        condition_num,
        benefit_amount,
        benefit_discount,
        benefit_level
    from ${APP}.ods_activity_rule_full
    where dt='$do_date'
)rule
left join
(
    select
        id,
        activity_name,
        activity_type,
        activity_desc,
        start_time,
        end_time,
        create_time
    from ${APP}.ods_activity_info_full
    where dt='$do_date'
)info
on rule.activity_id=info.id
left join
(
    select
        dic_code,
        dic_name
    from ${APP}.ods_base_dic_full
    where dt='$do_date'
    and parent_code='31'
)dic
on rule.activity_type=dic.dic_code;
"

case $1 in
"dim_user_zip")
    hive -e "$dim_user_zip"
;;
"dim_sku_full")
    hive -e "$dim_sku_full"
;;
"dim_province_full")
    hive -e "$dim_province_full"
;;
"dim_coupon_full")
    hive -e "$dim_coupon_full"
;;
"dim_activity_full")
    hive -e "$dim_activity_full"
;;
"all")
    hive -e "$dim_user_zip$dim_sku_full$dim_province_full$dim_coupon_full$dim_activity_full"
;;
esac

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1332716.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

细粒度语义对齐

抽象 大规模的视觉语言预训练在广泛的下游任务中显示出令人印象深刻的进展。现有方法主要通过图像和文本的全局表示的相似性或对图像和文本特征的高级跨模态关注来模拟跨模态对齐。然而,他们未能明确学习视觉区域和文本短语之间的细粒度语义对齐,因为只有…

Spring系列学习一、Spring框架的概论

Spring框架的概论 一、 Spring框架的起源与历史二、 Spring框架的核心理念与特点三、 Spring与其他框架的对比1、首先介绍下Spring与其平替的EJB的对比:2、接下来介绍下Spring与基于Java EE原生技术的对比3、Spring与Hibernate的对比4、Spring与Struts的对比 四、Sp…

docker的基本管理和相关概念

docker的基本管理和概念 docker:开源的应用容器引擎。基于go语言开发的。运行在linux系统当中的开源的,轻量级的“虚拟机” docker的容器技术可以在一台主机上轻松的为任何应用创建一个轻量级的,可移植的,自给自足的容器 docke…

计算机网络——数据链路层(三)

前言: 前面我们已经对计算机网络的物理层有了一个大概的了解,今天我们学习的是物理层服务的上一层数据链路层,位于物理层和网络层之间。数据链路层在物理层提供的服务的基础上向网络层提供服务,其最基本的服务是将源自物理层来的数据可靠地传…

RabbmitMQ基础

RabbmitMQ基础 1.1 什么是MQ MQ(Message Queue),从字面意思看,本质是个队列,FIFO先入先出,队列中存放的是message。是一种跨进程的通信机制,用于上下游传递消息。在互联网架构中,MQ是一种非常常见的上下游…

DDS深度揭秘:数字音乐的心灵密码

嗨,音乐奇迹的探索者们!今天,我们将揭开音乐数字化的神秘面纱,深入探讨Direct Digital Synthesis(DDS)这项数字音乐技术。别担心,我们将用通俗易懂的语言为你解析DDS是什么、它是如何工作的&…

零基础学人工智能:TensorFlow 入门例子

识别手写图片 因为这个例子是 TensorFlow 官方的例子,不会说的太详细,会加入了一点个人的理解,因为TensorFlow提供了各种工具和库,帮助开发人员构建和训练基于神经网络的模型。TensorFlow 中最重要的概念是张量(Tenso…

【论文解读】3D视觉标定的显式文本解耦和密集对齐(CVPR 2023)

来源:投稿 作者:橡皮 编辑:学姐 论文链接:https://arxiv.org/abs/2209.14941 开源代码:https://github.com/yanmin-wu/EDA 图1所示。文本解耦,密集对齐的3D视觉标定。文本中的不同颜色对应不同的解耦分量。…

Pinely Round 3 (Div. 1 + Div. 2) E. Multiple Lamps(思维题 暴力 补写法)

题目 n(n<2e5)盏灯&#xff0c;编号1到n&#xff0c;一开始全是熄灭的 n个开关&#xff0c;第i个控制着所有i的倍数&#xff0c;按下i的时候&#xff0c;i、2i、...这些灯的状态会被翻转&#xff0c; m(m<2e5)个限制&#xff0c;第j条限制形如uj,vj&#xff0c;表示uj…

智能优化算法应用:基于向量加权平均算法3D无线传感器网络(WSN)覆盖优化 - 附代码

智能优化算法应用&#xff1a;基于向量加权平均算法3D无线传感器网络(WSN)覆盖优化 - 附代码 文章目录 智能优化算法应用&#xff1a;基于向量加权平均算法3D无线传感器网络(WSN)覆盖优化 - 附代码1.无线传感网络节点模型2.覆盖数学模型及分析3.向量加权平均算法4.实验参数设定…

springboot整合JPA 多表关联 :一对多 多对多

补充一下自定义SQL 这是连表查询&#xff0c;可以任意查出字符&#xff0c;用Map接收 Testvoid test3() {JPAQueryFactory jpaQueryFactory new JPAQueryFactory(em);QStudent student QStudent.student;QMessage message QMessage.message;//constructor(StuMesDto.class, …

【pynput】鼠标行为追踪并模拟

文章目录 前言基本思路安装依赖包实时鼠标捕获捕获鼠标位置捕获鼠标事件记录点击内容 效果图 利用本文内容从事的任何犯法行为和开发与本人无关&#xff0c;请理性利用技术服务大家&#xff0c;创建美好和谐的社会&#xff0c;让人们生活从繁琐中变得更加具有创造性&#xff01…

7. 结构型模式 - 代理模式

亦称&#xff1a; Proxy 意图 代理模式是一种结构型设计模式&#xff0c; 让你能够提供对象的替代品或其占位符。 代理控制着对于原对象的访问&#xff0c; 并允许在将请求提交给对象前后进行一些处理。 问题 为什么要控制对于某个对象的访问呢&#xff1f; 举个例子&#xff…

Oracle研学-查询

学自B站黑马程序员 1.单表查询 //查询水表编号为 30408 的业主记录 select * from T_OWNERS where watermeter30408 //查询业主名称包含“刘”的业主记录 select * from t_owners where name like %刘% //查询业主名称包含“刘”的并且门牌号包含 5 的业主记录 select * from…

Redis-运维

转自 极客时间 Redis 亚风 原文视频&#xff1a;https://u.geekbang.org/lesson/535?article681062 Redis 同步 Redis主从数据同步,主从第⼀次同步是全量同步 replicaof 主机 端口 #当前这个机器做Master的备份master如何判断slave是不是第⼀次来同步数据&#xff1a; Repl…

linux循环调度执行

9.2 循环调度执行 9.2.1 简介 cron的概念和crontab是不可分割的。 ​ crontab是一个命令&#xff0c;常见于Unix和Linux的操作系统之中用于设置周期性被执行的指令。 ​ 该命令从标准输入设备读取指令&#xff0c;并将其存放于“crontab”文件中&#xff0c;以供之后读取和执…

基于AT89C51单片机的8位密码锁仿真与实物制作

点击链接获取Keil源码与Project Backups仿真图&#xff1a; https://download.csdn.net/download/qq_64505944/88657969?spm1001.2014.3001.5503 源码获取 C 源码仿真图毕业设计实物制作步骤01 摘要 在日常的生活和工作中, 住宅与部门的安全防范、单位的文件档案、财务报表…

在使用 npm install的时候提示node-sass command faile 解决方案

在使用npm install的时候错误提示node-sass 相关的。错误信息如下图&#xff1a; 解决方法&#xff08;PS&#xff1a;凯哥的不适用&#xff09; 出现这种问题基本是由于node版本与sass版本不匹配导致的 方案1&#xff1a;卸载node&#xff0c;安装对应版本 方案2&#xff1…

关键字:protected关键字

在 Java 中&#xff0c;protected 是一个访问修饰符&#xff0c;用于修饰类成员&#xff08;成员变量、成员方法和构造方法&#xff09;。当一个类成员被声明为 protected 时&#xff0c;它可以在同一包中的其他类以及子类中被访问。 以下是 protected 关键字的解析&#xff1a…

5. 创建型模式 - 单例模式

亦称&#xff1a; 单件模式、Singleton 意图 单例模式是一种创建型设计模式&#xff0c; 让你能够保证一个类只有一个实例&#xff0c; 并提供一个访问该实例的全局节点。 问题 单例模式同时解决了两个问题&#xff0c; 所以违反了单一职责原则&#xff1a; 保证一个类只有一…