sqoop使用
- 1. 导入数据
- 2. 从mysql向hive导入数据
- 2.1 导入用户信息表
- 2.导入订单表
- 2.2 导入订单表
- 2.3 导入商品信息表
- 2.4 导入国家信息表
- 2.5 导入省份信息表
- 2.6 导入城市信息表
- 2.7 创建hive临时表文件
在使用sqoop之前,需要提前启动hadoop, yarn和对应的数据库mysql
1. 导入数据
在sqoop中,导入的概念是从非大数据集群(关系型数据库向大数据集群(thdfs,hive]中传输数据,使用import关键字
2. 从mysql向hive导入数据
2.1 导入用户信息表
sqoop import \
--connect jdbc:mysql://bigdata03:3306/mall \
--username root \
--password 111111 \
--table t_user_info \
--num-mappers 1 \
--hive-import \
--fields-terminated-by "," \
--hive-overwrite \
--hive-table mall_bigdata.ods_user_info
里面的 \ 是代表换行符,这里指令可以写在一行,也可以使用换行符将参数部分分来来写,显得更加直观,num-mappers是指定mapper任务个数,这里表只有一个,数据量也少,任务可以设为1,当表多,数据量大时,可以适当增大num-mappers参数,fields-terminated-by是指定分隔符
注:bigdata03虚拟机会开两个tab窗口,一个用于输入相关shell命令,一个用于开启hive命令行界面进行相关数据查询等。
hive 导入完成
查看导入的用户信息表数据
select * from mall_bigdata.ods_user_info;
2.导入订单表
2.2 导入订单表
sqoop import \
--connect jdbc:mysql://bigdata03:3306/mall \
--username root \
--password 111111 \
--table t_sale_order \
--num-mappers 1 \
--hive-import \
--fields-terminated-by "," \
--hive-overwrite \
--hive-table mall_bigdata.ods_sale_info
查看导入的订单表数据
select * from mall_bigdata.ods_sale_info;
2.3 导入商品信息表
sqoop import \
--connect jdbc:mysql://bigdata03:3306/mall \
--username root \
--password 111111 \
--table dim_goods_info \
--num-mappers 1 \
--hive-import \
--fields-terminated-by "," \
--hive-overwrite \
--hive-table mall_bigdata.dim_goods_info
select * from mall_bigdata.dim_goods_info;
2.4 导入国家信息表
sqoop import \
--connect jdbc:mysql://bigdata03:3306/mall \
--username root \
--password 111111 \
--table dim_country_info \
--num-mappers 1 \
--hive-import \
--fields-terminated-by "," \
--hive-overwrite \
--hive-table mall_bigdata.dim_country_info
select * from mall_bigdata.dim_country_info;
2.5 导入省份信息表
sqoop import \
--connect jdbc:mysql://bigdata03:3306/mall \
--username root \
--password 111111 \
--table dim_province_info \
--num-mappers 1 \
--hive-import \
--fields-terminated-by "," \
--hive-overwrite \
--hive-table mall_bigdata.dim_province_info
select * from mall_bigdata.dim_province_info;
2.6 导入城市信息表
sqoop import \
--connect jdbc:mysql://bigdata03:3306/mall \
--username root \
--password 111111 \
--table dim_city_info \
--num-mappers 1 \
--hive-import \
--fields-terminated-by "," \
--hive-overwrite \
--hive-table mall_bigdata.dim_city_info
select * from mall_bigdata.dim_city_info;
创建tmp_dwd_user_info.sql 并上传到 /opt/file
-- 切换hive的数据库
use mall_bigdata;
-- 补全用户信息表中的国家名称,省份名称和城市名称
create table if not exists mall_bigdata.tmp_dwd_user_info
as
select
user_id
,user_name
,sex
,age
,country_name
,province_name
,city_name
from
(select
user_id
,user_name
,sex
,age
,country_code
,province_code
,city_code
from ods_user_info
) a
left join
(select
country_code
,country_name
from dim_country_info
) b
on a.country_code=b.country_code
left join
(select
province_code
,province_name
,country_code
from dim province_info
) c
on a.province_code=c.province_code and a.country_code=c.country_code
left join
(select
city_code
,city_name
,province_code
from dim_city_info
) d
on a.city_code=d.city_code and a.province_code=d.province_code;
2.7 创建hive临时表文件
创建hive临时表文件tmp_dwd_user_info.txt
-- 切换hive的数据库
use mall_bigdata;
-- 补全用户信息表中的国家名称,省份名称和城市名称
create table if not exists mall_bigdata.tmp_dwd_user_info
as
select
user_id
,user_name
,sex
,age
,country_name
,province_name
,city_name
from
(select
user_id
,user_name
,sex
,age
,country_code
,province_code
,city_code
from ods_user_info
) a
left join
(select
country_code
,country_name
from dim_country_info
) b
on a.country_code=b.country_code
left join
(select
province_code
,province_name
,country_code
from dim_province_info
) c
on a.province_code=c.province_code and a.country_code=c.country_code
left join
(select
city_code
,city_name
,province_code
from dim_city_info
) d
on a.city_code=d.city_code and a.province_code=d.province_code;
执行该hive文件
select * from tmp_dwd_user_info;
创建hive表文件dwd_sale_order_detail.sql到 /opt/file/目录
-- 切换hive的数据库
use mall_bigdata;
-- 补全用户信息表中的国家名称,省份名称和城市名称
create table if not exists mall_bigdata.tmp_dwd_user_info
as
select
user_id
,user_name
,sex
,age
,country_name
,province_name
,city_name
from
(select
user_id
,user_name
,sex
,age
,country_code
,province_code
,city_code
from ods_user_info
) a
left join
(select
country_code
,country_name
from dim_country_info
) b
on a.country_code=b.country_code
left join
(select
province_code
,province_name
,country_code
from dim_province_info
) c
on a.province_code=c.province_code and a.country_code=c.country_code
left join
(select
city_code
,city_name
,province_code
from dim_city_info
) d
on a.city_code=d.city_code and a.province_code=d.province_code;
--补全订单表中的商品名称
--过滤国家名称为中国的订单记录
create table if not exists mall_bigdata.dwd_sale_order_detail
as
select
sale_id,
a.user_id,
user_name,
sex,
age,
country_name,
province_name,
city_name,
a.goods_id,
goods_name,
price,
sale_count,
total_price,
create_time
from
(select
sale_id
,user_id
,goods_id
,price
,sale_count
,total_price
,create_time
from ods_sale_order) a
left join
(
select
goods_id
,goods_name
from dim_goods_info
) b
on a.goods_id=b.goods_id
left join
(
select
user_id
,user_name
,sex
,age
,country_name
,province_name
,city_name
from tmp_dwd_user_info
)c
on a.user_id=c.user_id
where country_name='中国';
--删除临时表
--drop table if exists mall_bigdata.tmp_dwd_user_info;
执行该sql文件
hive -f dwd_sale_order_detail.sql