1. 向表中装载数据(Load)
1)语法
hive> load data [local] inpath '数据的 path'[overwrite] into table student [partition (partcol1=val1,…)];
(1)load data:表示加载数据
(2)local:表示从本地加载数据到 hive 表;否则从 HDFS 加载数据到 hive 表
(3)inpath:表示加载数据的路径
(4)overwrite:表示覆盖表中已有数据,否则表示追加
(5)into table:表示加载到哪张表
(6)student:表示具体的表
(7)partition:表示上传到指定分区
2)实操案例
(0)创建一张表
create table student(id string, name string) row format delimited fields terminated by '';
(1)加载本地文件到 hive
hive (hive3)> load data local inpath '/home/atguigu/student.txt' into table hive3.student;
Loading data to table hive3.student
OK
Time taken: 1.007 seconds
查询结果:
hive (hive3)> select * from student;
OK
student.id student.name
1001 ss1
1002 ss2
1003 ss3
1004 ss4
1005 ss5
1006 ss6
1007 ss7
1008 ss8
1009 ss9
1010 ss10
1011 ss11
1012 ss12
1013 ss13
1014 ss14
1015 ss15
1016 ss16
1001 zzz
1002 ddd
1111 ccc
(2)加载 HDFS 文件到 hive 中
上传文件到 HDFS
hive (default)> dfs -put /opt/module/hive/data/student.txt /user/atguigu/hive;
加载 HDFS 上数据
hive (default)> load data inpath '/user/atguigu/hive/student.txt' into table default.student;
(3)加载数据覆盖表中已有的数据
上传文件到 HDFS
hive (default)> dfs -put /opt/module/data/student.txt /user/atguigu/hive;
加载数据覆盖表中已有的数据
hive (hive3)> load data local inpath '/home/atguigu/student.txt' overwrite into table hive3.student;
查询覆盖后的信息:
hive (hive3)> select * from student;
OK
student.id student.name
1001 zzz
1002 ddd
1111 ccc
Time taken: 0.747 seconds, Fetched: 3 row(s)
2. 通过查询语句向表中插入数据(Insert)
1)创建一张表
hive (default)> create table student_par(id int, name string) row format delimited fields terminated by '';
OK
Time taken: 3.124 seconds
2)基本插入数据,插入数据会生成MR任务,这样插入的数据会放在最前面
hive (default)> insert into table student_par values(1,'wangwu'),(2,'zhaoliu');
Automatically selecting local only mode for query
Query ID = atguigu_20211217144118_3d15bc1c-c822-41f3-a62b-62e1e57fa3a2
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2021-12-17 14:41:21,668 Stage-1 map = 0%, reduce = 0%
2021-12-17 14:41:22,688 Stage-1 map = 100%, reduce = 0%
2021-12-17 14:41:23,694 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local1983525660_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://hadoop100:8020/user/hive/warehouse/student_par/.hive-staging_hive_2021-12-17_14-41-18_153_8125208710986864882-1/-ext-10000
Loading data to table default.student_par
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 82823431 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
col1 col2
Time taken: 8.727 seconds
3)基本模式插入(根据单张表查询结果)
下面将hive3里面的student表的内容overwrite到student_par表格里面
hive (default)> insert overwrite table student_par select id, name from hive3.student;
OK
id name
Time taken: 5.391 seconds
hive (default)> select * from student_par;
OK
student_par.id student_par.name
1001 zzz
1002 ddd
1111 ccc
Time taken: 0.275 seconds, Fetched: 3 row(s)
insert into:以追加数据的方式插入到表或分区,原有数据不会删除
insert overwrite:会覆盖表中已存在的数据
注意:insert 不支持插入部分字段
4)多表(多分区)插入模式(根据多张表查询结果)
hive (default)> from student
insert overwrite table student partition(month='201707')
select id, name where month='201709'
insert overwrite table student partition(month='201706')
select id, name where month='201709';
3. 查询语句中创建表并加载数据(As Select)
详见4.5.1 章创建表。根据查询结果创建表(查询的结果会添加到新创建的表中)
hive (default)> create table if not exists student1 as select id,name from hive3.student;
OK
id name
Time taken: 3.2 seconds
hive (default)> select * from student1;
OK
student1.id student1.name
1001 zzz
1002 ddd
1111 ccc
Time taken: 0.231 seconds, Fetched: 3 row(s)
4. 创建表时通过 Location 指定加载数据路径
1)上传数据到 hdfs 上
hive (default)> dfs -mkdir /student;
hive (default)> dfs -put /home/atguigu/student.txt /student;
2)创建表,并指定在 hdfs 上的位置(最好使用外部表)
hive (default)> create external table if not exists student5(id int, name string) row format delimited fields terminated by "" location '/student';
OK
Time taken: 0.482 seconds
3)查询数据
hive (default)> select * from student5;
OK
student5.id student5.name
1001 zzz
1002 ddd
1111 ccc
Time taken: 0.45 seconds, Fetched: 3 row(s)
5. Import 数据到指定 Hive 表中
注意:先用 export 导出后,再将数据导入(不然会报非法路径的错误)。
hive (default)> import table student5 from '/user/hive/warehouse/student/student.txt';