【大数据】Hive SQL语言（学习笔记）

news2026/2/14 22:25:47

一、DDL数据定义语言

1、建库

1）数据库结构

默认的数据库叫做default，存储于HDFS的：/user/hive/warehouse

用户自己创建的数据库存储位置：/user/hive/warehouse/database_name.db

2）创建数据库

create (database|schema) [if not exists] database_name
[comment database_comment]
[location hdfs_path]
[with dbproperties (property_name=property_value, ...)];

comment：数据库的注释说明语句
location：指定数据库在HDFS存储位置，默认/user/hive/warehouse/dbname.db
with dbproperties：用于指定一些数据库的属性配置。

3）删除数据库

drop (database|schema) [if exists] database_name [restrict|cascade];

默认行为是RESTRICT，这意味着仅在数据库为空时才删除它。

若要删除带有表的数据库（不为空的数据库），可以使用CASCADE。

2、建表

1）建表语法树

create table  [if not exists] [db_name.]table_name
(col_name data_type [comment col_comment], ... )
[comment table_comment]
[row format delimited …];

-- 创建数据库并切换使用
create database if not exists itheima;
use itheima;
-- ddl create table
create table t_archer(
    id int comment "ID",
    name string,
    hp_max int,
    mp_max int,
    attack_max int,
    defense_max int,
    attack_range string,
    role_main string,
    role_assist string
)
row format delimited
fields terminated by "\t";

2）指定分隔符

ROW FORMAT DELIMITED：用于指定字段之间等相关的分隔符才能正确的读取解析数据。

建表时如果没有row format语法指定分隔符，则采用默认分隔符（‘\001’）。

3）Hive注释信息中文乱码解决

-- 注意 下面sql语句是需要在MySQL中执行  修改Hive存储的元数据信息（metadata）
use hive3;
show tables;

alter table hive3.COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
alter table hive3.TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table hive3.PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8 ;
alter table hive3.PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;
alter table hive3.INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;

3、show语法

Show相关的语句可以帮助用户查询相关信息

-- 1、显示所有数据库 SCHEMAS和DATABASES的用法 功能一样
show databases;
show schemas;

-- 2、显示当前数据库所有表
show tables;
SHOW TABLES [IN database_name]; --指定某个数据库

-- 3、查询显示一张表的元数据信息
desc formatted t_team_ace_player;

二、DML数据操纵语言

1、Load 加载

在Hive中建表成功之后，就会在HDFS上创建一个与之对应的文件夹，文件夹名是表名

文件夹父路径：/user/hive/warehouse/xxx.db

文件夹父路径是由参数hive.metastore.warehouse.dir控制

Load语法：

load data [local] inpath 'filepath' [overwrite] into table tablename;

filepath：表示待移动数据的路径。
指定local：将在本地文件系统（Hiveserver2服务所在机器）中查找文件路径。
没有指定local：直接使用这个URI。

2、Insert 插入

插入数据会触发MapReduce，慢死算了…😅

insert + select：将后面查询返回的结果作为内容插入到指定表中，需要保证查询结果列的数目和需要插入数据表格的列数目一致。

insert into table student_from_insert select num,name from student;

三、DQL数据查询语言

就是正常的sql语言 😅

四、Hive 函数

1、函数分类

UDF：普通函数，一进一出

UDAF：聚合函数，多进一出

UDTF：表生成函数，一进多出

2、内置函数

1）字符串函数

-- Hive 常用的内置函数
show functions;
describe function extended count;


-- String Functions 字符串函数
select length("itcast");
select reverse("itcast");

select concat("angela","baby");
-- 带分隔符字符串连接函数：concat_ws(separator, [string | array(string)]+)
select concat_ws('.', 'www', array('itcast', 'cn'));

-- 字符串截取函数：substr(str, pos[, len]) 或者  substring(str, pos[, len])
select substr("angelababy",-2); --pos是从1开始的索引，如果为负数则倒着数
select substr("angelababy",2,2);

-- 分割字符串函数: split(str, regex)
-- split针对字符串数据进行切割 返回是数组array 可以通过数组的下标取内部的元素 注意下标从0开始的
select split('apache hive', ' ');
select split('apache hive', ' ')[0];
select split('apache hive', ' ')[1];

2）日期函数

-- Date Functions 日期函数
-- 获取当前日期: current_date
select current_date();
-- 获取当前UNIX时间戳函数: unix_timestamp
select unix_timestamp();
-- 日期转UNIX时间戳函数: unix_timestamp
select unix_timestamp("2011-12-07 13:01:03");
-- 指定格式日期转UNIX时间戳函数: unix_timestamp
select unix_timestamp('20111207 13:01:03','yyyyMMdd HH:mm:ss');
-- UNIX时间戳转日期函数: from_unixtime
select from_unixtime(1618238391);
select from_unixtime(0, 'yyyy-MM-dd HH:mm:ss');
-- 日期比较函数: datediff  日期格式要求'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'
select datediff('2012-12-08','2012-05-09');
-- 日期增加函数: date_add
select date_add('2012-02-28',10);
-- 日期减少函数: date_sub
select date_sub('2012-01-1',10);

3）数学函数

-- Mathematical Functions 数学函数
-- 取整函数: round  返回double类型的整数值部分 （遵循四舍五入）
select round(3.1415926);
-- 指定精度取整函数: round(double a, int d) 返回指定精度d的double类型
select round(3.1415926,4);
-- 取随机数函数: rand 每次执行都不一样 返回一个0到1范围内的随机数
select rand();
-- 指定种子取随机数函数: rand(int seed) 得到一个稳定的随机数序列
select rand(3);

4）条件函数

-- Conditional Functions 条件函数
-- 使用之前课程创建好的student表数据
select * from student limit 3;

-- if条件判断: if(boolean testCondition, T valueTrue, T valueFalseOrNull)
select if(1=2,100,200);
select if(sex ='男','M','W') from student limit 3;

-- 空值转换函数: nvl(T value, T default_value)
select nvl("allen","itcast");
select nvl(null,"itcast");

-- 条件转换函数: CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
select case 100 when 50 then 'tom' when 100 then 'mary' else 'tim' end;
select case sex when '男' then 'male' else 'female' end from student limit 3;