【hudi】数据湖客户端运维工具Hudi-Cli实战

news2025/1/11 5:58:02

数据湖客户端运维工具Hudi-Cli实战

help

hudi:student_mysql_cdc_hudi_fl->help
AVAILABLE COMMANDS

Archived Commits Command
       trigger archival: trigger archival
       show archived commits: Read commits from archived files and show details
       show archived commit stats: Read commits from archived files and show details

Bootstrap Command
       bootstrap run: Run a bootstrap action for current Hudi table
       bootstrap index showmapping: Show bootstrap index mapping
       bootstrap index showpartitions: Show bootstrap indexed partitions

Built-In Commands
       help: Display help about available commands
       stacktrace: Display the full stacktrace of the last error.
       clear: Clear the shell screen.
       quit, exit: Exit the shell.
       history: Display or save the history of previously run commands
       version: Show version info
       script: Read and execute commands from a file.

Cleans Command
       cleans show: Show the cleans
       clean showpartitions: Show partition level details of a clean
       cleans run: run clean

Clustering Command
       clustering run: Run Clustering
       clustering scheduleAndExecute: Run Clustering. Make a cluster plan first and execute that plan immediately
       clustering schedule: Schedule Clustering

Commits Command
       commits compare: Compare commits with another Hoodie table
       commits sync: Sync commits with another Hoodie table
       commit showpartitions: Show partition level details of a commit
       commits show: Show the commits
       commits showarchived: Show the archived commits
       commit showfiles: Show file level details of a commit
       commit show_write_stats: Show write stats of a commit

Compaction Command
       compaction run: Run Compaction for given instant time
       compaction scheduleAndExecute: Schedule compaction plan and execute this plan
       compaction showarchived: Shows compaction details for a specific compaction instant
       compaction repair: Renames the files to make them consistent with the timeline as dictated by Hoodie metadata. Use when compaction unschedule fails partially.
       compaction schedule: Schedule Compaction
       compaction show: Shows compaction details for a specific compaction instant
       compaction unscheduleFileId: UnSchedule Compaction for a fileId
       compaction validate: Validate Compaction
       compaction unschedule: Unschedule Compaction
       compactions show all: Shows all compactions that are in active timeline
       compactions showarchived: Shows compaction details for specified time window

Diff Command
       diff partition: Check how file differs across range of commits. It is meant to be used only for partitioned tables.
       diff file: Check how file differs across range of commits

Export Command
       export instants: Export Instants and their metadata from the Timeline

File System View Command
       show fsview all: Show entire file-system view
       show fsview latest: Show latest file-system view

HDFS Parquet Import Command
       hdfsparquetimport: Imports Parquet table to a hoodie table

Hoodie Log File Command
       show logfile records: Read records from log files
       show logfile metadata: Read commit metadata from log files

Hoodie Sync Validate Command
       sync validate: Validate the sync by counting the number of records

Kerberos Authentication Command
       kerberos kdestroy: Destroy Kerberos authentication
       kerberos kinit: Perform Kerberos authentication

Markers Command
       marker delete: Delete the marker

Metadata Command
       metadata stats: Print stats about the metadata
       metadata list-files: Print a list of all files in a partition from the metadata
       metadata list-partitions: List all partitions from metadata
       metadata validate-files: Validate all files in all partitions from the metadata
       metadata delete: Remove the Metadata Table
       metadata create: Create the Metadata Table if it does not exist
       metadata init: Update the metadata table from commits since the creation
       metadata set: Set options for Metadata Table

Repairs Command
       repair deduplicate: De-duplicate a partition path contains duplicates & produce repaired files to replace with
       rename partition: Rename partition. Usage: rename partition --oldPartition <oldPartition> --newPartition <newPartition>
       repair overwrite-hoodie-props: Overwrite hoodie.properties with provided file. Risky operation. Proceed with caution!
       repair migrate-partition-meta: Migrate all partition meta file currently stored in text format to be stored in base file format. See HoodieTableConfig#PARTITION_METAFILE_USE_DATA_FORMAT.
       repair addpartitionmeta: Add partition metadata to a table, if not present
       repair deprecated partition: Repair deprecated partition ("default"). Re-writes data from the deprecated partition into __HIVE_DEFAULT_PARTITION__
       repair show empty commit metadata: show failed commits
       repair corrupted clean files: repair corrupted clean files

Rollbacks Command
       show rollback: Show details of a rollback instant
       commit rollback: Rollback a commit
       show rollbacks: List all rollback instants

Savepoints Command
       savepoint rollback: Savepoint a commit
       savepoints show: Show the savepoints
       savepoint create: Savepoint a commit
       savepoint delete: Delete the savepoint

Spark Env Command
       set: Set spark launcher env to cli
       show env: Show spark launcher env by key
       show envs all: Show spark launcher envs

Stats Command
       stats filesizes: File Sizes. Display summary stats on sizes of files
       stats wa: Write Amplification. Ratio of how many records were upserted to how many records were actually written

Table Command
       table update-configs: Update the table configs with configs with provided file.
       table recover-configs: Recover table configs, from update/delete that failed midway.
       refresh, metadata refresh, commits refresh, cleans refresh, savepoints refresh: Refresh table metadata
       create: Create a hoodie table if not present
       table delete-configs: Delete the supplied table configs from the table.
       fetch table schema: Fetches latest table schema
       connect: Connect to a hoodie table
       desc: Describe Hoodie Table properties

Temp View Command
       temp_query, temp query: query against created temp view
       temps_show, temps show: Show all views name
       temp_delete, temp delete: Delete view name

Timeline Command
       metadata timeline show incomplete: List all incomplete instants in active timeline of metadata table
       metadata timeline show active: List all instants in active timeline of metadata table
       timeline show incomplete: List all incomplete instants in active timeline
       timeline show active: List all instants in active timeline

Upgrade Or Downgrade Command
       downgrade table: Downgrades a table
       upgrade table: Upgrades a table

Utils Command
       utils loadClass: Load a class

kerberos

kerberos kinit --principal xxx@XXXXX.COM --keytab /xxx/kerberos/xxx.keytab

在这里插入图片描述
先看下样例表的表结构:
分区表哦!

-- FLink SQL建表语句
create table student_mysql_cdc_hudi_fl(
  `_hoodie_commit_time` string comment 'hoodie commit time',
  `_hoodie_commit_seqno` string comment 'hoodie commit seqno',
  `_hoodie_record_key` string comment 'hoodie record key',
  `_hoodie_partition_path` string comment 'hoodie partition path',
  `_hoodie_file_name` string comment 'hoodie file name',
  `s_id` bigint not null comment '主键',
  `s_name` string not null comment '姓名',
  `s_age` int comment '年龄',
  `s_sex` string comment '性别',
  `s_part` string not null comment '分区字段',
  `create_time` timestamp(6) not null comment '创建时间',
  `dl_ts` timestamp(6) not null,
  `dl_s_sex` string not null,
  PRIMARY KEY(s_id) NOT ENFORCED
)PARTITIONED BY (`dl_s_sex`) with ( 
,'connector' = 'hudi'
,'hive_sync.table' = 'student_mysql_cdc_hudi'
,'hoodie.datasource.write.drop.partition.columns' = 'true'
,'hoodie.datasource.write.hive_style_partitioning' = 'true'
,'hoodie.datasource.write.partitionpath.field' = 'dl_s_sex'
,'hoodie.datasource.write.precombine.field' = 'dl_ts'
,'path' = 'hdfs://xxx/hudi_db.db/student_mysql_cdc_hudi'
,'precombine.field' = 'dl_ts'
,'primaryKey' = 's_id'
)

table

connect

connect --path /xxx/hudi_db.db/student_mysql_cdc_hudi

在这里插入图片描述

desc

desc

在这里插入图片描述

refresh

refresh

在这里插入图片描述

fetch table schema

fetch table schema

在这里插入图片描述

  "type" : "record",
  "name" : "student_mysql_cdc_hudi_fl_record",
  "namespace" : "hoodie.student_mysql_cdc_hudi_fl",
  "fields" : [ {
    "name" : "_hoodie_commit_time",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_commit_seqno",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_record_key",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_partition_path",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_file_name",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "_hoodie_operation",
    "type" : [ "null", "string" ],
    "doc" : "",
    "default" : null
  }, {
    "name" : "s_id",
    "type" : "long"
  }, {
    "name" : "s_name",
    "type" : "string"
  }, {
    "name" : "s_age",
    "type" : [ "null", "int" ],
    "default" : null
  }, {
    "name" : "s_sex",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "s_part",
    "type" : "string"
  }, {
    "name" : "create_time",
    "type" : {
      "type" : "long",
      "logicalType" : "timestamp-micros"
    }
  }, {
    "name" : "dl_ts",
    "type" : {
      "type" : "long",
      "logicalType" : "timestamp-micros"
    }
  }, {
    "name" : "dl_s_sex",
    "type" : "string"
  } ]
}

commit

commits show

commits show --sortBy "Total Bytes Written" --desc true --limit 10

在这里插入图片描述

commits showarchived

commits showarchived

在这里插入图片描述

commit showfiles

commit showfiles --commit 20230915164442583

在这里插入图片描述

commit showfiles --commit 20230915164442583 --sortBy "Partition Path"

在这里插入图片描述

commit showpartitions

commit showpartitions --commit 20230915164442583

在这里插入图片描述

commit showpartitions --commit 20230915164442583 --sortBy "Total Bytes Written" --desc true --limit 10

在这里插入图片描述

commit show_write_stats

commit show_write_stats --commit 20230915164442583

在这里插入图片描述

File System View

show fsview all

show fsview all

在这里插入图片描述

show fsview latest

show fsview latest --partitionPath dl_s_sex=female

在这里插入图片描述

Log File

show logfile records

# 注意10 是需要取数据记录条数
show logfile records 10 /xxx/hudi_db.db/student_mysql_cdc_hudi/dl_s_sex=female/.bf4b06b4-e897-42df-8a3c-a3a2f737d367_20230915163856302.log.1_0-1-0

在这里插入图片描述
数据是json格式的:

{
  "_hoodie_commit_time": "20230915163856302",
  "_hoodie_commit_seqno": "20230915163856302_0_83",
  "_hoodie_record_key": "88",
  "_hoodie_partition_path": "dl_s_sex=female",
  "_hoodie_file_name": "bf4b06b4-e897-42df-8a3c-a3a2f737d367",
  "_hoodie_operation": "I",
  "s_id": 88,
  "s_name": "傅亮",
  "s_age": 4,
  "s_sex": "female",
  "s_part": "2017/11/20",
  "create_time": 790128367000000,
  "dl_ts": -28800000000,
  "dl_s_sex": "female"
}

show logfile metadata

show logfile metadata /xxx/xxx/hive/hudi_db.db/student_mysql_cdc_hudi/dl_s_sex=female/dl_create_time_yyyy=1971/dl_create_time_mm=03/.dadac2dd-7e5e-46c3-9b27-f1f03e04a90c_20230915151426134.log.1_0

图片中还有FooterMetadata列没显示全
在这里插入图片描述

{
  "SCHEMA": "{\"type\":\"record\",\"name\":\"student_mysql_cdc_hudi_fl_record\",\"namespace\":\"hoodie.student_mysql_cdc_hudi_fl\",\"fields\":[{\"name\":\"_hoodie_commit_time\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_commit_seqno\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_record_key\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_partition_path\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_file_name\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_operation\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"s_id\",\"type\":\"long\"},{\"name\":\"s_name\",\"type\":\"string\"},{\"name\":\"s_age\",\"type\":[\"null\",\"int\"],\"default\":null},{\"name\":\"s_sex\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"s_part\",\"type\":\"string\"},{\"name\":\"create_time\",\"type\":{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"}},{\"name\":\"dl_ts\",\"type\":{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"}},{\"name\":\"dl_s_sex\",\"type\":\"string\"}]}",
  "INSTANT_TIME": "20230915164442583"
}

differ

diff partition

diff partition dl_s_sex=female

在这里插入图片描述

differ file
# 需要提供FileID。就是log文件的部分
# 如log文件:.bf4b06b4-e897-42df-8a3c-a3a2f737d367_20230915163856302.log.1_0-1-0
diff file bf4b06b4-e897-42df-8a3c-a3a2f737d367

在这里插入图片描述在这里插入图片描述

rollbacks

show rollbacks

show rollbacks

在这里插入图片描述

stats

stats filesizes

stats filesizes --partitionPath dl_s_sex=female --sortBy "95th" --desc true --limit 3

在这里插入图片描述

stats wa

stats wa

在这里插入图片描述

compaction

compactions show all

compactions show all

待续!!!

compactions showarchived

compactions showarchived

在这里插入图片描述

compaction showarchived

compaction showarchived 20230915200042501

在这里插入图片描述

compaction show

compaction show 20230915174042680

在这里插入图片描述

参考文章:
Apache Hudi数据湖hudi-cli客户端使用

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1023956.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

<硬件设计>运放+三极管组成的恒流源VI电路设计与分析

目录 01 原理介绍&描述 运放的虚短和虚断 02 恒流源描述&分析 简单恒流源电路 简单恒流源电路描述 恒流源电路分析 恒流源VI电路 恒流源VI电路描述 恒流源VI电路分析 恒流源应用场景 03 恒流源VI电路示例 示例原理图&描述 恒流原理分析 恒流原理 恒…

毕业设计|基于51单片机的空气质量检测PM2.5粉尘检测温度设计

基于51单片机的空气质量检测PM2.5粉尘检测温度设计 1、项目简介1.1 系统构成1.2 系统功能 2、部分电路设计2.1 LED信号指示灯电路设计2.2 LCD1602显示电路2.3 PM2.5粉尘检测电路设计 3、部分代码展示3.1 串口初始化3.1 定时器初始化3.2 LCD1602显示函数 4 演示视频及代码资料获…

Java实现截取视频第一帧

目录 前言 一、通过Java借助第三方库实现 1.引用ffmpeg 使用maven&#xff0c;导入pom依赖&#xff1a; 工具类 2.引用jcodec 二、使用第三方存储自带的方法实现&#xff08;如阿里云OSS、华为云OBS&#xff09; 前言 在实际项目中&#xff0c;会遇到上传视频后&#xf…

SpringBoot轻松实现项目集成Knife4j接口文档

Knife4j 介绍 Knife4j 官网 Knife4j是一款基于Swagger生成API文档的增强工具&#xff0c;它简化了开发者构建和管理RESTful API文档的过程。通过自动扫描项目中的接口信息&#xff0c;Knife4j能够生成详细、易读的API文档&#xff0c;无需手动编写和维护。它提供交互式的接口调…

以太网传输距离以及延长办法

以太网传输距离与介质 以太网的标准传输距离取决于不同的以太网类型和传输介质。以下是一些常见的以太网类型和它们的标准传输距离&#xff1a; 以太网&#xff08;Ethernet&#xff09;&#xff1a;传输距离最长为100米&#xff0c;使用双绞线作为传输介质。 快速以太网&…

我的Qt作品(19)使用Qt写一个轻量级的视觉框架---第2章,实现思维导图方式的流程图运行

上次写的第1章介绍了主界面的设计。 https://blog.csdn.net/libaineu2004/article/details/130277151 本次是第2章&#xff0c;主要介绍流程图的运行。 本作品采用的是QtOpenCV组合方式开发。流程图的设计思想其实就是数据结构的【图】。通过遍历每个节点来执行各个算法。 1…

深度学习数据集的文本制作和读取

文章目录 制作数据集的文本文件读取文本文件 制作数据集的文本文件 import os from os.path import join import random import config args config.argsclass SplitDataset:def __init__(self):self.data_root_path args.data_root_pathself.dataset_split_rate args.data…

【网络应用与安全】第一次作业

文章目录 一、熟悉实验室运行环境1 - 登录2 - 熟悉Linux环境3 - 远程登录4 - 使用Git 二、网络延迟三、网络应用四、HTTP五、Network Port六、TCP Protocol七、实验室系统1 - LDAP2 - Kerberos3 - Ansible 八、Linux运行环境和Nginx1 - 安装Ubuntu22.04.3LTS版本2 - 安装Nginx3…

Linux:基础开发工具之yum,vim,gcc的使用

文章目录 yumvimgcc 本篇主要总结的是Linux下开发工具 yumvimgcc/g yum 什么是yum&#xff1f; 不管是在手机移动端还是pc端&#xff0c;不管是什么操作系统&#xff0c;当用户想要下载一些内容或者工具的时候&#xff0c;都需要到一个特定的位置进行下载&#xff0c;例如在…

图片格式大全

青春不能回头&#xff0c;青春也没有终点。 大全介绍 图片格式有多种&#xff0c;每种格式都有其独特的特性和用途。以下是一些常见的图片格式以及它们的介绍&#xff1a; JPEG&#xff08;Joint Photographic Experts Group&#xff09;&#xff1a; 文件扩展名&#xff1a;…

Whisper + NemoASR + ChatGPT 实现语言转文字、说话人识别、内容总结等功能

引言 2023年&#xff0c;IT领域的焦点无疑是ChatGPT&#xff0c;然而&#xff0c;同属OpenAI的开源产品Whisper似乎鲜少引起足够的注意。 Whisper是一款自动语音识别系统&#xff0c;可以识别来自99种不同语言的语音并将其转录为文字。 如果说ChatGPT为计算机赋予了大脑&…

解决flutter不识别yaml里面配置的git项目

解决办法找到相应的 git路径&#xff0c;然后手动 git pull 暂时先用这个笨方法&#xff0c;后面有更好的解决办法了再说 studio 自己拉取的项目里面没有ios 和lib包

知识付费平台开发技术实践:构建数字学习的未来

引言 知识付费平台的兴起正在塑造着数字学习的未来。本文将介绍一些关键的技术实践&#xff0c;帮助开发者构建强大的知识付费平台&#xff0c;提供出色的数字学习体验。 1. 选择适当的技术栈 在开始知识付费平台的开发之前&#xff0c;首要任务是选择适当的技术栈。这包括…

App测试中iOS和Android的差异

1、系统版本&#xff1a; iOS和Android系统版本的更新速度、使用人数比例以及功能的不同都可能导致应用程序在不同操作系统版本上的表现和兼容性存在区别。 例如&#xff0c;在iOS平台上&#xff0c;很多用户会更快地升级到最新版本的iOS系统&#xff0c;而在Android平台上&a…

如何用C语言实现 IoT Core

涂鸦 IoT Core SDK 使用 C 语言实现&#xff0c;支持涂鸦设备模型协议&#xff0c;适用于开发者自主开发硬件设备逻辑业务接入涂鸦。 功能概述 涂鸦 IoT Core SDK 提供设备激活、发送上下行 DP 和固件 OTA 升级等基础业务接口封装。SDK 不依赖具体设备平台及操作系统环境&…

Java毕业设计-基于SpringBoot的租房网站的设计与实现

大家好&#xff0c;今天为大家打来的是基于SpringBoot的租房网站的设计与实现 博主介绍&#xff1a;✌程序员徐师兄、7年大厂程序员经历。全网粉丝30W、csdn博客专家、掘金/华为云/阿里云/InfoQ等平台优质作者、专注于Java技术领域和毕业项目实战✌ 文章目录 一、前言介绍二、主…

如何理解高效IO

目录 前言 1.如何理解高效的IO 2.五种IO模型 3.非阻塞IO 4.非阻塞代码编写 总结 前言 哈喽&#xff0c;很高兴和大家见面&#xff01;今天我们要介绍的关于IO的话题&#xff0c;在计算机中IO是非常常规的操作&#xff0c;例如将数据显示到外设&#xff0c;或者将数据从主…

将本地前端工程中的npm依赖上传到Nexus

【问题背景】 用Nexus搭建了内网的依赖仓库&#xff0c;需要将前端工程中node_modules中的依赖上传到Nexus上&#xff0c;但是node_modules中的依赖已经是解压后的状态&#xff0c;如果直接机械地将其简单地打包上传到Nexus&#xff0c;那么无法通过npm install下载使用。故有…

安防监控系统/视频云存储/监控平台EasyCVR服务器解释器出现变更该如何修改?

安防视频监控/视频集中存储/云存储/磁盘阵列EasyCVR平台可拓展性强、视频能力灵活、部署轻快&#xff0c;可支持的主流标准协议有国标GB28181、RTSP/Onvif、RTMP等&#xff0c;以及支持厂家私有协议与SDK接入&#xff0c;包括海康Ehome、海大宇等设备的SDK等。平台既具备传统安…

软件测试/测试开发丨利用人工智能ChatGPT批量生成测试数据

点此获取更多相关资料 简介 测试数据是指一组专注于为测试服务的数据&#xff0c;既可以作为功能的输入去验证输出&#xff0c;也可以去触发各类异常场景。 测试数据的设计尤为重要&#xff0c;等价类、边界值、正交法等测试用例设计方法都是为了更全面地设计对应的测试数据…