kyuubi、sparksql部署实战与连接

news2025/9/19 20:21:12

一、下载spark和kyuubi的软件包

spark官网下载

https://spark.apache.org/downloads.html

kyuubi官网下载

https://www.apache.org/dyn/closer.lua/kyuubi/kyuubi-1.9.0/apache-kyuubi-1.9.0-bin.tgz

二、部署spark

1、spark配置spark-env.sh

YARN_CONF_DIR=/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/etc/hadoop

2、spark中使用hive元数据，需添加hive的hive-site.xml

三、配置kyuubi环境

1、kyuubi-defaults.conf

y
kyuubi.frontend.bind.host                bigdata30
kyuubi.frontend.protocols                THRIFT_BINARY,REST
kyuubi.frontend.thrift.binary.bind.port  10009
# kyuubi.frontend.rest.bind.port           10099
#
kyuubi.engine.type                       SPARK_SQL
kyuubi.engine.share.level                USER
# kyuubi.session.engine.initialize.timeout PT3M

# 高可用
kyuubi.ha.enabled                           true
kyuubi.ha.client.class                      org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient 

kyuubi.ha.addresses                         bigdata30:2181,bigdata31:2181,bigdata32:2181
kyuubi.ha.namespace                         kyuubi

# 如果启动了kerberos需要配置如下

# kyuubi.ha.zookeeper.auth.type               KERBEROS

kyuubi.ha.zookeeper.auth.principal          zookeeper/_HOST@HADOOP.COM
kyuubi.ha.zookeeper.auth.keytab             /etc/security/keytabs/zookeeper.keytab

# kyuubi 启动kerberos认证配置
kyuubi.authentication                       KERBEROS
kyuubi.kinit.principal                      hive/_HOST@HADOOP.COM
kyuubi.kinit.keytab                         /etc/security/keytabs/hive.keytab 

#kyuuibi pool
kyuubi.backend.engine.exec.pool.size  30
kyuubi.backend.engine.exec.pool.wait.queue.size  100

#spark
spark.master           yarn
# spark.driver.memory    2g
# spark.executor.memory  4g
# spark.driver.cores     1
# spark.executor.cores   3


#spark sql优化
spark.sql.adaptive.enabled              true
spark.sql.adaptive.forceApply              false
spark.sql.adaptive.logLevel              info
spark.sql.adaptive.advisoryPartitionSizeInBytes              256m
spark.sql.adaptive.coalescePartitions.enabled              true
spark.sql.adaptive.coalescePartitions.minPartitionNum              1
spark.sql.adaptive.coalescePartitions.initialPartitionNum              1
spark.sql.adaptive.fetchShuffleBlocksInBatch              true
spark.sql.adaptive.localShuffleReader.enabled              true
spark.sql.adaptive.skewJoin.enabled              true
spark.sql.adaptive.skewJoin.skewedPartitionFactor              5
spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes              400m
spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin              0.2
# spark.sql.adaptive.optimizer.excludedRules
spark.sql.autoBroadcastJoinThreshold              -1
# Details in https://kyuubi.readthedocs.io/en/master/configuration/settings.html

# #静态资源申请
# spark.executor.instances      2
# spark.executor.cores          2
# spark.executor.memory         2g

# 动态资源申请
spark.dynamicAllocation.enabled              true
# # ##false if prefer shuffle tracking than ESS
# spark.shuffle.service.enabled              true
spark.dynamicAllocation.initialExecutors              1
spark.dynamicAllocation.minExecutors              1
spark.dynamicAllocation.maxExecutors              5
# spark.executor.cores 3
# spark.exevutor.memory 4g
spark.dynamicAllocation.executorAllocationRatio              0.5
spark.dynamicAllocation.executorIdleTimeout              60s
spark.dynamicAllocation.cachedExecutorIdleTimeout              30min
# true if prefer shuffle tracking than ESS
spark.dynamicAllocation.shuffleTracking.enabled              true
spark.dynamicAllocation.shuffleTracking.timeout              30min
spark.dynamicAllocation.schedulerBacklogTimeout              1s
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout              1s
spark.cleaner.periodicGC.interval              5min
# # For a user named kent
# ___hive___.spark.dynamicAllocation.maxExecutors  10

2、kyuubi-env.sh

export JAVA_HOME=/usr/java/jdk1.8.0_131
export SPARK_HOME=/home/soft/spark-3.5.1-bin-hadoop3
# export FLINK_HOME=/opt/flink
export HIVE_HOME=/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hive
# export FLINK_HADOOP_CLASSPATH=/path/to/hadoop-client-runtime-3.3.2.jar:/path/to/hadoop-client-api-3.3.2.jar
# export HIVE_HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/common/lib/commons-collections-3.2.2.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-runtime-3.1.0.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-api-3.1.0.jar:${HADOOP_HOME}/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar
export HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop
export YARN_CONF_DIR=/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/etc/hadoop
export KYUUBI_JAVA_OPTS="-Xmx10g -XX:MaxMetaspaceSize=512m -XX:MaxDirectMemorySize=1024m -XX:+UseG1GC -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:+UseCondCardMark -XX:+UseGCOverheadLimit -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./logs -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -verbose:gc -Xloggc:./logs/kyuubi-server-gc-%t.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=20M"
export KYUUBI_BEELINE_OPTS="-Xmx2g -XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+UseCondCardMark"

3、创建keytab文件到配置指定路径

如果是高可用的每个机器放置好keytab文件

[root@bigdata31 ~]# ll /etc/security/keytabs/
总用量 12
-rw-r--r-- 1 root root  970 4月  28 23:12 hive.keytab
-rw-r--r-- 1 root root 1040 4月  28 21:47 zookeeper.keytab

4、启动与关闭

sudo -u hive bin/kyuubi start
sudo -u hive bin/kyuubi stop
或则
sudo -u hive bin/kyuubi restart

四、测试连接

1、beline连接

1.1、非ha方式

[root@bigdata30 apache-kyuubi-1.9.0-bin]# beeline -u 'jdbc:hive2://bigdata30:10009/;principal=hive/_HOST@HADOOP.COM'
Connecting to jdbc:hive2://bigdata30:10009/;principal=hive/_HOST@HADOOP.COM
Connected to: Spark SQL (version 3.5.1)
Driver: Hive JDBC (version 2.1.1-cdh6.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.1.1-cdh6.2.1 by Apache Hive
0: jdbc:hive2://bigdata30:10009/>

1.2、ha方式连接

beeline -u 'jdbc:hive2://bigdata30:2181,bigdata31:2181,bigdata32:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi;principal=hive/_HOST@HADOOP.COM'

[root@bigdata30 apache-kyuubi-1.9.0-bin]# beeline -u 'jdbc:hive2://bigdata30:2181,bigdata31:2181,bigdata32:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi;principal=hive/_HOST@HADOOP.COM'
Connecting to jdbc:hive2://bigdata30:2181,bigdata31:2181,bigdata32:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi;principal=hive/bigdata30@HADOOP.COM
24/04/28 22:56:40 [main]: INFO jdbc.HiveConnection: Connected to 10.8.3.30:10009
Connected to: Spark SQL (version 3.5.1)
Driver: Hive JDBC (version 2.1.1-cdh6.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.1.1-cdh6.2.1 by Apache Hive
0: jdbc:hive2://bigdata30:2181,bigdata31:2181>

2、dbeaver连接

软件包 hive-jdbc-uber-2.6.3.0-235.jar

2.1、非ha连接

url模板

jdbc:hive2://{host}[:{port}][/{database}];AuthMech=1;KrbRealm=HADOOP.COM;KrbHostFQDN={host};KrbServiceName={server};KrbAuthType=2;principal={user}/_HOST@HADOOP.COM

连接信息

jdbc:hive2://bigdata30:10009/default;AuthMech=1;KrbRealm=HADOOP.COM;KrbHostFQDN=bigdata30;KrbServiceName=hive;KrbAuthType=2;principal=hive/_HOST@HADOOP.COM

还有一种类似beline的连接方式，非常精简，看着清爽：

jdbc:hive2://{host}[:{port}][/{database}];principal={user}/_HOST@HADOOP.COM

jdbc:hive2://bigdata30:10009/default;principal=hive/_HOST@HADOOP.COM