hive表向es集群同步数据20230830

news2024/10/7 20:35:26

背景:实际开发中遇到一个需求,就是需要将hive表中的数据同步到es集群中,之前没有做过,查看一些帖子,发现有一种方案挺不错的,记录一下。

我的电脑环境如下

软件名称版本
Hadoop3.3.0
hive3.1.3
jdk1.8
Elasticsearch7.10.2
kibana7.10.2
logstash7.10.2
ES-Hadoop7.10.2

ES-Hadoop的引入

hadoop、hive和es的关系如下图,中间有一个组件叫做ES-Hadoop,是连接Hadoop和es的桥梁,es的官网上提供了这个组件,解决Hadoop和es之间的数据同步问题。

在这里插入图片描述

下面说一下数据同步的具体步骤

第一步:去es的官网上下载ES-Hadoop组件

注意:ES-Hadoop 的版本需要和es的版本是一致的
官网下载地址:es的官网链接-点我
在输入框中输入es-hadoop,在version版本处找到和你es相同的版本即可
在这里插入图片描述

第二步:上传这个安装包到集群上,解压完成后将其中的jar包上传到HDFS上你新建的目录

命令如下;

hadoop fs -put elasticsearch-hadoop-7.10.2.jar /user/hive/warehouse/es_hadoop/

第三步:将这个jar包添加到hive中

在hive的终端下输入如下命令

add jar hdfs:///user/hive/warehouse/es_hadoop/elasticsearch-hadoop-7.10.2.jar;

第四步:在hive中创建临时表,作为测试用

创建一个hive中的表,并且添加上测试数据

CREATE EXTERNAL TABLE `hive_test`(
  `name` string, 
  `age` int, 
  `hight` int)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
WITH SERDEPROPERTIES ( 
  'field.delim'=',', 
  'serialization.format'=',') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://master:9000/user/hive/warehouse/pdata_dynamic.db/hive_test';
添加数据,展示如下
hive> select * from hive_test;
OK
吴占喜  30      175
令狐冲  50      180
任我行  60      160

第五步:创建hive到es的映射表

CREATE EXTERNAL TABLE `es_hadoop_cluster`(
  `name` string COMMENT 'from deserializer',
  `age` string COMMENT 'from deserializer',
  `hight` string COMMENT 'from deserializer')
ROW FORMAT SERDE
  'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY
  'org.elasticsearch.hadoop.hive.EsStorageHandler'
WITH SERDEPROPERTIES (
  'serialization.format'='1')
LOCATION
  'hdfs://master:9000/user/hive/warehouse/pdata_dynamic.db/es_hadoop_cluster'
TBLPROPERTIES (
  'bucketing_version'='2',
  'es.batch.write.retry.count'='6',
  'es.batch.write.retry.wait'='60s',
  'es.index.auto.create'='TRUE',
  'es.index.number_of_replicas'='0',
  'es.index.refresh_interval'='-1',
  'es.mapping.name'='name:name,age:age,hight:hight',
  'es.nodes'='172.16.27.133:9200',
  'es.nodes.wan.only'='TRUE',
  'es.resource'='hive_to_es/_doc');

映射表的参数说明

参数参数参数说明
bucketing_version2
es.batch.write.retry.count6
es.batch.write.retry.wait60s
es.index.auto.createTRUE通过Hadoop组件向Elasticsearch集群写入数据,是否自动创建不存在的index: true:自动创建 ; false:不会自动创建
es.index.number_of_replicas0
es.index.refresh_interval-1
es.mapping.name7.10.2hive和es集群字段映射
es.nodes指定Elasticsearch实例的访问地址,建议使用内网地址。
es.nodes.wan.onlyTRUE开启Elasticsearch集群在云上使用虚拟IP进行连接,是否进行节点嗅探: true:设置 ;false:不设置
es.resource7.10.2es集群中索引名称
es.nodes.discoveryTRUE是否禁用节点发现:true:禁用 ;false:不禁用
es.input.use.sliced.partitionsTRUE是否使用slice分区: true:使用。设置为true,可能会导致索引在预读阶段的时间明显变长,有时会远远超出查询数据所耗费的时间。建议设置为false,以提高查询效率; false:不使用。
es.read.metadataFALSE操作Elasticsearch字段涉及到_id之类的内部字段,请开启此属性。

第六步:写同步SQL测试一下

同步SQL

insert into es_hadoop_cluster select * from hive_test;

结果报错了,nice,正好演示一下这个错误怎么解决
报错信息中有一行:Caused by: java.lang.ClassNotFoundException: org.elasticsearch.hadoop.hive.EsHiveInputFormat
报错原因:在hive中执行add jar xxx 这个命令,只在当前窗口下有效
解决方案:1.每次执行insert into的时候,执行下面的添加jar包的命令
add jar hdfs:///user/hive/warehouse/es_hadoop/elasticsearch-hadoop-7.10.2.jar;
2.将这个jar包作为永久的函数加载进来(后面在补充再补充一下)

Query ID = root_20230830041146_d125a01c-4174-48a5-8b9c-a15b41d27403
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1693326922484_0004, Tracking URL = http://master:8088/proxy/application_1693326922484_0004/
Kill Command = /root/soft/hadoop/hadoop-3.3.0//bin/mapred job  -kill job_1693326922484_0004
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2023-08-30 04:11:51,842 Stage-2 map = 0%,  reduce = 0%
2023-08-30 04:12:45,532 Stage-2 map = 100%,  reduce = 0%
Ended Job = job_1693326922484_0004 with errors
0    [582efbca-df6e-4f87-b4a4-5a5f03667fd9 main] ERROR org.apache.hadoop.hive.ql.exec.Task  - Ended Job = job_1693326922484_0004 with errors
Error during job, obtaining debugging information...
1    [Thread-33] ERROR org.apache.hadoop.hive.ql.exec.Task  - Error during job, obtaining debugging information...
Examining task ID: task_1693326922484_0004_m_000000 (and more) from job job_1693326922484_0004
3    [Thread-34] ERROR org.apache.hadoop.hive.ql.exec.Task  - Examining task ID: task_1693326922484_0004_m_000000 (and more) from job job_1693326922484_0004

Task with the most failures(4): 
-----
Task ID:
  task_1693326922484_0004_m_000000

URL:
  http://master:8088/taskdetails.jsp?jobid=job_1693326922484_0004&tipid=task_1693326922484_0004_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Failed to load plan: hdfs://master:9000/tmp/hive/root/582efbca-df6e-4f87-b4a4-5a5f03667fd9/hive_2023-08-30_04-11-47_010_5312656494742717839-1/-mr-10002/ae1cf98b-a157-4266-92b3-7d618b411f00/map.xml
        at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:502)
        at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:335)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:435)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:881)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:874)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:716)
        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:175)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.elasticsearch.hadoop.hive.EsHiveInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
conf (org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator)
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
        at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
        at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:185)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:203)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:180)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:180)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:180)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:686)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:210)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectByKryo(SerializationUtilities.java:729)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:613)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:590)
        at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:463)
        ... 13 more
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.hadoop.hive.EsHiveInputFormat
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
        ... 60 more


98   [Thread-33] ERROR org.apache.hadoop.hive.ql.exec.Task  - 
Task with the most failures(4): 
-----
Task ID:
  task_1693326922484_0004_m_000000

URL:
  http://master:8088/taskdetails.jsp?jobid=job_1693326922484_0004&tipid=task_1693326922484_0004_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Failed to load plan: hdfs://master:9000/tmp/hive/root/582efbca-df6e-4f87-b4a4-5a5f03667fd9/hive_2023-08-30_04-11-47_010_5312656494742717839-1/-mr-10002/ae1cf98b-a157-4266-92b3-7d618b411f00/map.xml
        at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:502)
        at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:335)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:435)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:881)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:874)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:716)
        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:175)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.elasticsearch.hadoop.hive.EsHiveInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
conf (org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator)
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
        at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)
        at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:185)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:203)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:180)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:180)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:180)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
        at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)
        at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:686)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:210)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectByKryo(SerializationUtilities.java:729)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:613)
        at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:590)
        at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:463)
        ... 13 more
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.hadoop.hive.EsHiveInputFormat
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
        ... 60 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
106  [582efbca-df6e-4f87-b4a4-5a5f03667fd9 main] ERROR org.apache.hadoop.hive.ql.Driver  - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-2: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
将jar包添加进来又报错了,very nice,正好在演示一下这个错误,😄

报错如下所示
报错原因分析:仔细看这行Error: java.lang.ClassNotFoundException: org.apache.commons.httpclient.HttpConnectionManager,原因是缺少httpclient.的jar包导致的
解决方案:将httpclient.的jar包像上面的es-hadoop的jar包一样导入即可

hive> insert into es_hadoop_cluster select * from hive_test;
Query ID = root_20230830041906_20c85a80-072a-4023-b5ab-8b532e5db092
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1693326922484_0005, Tracking URL = http://master:8088/proxy/application_1693326922484_0005/
Kill Command = /root/soft/hadoop/hadoop-3.3.0//bin/mapred job  -kill job_1693326922484_0005
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2023-08-30 04:19:15,561 Stage-2 map = 0%,  reduce = 0%
2023-08-30 04:19:34,095 Stage-2 map = 100%,  reduce = 0%
Ended Job = job_1693326922484_0005 with errors
408543 [582efbca-df6e-4f87-b4a4-5a5f03667fd9 main] ERROR org.apache.hadoop.hive.ql.exec.Task  - Ended Job = job_1693326922484_0005 with errors
Error during job, obtaining debugging information...
408543 [Thread-57] ERROR org.apache.hadoop.hive.ql.exec.Task  - Error during job, obtaining debugging information...
Examining task ID: task_1693326922484_0005_m_000000 (and more) from job job_1693326922484_0005
408547 [Thread-58] ERROR org.apache.hadoop.hive.ql.exec.Task  - Examining task ID: task_1693326922484_0005_m_000000 (and more) from job job_1693326922484_0005

Task with the most failures(4): 
-----
Task ID:
  task_1693326922484_0005_m_000000

URL:
  http://master:8088/taskdetails.jsp?jobid=job_1693326922484_0005&tipid=task_1693326922484_0005_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.ClassNotFoundException: org.apache.commons.httpclient.HttpConnectionManager
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransportFactory.create(CommonsHttpTransportFactory.java:40)
        at org.elasticsearch.hadoop.rest.NetworkClient.selectNextNode(NetworkClient.java:99)
        at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:82)
        at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:58)
        at org.elasticsearch.hadoop.rest.RestClient.<init>(RestClient.java:101)
        at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:620)
        at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:175)
        at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:59)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:987)
        at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111)
        at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:966)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
        at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:966)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:990)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:193)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)


408555 [Thread-57] ERROR org.apache.hadoop.hive.ql.exec.Task  - 
Task with the most failures(4): 
-----
Task ID:
  task_1693326922484_0005_m_000000

URL:
  http://master:8088/taskdetails.jsp?jobid=job_1693326922484_0005&tipid=task_1693326922484_0005_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.ClassNotFoundException: org.apache.commons.httpclient.HttpConnectionManager
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransportFactory.create(CommonsHttpTransportFactory.java:40)
        at org.elasticsearch.hadoop.rest.NetworkClient.selectNextNode(NetworkClient.java:99)
        at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:82)
        at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:58)
        at org.elasticsearch.hadoop.rest.RestClient.<init>(RestClient.java:101)
        at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:620)
        at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:175)
        at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:59)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:987)
        at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111)
        at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:966)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
        at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:966)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:990)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:193)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
408563 [582efbca-df6e-4f87-b4a4-5a5f03667fd9 main] ERROR org.apache.hadoop.hive.ql.Driver  - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-2: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
在hive的终端下,添加上httpclient的jar包,然后重新执行insert语句

命令如下;

add jar hdfs:///user/hive/warehouse/es_hadoop/commons-httpclient-3.1.jar;

执行成功喽,😄

hive> insert into es_hadoop_cluster select * from hive_test;
Query ID = root_20230830043006_1cf3a042-e5e4-4bec-8cee-49e670ac9b49
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1693326922484_0006, Tracking URL = http://master:8088/proxy/application_1693326922484_0006/
Kill Command = /root/soft/hadoop/hadoop-3.3.0//bin/mapred job  -kill job_1693326922484_0006
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2023-08-30 04:30:17,504 Stage-2 map = 0%,  reduce = 0%
2023-08-30 04:30:21,633 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 0.79 sec
MapReduce Total cumulative CPU time: 790 msec
Ended Job = job_1693326922484_0006
MapReduce Jobs Launched: 
Stage-Stage-2: Map: 1   Cumulative CPU: 0.79 sec   HDFS Read: 6234 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 790 msec
OK
Time taken: 16.482 seconds

第七步:查询一下映射表,并去es集群上查询数据是否同步成功了

1.查询映射表

nice,又报错了,报错如下;
报错原因分析:我之前做的时候,将解压的所有包都放在hive的lib目录下了,现在看来,只需要一个即可,将其余的都删除
解决方案:删除多余的jar包在hive的lib的目录下

hive> select * from es_hadoop_cluster;
OK
Exception in thread "main" java.lang.ExceptionInInitializerError
        at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:216)
        at org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:414)
        at org.elasticsearch.hadoop.hive.EsHiveInputFormat.getSplits(EsHiveInputFormat.java:115)
        at org.elasticsearch.hadoop.hive.EsHiveInputFormat.getSplits(EsHiveInputFormat.java:51)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:425)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:395)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:314)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:540)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:509)
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2691)
        at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.RuntimeException: Multiple ES-Hadoop versions detected in the classpath; please use only one
jar:file:/root/soft/hive/apache-hive-3.1.3-bin/lib/elasticsearch-hadoop-7.17.6.jar
jar:file:/root/soft/hive/apache-hive-3.1.3-bin/lib/elasticsearch-hadoop-hive-7.17.6.jar
jar:file:/root/soft/hive/apache-hive-3.1.3-bin/lib/elasticsearch-hadoop-mr-7.17.6.jar
jar:file:/root/soft/hive/apache-hive-3.1.3-bin/lib/elasticsearch-hadoop-pig-7.17.6.jar
jar:file:/root/soft/hive/apache-hive-3.1.3-bin/lib/elasticsearch-spark-20_2.11-7.17.6.jar
jar:file:/root/soft/hive/apache-hive-3.1.3-bin/lib/elasticsearch-storm-7.17.6.jar

删除掉多余的jar包,然后在执行一次insert 语句,再进行查询,显示如下,在hive中查询是没有问题了

hive> select * from es_hadoop_cluster ;
OK
吴占喜  30      175
令狐冲  50      180
任我行  60      160
吴占喜  30      175
令狐冲  50      180
任我行  60      160
2.在es集群上去进行查询

根据映射表中创建时的索引,进行查询,数据正常展示出来了,nice
在这里插入图片描述
开心,😄,有问题欢迎留言交流
在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/949373.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

Oralce Client11和PL/SQL12安装

初始环境&#xff1a; 1.阿里云轻量应用服务器已经安装Oracle11g https://blog.csdn.net/testleaf/article/details/111826134 2.阿里云轻量应用服务器已经配置Oracle11g https://blog.csdn.net/testleaf/article/details/109096654 具体目标&#xff1a; 1.安装Oralce Client1…

盘点国内2023上半年低无代码平台TOP10:你用了哪款?

随着数字化转型的加速&#xff0c;无代码/低代码平台以其高效、灵活和易用的特性&#xff0c;正在改变着企业应用开发和部署的方式。这些平台正在成为越来越多企业的首选&#xff0c;因为他们可以快速构建和部署应用&#xff0c;以适应不断变化的业务需求。在这个热潮背后&…

【CSS】解决对齐的小问题

问题&#xff1a; 表单或者页面上可能遇到文字无法对平均分&#xff0c;带有冒号的文本无法左右对齐的情况 常见的解决方式&#xff1a; 解决如下图 仍无法解决对齐的问题&#xff0c;还需要考虑字数 解决 这里用css的方式解决 增加 i 标签 固定宽度&#xff0c;设置 i …

IT 支持人员如何应对现代教育技术挑战

将技术融入教学实践为教育行业带来了重大变化。特别是近几年&#xff0c;技术在教育机构提供的产品和服务水平方面发挥了越来越重要的作用。 随着混合灵活教学&#xff08;HyFlex&#xff09;教学模式成为新的当务之急&#xff0c;学生和教职员工都希望技术能够满足自己对灵活…

Java实现根据关键词搜索当当商品列表数据方法,当当API接口申请指南

要通过当当网的API获取商品列表数据&#xff0c;您可以使用当当开放平台提供的接口来实现。以下是一种使用Java编程语言实现的示例&#xff0c;展示如何通过当当开放平台API获取商品列表&#xff1a; 首先&#xff0c;确保您已注册成为当当开放平台的开发者&#xff0c;并创建…

CSS学习笔记02

CSS笔记02 美化网页元素 为什么要美化网页 目的&#xff1a; 有效的传递页面信息美化网页、页面漂亮、才能吸引用户突显页面的主题提高用户的体验 span标签 span标签是短语内容的通用行内容器&#xff0c;它本身并没有任何特殊语义。 通常我们使用span标签来把我们想要重…

【1654. 到家的最少跳跃次数】

来源&#xff1a;力扣&#xff08;LeetCode&#xff09; 描述&#xff1a; 有一只跳蚤的家在数轴上的位置 x 处。请你帮助它从位置 0 出发&#xff0c;到达它的家。 跳蚤跳跃的规则如下&#xff1a; 它可以 往前 跳恰好 a 个位置&#xff08;即往右跳&#xff09;。它可以 …

BRAM资源不够用?不怕!这里有FPGA BRAM省资源小秘招!

FPGA的BRAM和LUT等资源都是有限的&#xff0c;在FPGA开发过程中&#xff0c;可能经常遇到BRAM或者LUT资源不够用的情况。 一般建议BRAM和LUT资源的消耗不要超过80%&#xff0c;当然高端一点的FPGA芯片也可以放宽到90%&#xff0c;超过这个限制&#xff0c;可能就会出现时序违例…

必看!银行业软件测试岗位需求暴增的原因解密!

根据2023年3月中共中央、国务院印发《党和国家机构改革方案》&#xff0c;要求统筹推进中国人民银行分支机构改革。包括&#xff1a;撤销中国人民银行大区分行及分行营业管理部、总行直属营业管理部和省会城市中心支行&#xff0c;在31个省&#xff08;自治区、直辖市&#xff…

【解决】提示“找不到该项目的文件或文件夹如何删除”办法

在删除一些文件或文件夹中出现操作错误&#xff0c;后面想删除文件或文件夹时&#xff0c;弹出以下的状态栏提示↓↓↓ 此时无论怎么重启计算机还是快捷键都删除不了。 那么可采取下面的方法&#xff1a; ① 在该文件或文件夹内新建记事本&#xff0c;在记事本中输入以下代码…

QT设置mainwindow的窗口title

QT设置mainwindow的窗口title 在QT程序中&#xff0c;通常会有**aaaa-[bbbbbbb]**这种形式的title&#xff0c;对于刚上手qt的程序员同学&#xff0c;可能会简单的以为修改这种title&#xff0c;就是使用setWindowTitle这个接口&#xff0c;其实只对了一半&#xff0c;这种形式…

SpringBoot 使用 EMQX

一、SpringBoot服务器端 1. 在centos搭建 EMQX服务 2. 创建API密码 3. 在SpringBoot 的yml中添加mqqt的配置 #配置 emqx:ip: 47.109.49.176port: 18083api: xxxxxxxx &#xff08;自己的api&#xff09;secret: xxxxxxxxx &#xff08;自己的secret&#xff09; 4. 因为…

GPT4不限使用、内容加密更安全,ChatGPT企业版能否成为公司必备工具?

近日&#xff0c;OpenAI推出了ChatGPT企业版&#xff0c;这款AI助手为企业提供了更快速度的不限次数使用的GPT4。它还包括可以扩展上下文窗口以处理更长文本、加密、企业级安全与隐私保护&#xff0c;以及账户组管理功能。 在ChatGPT成功推出9个月后&#xff0c;这款流行的聊…

x86 汇编手册快速入门

本文翻译自&#xff1a;Guide to x86 Assembly 在阅读 Linux 源码之前&#xff0c;我们需要有一些 x86 汇编知识。本指南描述了 32 位 x86 汇编语言编程的基础知识&#xff0c;包括寄存器结构&#xff0c;数据表示&#xff0c;基本的操作指令&#xff08;包括数据传送指令、逻…

42、Flink 的table api与sql之Hive Catalog

Flink 系列文章 1、Flink 部署、概念介绍、source、transformation、sink使用示例、四大基石介绍和示例等系列综合文章链接 13、Flink 的table api与sql的基本概念、通用api介绍及入门示例 14、Flink 的table api与sql之数据类型: 内置数据类型以及它们的属性 15、Flink 的ta…

匠心新品:大彩科技超薄7寸WIFI线控器发布,热泵、温控器、智能家电首选!

一、产品介绍 此次发布一款7寸高清全新外壳产品&#xff0c;让HMI人机界面家族再添一新成员。该产品相比其他外壳有以下5个大改动&#xff1a; 1 表面玻璃盖板使用2.5D立体结构&#xff1b; 2 液晶盖板采用一体黑设计&#xff0c;且液晶屏与触摸板是全贴合结构&#xff1b; …

leetcode 84. 柱状图中最大的矩形

2023.8.30 本题和接雨水 有点类似&#xff0c;依旧用双指针来解。但是本题要记录的是当前柱子 左右两侧第一个小于该柱子的索引。将其保存在两个数组中&#xff0c;最后再求最大面积。代码如下&#xff1a; class Solution { public:int largestRectangleArea(vector<int&g…

图表背后的故事:数据可视化的威力与影响

数据可视化现在在市场上重不重要&#xff1f;这已经不再是一个简单的问题&#xff0c;而是一个不可忽视的现实。随着信息时代的来临&#xff0c;数据已经成为企业和组织的核心资产&#xff0c;而数据可视化则成为释放数据价值的重要工具。 在当今竞争激烈的商业环境中&#xf…

css换行

强制显示一行&#xff0c;超出... .box{white-space: nowrap; /* 强制显示一行 */overflow: hidden;text-overflow: ellipsis; /* 超出... */ } 自动换行 一般默认制动换行 .box1{word-wrap:break-word; } 显示2行&#xff0c;超出... .box2 {overflow: hidden;display: -…

SpringBoot初级开发--服务请求(GET/POST)所有参数的记录管理(8)

服务端在定位错误的时候&#xff0c;有时候要还原现场&#xff0c;这就要把当时的所有入参参数都能记录下来&#xff0c;GET还好说&#xff0c;基本NGINX都会记录。但是POST的请求参数基本不会被记录&#xff0c;这就需要我们通过一些小技巧来记录这些参数&#xff0c;放入日志…