Hadoop常见问题

news2024/10/6 2:22:35

 报错1 :is group-writable, and the group is not root.  Its permissions are 0775,

datanode启动时,日志报错

1.“xxxx” is group-writable, and the group is not root.  Its permissions are 0775, and it is owned by gid 3245.  Please fix this or select a different socket path.

从报错可以看出,hadoop的目录结构中,权限不对,重置下目录权限

chown 755 hadoop/    再启动datanode,查查日志,又是

xxx  is group-writable, and the group is not root.  Its permissions are 0775, and it is owned by gid 3245.  Please fix this or select a different socket path

没办法,换了个目录生成socket文件,并设置权限为755,datanode启动正常

报错2:we cannot start a localDataXceiverServer because libhadoop cannot be loaded.

java.lang.RuntimeException: Although a UNIX domain socket path is configured as /app/log4x/apps/hadoop/etc/DN_PORT, we cannot start a localDataXceiverServer because libhadoop cannot be loaded.

 1.检查主机情况

# hadoop checknative -a
23/11/07 21:13:08 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
23/11/07 21:13:08 DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
23/11/07 21:13:08 DEBUG util.NativeCodeLoader: java.library.path=:/app/log4x/apps/hadoop/lib/native/Linux-amd64-64/:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
23/11/07 21:13:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/11/07 21:13:08 DEBUG util.Shell: setsid exited with exit code 0
Native library checking:
hadoop:  false 
zlib:    false 
snappy:  false 
lz4:     false 
bzip2:   false 
openssl: false 
23/11/07 21:13:08 INFO util.ExitUtil: Exiting with status 1

2.新加配置

在~/.bash_profile文件中,新增配置,指定nvtive的路径

export JAVA_LIBRARY_PATH=/app/log4x/apps/hadoop/lib/native

source ~/.bash_porfile

# hadoop checknative -a
23/11/08 16:06:00 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
23/11/08 16:06:00 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop:  true /app/log4x/apps/hadoop/lib/native/libhadoop.so
zlib:    true /lib64/libz.so.1
snappy:  true /lib64/libsnappy.so.1
lz4:     true revision:99
bzip2:   false 
openssl: false Cannot load libcrypto.so (libcrypto.so: cannot open shared object file: No such file or directory)!
23/11/08 16:06:00 INFO util.ExitUtil: Exiting with status 1

报错3:IPC's epoch 1 is not the current writer epoch  0

org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 3/5. 4 exceptions thrown:
10.255.33.120:43001: IPC's epoch 1 is not the current writer epoch  0

奇怪了,我也报出这样的问题,看了别人的解决思路

 1,先把报错关键信息 "IPC's epoch  is less than the last promised epoch" 贴到google上查了一下,大部分外国人的回答都是因为网络原因引起的.
    2,据上,经过看日志,每次启动另一个namenode的时候都会去探测三个 journalnode服务的8485端口,提示是faild的,
        说明最有可能是网络问题,排查如下:
        ifconfig -a看网卡是否有丢包,
        查看/etc/sysconfig/selinux 配置 SELINUX=disabled 是否是对的,
        /etc/init.d/iptables status  查看防火墙是否运行,因为我们hadoop是运行内网环境,记得之前部署的时候,防火墙是关闭的, 看来问题找到了
        /etc/init.d/iptables stop
        先后检查了,三个 journalnode服务器的防火墙,都莫名其妙的启着的,马上关闭
        再重新启动两个namenode,查看日志,正常了,

我检查了下我的这边:

1.SELINUX都是disabled

# ansible -i hosts tt -m shell -a "sudo cat /etc/sysconfig/selinux  | grep SELINUX"
[WARNING]: Consider using 'become', 'become_method', and 'become_user' rather than running sudo
host_121 | CHANGED | rc=0 >>
# SELINUX= can take one of these three values:
SELINUX=disabled
# SELINUXTYPE= can take one of three two values:
SELINUXTYPE=targeted 
host_118 | CHANGED | rc=0 >>
# SELINUX= can take one of these three values:
SELINUX=disabled
# SELINUXTYPE= can take one of three two values:
SELINUXTYPE=targeted 
host_119 | CHANGED | rc=0 >>
# SELINUX= can take one of these three values:
SELINUX=disabled
# SELINUXTYPE= can take one of three two values:
SELINUXTYPE=targeted 
host_122 | CHANGED | rc=0 >>
# SELINUX= can take one of these three values:
SELINUX=disabled
# SELINUXTYPE= can take one of three two values:
SELINUXTYPE=targeted 
host_120 | CHANGED | rc=0 >>
# SELINUX= can take one of these three values:
SELINUX=disabled
# SELINUXTYPE= can take one of three two values:
SELINUXTYPE=targeted 
host_126 | CHANGED | rc=0 >>
# SELINUX= can take one of these three values:
SELINUX=disabled
# SELINUXTYPE= can take one of three two values:
SELINUXTYPE=targeted 

2.查看防火墙状态

        有关着的也有起着的。我以为是防火墙的问题,但看了另一套hadoop的环境信息,都是在相通主机上,用户名不通而已,不可能只对当前用户不起作用。 

# ansible -i hosts tt -m shell -a "systemctl status iptables |grep  Active:"
host_118 | CHANGED | rc=0 >>
   Active: failed (Result: exit-code) since ? 2022-04-26 11:04:51 CST; 1 years 6 months ago
host_121 | CHANGED | rc=0 >>
   Active: active (exited) since ? 2022-04-26 11:05:40 CST; 1 years 6 months ago
host_119 | CHANGED | rc=0 >>
   Active: active (exited) since ? 2022-04-26 10:59:41 CST; 1 years 6 months ago
host_122 | CHANGED | rc=0 >>
   Active: active (exited) since ? 2021-03-18 17:13:00 CST; 2 years 7 months ago
host_120 | CHANGED | rc=0 >>
   Active: failed (Result: exit-code) since ? 2022-04-26 11:05:06 CST; 1 years 6 months ago
host_126 | CHANGED | rc=0 >>
   Active: active (exited) since ? 2021-03-18 15:58:10 CST; 2 years 7 months ago

3.查看 journalnode日志

2023-11-08 11:46:08,640 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting journal Storage Directory /app/log4x/apps/hadoop/jn/log4x-hcluster with nsid: 1531841238
2023-11-08 11:46:08,642 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /app/log4x/apps/hadoop/jn/log4x-hcluster/in_use.lock acquired by nodename 3995@hkcrmlog04
2023-11-08 11:49:37,343 INFO org.apache.hadoop.hdfs.qjournal.server.Journal: Updating lastPromisedEpoch from 0 to 1 for client /10.255.33.121
2023-11-08 11:49:37,345 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 43001, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from 10.255.33.121:51086 Call#65 Retry#0
java.io.IOException: IPC's epoch 1 is not the current writer epoch  0
        at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:445)
        at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:342)
        at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:148)
        at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)
        at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:975)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034)

确实更防火墙没关系,format了namenode节点后,根据hdfs-site.xml文件中配置的sockt路径,创建了sock。

# ll
total 4
srwxr-xr-x 1 log4x log4x    0 Nov  8 15:05 DN_PORT
drwxr-xr-x 2 log4x log4x 4096 Nov  8 14:47 hadoop
[log4x@hkcrmlog04 etc]$ pwd
/app/log4x/apps/hadoop/etc
[log4x@hkcrmlog04 etc]$ grep -ir 'DN_PORT' hadoop/
hadoop/hdfs-site.xml.bajk:              <value>/app/ailog4x/hadoop/etc/DN_PORT</value>
hadoop/hdfs-site.xml.1107bak:           <value>/app/log4x/apps/hadoop/etc/DN_PORT</value>
hadoop/hdfs-site.xml:           <value>/app/log4x/apps/hadoop/etc/DN_PORT</value>

4.说明

4.1 nc -Ul DN_PORT  创建为socket 。若命令不存在,yum install -y nc安装即可

4.2 chmod 666 DN_PORT  权限给到666

4.2 chmod -R 755 hadoop/   整个hadoop的权限给到755

报错4:Operation category JOURNAL is not supported in state standby

 Operation category JOURNAL is not supported in state standby。。。

 Call From hkcrmlog03/10.255.33.120 to hkcrmlog04:40101 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
        at org.apache.hadoop.ipc.Client.call(Client.java:1474)
        at org.apache.hadoop.ipc.Client.call(Client.java:1401)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy15.rollEditLog(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:148)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:271)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:313)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
        at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
        at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1523)
        at org.apache.hadoop.ipc.Client.call(Client.java:1440)

1.查看状态

# bin/hdfs haadmin -getServiceState l4xnn2
standby
[log4x@hkcrmlog03 hadoop]$ 
[log4x@hkcrmlog03 hadoop]$ bin/hdfs haadmin -getServiceState l4xnn1
standby

# bin/hdfs haadmin -transitionToActive --forcemanual l4xnn1
You have specified the forcemanual flag. This flag is dangerous, as it can induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly irrecoverably.

It is recommended not to use this flag, but instead to shut down the cluster and disable automatic failover if you prefer to manually manage your HA state.

You may abort safely by answering 'n' or hitting ^C now.

Are you sure you want to continue? (Y or N) y
23/11/08 15:46:00 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at /10.255.33.121:40101
23/11/08 15:46:00 WARN ha.HAAdmin: Proceeding with manual HA state management even though
automatic failover is enabled for NameNode at /10.255.33.120:40101
[log4x@hkcrmlog03 hadoop]$ 
[log4x@hkcrmlog03 hadoop]$ 
[log4x@hkcrmlog03 hadoop]$ 
[log4x@hkcrmlog03 hadoop]$ bin/hdfs haadmin -getServiceState l4xnn1             
active
[log4x@hkcrmlog03 hadoop]$ bin/hdfs haadmin -getServiceState l4xnn2             
standby

namenode的日志已正常。

报错5:Error replaying edit log at offset 0.  Expected transaction ID was 1

org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying edit log at offset 0.  Expected transaction ID was 1

在节点上从新执行了

# $HADOOP_PREFIX/bin/hdfs namenode -format

在启动namenode

# $HADOOP_PREFIX/sbin/hadoop-daemon.sh --script hdfs start namenode
$ bin/hdfs haadmin -getServiceState l4xnn1
active
[log4x@hkcrmlog03 hadoop]$ 
[log4x@hkcrmlog03 hadoop]$ bin/hdfs haadmin -getServiceState l4xnn2
standby

namenode已正常。

报错6:journal Storage Directory

journal Storage Directory /app/log4x/apps/hadoop/jn/log4x-hcluster: NameNode

大概为journalnode保存的元数据和namenode的不一致,导致,3台机器中有2台报了这个错误。

在nn1上启动journalnode,再执行hdfs namenode -initializeSharedEdits,使得journalnode与namenode保持一致。再重新启动namenode就没有问题了。

在查看2个namenode的状态。

 

报错7:Decided to synchronize log to startTxId: 1

 Decided to synchronize log to startTxId: 1

 namenode元数据被破坏,需要修复

解决恢复一下namenode
  

hadoop namenode –recover

 

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1185334.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

asp.net core mvc之路由

一、默认路由 &#xff08;Startup.cs文件&#xff09; routes.MapRoute(name: "default",template: "{controllerHome}/{actionIndex}/{id?}" ); 默认访问可以匹配到 https://localhost:44302/home/index/1 https://localhost:44302/home/index https:…

直击第一届中国测绘地理信息大会,华测导航强势出圈!

11月8日&#xff0c;由自然资源部指导&#xff0c;中国测绘学会、中国地理信息产业协会和中国卫星导航定位协会共同主办的第一届中国测绘地理信息大会于浙江德清盛大开幕&#xff0c;各家科研院所、企事业单位云集现场&#xff0c;展示科技创新成果。华测导航携多类智能装备及解…

初步了解OSG智能指针

OSG定义了智能指针模板类ref_ptr<>。 osg命名空间的Referenced类实现了对内存区段的引用计数器功能。 所有的osg节点和场景图形数据&#xff0c;均派生自Referenced类。 ref_ptr<>用于实现一个指向Referenced对象的智能指针。 #include <Geode> #include <…

水泵机组如何通过设备健康管理系统实现预测性维护

水泵机组是关键的工业设备&#xff0c;在正常运行过程中可能遭遇各种故障&#xff0c;影响生产效率和设备寿命。为了提高水泵机组的可靠性和效率&#xff0c;预测性维护成为一种重要的管理方法。本文将介绍水泵机组的结构及常见故障&#xff0c;并详细阐述如何使用设备健康管理…

使用遗传算法优化BP神经网络实现非线性函数拟合

大家好&#xff0c;我是带我去滑雪&#xff01; 非线性函数拟合是一种用于找到与给定数据点集合最好匹配的非线性函数的过程。非线性函数拟合通常用于以下情况&#xff1a; 数据趋势不是线性的&#xff1a;当数据点之间的关系不能用线性方程来表示时&#xff0c;需要使用非线性…

【吞噬星空】斩杀两大兽皇,罗峰实力暴涨,雷神已不是对手

Hello,小伙伴们&#xff0c;我是小郑继续为大家深度解析国漫资讯。 深度爆料&#xff0c;《吞噬星空》国漫第92话新剧情解析&#xff0c;罗峰、洪和雷神&#xff0c;已成为师兄弟&#xff0c;但洪和雷神无论在实力还是地位上都高于罗峰。因此&#xff0c;罗峰在这三人中始终担…

【C/PTA】循环结构进阶练习(三)

本文结合PTA专项练习带领读者掌握循环结构&#xff0c;刷题为主注释为辅&#xff0c;在代码中理解思路&#xff0c;其它不做过多叙述。 文章目录 7-1 循环-Fibonacci数列的运算7-2 循环-找数字7-3 循环-小智的捕食计划7-4 循环-抱大腿7-5 循环-跳&#xff01;7-6 循环-生气的峰…

金融帝国实验室(Capitalism Lab)推出一个密钥即完成注册机制!

为了方便趸购『金融帝国实验室』&#xff08;Capitalism Lab&#xff09;正版玩家&#xff0c;Enlight官方正式推出『一个密钥即完成注册』机制&#xff0c;切实简化游戏账户注册流程&#xff01; ————————————— 『一个密钥即完成注册』适用于趸购“游戏本体4DLC”…

前端工程化(vue脚手架安装)

用命令行安装&#xff0c;先安装NodeJs&#xff0c;配置环境变量 1.npm config set prefix "C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Node.js" //引号内路径是NodeJs安装所在路径 2.npm config get prefix 查看其是否成功 3.npm install -g vue/cli 4…

Flutter——最详细(AppBar)使用教程

AppBar简介 Material Design 应用栏(标题栏) 使用场景&#xff1a; 顶部标题栏包括一些常用的菜单按钮 属性作用leading左边工具视图automaticallyImplyLeading左边图标的颜色title标题视图actions右边菜单按钮flexibleSpace其高度将与应用栏的整体高度相同bottom左侧底部文本内…

程序员怎样才能学好算法?这本书送几本给大家!

目录 笔者对算法的理解 写书的初衷及过程 本书的内容 购买方式 数据结构和算法是计算机科学的基石&#xff0c;是计算机的灵魂&#xff0c;要想成为计算机专业人员&#xff0c;学习和掌握算法是十分必要的。不懂数据结构和算法的人不可能写出效率更高的代码。计算机科学的很…

【原理篇】二、Bean加载控制

文章目录 1、编程式Bean加载控制2、注解式Bean加载控制3、Conditional派生注解4、Bean依赖的属性配置 Bean的加载控制指根据特定情况对bean进行选择性加载以达到适用于项目的目标 上篇Bean声明的方式中&#xff0c;后4种可以实现对Bean加载的控制&#xff0c;分别是&#xff1a…

0.1+0.2为什么不等于0.3

目录 一、0.10.2是如何执行的&#xff1f;1、转成浮点数2、浮点数相加3、浮点数转成十进制 二、BigDecimal 大家好&#xff0c;我是哪吒。 最近碰到一个问题&#xff0c;很有趣&#xff0c;我就不贴代码了&#xff0c;要不你们会以为我在无中生有。 我现在的心情很复杂&#x…

无人机航迹规划:五种最新智能优化算法(KOA、COA、LSO、GRO、LO)求解无人机路径规划MATLAB

一、五种算法&#xff08;KOA、COA、LSO、GRO、LO&#xff09;简介 1、开普勒优化算法KOA 开普勒优化算法&#xff08;Kepler optimization algorithm&#xff0c;KOA&#xff09;由Mohamed Abdel-Basset等人于2023年提出。五种最新优化算法&#xff08;SWO、ZOA、EVO、KOA、…

如何在Visual Studio上创建项目并运行【超级详细】

工欲善其事&#xff0c;必先利其器。想要学好编程&#xff0c;首先要把手中的工具利用好&#xff0c;今天小编教一下大家如何在史上最强大的编译器--Visual Studio上创建项目。&#x1f357; 一.打开编译器&#x1f357; 双击你电脑上的vs&#xff0c;(2012,2019,2022)都行。&…

【原理篇】四、自定义starter

文章目录 1、案例分析2、业务功能的实现3、中途调试4、开启定时任务打印报表5、引入属性配置类&#xff0c;写活业务参数配置6、拦截器7、开启yml提示功能 做一个记录系统访客独立IP访问次数的功能&#xff0c;并把它自定义成一个starter&#xff0c;实现&#xff1a;在现有项目…

systemctl enable docker.service报错“Failed to execute operation: Bad message“

将docker加入到开机自启&#xff0c;报错&#xff1a; 解决&#xff1a; 重新粘贴复制&#xff1a; [Unit] DescriptionDocker Application Container Engine Documentationhttps://docs.docker.com Afternetwork-online.target firewalld.service Wantsnetwork-online.target…

【基带开发】AD9361通信基础:复数乘法 除法

复数 是实数和虚数的组合 例子&#xff1a;3.6 4i, −0.02 1.2i, 25 − 0.3i, 0 2i 乘法 除法

apple mobile device ethernet

莫名其妙使用了一次apple mobile device ethernet&#xff0c;原本正常的网络突然之间抽筋了&#xff0c;在网卡界面看到有两个&#xff0c;以太网3原本启用状态&#xff0c;禁用恢复。 通过搜索apple mobile device ethernet&#xff0c;在网上看到该答案&#xff0c;原来是接…

DDD技术方案落地实践

1. 引言 从接触领域驱动设计的初学阶段&#xff0c;到实现一个旧系统改造到DDD模型&#xff0c;再到按DDD规范落地的3个的项目。对于领域驱动模型设计研发&#xff0c;从开始的各种疑惑到吸收各种先进的理念&#xff0c;目前在技术实施这一块已经基本比较成熟。在既往经验中总结…