ambari-hdp启动yarn报错Corruption: checksum mismatch
页面报错
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/nodemanager.py", line 102, in <module>
Nodemanager().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/nodemanager.py", line 53, in start
service('nodemanager',action='start')
File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/service.py", line 110, in service
Execute(daemon_cmd, not_if=check_process, environment=hadoop_env_exports)
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
returns=self.resource.returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-sudo.sh su yarn -l -s /bin/bash -c 'ulimit -c unlimited; /usr/hdp/3.1.5.0-152/hadoop-yarn/bin/yarn --config /usr/hdp/3.1.5.0-152/hadoop/conf --daemon start nodemanager'' returned 1.
后端日志报错
r:/usr/hdp/3.1.5.0-152/tez/lib/tez.tar.gz
STARTUP_MSG: build = git@github.com:hortonworks/hadoop.git -r e235cd6e7bfe0fe31ee6448b612862c53b45dda9; compiled by 'jenkins' on 2019-12-12T19:46Z
STARTUP_MSG: java = 1.8.0_322
************************************************************/
2024-09-02 15:15:03,986 INFO nodemanager.NodeManager (LogAdapter.java:info(51)) - registered UNIX signal handlers for [TERM, HUP, INT]
2024-09-02 15:15:04,722 INFO recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:openDatabase(1540)) - Using state database at /udata/var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state for recovery
2024-09-02 15:15:04,765 INFO service.AbstractService (AbstractService.java:noteFailure(267)) - Service org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService failed in state INITED
org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:1543)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1531)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:353)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:285)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:358)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013)
2024-09-02 15:15:04,769 INFO service.AbstractService (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state INITED
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:285)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:358)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013)
**解决:**从日志中得知是某个文件校验不通过,造成nodemanager无法启动,页面上停了yarn。尝试删除报错的机器上的文件(hadoop-yarn路径地址根据实际配置)如:/var/log/hadoop-yarn/ 重启yarn然后nodemanager正常启动,问题得以解决。