感谢点赞和关注 ,每天进步一点点!加油!
目录
一、问题描述
二、解决办法
一、问题描述
Ambari Metrics, Metrics Collector 启动报错如下:
stderr:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/AMBARI_METRICS/package/scripts/metrics_collector.py", line 90, in
AmsCollector().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 355, in execute
self.execute_prefix_function(self.command_name, 'post', env)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 382, in execute_prefix_function
method(env)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 424, in post_start
raise Fail("Pid file {0} doesn't exist after starting of the component.".format(pid_file))
resource_management.core.exceptions.Fail: Pid file /var/run/ambari-metrics-collector//hbase-ams-master.pid doesn't exist after starting of the component.
stdout:
2023-05-16 13:47:05,744 - Stack Feature Version Info: Cluster Stack=3.1, Command Stack=None, Command Version=3.1.4.0-315 -> 3.1.4.0-315
2023-05-16 13:47:05,761 - Using hadoop conf dir: /usr/hdp/3.1.4.0-315/hadoop/conf
2023-05-16 13:47:05,915 - Stack Feature Version Info: Cluster Stack=3.1, Command Stack=None, Command Version=3.1.4.0-315 -> 3.1.4.0-315
2023-05-16 13:47:05,919 - Using hadoop conf dir: /usr/hdp/3.1.4.0-315/hadoop/conf
根据上面 的日志分析是AMS 自带的HBase 问题,所以解决方法想到了 删除数据让服务自动重建数据库。
二、解决办法
设置 AMS 为维护模式:
通过ambari界面 AMS Configs 中找到以下配置
找到对应的服务器,删除如下目录下的所有文件
/var/lib/ambari-metrics-collector/hbase-tmp/
## 默认 /var/lib 开头
/app/var/lib/ambari-metrics-collector
AMS data 会存储在 ‘hbase.rootdir’ . 备份后删除该目录下相应的 AMS data
Ambari重启AMS,重启后组件正常。
参考:
【Ambari】ambari metrics 启动报错hbase-xxx-master.pid doesn‘t exist_pid file /var/run/ambari-metrics-collector//hbase-_时间的美景的博客-CSDN博客
Cleaning up Ambari Metrics System Data - Apache Ambari - Apache Software Foundation