MySQL-MHA搭建、故障测试

一、架构说明

MHA（Master High Availability）是一个用于 MySQL 主从复制管理和自动故障转移的开源工具集。MHA 的主要目的是提供 MySQL 环境的高可用性和自动故障转移功能，确保在主库发生故障时能够快速切换到备库，降低业务中断时间。

MHA 主要由以下几个组件组成：

1.Manager：管理节点，负责监控主库的状态，并处理自动故障转移。Manager 会不断检查数据库的主从同步状态，当主库发生故障时，会自动将备库提升为新的主库，并重新配置其他从库的复制关系。
如果主从延迟，则等待所有relay log全部应用后，将VIP漂移到新主库上，整个故障转移过程对应用程序完全透明。

2.Node：节点，即 MySQL 实例。MHA 通过在每个 MySQL 实例上部署 MHA 脚本来对节点进行监控和操作。

3.SSH：MHA 使用 SSH 协议来进行节点之间的通信和操作。确保在使用 MHA 之前，通过 SSH 配置了节点之间的互信。

MHA 的工作流程通常如下：

1.Manager 通过 SSH 登录到节点上监控 MySQL 主库的状态，包括主从复制延迟、节点健康状况等。

2.当发现主库发生故障或不可用时，Manager 会触发自动故障转移流程，将备库晋升为新的主库。

3.Manager 会更新其他从库的复制关系，确保它们能够正确复制新的主库。
在这里插入图片描述

主从复制（一主一从） + 增强半同步 + GTID + MHA（ping_type=INSERT）

架构	操作系统	IP	VIP	主机名	服务版本	端口	磁盘空间	内存	CPU
MHA	redhat7.9	192.168.111.34	192.168.111.36	mha-master01	mysql-8.0.35、manager-0.58、node-0.58	3307	20G	4G	8C
–	–	192.168.111.35	192.168.111.36	mha-slave01	mysql-8.0.35、manager-0.58、node-0.58	3307	20G	4G	8C

注：该方案仅支持一主一从
缺点：
1.manager管理节点存在单点故障
2. io_threads异常或者延迟时，有丢失数据的风险（主要由网络故障引发，所以主从节点必须在同一个网段)

二、环境准备

2.1 环境准备、搭建主从在此链接跳转

删除依赖冲突
yum remove mariadb*

注意：搭建MHA需要的依赖包如下（两个节点都安装）

perl依赖

yum -y install perl-Module-Install perl-Module-Build

MySQL驱动

yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager
yum install perl-Class-Load perl-MIME-Lite perl-Mail-Send perl-Mail-Sender perl-Mail-Sendmail perl-Params-Validate perl-Sys-Syslog perl-Email-Date-Format perl-MIME-Types

以上依赖包如果yum源没有则手动下载安装，依赖包可在以下网址搜索
perl-DBD-MySQL-4.023-6.el7.x86_64
perl-Config-Tiny-2.14-7.el7.noarch
perl-Log-Dispatch-2.41-1.el7.1.noarch
perl-Parallel-ForkManager-1.18-2.el7.noarch
perl-MIME-Lite-3.030-1.el7.noarch
perl-Mail-Sender-0.8.23-1.el7.noarch
perl-Mail-Sendmail-0.79-21.el7.noarch
perl-Email-Date-Format-1.002-15.el7.noarch
perl-MIME-Types-1.38-2.el7.noarch

若出现libmysqlclient.so.18报错问题以下是解决方式。

https://rpm.pbone.net/
wget ftp://ftp.pbone.net/mirror/download.fedora.redhat.com/pub/fedora/epel/7/aarch64/Packages/p/perl-Log-Dispatch-2.41-1.el7.1.noarch.rpm
wget ftp://ftp.pbone.net/mirror/archive.fedoraproject.org/epel/7.2020-10-05/x86_64/Packages/p/perl-Parallel-ForkManager-1.18-2.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/archive.fedoraproject.org/epel/7.2020-10-05/ppc64/Packages/p/perl-MIME-Lite-3.030-1.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/archive.fedoraproject.org/epel/7.2019-05-29/ppc64le/Packages/p/perl-Mail-Sender-0.8.23-1.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/download.fedora.redhat.com/pub/fedora/epel/7/aarch64/Packages/p/perl-Mail-Sendmail-0.79-21.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/download.fedora.redhat.com/pub/fedora/epel/7/ppc64le/Packages/p/perl-Email-Date-Format-1.002-15.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/archive.fedoraproject.org/epel/7.2020-04-20/x86_64/Packages/p/perl-MIME-Types-1.38-2.el7.noarch.rpm

安装mysql驱动
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-libs-8.0.28-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-common-8.0.28-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-libs-compat-8.0.28-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-client-plugins-8.0.28-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-5.6/MySQL-shared-5.6.51-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-client-plugins-8.0.28-1.el7.x86_64.rpm

rpm -ivh mysql-community-common-8.0.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-client-plugins-8.0.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-8.0.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-compat-8.0.28-1.el7.x86_64.rpm
rpm -ivh MySQL-shared-5.6.51-1.el7.x86_64.rpm <---> 与mysql-community-libs 互补

2.2 数据库用户权限管理

2.2.1 权限管理

在主从环境按照以下建议赋予权限，禁止赋予all privileges权限
普通用户DDL权限控制

管理员账号，账号名统一为admin_root,由项目负责人管理，用于数据初始化等临时操作

create user 'admin_root'@'%' identified by 'myttrepl@2222#TO'; 
grant alter,Alter routine,Create,Create routine,Create view,Delete,Drop,Select,Insert,Show view,Update on *.* to 'admin_root'@'%';
grant Event,Execute,Index,Lock tables,Process,References,Reload,Show databases,Trigger on *.* to 'admin_root'@'%';

应用账号权限
#无特殊需求是，默认赋予以下权限

create user 'app_du'@'%' identified by 'myttrepl@2222#TO'; 
grant select,update,delete,insert,Show view,Trigger,Execute,Alter routine on db.* to 'app_du'@'%';
grant event on *.* to 'app_du'@'%';

如果主库宕机后，能够快速上线，要避免业务写从库，这样主库可以以slave身份重新加入，并保持数据同步。
处理方式：不给业务分配超级权限账号，就可以避免从库写入（因为从库始终处于read_only模式）

2.3 配置互信

方法1：使用ssh-copy命令配置（分别在2个节点执行）

ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.111.34
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.111.35

方法2：手工配置
执行ssh-keygen -t rsa生成一对密钥
然后将公钥（id_rsa.pub）分别写入34、35两台机器的/root/.ssh/authorized_keys中

测试是否成功

ssh 192.168.111.34 date  && ssh 192.168.111.35 date

三、安装MHA

3.1 创建目录

mkdir -p /data/mha/{scripts,conf,manager}

3.2 配置在线切换、故障切换检测脚本

vi /data/mha/scripts/master_ip_failover_vip
需要修改的地方
my $vip = '192.168.111.36/24';
my $ssh_start_vip = "/usr/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/usr/sbin/ifconfig ens32:$key down";

#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
 
my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);
 
my $vip = '192.168.111.36/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens32:$key down";
 
GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);
exit &main();
sub main {
    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
    if ( $command eq "stop" || $command eq "stopssh" ) {
 
        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {
 
        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}
sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
 
sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

vi /data/mha/scripts/master_ip_online_change_vip
需要修改的地方
my $vip = '192.168.111.36/24';
my $ssh_start_vip = "/usr/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/usr/sbin/ifconfig ens32:$key down";

#!/usr/bin/env perl
use strict;  
use warnings FATAL =>'all';  
use Getopt::Long;   
my $vip = '192.168.111.36/24';  # Virtual IP  
my $key = "1";  
my $ssh_start_vip = "/sbin/ifconfig ens32:$key $vip";  
my $ssh_stop_vip = "/sbin/ifconfig ens32:$key down";  
my $exit_code = 0;  
my (  
  $command,              $orig_master_is_new_slave, $orig_master_host,  
  $orig_master_ip,       $orig_master_port,         $orig_master_user,  
  $orig_master_password, $orig_master_ssh_user,     $new_master_host,  
  $new_master_ip,        $new_master_port,          $new_master_user,  
  $new_master_password,  $new_master_ssh_user,  
);  
GetOptions(  
  'command=s'                => \$command,  
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,  
  'orig_master_host=s'       => \$orig_master_host,  
  'orig_master_ip=s'         => \$orig_master_ip,  
  'orig_master_port=i'       => \$orig_master_port,  
  'orig_master_user=s'       => \$orig_master_user,  
  'orig_master_password=s'   => \$orig_master_password,  
  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,  
  'new_master_host=s'        => \$new_master_host,  
  'new_master_ip=s'          => \$new_master_ip,  
  'new_master_port=i'        => \$new_master_port,  
  'new_master_user=s'        => \$new_master_user,  
  'new_master_password=s'    => \$new_master_password,  
  'new_master_ssh_user=s'    => \$new_master_ssh_user,  
);  
exit &main();  
sub main {  
#print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";  
  
if ( $command eq "stop" || $command eq "stopssh" ) {  
  
        # $orig_master_host, $orig_master_ip, $orig_master_port are passed.  
        # If you manage master ip address at global catalog database,  
        # invalidate orig_master_ip here.  
        my $exit_code = 1;  
        eval {  
            print "\n\n\n***************************************************************\n";  
            print "Disabling the VIP - $vip on old master: $orig_master_host\n";  
            print "***************************************************************\n\n\n\n";  
&stop_vip();  
            $exit_code = 0;  
        };  
        if ($@) {  
            warn "Got Error: $@\n";  
            exit $exit_code;  
        }  
        exit $exit_code;  
}  
elsif ( $command eq "start" ) {  
  
        # all arguments are passed.  
        # If you manage master ip address at global catalog database,  
        # activate new_master_ip here.  
        # You can also grant write access (create user, set read_only=0, etc) here.  
my $exit_code = 10;  
        eval {  
            print "\n\n\n***************************************************************\n";  
            print "Enabling the VIP - $vip on new master: $new_master_host \n";  
            print "***************************************************************\n\n\n\n";  
&start_vip();  
            $exit_code = 0;  
        };  
        if ($@) {  
            warn $@;  
            exit $exit_code;  
        }  
        exit $exit_code;  
}  
elsif ( $command eq "status" ) {  
        print "Checking the Status of the script.. OK \n";  
        `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_start_vip \"`;  
        exit 0;  
}  
else {  
&usage();  
        exit 1;  
}  
}  
# A simple system call that enable the VIP on the new master  
sub start_vip() {  
`ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;  
}  
# A simple system call that disable the VIP on the old_master  
sub stop_vip() {  
`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;  
}  
sub usage {  
print  
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";  
}

chmod +x /data/mha/scripts/master_ip_failover_vip
chmod +x /data/mha/scripts/master_ip_online_change_vip

3.3 安装MHA软件

mha4mysql-node-0.58.tar.gz —> 两节点都安装
mha4mysql-manager-0.58.tar.gz —> 两节点都安装

cd /data/mha
wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58.tar.gz
wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58.tar.gz
tar -zxvf mha4mysql-node-0.58.tar.gz
cd mha4mysql-node-0.58
perl Makefile.PL
make && make install
ln -s /data/mha/mha4mysql-node-0.58/bin/* /usr/local/bin/
ll /usr/local/bin/

tar -zxvf mha4mysql-manager-0.58.tar.gz
cd mha4mysql-manager-0.58
perl Makefile.PL
make && make install
ln -s /data/mha/mha4mysql-manager-0.58/bin/* /usr/local/bin/
ll /usr/local/bin/

3.4 在manager节点配置masterha.cnf、app.cnf（两个节点都配置）

创建mha管理账号(仅在节点1创建)

create user 'i_mha'@'192.168.%.%' identified by "myttrepl@2222#TO";
GRANT ALL PRIVILEGES ON *.* TO 'i_mha'@'192.168.%.%' with grant option;

mkdir -p /data/mha/masterha/
vi /data/mha/conf/masterha.cnf

[server default]
user=i_mha		#设置监控用户mha
password=myttrepl@2222#TO		#设置mysql中mha用户的密码，这个密码是前文中创建监控用户的那个密码

repl_user=repl		#设置复制用户的用户repl
repl_password=myttrepl@2222#TO		#设置复制用户repl的密码
ssh_user=root		#设置ssh的登录用户名
ping_interval=10		#设置监控主库，发送ping包的时间间隔10秒，默认是3秒，尝试三次没有回应的时候自动进行failover
ping_type=CONNECT		
remote_workdir=/data/mha/masterha/		#manager工作目录

vi /data/mha/conf/app.cnf

manager_workdir=/data/mha/manager/		#manager工作目录
manager_log=/data/mha/manager/manager.log	#manager日志
#workdir on the node for mysql server
master_binlog_dir=/data/mysql8.0.35/3307/binlog/		#master保存binlog的位置，这里的路径要与master里配置的binlog的路径一致，以便MHA能找到
#自动故障切换master脚本
master_ip_failover_script=/data/mha/scripts/master_ip_failover_vip
#手动切换时运行的脚本
master_ip_online_change_script= /data/mha/scripts/master_ip_online_change_vip
#指定检查的从服务器IP地址
secondary_check_script = /usr/local/bin/masterha_secondary_check -s 192.168.111.34 -s 192.168.111.35	

[server1]
hostname=192.168.111.34
ssh_port=22
port=3307
#设置为候选master，设置该参数以后，发生主从切换以后将会将此从库提升为主库，即使这个主库不是集群中最新的slave
candidate_master=1
#默认情况下如果一个slave落后master 超过100M的relay logs的话，MHA将不会选择该slave作为一个新的master， 因为对于这个slave的恢复需要花费很长时间；通过设置check_repl_delay=0，MHA触发切换在选择一个新的master的时候将会忽略复制延时，这个参数对于设置了candidate_master=1的主机非常有用，因为这个候选主在切换的过程中一定是新的master
check_repl_delay=0 #忽略relay logs日志的复制延迟

[server2]
hostname=192.168.111.35
ssh_port=22
port=3307
candidate_master=1
check_repl_delay=0

3.5 检测互信

[root@mhaserver01 conf]# masterha_check_ssh --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf
Thu Feb 22 16:34:13 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Feb 22 16:34:13 2024 - [info] Reading application default configuration from /data/mha/conf/app.cnf..
Thu Feb 22 16:34:13 2024 - [info] Reading server configuration from /data/mha/conf/app.cnf..
Thu Feb 22 16:34:13 2024 - [info] Starting SSH connection tests..
Thu Feb 22 16:34:13 2024 - [debug]
Thu Feb 22 16:34:13 2024 - [debug]  Connecting via SSH from root@192.168.111.34(192.168.111.34:22) to root@192.168.111.35(192.168.111.35:22)..
Thu Feb 22 16:34:13 2024 - [debug]   ok.
Thu Feb 22 16:34:14 2024 - [debug]
Thu Feb 22 16:34:13 2024 - [debug]  Connecting via SSH from root@192.168.111.35(192.168.111.35:22) to root@192.168.111.34(192.168.111.34:22)..
Thu Feb 22 16:34:14 2024 - [debug]   ok.
Thu Feb 22 16:34:14 2024 - [info] All SSH connection tests passed successfully.

3.6 检测主从状态

[root@mhaserver01 conf]# masterha_check_repl --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf
Fri Feb 23 10:50:09 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 23 10:50:09 2024 - [info] Reading application default configuration from /data/mha/conf/app.cnf..
Fri Feb 23 10:50:09 2024 - [info] Reading server configuration from /data/mha/conf/app.cnf..
Fri Feb 23 10:50:09 2024 - [info] MHA::MasterMonitor version 0.58.
Fri Feb 23 10:50:10 2024 - [info] GTID failover mode = 1
Fri Feb 23 10:50:10 2024 - [info] Dead Servers:
Fri Feb 23 10:50:10 2024 - [info] Alive Servers:
Fri Feb 23 10:50:10 2024 - [info]   192.168.111.34(192.168.111.34:3307)
Fri Feb 23 10:50:10 2024 - [info]   192.168.111.35(192.168.111.35:3307)
Fri Feb 23 10:50:10 2024 - [info] Alive Slaves:
Fri Feb 23 10:50:10 2024 - [info]   192.168.111.35(192.168.111.35:3307)  Version=8.0.35 (oldest major version between slaves) log-bin:enabled
Fri Feb 23 10:50:10 2024 - [info]     GTID ON
Fri Feb 23 10:50:10 2024 - [info]     Replicating from 192.168.111.34(192.168.111.34:3307)
Fri Feb 23 10:50:10 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Feb 23 10:50:10 2024 - [info] Current Alive Master: 192.168.111.34(192.168.111.34:3307)
Fri Feb 23 10:50:10 2024 - [info] Checking slave configurations..
Fri Feb 23 10:50:10 2024 - [info]  read_only=1 is not set on slave 192.168.111.35(192.168.111.35:3307).
Fri Feb 23 10:50:10 2024 - [info] Checking replication filtering settings..
Fri Feb 23 10:50:10 2024 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Feb 23 10:50:10 2024 - [info]  Replication filtering check ok.
Fri Feb 23 10:50:10 2024 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Feb 23 10:50:10 2024 - [info] Checking SSH publickey authentication settings on the current master..
Fri Feb 23 10:50:11 2024 - [info] HealthCheck: SSH to 192.168.111.34 is reachable.
Fri Feb 23 10:50:11 2024 - [info]
192.168.111.34(192.168.111.34:3307) (current master)
 +--192.168.111.35(192.168.111.35:3307)

Fri Feb 23 10:50:11 2024 - [info] Checking replication health on 192.168.111.35..
Fri Feb 23 10:50:11 2024 - [info]  ok.
Fri Feb 23 10:50:11 2024 - [info] Checking master_ip_failover_script status:
Fri Feb 23 10:50:11 2024 - [info]   /data/mha/scripts/master_ip_failover_vip --command=status --ssh_user=root --orig_master_host=192.168.111.34 --orig_master_ip=192.168.111.34 --orig_master_port=3307

IN SCRIPT TEST====/usr/sbin/ifconfig ens32:1 down==/usr/sbin/ifconfig ens32:1 192.168.111.36/24===

Checking the Status of the script.. OK
Fri Feb 23 10:50:11 2024 - [info]  OK.
Fri Feb 23 10:50:11 2024 - [warning] shutdown_script is not defined.
Fri Feb 23 10:50:11 2024 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

3.7 开启MHA-manager监控进程（mhamaster02启用manager管理）

注意：第一次启动，主库上的VIP 不会自动绑定，需要手动去绑定，主库发生故障切换会进行vip的漂移

#起服务
nohup masterha_manager --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --ignore_last_failover < /dev/null > /data/mha/run.log 2>&1 &
#停服务
masterha_stop --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf

3.8 检测MHA状态

masterha_check_status --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf
app (pid:28984) is running(0:PING_OK), master:192.168.111.34

##查看MHA日志，看到当前matser是192.168.111.34
cat /data/mha/manager/manager.log |grep "current master"
Mon Feb 26 16:43:12 2024 - [info] Checking SSH publickey authentication settings on the current master..
192.168.111.34(192.168.111.34:3307) (current master)

/data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysql --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf -uroot -h192.168.111.34 -p'y-yxoqupe33Z' -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| server_id     | 343307 |
+---------------+--------+
/data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysql --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf -uroot -h192.168.111.35 -p'y-yxoqupe33Z' -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| server_id     | 353307 |
+---------------+--------+

3.9 MHA-VIP 配置

mha-master01节点手动执行vip命令

#挂载vip
/sbin/ifconfig ens32:1 192.168.111.36/24

#卸载vip
/usr/sbin/ip addr del 192.168.111.36/24 dev ens32:1

3.10 编写脚本

3.10.1 登录数据库脚本

vi /data/mha/scripts/mha_login_mysql
#!/bin/bash
/data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysql --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf -uroot -p'y-yxoqupe33Z'

ln -s /data/mha/scripts/mha_login_mysql /usr/local/bin/mha_login_mysql

3.10.2 在线切换主从脚本

vi /data/mha/scripts/mha_switchover
#!/bin/bash
masterha_master_switch --master_state=alive --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --orig_master_is_new_slave --interactive=0 --running_updates_limit=60

ln -s /data/mha/scripts/mha_switchover /usr/local/bin/mha_switchover

chmod +x /usr/local/bin/mha_login_mysql
chmod +x /usr/local/bin/mha_switchover

四、故障模拟测试

4.1 在线手工切换（维护切换，需要把MHA监控进程关掉）：

masterha_stop --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf

masterha_master_switch --master_state=alive --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --orig_master_is_new_slave --interactive=0 --running_updates_limit=60

–orig_master_is_new_slave：把旧的master配置为从库
–running_updates_limit=60：如果主从库同步延迟在60s内都允许切换，但是但是切换的时间长短是由recover时relay 日志的大小决定

切换成功需要看到类似下面的提示：

[info] Switching master to 192.168.111.35(192.168.111.35:3307) completed successfully.

同时要查看VIP是否已经漂移到了新的主库上面，新的主库是否可读写，最后在新的主库上起manager服务。

nohup masterha_manager --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --ignore_last_failover < /dev/null > /data/mha/run.log 2>&1 &

masterha_check_status --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf
app (pid:4996) is running(0:PING_OK), master:192.168.111.35

附：MHA在线切换的原理

1. 检查当前的配置信息及主从服务器的信息
包括读取MHA的配置文件/data/mha/conf/app.cnf及检查当前slave的健康状态

2. 阻止对当前master的更新
主要通过如下步骤：
   2.1.等待1.5s（$time_until_kill_threads*100ms），等待当前连接断开。
   2.2 执行 read_only=1，阻止新的DML操作
   2.3 等待0.5s，等待当前DML操作完成。
   2.4 kill掉所有连接。
   2.5 FLUSH NO_WRITE_TO_BINLOG TABLES
   2.6 FLUSH TABLES WITH READ LOCK
   
3. 等待新master执行完所有的relay log
Waiting to execute all relay logs on 192.168.244.20(192.168.244.20:3306)..

4. 将新master的read_only设置为off，并添加VIP

5. slave切换到新master上。
   5.1 等待slave（192.168.244.30）应用完原主从复制产生的relay log，然后执行change master操作切换到新master上。
   5.2 释放原master上加的锁。
   5.3 因masterha_master_switch命令行中带有--orig_master_is_new_slave参数，故原master也切换为新master的从。

6. 清理新master的相关信息。
主要是执行了reset slave all操作，清除之前的复制信息。

4.2 故障手工切换（MHA进程没启动或者挂了的同时主库也挂了）：

4.2.1 模拟场景

#主库35停manager监控服务
masterha_stop --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf

#主库35停数据库服务
/data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysqladmin --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf -uroot -p shutdown

4.2.2 故障手工切换命令

masterha_master_switch --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --dead_master_host=192.168.111.35 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.111.34  --new_master_port=3307 --ignore_last_failover

切换成功需要看到类似如下提示：

Started manual(interactive) failover.
Invalidated master IP address on 192.168.111.35(192.168.111.35:3307)
Selected 192.168.111.34(192.168.111.34:3307) as a new master.
192.168.111.34(192.168.111.34:3307): OK: Applying all logs succeeded.
192.168.111.34(192.168.111.34:3307): OK: Activated master IP address.
192.168.111.34(192.168.111.34:3307): Resetting slave info succeeded.
Master failover to 192.168.111.34(192.168.111.34:3307) completed successfully.

表示成功切换，切换成功后，查看VIP是否漂移到了从库上(切换成功后，MHA进程会自动停止)，同时查看数据库是否可读写。

故障主库起来后，需要确认数据是否跟新的主库一样，如果一样，那么就把故障主库作为新的从库加入新主库下。然后在主库上启动MHA进程。

4.2.3 将故障主库35数据库服务拉起，然后加入集群成为从库。

nohup /data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysqld_safe --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf &

#数据库服务拉起来后，配置从库
change master to
master_host='192.168.111.34',
master_port=3307,
master_user='repl',
master_password='myttrepl@2222#TO',
master_auto_position=1;

start slave;	//启动slave进程

 --查看slave状态
确认IO线程、SQL线程都以运行
mysql> show slave status\G;	//查看当前的从库状态
    Slave_IO_Running: Yes			//IO线程已运行
     Slave_SQL_Running: Yes	//SQL线程已运行
    mysql> show slave status\G;	//查看当前的从库状态

4.2.4 主库启动manager服务并查看MHA状态

nohup masterha_manager --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --ignore_last_failover < /dev/null > /data/mha/run.log 2>&1 &
	
masterha_check_status --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf
app (pid:14837) is running(0:PING_OK), master:192.168.111.34

五、MHA 日常维护命令集：

1 查看ssh 登陆是否成功

masterha_check_ssh --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf

2 查看主从同步情况

masterha_check_repl  --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf

3 检查启动的状态

masterha_check_status  --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf

4 停止mha

masterha_stop --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf

5 启动mha

nohup masterha_manager --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --ignore_last_failover < /dev/null > /data/mha/run.log 2>&1 &

注意：当有slave 节点宕掉的情况是启动不了的，加上–ignore_fail_on_start 即使有节点宕掉也能启动mha，需要在配置文件中设置ignore_fail=1

6 failover 后下次重启

每次failover 切换后会在管理目录生成文件app.failover.complete ，下次在切换的时候会发现有这个文件导致切换不成功，需要手动清理掉。
rm -rf /masterha/app1/app1.failover.complete
也可以加上参数--ignore_last_failover

7 手工failover

手工failover 场景，master 死掉，但是masterha_manager 没有开启，可以通过手工failover：
masterha_master_switch --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --dead_master_host=old_ip --dead_port=port --master_state=dead --new_master_host=new_ip --new_master_port=port --ignore_last_failover

8 masterha_manager 是一种监视和故障转移的程序。另一方面,masterha_master_switch 程序不监控主库。masterha_master_switch 可以用于主库故障转移,也可用于在线总开关。

9 手动在线切换(比如做维护切换时)

masterha_master_switch ---global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --master_state=alive --new_master_host=192.168.111.34 --orig_master_is_new_slave
或者
masterha_master_switch --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --master_state=alive --new_master_host=192.168.111.34 -orig_master_is_new_slave --running_updates_limit=10000

--orig_master_is_new_slave 切换时加上此参数是将原master 变为slave 节点，如果不加此参数，原来的master 将不启动
--running_updates_limit=10000 切换时候选master 如果有延迟的话，mha 切换不能成功，加上此参数表示延迟在此时间范围内都可切换（单位为s），但是切换的时间长短是由recover时relay 日志的大小决定

手动在线切换mha，切换时需要将在运行的mha 停掉后才能切换。
在备库先执行DDL，一般先stop slave，一般不记录mysql 日志，可以通过set SQL_LOG_BIN =0 实现。然后进行一次主备切换操作，再在原来的主库上执行			DDL。这种方法适用于增减索引，如果是增加字段就需要额外注意。