企业生产场景mysql主从复制故障原因
企业生产场景mysql主从复制故障原因
实验一:
目的:解决主从不同步(本例中sql线程出现问题)
方法:模拟故障场景
1.在SLAVE上建立一个名为yingying数据库。
mysql> create database yingying;
2.在MASTER上建立一个名为yingying数据库。
mysql> create database yingying;
3.查看SALVE上的状态。
mysql> show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.254.253
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000004
Read_Master_Log_Pos: 384
Relay_Log_File: localhost-relay-bin.000010
Relay_Log_Pos: 253
Relay_Master_Log_File: mysql-bin.000004
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1007 //mysql错误代码
Last_Error: Error 'Can't create database 'yingying'; database exists' on query. Default database:
'yingying'. Query: 'create database yingying'
Skip_Counter: 0
Exec_Master_Log_Pos: 293
Relay_Log_Space: 504
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1007
Last_SQL_Error: Error 'Can't create database 'yingying'; database exists' on query. Default database:
'yingying'. Query: 'create database yingying'
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
4.解决措施:
------------------not gtid-----------
1. 直接drop database,如果库里的数据不要了
mysql> drop database yingying;
Query OK, 0 rows affected (0.31 sec)
2.跳过事务
mysql > stop slave;
mysql>set global sql_slave_skip_counter=1; //往下跳一个指针,1可以更换其他数字,数字越大,丢失的数据越多。
------GTID---------
同步报错:
Last_Errno: 1007
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'df8f817c-f215-11e8-83e4-525400950067:5' at master log mysql-bin.000004, end_log_pos 359. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
slave库报错1007,是slave库上已经存在库名,然而master库上又创建相同的库名,从而导致的报错。
分析解决过程:
主库查看binlog文件mysql-bin.000004内容,发现是一开始配置主从库时,主从server_id相同导致的复制失败,于是修改主库的server_id,记录到mysql-bin.000004文件中(从提交的事物的Gtid可以看出来).
[root@VM_82_178_centos binlog]# /usr/local/mysql7/bin/mysqlbinlog -vv --base64-output=decode-rows mysql-bin.000004|grep -B 25 'end_log_pos 359'
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
#at 4
#181127 15:51:00 server id 2333306 end_log_pos 123 CRC32 0x4acc26d3 Start: binlog v 4, server v 5.7.24-log created 181127 15:51:00
#Warning: this binlog is either in use or was not closed properly.
#at 123
#181127 15:51:00 server id 2333306 end_log_pos 194 CRC32 0x37fe34ae Previous-GTIDs
#df8f817c-f215-11e8-83e4-525400950067:1-4
#at 194
#181127 16:09:58 server id 1313306 end_log_pos 259 CRC32 0x6d246cbf GTID last_committed=0 sequence_number=1 rbr_only=no
SET @@SESSION.GTID_NEXT= 'df8f817c-f215-11e8-83e4-525400950067:5'/*!*/;
#at 259
#181127 16:09:58 server id 1313306 end_log_pos 359 CRC32 0x37df96be Query thread_id=6 exec_time=0 error_code=0
slave库上跳过这个事务:
root@localhost [(none)]>show variables like "%gtid%";
+----------------------------------+-----------+
| Variable_name | Value |
+----------------------------------+-----------+
| binlog_gtid_simple_recovery | ON |
| enforce_gtid_consistency | ON |
| gtid_executed_compression_period | 1000 |
| gtid_mode | ON |
| gtid_next | AUTOMATIC |
| gtid_owned | |
| gtid_purged | |
| session_track_gtids | OFF |
+----------------------------------+-----------+
8 rows in set (0.00 sec)
注入空事务:
root@localhost [(none)]>stop slave sql_thread; ###此步骤可以忽略
Query OK, 0 rows affected (0.02 sec)
root@localhost [(none)]>set gtid_next='df8f817c-f215-11e8-83e4-525400950067:5';
Query OK, 0 rows affected (0.00 sec)
root@localhost [(none)]>begin;
Query OK, 0 rows affected (0.00 sec)
root@localhost [(none)]>commit;
Query OK, 0 rows affected (0.00 sec)
root@localhost [(none)]>show master status\G
*************************** 1. row ***************************
File: mysql-bin.000001
Position: 356
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: df8f817c-f215-11e8-83e4-525400950067:5
1 row in set (0.01 sec)
root@localhost [(none)]>set gtid_next='AUTOMATIC';
Query OK, 0 rows affected (0.00 sec)
root@localhost [(none)]>start slave sql_thread;
Query OK, 0 rows affected (0.02 sec)
到此处主从复制报错解决:
[root@localhost data]# mysql -e "show slave status\G" |grep -i "yes"
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
mysql >start slave;
注释:对于普通的互联网业务,忽略问题不是很大。当然要确认不影响公司业务的前提下。企业场景解决主从同步,比主从不一致对当前更重要,然后如果主从数据一致也很重要,在找个时间恢复下这个从库。 主从数据不一致更重要还是主从同步持续状态更重要。根据业务选择。
5.验证:
mysql> show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.254.253
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000004
Read_Master_Log_Pos: 384
Relay_Log_File: localhost-relay-bin.000011
Relay_Log_Pos: 253
Relay_Master_Log_File: mysql-bin.000004
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 384
Relay_Log_Space: 650
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
----------------------------------------------------------------------------------------------
实验二:
目的:主从不同步(通过忽略错误号)
方法:1. 在my.cnf配置文件中mysqld中,增加忽略语句
slave-skip-errors = 1032,1062,1007 //错误号可以查询mysql错误代码
2.重启mysql数据库
3.在SLAVE中建立一个数据库 //验证
4.在MASTER中建立一个数据库 //验证
5.查看SLAVE状态 : show slave status;
注释:不同的数据库版本会引起不同步,低版本到高版本可以,但是高版本不能往低版本同步。