数据库高可用--MySQL--高可用之keepalive+MySQL

栏目:云苍穹知识作者:金蝶来源:金蝶云社区发布:2024-09-23浏览:1

数据库高可用--MySQL--高可用之keepalive+MySQL

1. 高可用方案说明

MySQL高可用方案主要目标是保证MySQL主库异常机后从库能够快速承接业务,从而保证业务的连续性。MySQL的高可用都是基于MySQL主从实现的,然后组合其他的中间件形成适合公司业务的高可用架构方案。高可用方案要考虑架构的稳定性,时效性,一致性,这三者做很好的平衡。

目前主流的MySQL高可用架构有keepalived+一主一从vip/dns+一主多从+mhavip/dns+一主多从+ Orchestrator还可以加入代理中间做负载均衡分库分表等,类似的中间件有haproxyproxysql

mycatdble,sharingsphere等

组件分类

Keepalivednsconsulzookeeper等这些主要是实现应用程序的统一的访问入口

MhaOrchestrator主要是实现主从关系的管理和自动切换

haproxyproxysqlmycatdble sharingsphere等主要是实现负载均衡,访问流量管理,分库分表

目前苍穹数据库都是一主一从架构,这样搭配keepalived,采用keepalived+一主一从的高可用方案是最合适的。这种方案最大的好处就是简单可控。本文档不会详细讲解高可用架构的安装部署,主要是测试MySQL异步和半同步的性能对比,高可用各种故障情况的测试。


测试服务器信息

IP地址

主从

版本

192.168.63.65

8.0.26

192.168.63.63

8.0.26





2. MySQL异步和半同步复制

异步复制:

应用发起数据更新(含insertupdatedelete操作)请求,master在执行完更新操作后立即向应用程序返回响应,然后master在向slave同步数据。

数据更新过程中master不需要等待slave的响应,因此异步复制的数据库实例通常具有较高的性能,且slave不可以用并不影响master对外提供服务。但因数据并不是实时同步到slave,而masterslave有延迟的情况下发生故障则有较小的概率会引起数据不一致。

半同步复制:

应用发起数据更新(含insertupdatedelete操作)请求,master在执行完更新操作后立即向slave复制数据,slave接收到数据并写到relaylog中(不需要执行)后才向master返回成功的信息,master必须在接受到slave的成功信息后再向应用程序返回响应。

仅在数据复制发生异常(slave节点不可以用或者数据复制所用的网络发生异常)的情况下,master会暂停(MySQL默认10s左右)对应用的响应,将复制方式降为异步复制。当数据复制恢复正常,将恢复为半同步复制。


2.1MySQL安装和主从搭建

1.创建用户

/usr/sbin/groupadd mysql

/usr/sbin/useradd -g mysql mysql


2.解压文件

tar -xvJf mysql-8.0.26-linux-glibc2.12-x86_64.tar.xz -C /usr/local

cd /usr/local

ln -s mysql-8.0.26-linux-glibc2.12-x86_64 mysql80


3.创建目录结构

mkdir -pv /data/dbdata/mysqldata

mkdir -pv /data/dbdata/mysqllog

mkdir -pv /data/dbdata/mysqllog/binlog

mkdir -pv /data/dbdata/mysqllog/relay-log


4.设置配置文件

拷贝模板my.cnf /etc/my.cnf

修改目录权限

chown -R mysql.mysql /data/dbdata

chown -R mysql.mysql /usr/local/mysql80


5.初始化数据库

/usr/local/mysql80/bin/mysqld --defaults-file=/home/mysql3308/etc/my.cnf --initialize --user=mysql


建立主从关系在从库执行

CHANGE MASTER TO MASTER_HOST='192.168.63.65',MASTER_PORT=3307,MASTER_USER='repl',MASTER_PASSWORD='replPAS',MASTER_LOG_FILE='binlog.000105',MASTER_LOG_POS=156;


2.2半同步安装部署

主库查看是否安装半同步复制插件:rpl_semi_sync_master

show plugins;

使用二进制或者源码编译安装,用find查找模块是否存在

find / -name "semisync*"


主和所有从节点均需安装并开启插件

主库安装

install plugin rpl_semi_sync_master soname 'semisync_master.so';

从库安装

install plugin rpl_semi_sync_slave soname 'semisync_slave.so';

show variables like '%semi%';

+-------------------------------------------+------------+

| Variable_name                             | Value      |

+-------------------------------------------+------------+

| rpl_semi_sync_master_enabled              | OFF        |

| rpl_semi_sync_master_timeout              | 10000      |

| rpl_semi_sync_master_trace_level          | 32         |

| rpl_semi_sync_master_wait_for_slave_count | 1          |

| rpl_semi_sync_master_wait_no_slave        | ON         |

| rpl_semi_sync_master_wait_point           | AFTER_SYNC |

+-------------------------------------------+------------+


主相关系统变量

rpl_semi_sync_master_enabled:主库是否启用了半同步复制,默认为OFF

rpl_semi_sync_master_timeout:等待从库的ACK回复的超时时间,默认为10秒。

rpl_semi_sync_master_trace_level:半同步复制时主库的调试级别。

rpl_semi_sync_master_wait_for_slave_count:主库在超时时间内需要收到多少个ACK回复才认为此次提交成功,否则就降级为异步复制。该变量在MySQL 5.7.3才提供,在此之前的版本都默认为收到1ACK则确认成功,且不可更改。MySQL 5.7.3之后该变量的默认值也是1

rpl_semi_sync_master_wait_no_slave:默认值为ON,当状态变量Rpl_semi_sync_master_clients中的值小于rpl_semi_sync_master_wait_for_slave_count时,Rpl_semi_sync_master_status依旧为ON,只有当事务提交后等待rpl_semi_sync_master_timeout超时后,Rpl_semi_sync_master_status才会变为OFF,即降级为异步复制;为OFF时,当状态变量Rpl_semi_sync_master_clients中的值小于rpl_semi_sync_master_wait_for_slave_count时,Rpl_semi_sync_master_status立即显示为OFF,即立即降级为异步复制。

rpl_semi_sync_master_wait_point:控制主库上commit、接收ack、返回消息给客户端的时间点。值为 AFTER_SYNC AFTER_COMMIT。该选项是MySQL5.7.2后引入的,默认值为 AFTER_SYNC。此版本之前,等价于使用了 AFTER_COMMIT 模式。

从库

show variables like '%semi%';

+---------------------------------+-------+

| Variable_name                   | Value |

+---------------------------------+-------+

| rpl_semi_sync_slave_enabled     | OFF   |

| rpl_semi_sync_slave_trace_level | 32    |

+---------------------------------+-------+


从相关系统变量:

rpl_semi_sync_slave_enabled从库是否开启半同步复制。

rpl_semi_sync_slave_trace_level从库的调试级别。


启用半同步复制

--

set global rpl_semi_sync_master_enabled = 1;

--

set global rpl_semi_sync_slave_enabled = 1;


长期生效修改my.conf配置文件

#主库

plugin-load="rpl_semi_sync_master=semisync_master.so"

rpl_semi_sync_master_enabled=1


#主库

plugin-load="rpl_semi_sync_slave=semisync_slave.so"

rpl_semi_sync_slave_enabled=1


启从库上的I/O线程

stop slave io_thread;

start slave io_thread;


查看半同步是否在运行

#主库

mysql> show status like 'Rpl_semi_sync_master_status';

+-----------------------------+-------+

| Variable_name               | Value |

+-----------------------------+-------+

| Rpl_semi_sync_master_status | ON    |

+-----------------------------+-------+

1 row in set (0.01 sec)


#从库

mysql> show status like 'Rpl_semi_sync_slave_status';

+----------------------------+-------+

| Variable_name              | Value |

+----------------------------+-------+

| Rpl_semi_sync_slave_status | ON    |

+----------------------------+-------+

1 row in set (0.00 sec)


这两个变量常用来监控主从是否运行在半同步复制模式下。至此,MySQL半同步复制搭建完毕。

从库上只有一个半同步复制相关的状态变量Rpl_semi_sync_slave_status,为ON时表示从库使用半同步复制,OFF表示从库使用异步复制。


主库有14个半同步复制相关的状态变量

Rpl_semi_sync_master_clients:当前连接了多少个半同步从库。

Rpl_semi_sync_master_net_avg_wait_time:主库等待从库回复的平均时间,以微秒为单位。此变量始终为0,不推荐使用,并且将在以后的版本中删除。

Rpl_semi_sync_master_net_wait_time:主库等待从库回复的总时间,以微秒为单位。此变量始终为0,不推荐使用,并且将在以后的版本中删除。

Rpl_semi_sync_master_net_waits:主库等待从库回复的总次数。

Rpl_semi_sync_master_no_times:主库关闭半同步复制的次数。

Rpl_semi_sync_master_no_tx从库未成功确认的事务数。

Rpl_semi_sync_master_status:为ON时表示主库使用半同步复制,OFF表示主库使用异步复制。

Rpl_semi_sync_master_timefunc_failures:调用gettimeofday等时间函数时主库失败的次数。

Rpl_semi_sync_master_tx_avg_wait_time:主库等待一个事务的平均时间,以微秒为单位。

Rpl_semi_sync_master_tx_wait_time:主等待事务的总时间,以微秒为单位。

Rpl_semi_sync_master_tx_waits:主库等待事务的总次数。

Rpl_semi_sync_master_wait_pos_backtraverse:主库等待事件的二进制坐标低于之前等待事件的总次数。当事务开始等待回复的顺序与其二进制日志事件的写入顺序不同时,就会发生这种情况。

Rpl_semi_sync_master_wait_sessions:当前等待从库回复的会话数。

Rpl_semi_sync_master_yes_tx从库成功确认的事务数。


2.3异步和半同步性能测试对比

1.导入47G dump数据文件到数据库

Mysqldump  -uadmin h192.168.63.65 -P3307 pxxxx < pt_bizbl_big_wfs.sql

异步:耗时(s): 811s

半同步:耗时(s): 1021s


2.使用sysbench测试

sysbench /usr/local/share/sysbench/tests/include/oltp_legacy/oltp.lua --mysql-host=192.168.63.65 --mysql-port=3307 --mysql-user=admin --mysql-password='kd$2020' --mysql-db=testdb --db-driver=mysql --oltp-test-mode=complex --oltp-tables-count=32 --oltp-table-size=1000000 --threads=64 --time=120 --report-interval=10  prepare


sysbench /usr/local/share/sysbench/tests/include/oltp_legacy/oltp.lua --mysql-host=192.168.63.65 --mysql-port=3307 --mysql-user=admin --mysql-password='kd$2020' --mysql-db=testdb --db-driver=mysql --oltp-test-mode=complex --oltp-tables-count=32 --oltp-table-size=1000000 --threads=64 --time=120 --report-interval=10  run


异步QPS:3271.60 per sec

半同步QPS:2934.64 per sec


测试方式

异步复制

半同步复制

性能损耗

导入数据测试[单位s]

811s

1021s

25.89%

Sysben测试[单位qps]

3271.60 per sec

2934.64 per sec

11.48%


测试结论:半同步复制比异步复制性能要下降10-25%之间


3. keepalived高可用测试

目前苍穹采用这种高可用方案,优先保证数据一致性,只有在严格的情况下才会触发VIP漂移,并且切换到从库后需要手动关闭只读属性。mysql发生oomcarsh 不会切换,mysql异常关闭后如果能够重新启动也不会切换,数据库压力高,超过最大连接数等不会切换

相对于目前私有云公有云keepalived+一主一从方案,本方案有以下几点改进

keepalived无主备概念,无优先级等级,不抢占模式,谁先启动谁就是master

整个运行过程中不会关闭keepalived,除非需要长时间停掉mysql要手动停止keepalive,减少运维复杂

Mysql挂掉后,会判断主从延迟情况,如果同步不一致不会做切换,如果同步一致后会做切换

mysql发生oomcarsh 不会切换,mysql挂掉自动重启一次

如果高可用要降级,可以停掉从库的keepalived,全程手动切换,不会导致数据一致性问题

Keepalived日志独立,不和系统日志打印在一起,方便查看问题

主从同步检测不依赖主库,当主库机无法ssh登录时不会检测失败

MySQL高可用严重依赖主从同步,如果主从中断,延迟了,主机挂掉的时候从库就不能及时接管业务,这样就等于没有高可用。所以MySQL主从同步一定要做好监控,这个告警等级要和MySQL服务down掉一样高。


3.1. Keepalived部署常见问题

keepalived配置文件

备采用相同的配置

Cat /etc/keepalived.conf

global_defs {

router_id mysql_65_63

script_user root

enable_script_security

}

vrrp_script chk_mysql {

script "/etc/keepalived/check_mysql.sh"

       interval 30

       fall 3

}

vrrp_instance VI_1 {

state BACKUP

interface eth0

virtual_router_id 69

mcast_src_ip 192.168.63.65

priority 100

nopreempt

advert_int 3

authentication {

auth_type PASS

auth_pass kingdee2020

}

track_script {

chk_mysql

}

virtual_ipaddress {

192.168.63.69

}

#notify_master /etc/keepalived/readonly_off.sh

notify_backup /etc/keepalived/readonly_on.sh

notify_stop   /etc/keepalived/readonly_on.sh

}


常见报错信息

报错Configuration file '/etc/keepalived/keepalived.conf' is not a regular non-executable file

修改配置文件权限

chmod 664 /etc/keepalived/keepalived.conf


日志中输出如下

WARNING - default user 'keepalived_script' for script execution does not exist - please create

添加脚本运行任务

script_user root

enable_script_security


如果报这种错误,说明脚本执行时间超过interval,没有执行完成就要重新开始执行

Track script chk_mysql is being timed out, expect idle - skipping run

修改参数interval 调大间隔时间


MySQL检测脚本

script "/etc/keepalived/check_mysql.sh"脚本


脚本中的变量要根据实际情况替换当前环境的值

#变量需要根据实际情况修改

my_cnf="/etc/my.cnf"

mysql_bin="/usr/local/mysql80/bin"

mysql_slave_ip="192.168.63.63"

mysql_slave_port="3307"

mysql_slave_user="admin"

mysql_slave_passwd='kd$2020'


mysql_service="mysqld_8.0_3307"  ##初始安装需要修改


日志输出配置

日志默认是输出到/var/log/message,如果需要改变keepalive日志输出的位置

配置的是KEEPALIVED_OPTIONS="-D -d -S 0";而-Ssyslogfacility0表示放在local0,在/etc/rsyslog.conf 中配置local0.* /var/log/keepalived.log

systemctl restart rsyslog


直接修改/usr/lib/systemd/system/keepalived.service

ExecStart=/usr/local/keepalived/sbin/keepalived -D -d -S 0

systemctl daemon-reload


systemctl restart keepalived


主库keepalive成功启动日志如下

Jun 23 10:30:03 centos7-65 Keepalived[47088]: Starting Keepalived v1.4.0 (12/29,2017)

Jun 23 10:30:03 centos7-65 Keepalived[47088]: Running on Linux 3.10.0-123.el7.x86_64 #1 SMP Mon Jun 30 12:09:22 UTC 2014 (built for Linux 3.10.0)

Jun 23 10:30:03 centos7-65 Keepalived[47088]: Opening file '/etc/keepalived/keepalived.conf'.

Jun 23 10:30:03 centos7-65 systemd: PID file /var/run/keepalived.pid not readable (yet?) after start.

Jun 23 10:30:03 centos7-65 Keepalived[47089]: Starting Healthcheck child process, pid=47090

Jun 23 10:30:03 centos7-65 systemd: Started Keepalived.

Jun 23 10:30:03 centos7-65 Keepalived[47089]: Starting VRRP child process, pid=47091

Jun 23 10:30:03 centos7-65 Keepalived_healthcheckers[47090]: Opening file '/etc/keepalived/keepalived.conf'.

Jun 23 10:30:03 centos7-65 Keepalived_vrrp[47091]: Registering Kernel netlink reflector

Jun 23 10:30:03 centos7-65 Keepalived_vrrp[47091]: Registering Kernel netlink command channel

Jun 23 10:30:03 centos7-65 Keepalived_vrrp[47091]: Registering gratuitous ARP shared channel

Jun 23 10:30:03 centos7-65 Keepalived_vrrp[47091]: Opening file '/etc/keepalived/keepalived.conf'.

Jun 23 10:30:03 centos7-65 Keepalived_vrrp[47091]: Truncating auth_pass to 8 characters

Jun 23 10:30:03 centos7-65 Keepalived_vrrp[47091]: VRRP_Instance(VI_1) removing protocol VIPs.

Jun 23 10:30:03 centos7-65 Keepalived_vrrp[47091]: Using LinkWatch kernel netlink reflector...

Jun 23 10:30:03 centos7-65 Keepalived_vrrp[47091]: VRRP_Instance(VI_1) Entering BACKUP STATE

Jun 23 10:30:03 centos7-65 Keepalived_vrrp[47091]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]

Jun 23 10:30:03 centos7-65 Keepalived_vrrp[47091]: VRRP_Script(chk_mysql) succeeded   ###执行脚本

Jun 23 10:30:12 centos7-65 Keepalived_vrrp[47091]: VRRP_Instance(VI_1) Transition to MASTER STATE

Jun 23 10:30:15 centos7-65 Keepalived_vrrp[47091]: VRRP_Instance(VI_1) Entering MASTER STATE  ###进入MASTER状态

Jun 23 10:30:15 centos7-65 Keepalived_vrrp[47091]: VRRP_Instance(VI_1) setting protocol VIPs.

Jun 23 10:30:15 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 10:30:15 centos7-65 Keepalived_vrrp[47091]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on eth0 for 192.168.63.69

Jun 23 10:30:15 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 10:30:15 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 10:30:15 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 10:30:15 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 10:30:15 centos7-65 avahi-daemon[2226]: Registering new address record for 192.168.63.69 on eth0.IPv4.  ###注册虚拟ip地址

Jun 23 10:30:20 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 10:30:20 centos7-65 Keepalived_vrrp[47091]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on eth0 for 192.168.63.69

Jun 23 10:30:20 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 10:30:20 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 10:30:20 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 10:30:20 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69


备库keepalive启动日志

Jun 23 11:17:42 tikv3 systemd: Starting Keepalived...

Jun 23 11:17:42 tikv3 Keepalived[88839]: Starting Keepalived v1.4.0 (12/29,2017)

Jun 23 11:17:42 tikv3 Keepalived[88839]: Running on Linux 3.10.0-123.el7.x86_64 #1 SMP Mon Jun 30 12:09:22 UTC 2014 (built for Linux 3.10.0)

Jun 23 11:17:42 tikv3 Keepalived[88839]: Opening file '/etc/keepalived/keepalived.conf'.

Jun 23 11:17:42 tikv3 Keepalived[88840]: Starting Healthcheck child process, pid=88841

Jun 23 11:17:42 tikv3 Keepalived[88840]: Starting VRRP child process, pid=88842

Jun 23 11:17:42 tikv3 Keepalived_healthcheckers[88841]: Opening file '/etc/keepalived/keepalived.conf'.

Jun 23 11:17:42 tikv3 Keepalived_vrrp[88842]: Registering Kernel netlink reflector

Jun 23 11:17:42 tikv3 Keepalived_vrrp[88842]: Registering Kernel netlink command channel

Jun 23 11:17:42 tikv3 Keepalived_vrrp[88842]: Registering gratuitous ARP shared channel

Jun 23 11:17:42 tikv3 Keepalived_vrrp[88842]: Opening file '/etc/keepalived/keepalived.conf'.

Jun 23 11:17:42 tikv3 Keepalived_vrrp[88842]: Truncating auth_pass to 8 characters

Jun 23 11:17:42 tikv3 Keepalived_vrrp[88842]: VRRP_Instance(VI_1) removing protocol VIPs.

Jun 23 11:17:42 tikv3 Keepalived_vrrp[88842]: Using LinkWatch kernel netlink reflector...

Jun 23 11:17:42 tikv3 Keepalived_vrrp[88842]: VRRP_Instance(VI_1) Entering BACKUP STATE  ###进入BACKUP状态

Jun 23 11:17:42 tikv3 Keepalived_vrrp[88842]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]


3.2. 高可用模拟故障测试

机器+keepalived+脚本检测,三种状态等同,一般情况keepalive不会挂掉,一般只是人为关闭或者和机器同时挂掉

故障编号

故障说明

解释

RT_MySQL_01

只启动单个keepalived

作用就是启用一个VIP,出现故障手动启keepaliveed来做vip转移,目前有部分客户采用次方案

RT_MySQL_02

主库MySQL挂掉,主从同步正常

Oomcrash recovery不会切换,sleep 10 等待自己启动mysql进程不在还会自动重启一下,这个可以看情况也可以去掉如果确实需要停掉mysql主库,又不想vip漂移,可以先停掉从的keepalived然后停掉主的keepalived

RT_MySQL_03

主库MySQL挂掉,主从同步不一致

当主从同步不一致不能切换会导致数据一致如果同步跟上一致了会发生切换

RT_MySQL_04

主库keepalive挂掉

一般情况不会出现除非是人为停掉或者机器挂掉这种不会做mysql的任何检测包括主从直接把vip漂移过去从库只读关闭要人工确认

RT_MySQL_05

主库机器挂掉

同上

RT_MySQL_06

主库和从库网络之间不通

这种情况会存在多个vip的情况出现脑裂情况需要手工停掉一个keepalived

RT_MySQL_07

主库负载高

检测mysql进程是否存在出现负载高,数据库hang住,超过连接数等都要手工处理,不会做切换

RT_MySQL_08

从库挂

无影响从库检测也会执行也会执行check_mysql.sh逻辑,会把从库重启一次

RT_MySQL_09

从库keepalived挂掉

无影响

RT_MySQL_10

主库挂掉,从库挂keepalived运行正常

VIP消失


RT_MySQL_01

只启动单个keepalive,提供VIP的服务,这种情况下vip不会漂移,有故障发生手动启动从库的keepalive。这样方案的好处就是不会有数据不一致的问题。

关闭mysql服务,VIPdownkeepalived输出信息如下

Jun 23 10:59:03 centos7-65 Keepalived_vrrp[47091]: VRRP_Script(chk_mysql) failed (exited with status 1)

Jun 23 10:59:05 centos7-65 Keepalived_vrrp[47091]: VRRP_Instance(VI_1) Entering FAULT STATE

Jun 23 10:59:05 centos7-65 Keepalived_vrrp[47091]: VRRP_Instance(VI_1) removing protocol VIPs.

Jun 23 10:59:05 centos7-65 Keepalived_vrrp[47091]: VRRP_Instance(VI_1) Now in FAULT state

Jun 23 10:59:05 centos7-65 avahi-daemon[2226]: Withdrawing address record for 192.168.63.69 on eth0.


重新把mysql拉起来,keepalive日志,vip重新注册成功

Jun 23 11:04:06 centos7-65 Keepalived_vrrp[47091]: VRRP_Script(chk_mysql) succeeded

Jun 23 11:04:12 centos7-65 Keepalived_vrrp[47091]: Kernel is reporting: interface eth0 UP

Jun 23 11:04:12 centos7-65 Keepalived_vrrp[47091]: VRRP_Instance(VI_1): Entering BACKUP STATE

Jun 23 11:04:22 centos7-65 Keepalived_vrrp[47091]: VRRP_Instance(VI_1) Transition to MASTER STATE

Jun 23 11:04:25 centos7-65 Keepalived_vrrp[47091]: VRRP_Instance(VI_1) Entering MASTER STATE

Jun 23 11:04:25 centos7-65 Keepalived_vrrp[47091]: VRRP_Instance(VI_1) setting protocol VIPs.

Jun 23 11:04:25 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 11:04:25 centos7-65 Keepalived_vrrp[47091]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on eth0 for 192.168.63.69

Jun 23 11:04:25 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 11:04:25 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 11:04:25 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 11:04:25 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 11:04:25 centos7-65 avahi-daemon[2226]: Registering new address record for 192.168.63.69 on eth0.IPv4.

Jun 23 11:04:30 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 11:04:30 centos7-65 Keepalived_vrrp[47091]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on eth0 for 192.168.63.69

Jun 23 11:04:30 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 11:04:30 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 11:04:30 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jun 23 11:04:30 centos7-65 Keepalived_vrrp[47091]: Sending gratuitous ARP on eth0 for 192.168.63.69


测试结果【mysql挂掉VIPdown掉】


RT_MySQL_02

主库MySQL挂掉,主从同步正常,其中会排查掉oomcrash挂掉的情况

第一种测试kill -9 `pidof mysqld` 模拟carsh或者oom情况

mysql日志输出

2022-07-13T13:43:55.176671+08:00 0 [Warning] [MY-010097] [Server] Insecure configuration for --secure-file-priv: Current value does not restrict location of generated files. Consider setting it to a valid, non-empty path.

2022-07-13T13:43:55.176787+08:00 0 [System] [MY-010116] [Server] /usr/local/mysql80/bin/mysqld (mysqld 8.0.26) starting as process 97886

2022-07-13T13:43:55.188148+08:00 0 [Warning] [MY-000081] [Server] option 'innodb-ft-total-cache-size': unsigned value 6400000000 adjusted to 1600000000.

2022-07-13T13:43:55.188371+08:00 0 [Warning] [MY-000081] [Server] option 'innodb-ft-result-cache-limit': unsigned value 40000000000 adjusted to 4294967295.

2022-07-13T13:43:55.192936+08:00 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.

2022-07-13T13:44:03.815637+08:00 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.

2022-07-13T13:44:04.551760+08:00 0 [Warning] [MY-010736] [Server] Couldn't load plugin named 'rpl_semi_sync_master' with soname 'semisync_master.so'.

2022-07-13T13:44:04.558550+08:00 0 [Warning] [MY-010918] [Repl] 'rpl_semi_sync_master' is deprecated and will be removed in a future release. Please use rpl_semi_sync_source instead.

2022-07-13T13:44:04.597628+08:00 0 [System] [MY-010229] [Server] Starting XA crash recovery...

2022-07-13T13:44:04.610862+08:00 0 [System] [MY-010232] [Server] XA crash recovery finished.

keepalive无日志

测试结果【不会切换】


第二种情况关闭mysql,能够重启成功,这种下也不会切换

mysql日志输出

2022-07-13T14:12:33.068072+08:00 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.26).

2022-07-13T14:12:43.160249+08:00 0 [System] [MY-010910] [Server] /usr/local/mysql80/bin/mysqld: Shutdown complete (mysqld 8.0.26)  MySQL Community Server - GPL.


2022-07-13T14:12:55.838827+08:00 0 [System] [MY-010116] [Server] /usr/local/mysql80/bin/mysqld (mysqld 8.0.26) starting as process 113285

2022-07-13T14:12:55.849266+08:00 0 [Warning] [MY-000081] [Server] option 'innodb-ft-total-cache-size': unsigned value 6400000000 adjusted to 1600000000.

2022-07-13T14:12:55.849551+08:00 0 [Warning] [MY-000081] [Server] option 'innodb-ft-result-cache-limit': unsigned value 40000000000 adjusted to 4294967295.

2022-07-13T14:12:55.853794+08:00 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.

2022-07-13T14:13:04.331102+08:00 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.


keepalive无日志

测试结果【不会切换】


第三种情况关闭mysql,并且重启不成功,这种情况需要发生切换

模拟重启不成功,修改mysql启动文件

把启动mysql的命令注释掉

#$basedir/bin/mysqld_safe --defaults-file=$my_cnf >/dev/null 2>&1 &

#wait_for_pid created "$!" "$mysqld_pid_file_path"; return_value=$?


mysql日志输出

2022-07-13T14:50:38.970201+08:00 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.26).

2022-07-13T14:50:48.682089+08:00 0 [System] [MY-010910] [Server] /usr/local/mysql80/bin/mysqld: Shutdown complete (mysqld 8.0.26)  MySQL Community Server - GPL.


keepalive主日志

Jul 13 14:52:24 centos7-65 Keepalived_vrrp[111579]: VRRP_Script(chk_mysql) failed (exited with status 1)

Jul 13 14:52:26 centos7-65 Keepalived_vrrp[111579]: VRRP_Instance(VI_1) Entering FAULT STATE

Jul 13 14:52:26 centos7-65 Keepalived_vrrp[111579]: VRRP_Instance(VI_1) removing protocol VIPs.

Jul 13 14:52:26 centos7-65 Keepalived_vrrp[111579]: VRRP_Instance(VI_1) Now in FAULT state


keepalive从日志

Jul 13 14:52:27 tikv3 Keepalived_vrrp[61230]: VRRP_Instance(VI_1) Transition to MASTER STATE

Jul 13 14:52:30 tikv3 Keepalived_vrrp[61230]: VRRP_Instance(VI_1) Entering MASTER STATE

Jul 13 14:52:30 tikv3 Keepalived_vrrp[61230]: VRRP_Instance(VI_1) setting protocol VIPs.

Jul 13 14:52:30 tikv3 Keepalived_vrrp[61230]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jul 13 14:52:30 tikv3 Keepalived_vrrp[61230]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on eth0 for 192.168.63.69

Jul 13 14:52:30 tikv3 Keepalived_vrrp[61230]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jul 13 14:52:30 tikv3 Keepalived_vrrp[61230]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jul 13 14:52:30 tikv3 Keepalived_vrrp[61230]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jul 13 14:52:30 tikv3 Keepalived_vrrp[61230]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jul 13 14:52:35 tikv3 Keepalived_vrrp[61230]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jul 13 14:52:35 tikv3 Keepalived_vrrp[61230]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on eth0 for 192.168.63.69

Jul 13 14:52:35 tikv3 Keepalived_vrrp[61230]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jul 13 14:52:35 tikv3 Keepalived_vrrp[61230]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jul 13 14:52:35 tikv3 Keepalived_vrrp[61230]: Sending gratuitous ARP on eth0 for 192.168.63.69

Jul 13 14:52:35 tikv3 Keepalived_vrrp[61230]: Sending gratuitous ARP on eth0 for 192.168.63.69


测试结果【vip会切换到从库】


RT_MySQL_03

主库MySQL挂掉,主从同步不一致(可能是延迟,也能是中断了),这种情况不能切换

判断的条件如下,包含了延迟和中断的情况,主从跟上了会切换

[ -z "$master_log_file" ] || [ -z "$relay_master_log_file" ] || [ -z "$read_master_log_pos" ] || [ -z "$exec_master_log_pos" ]


比较主从同步一致情况      

[ ${master_log_file} == ${relay_master_log_file} ] && [ ${read_master_log_pos} -eq ${exec_master_log_pos} ]

sysbench 造数据,从库先stop slave掉,主库造数据完成后start slave,这样模拟主从延迟现象

主从中断情况不用模拟,文件名称都不相等,不存在主从同步一致的情况

mysql关闭

2022-07-13T16:33:22.616794+08:00 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.26).

2022-07-13T16:33:24.465640+08:00 0 [System] [MY-010910] [Server] /usr/local/mysql80/bin/mysqld: Shutdown complete (mysqld 8.0.26)  MySQL Community Server - GPL.

主从检测不同步,keepalive无日志输出


主库关掉,从库不能连接主库,同步不一致

如果io线程已经把binlog日志全部同步过来,但是sql线程没有同步完成,等sql线程同步完成会触发切换

主从检测同步,keepalive输出

2022-07-13T17:18:47.134046+08:00 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.26).

2022-07-13T17:18:59.018091+08:00 0 [System] [MY-010910] [Server] /usr/local/mysql80/bin/mysqld: Shutdown complete (mysqld 8.0.26)  MySQL Community Server - GPL.


从库状态

************************* 1. row ***************************

Slave_IO_State: Reconnecting after a failed source event read

Master_Host: 192.168.63.65

Master_User: repl

Master_Port: 3307

Connect_Retry: 60

Master_Log_File: binlog.000123

Read_Master_Log_Pos: 876060889

Relay_Log_File: mysql_relay_bin6507.000039

Relay_Log_Pos: 876061098

Relay_Master_Log_File: binlog.000123

Slave_IO_Running: Connecting

Slave_SQL_Running: Yes

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table: mysql.%,performance_schema.%

Last_Errno: 0

Last_Error:

Skip_Counter: 0

Exec_Master_Log_Pos: 876060889

Relay_Log_Space: 876061355

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 2003

Last_IO_Error: error reconnecting to master 'repl@192.168.63.65:3307' - retry-time: 60 retries: 3 message: Can't connect to MySQL server on '192.168.63.65:3307' (111)

Last_SQL_Errno: 0

Last_SQL_Error:

Replicate_Ignore_Server_Ids:

Master_Server_Id: 6507

Master_UUID: c29e1834-c155-11ec-b3c0-861550fd19a2

Master_Info_File: mysql.slave_master_info

SQL_Delay: 0

SQL_Remaining_Delay: NULL

Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates

Master_Retry_Count: 86400

Master_Bind:

Last_IO_Error_Timestamp: 220713 17:20:55

Last_SQL_Error_Timestamp:

Master_SSL_Crl:

Master_SSL_Crlpath:

Retrieved_Gtid_Set:

Executed_Gtid_Set:

Auto_Position: 0

Replicate_Rewrite_DB:

Channel_Name:

Master_TLS_Version:

Master_public_key_path:

Get_master_public_key: 0

Network_Namespace:

1 row in set, 1 warning (0.00 sec)


当主从同步完成时会发生切换

keepalived切换日志

Jul 13 17:22:24 centos7-65 Keepalived_vrrp[111579]: VRRP_Script(chk_mysql) failed (exited with status 1)

Jul 13 17:22:25 centos7-65 Keepalived_vrrp[111579]: VRRP_Instance(VI_1) Entering FAULT STATE

Jul 13 17:22:25 centos7-65 Keepalived_vrrp[111579]: VRRP_Instance(VI_1) removing protocol VIPs.

Jul 13 17:22:25 centos7-65 Keepalived_vrrp[111579]: VRRP_Instance(VI_1) Now in FAULT state


从库keepalived 切换日志

Jul 13 17:22:26 tikv3 Keepalived_vrrp[73661]: VRRP_Instance(VI_1) Transition to MASTER STATE

Jul 13 17:22:29 tikv3 Keepal

数据库高可用--MySQL--高可用之keepalive+MySQL

1. 高可用方案说明MySQL高可用方案主要目标是保证MySQL主库异常宕机后从库能够快速承接业务,从而保证业务的连续性。MySQL的高可用都是基...
点击下载文档
确认删除?