How to Set MySQL Group Replication into Multi-Primary Mode

在MySQL 5.7.17版本中发布的MySQL Group Replication(后文简称为MGR)被很多人称为MySQL复制方案的正规军,可以一举取代现在的MySQL Replication,Semisynchronous replication,甚至是可以取代之前最成功的MySQL集群方案Galera。

MGR有两种模式,一种是Single-Primary,一种是Multi-Primary,单主或者多主。

在前一种模式Single-Primary中,无论集群中有多少个节点,只有一个节点允许写入,其它节点都是只读的,这个允许写入的节点被称为主节点,只有当这个主节点出现问题从集群中被踢出,才会在剩余的节点中选举出另外一个节点成为新的主节点,并且将该节点置为可写模式。

这个过程可以通过log清晰地看到。

2017-03-16T02:04:32.689278Z 0 [Note] Plugin group_replication reported: 'getstart group_id 4317e324'
2017-03-16T02:04:33.081743Z 0 [Note] Plugin group_replication reported: 'Unsetting super_read_only.'
2017-03-16T02:04:33.081756Z 28 [Note] Plugin group_replication reported: 'A new primary was elected, enabled conflict detection until the new primary applies all relay logs'

而在后一种模式Multi-Primary中,所有的节点都是主节点,都可以同时被读写,看上去这似乎更好,但是因为多主的复杂性,在功能上如果设置了多主模式,则会有一些使用的限制,比如不支持Foreign Keys with Cascading Constraints。

在多主模式下,集群中的节点退出集群,也不再会出现重新选举的动作,因为本来所有的节点都是Primary节点。

前面这些并不是本文的重点,实际上在5.7.17的官方文档中有详细地描述如何设置Single-Primary MGR的方法。
Deploying Group Replication in Single-Primary Mode

但是不确认是什么原因,却没有单独的章节来描述如何设置集群为Multi-Primary模式。只是在最后语焉不详地提及了一句:Multi-primary mode groups (members all configured with group_replication_single_primary_mode=OFF) 让读者可以知道跟group_replication_single_primary_mode参数有关。虽然确实也就是跟这个参数有关,但是文档写的这样半半拉拉也确实值得吐槽。

以下为设置Multi-Primary MGR的方法。假设集群之前已经处于Single-Primary模式。

--group_replication_single_primary_mode=ON,表示启动了Single-Primary模式,那么修改为OFF就意味着要启动Multi-Primary模式。
(root@localhost) [(none)]> show variables like 'group_replication_single_primary_mode';
+----------------------------------------------------+-------------------------------------------------+
| Variable_name                                      | Value                                           |
+----------------------------------------------------+-------------------------------------------------+
| group_replication_single_primary_mode              | ON                                              |
+----------------------------------------------------+-------------------------------------------------+

--如果MGR已经启动,则无法动态修改该参数
(root@localhost) [(none)]> set global group_replication_single_primary_mode=off;
ERROR 3093 (HY000): Cannot change into or from single primary mode while Group Replication is running.

--停止复制
(root@localhost) [(none)]> stop GROUP_REPLICATION;
Query OK, 0 rows affected (8.67 sec)

--设置单主模式参数为off
(root@localhost) [(none)]> set global group_replication_single_primary_mode=off;
Query OK, 0 rows affected (0.00 sec)

--该参数设置为ON,则禁用了在多主模式下一些可能产生未知数据冲突的操作
(root@localhost) [(none)]> set global group_replication_enforce_update_everywhere_checks=ON;
Query OK, 0 rows affected (0.00 sec)

--设置为第一个准备启动MGR(bootstrap)的节点
(root@localhost) [(none)]> SET GLOBAL group_replication_bootstrap_group=ON;
Query OK, 0 rows affected (0.00 sec)

--启动复制
(root@localhost) [(none)]> START GROUP_REPLICATION;
Query OK, 0 rows affected (1.29 sec)

--为了防止后续由于意外再启动另外一个复制组,关闭bootstrap参数
(root@localhost) [(none)]> SET GLOBAL group_replication_bootstrap_group=OFF;
Query OK, 0 rows affected (0.00 sec)

--此时可以从视图中看到整个集群只有一个节点是ONLINE
(root@localhost) [(none)]> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 72ad2062-08a3-11e7-a513-5bfce171938d | bogon       |       24801 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
1 row in set (0.00 sec)

可以加入第二个节点了。

--同样设置单主模式参数为off
(root@localhost) [(none)]> set global group_replication_single_primary_mode=off;
Query OK, 0 rows affected (0.00 sec)

--设置update检查参数为on
(root@localhost) [(none)]> set global group_replication_enforce_update_everywhere_checks=ON;
Query OK, 0 rows affected (0.00 sec)

--启动复制
(root@localhost) [(none)]> start group_replication;
Query OK, 0 rows affected (5.42 sec)

--此时检查视图,可以发现集群中已经存在两个节点
(root@localhost) [(none)]> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 9003e830-08a3-11e7-8ae3-e62d2f6366d2 | bogon       |       24802 | ONLINE       |
| group_replication_applier | 96fe2b5a-08a3-11e7-b383-d7f02f17e847 | localhost   |       24803 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
2 rows in set (0.00 sec)

--group_replication_primary_member值为空,表示启动的是Multi-Primary Mode,否则该参数显示的是单主模式中的Primary节点
(root@localhost) [(none)]> SELECT * FROM performance_schema.global_status WHERE VARIABLE_NAME='group_replication_primary_member';
+----------------------------------+----------------+
| VARIABLE_NAME                    | VARIABLE_VALUE |
+----------------------------------+----------------+
| group_replication_primary_member |                |
+----------------------------------+----------------+
1 row in set (0.00 sec)

同样的方法可以加入第三个节点,在当前版本中MGR最多支持一个集群中拥有9个节点。

如果需要在MySQL重启之后这些参数仍然生效,那么需要将这些参数加入到my.cnf文件中,一个典型的配置了MGR的my.cnf如下所示。

[mysqld]

# server configuration
datadir=/Users/Kamus/mysql_data/s2
basedir=/usr/local/Cellar/mysql/5.7.17

port=24802
socket=/tmp/s2.sock

#
# Replication configuration parameters
#
server_id=2
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW

#
# Group Replication configuration
#
transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
loose-group_replication_start_on_boot=off
loose-group_replication_local_address= "127.0.0.1:24902"
loose-group_replication_group_seeds= "127.0.0.1:24901,127.0.0.1:24902,127.0.0.1:24903"
loose-group_replication_bootstrap_group= off

#
## Group Replication configuration multi-primary mode
##
# loose-group_replication_single_primary_mode=off
# loose-group_replication_enforce_update_everywhere_checks=ON