一、repmgr简单简介¶

repmgr 是一套在PostgreSQL服务器集群中用于管理复制和故障转移的开源工具。它支持并增强了 PostgreSQL的内置流式复制，提供单个读/写主服务器以及一个或多个只读备用数据库，其中包含主数据库的近实时副本服务器的数据库

它提供了两个主要工具：

repmgr 用于执行管理任务的命令行工具设置备用服务器，将备用服务器提升为主服务器，切换主服务器和备用服务器，显示复制群集中服务器的状态
repmgrd 主动监视复制群集中的服务器的守护程序监视和记录复制性能，通过检测主数据库和提升最合适的备用服务器，向用户定义的群集中事件提供有关事件的通知可以执行任务的脚本，例如通过电子邮件发送警报

二、快速搭建 repmgr简单介绍¶

1、创建用户与数据库

// 创建用户 需要超级用户权限

create user repmgr with superuser password 'repmgr' connection limit 10;

// 创建元数据库

create database repmgr owner repmgr;

// repmgr会创建repmgr schema 来报错repmgr的元数据表，函数，视图等，建议设置用户repmgr的搜索路径如下 ALTER USER repmgr SET search_path TO repmgr, "$user", public;

2、配置认证文件 pg_hba.conf

# 允许用户 repmgr 通过local，127.0.0.1，pg的ip段.网段连接到replication
local replication repmgr trust
host replication repmgr 127.0.0.1/32 trust
host replication repmgr pg的ip段/24 trust

# 允许用户 repmgr 通过local，127.0.0.1，pg的ip段. 连接到repmgr schema
local repmgr repmgr trust
host repmgr repmgr 127.0.0.1/32 trust
host repmgr repmgr pg的ip段/24 trust

3、设置免密登录

使用postgres用户 ，在两台服务器上
在node1上
ssh-keygen -t rsa
ssh-copy-id postgres@192.168.10.90
ssh postgres@192.168.10.90 date

在node2上
ssh-keygen -t rsa
ssh-copy-id postgres@192.168.10.91
ssh postgres@192.168.10.91 date

// 数据库免密登录 IP:port:schema?:user:passwd
vim .pgpass
192.168.10.90:5432:repmgr:repmgr:repmgr
192.168.10.91:5432:repmgr:repmgr:repmgr

修改权限，只允许postgres用户读写
chmod 600 .pgpass

4、repmgr配置文件(主节点)

vim repmgr.conf

注意：repmgr.conf不应存储在 PostgreSQL 数据目录中，因为在设置或重新初始化 PostgreSQL 服务器时它可能会被覆盖

node_id=1
node_name='node1'
conninfo='host=192.168.10.90 port=5432 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/home/postgresql/data'

5、注册主节点

要使repmgr支持复制集群，主节点必须向repmgr注册。这将安装repmgr扩展和元数据对象，并为主服务器添加元数据记录

repmgr -f /etc/repmgr.conf primary register
输出结果会有类似如下成功信息
NOTICE: primary node record (ID: 1) registered

可以执行如下命令查看集群状态
repmgr -f /etc/repmgr.conf cluster show
Status 为running 就表示正常
当然也可以登录pg数据库 （用mrg用户登录）在mgr库下查询
select * from nodes; 结果跟上面命令一致

到目前位置主库搭建好了

6、搭建备库（克隆 standby Server）

在从节点上编辑配置文件 vim repmgr.conf

node_id=2
node_name='node2'
conninfo='host=192.168.10.91 port=5432 user=repmgr dbname=repmgr connect_timeout=2' data_directory='/home/storage/pgsql/data'

这里连接信息是备库的信息

7、克隆sandby（在没有单机pg standby的情况下需要执行）

使用参数--dry-run 检查是否可以克隆从库

repmgr -h 192.168.10.90 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --dry-run

主要检查项

检查参数 max_wal_senders 是否大于2
检查参数 wal_log_hints
检查通过会执行备份命令 pg_basebackup -l "repmgr base backup" -D /home/storage/pgsql/data -h 192.168.10.90 -p 5432 -U repmgr -X stream

检查通过之后执行clone

repmgr -h 192.168.10.90 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone

可以去验证同步是否正常

在主库查看
SELECT * FROM pg_stat_replication;

standy 上查看
SELECT * FROM pg_stat_wal_receiver;

8、注册从节点

使用以下命令注册从节点
repmgr -f /etc/repmgr.conf standby register

查看注册结果
repmgr -f /etc/repmgr.conf cluster show
正常情况会显示两条结果

9、如何进行主备机切换

repmgr -f repmgr.conf standby switchover -U repmgr --verbose
这个命令操作步骤如下
关机主库-》promoted 升级standby为master -》 启动旧master rewind为新的standby
执行成功后可以查看状态
repmgr -f repmgr.conf cluster show

10、如何结合vip或者dns方案来实现高可用切换

以上repmgr实现的切换，只是在pg层面实现了切换，但是业务如果直连的是主库ip的话，那么需要业务配合修改ip地址，这看起来很不合理

正常的一个高可用切换：应该应该要做到如下两点

高可用切换后，业务无需改变请求入口
切换之后业务应该要有重连机制，保证程序能自动恢复

方案：

repmgr + vip的方案，（业务访问 vip） repmgr + dns域名，（业务访问 dns域名）

如何实现：

在mgr脚本里 repmgr.conf 增加如下内容

#脚本内容
node_id=201
node_name='node1'
conninfo='host=192.168.10.90 port=5432 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/home/postgres/data'
replication_user='repmgr'
replication_type='physical'
repmgr_bindir='/home/postgres/soft/bin'
pg_bindir='/home/postgres/soft/bin'
monitoring_history=yes
monitor_interval_secs=5
log_level='debug'
log_file='/home/postgres/repmgr.log'
failover='automatic'
connection_check_type=ping
reconnect_attempts=3
reconnect_interval=10
promote_command='/home/postgres/repmgr_promote.sh'
follow_command='/home/postgres/repmgr_follow.sh %n'

repmgr_promote.sh

#!/bin/bash
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 202 start" >> /home/postgres/repmgr.log
/bin/ssh -t postgres@192.168.10.90 "/bin/sudo /usr/sbin/ip addr del 192.168.10.100/24 dev ens37"
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: del VIP in 202 finish" >> /home/postgres/repmgr.log
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: promote start" >> /home/postgres/repmgr.log
repmgr standby promote -f /home/postgres/repmgr.conf --log-to-file
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: promote finish" >> /home/postgres/repmgr.log
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: add VIP start" >> /home/postgres/repmgr.log
/bin/sudo /usr/sbin/ip addr add 192.168.10.100/24 dev ens37
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: add VIP finish" >> /home/postgres/repmgr.log

切换完处理的一些事项

#! /bin/bash
echo "["`date "+%Y-%m-%d %H:%M:%S"`"]: follow $1" >> /home/postgres/repmgr.log
/home/postgres/soft/bin/repmgr standby follow -f /home/postgres/repmgr.conf --upstreamnode-id=$1 --log-to-file
也可以定义你想处理的其他脚本信息

repmgr + dns域名类似

主要在 promote_command 和follow_command去实现切换自定义的hook脚本

PostgreSQL repmgr 入门：组件、部署流程与日常操作

一、repmgr简单简介¶

二、快速搭建 repmgr简单介绍¶

评论区