os: centos 7.4
postgresql: 9.6.9
pg_rman: REL9_6_STABLE

备份就是为了恢复，如果不能恢复，那备份将毫无意义。

$ pg_rman --help
pg_rman manage backup/recovery of PostgreSQL database.

Usage:
  pg_rman OPTION init
  pg_rman OPTION backup
  pg_rman OPTION restore
  pg_rman OPTION show [DATE]
  pg_rman OPTION show detail [DATE]
  pg_rman OPTION validate [DATE]
  pg_rman OPTION delete DATE
  pg_rman OPTION purge

pg_rman restore 全恢复

删除 $PGDATA 下的文件

# systemctl stop postgresql-9.6.service 
# ps -ef|grep -i post |grep -v grep

# su - postgres
$ cd $PGDATA/..
$ rm -rf ./data

使用 pg_rman restore 使用最近的全备

$ pg_rman show detail
======================================================================================================================
 StartTime           EndTime              Mode    Data  ArcLog  SrvLog   Total  Compressed  CurTLI  ParentTLI  Status 
======================================================================================================================
2018-12-03 13:07:24  2018-12-03 13:07:29  INCR  3186kB    67MB    ----  8138kB        true       1          0  OK
2018-12-03 13:04:57  2018-12-03 13:05:18  FULL   406MB    67MB    28kB   443MB       false       1          0  OK
2018-12-03 13:01:14  2018-12-03 13:01:20  INCR  2932kB   100MB    ----    11MB        true       1          0  OK
2018-12-03 12:16:05  2018-12-03 12:16:10  INCR  3203kB    67MB    ----  8083kB        true       1          0  OK
2018-12-03 11:53:39  2018-12-03 11:54:00  FULL   405MB    83MB  2231kB   461MB       false       1          0  OK

$ pg_rman restore

WARNING: pg_controldata file "/var/lib/pgsql/9.6/data/global/pg_control" does not exist
INFO: the recovery target timeline ID is not given
INFO: use timeline ID of latest full backup as recovery target: 1
INFO: calculating timeline branches to be used to recovery target point
INFO: searching latest full backup which can be used as restore start point
INFO: found the full backup can be used as base in recovery: "2018-12-03 13:04:57"
INFO: copying online WAL files and server log files
INFO: clearing restore destination
INFO: validate: "2018-12-03 13:04:57" backup, archive log files and server log files by SIZE
INFO: backup "2018-12-03 13:04:57" is valid
INFO: restoring database files from the full mode backup "2018-12-03 13:04:57"
INFO: searching incremental backup to be restored
INFO: validate: "2018-12-03 13:07:24" backup and archive log files by SIZE
INFO: backup "2018-12-03 13:07:24" is valid
INFO: restoring database files from the incremental mode backup "2018-12-03 13:07:24"
INFO: searching backup which contained archived WAL files to be restored
INFO: backup "2018-12-03 13:07:24" is valid
INFO: restoring WAL files from backup "2018-12-03 13:07:24"
INFO: restoring online WAL files and server log files
INFO: generating recovery.conf
INFO: restore complete
HINT: Recovery will start automatically when the PostgreSQL server is started.

注意下面这个日志，全备份+增量备份
INFO: found the full backup can be used as base in recovery: “2018-12-03 13:04:57”

INFO: searching incremental backup to be restored
INFO: validate: “2018-12-03 13:07:24” backup and archive log files by SIZE

查看 $PGDATA

$ cd $PGDATA/
$ ls -l
total 60
-rw-r--r--  1 postgres postgres   215 Dec  3 14:26 backup_label
drwx------ 11 postgres postgres   123 Dec  3 14:26 base
drwx------  2 postgres postgres  4096 Dec  3 14:26 global
drwx------  2 postgres postgres    18 Dec  3 14:26 pg_clog
drwx------  2 postgres postgres     6 Dec  3 14:26 pg_commit_ts
drwx------  2 postgres postgres     6 Dec  3 14:26 pg_dynshmem
-rw-------  1 postgres postgres  4260 Dec  3 14:26 pg_hba.conf
-rw-------  1 postgres postgres  1636 Dec  3 14:26 pg_ident.conf
drwx------  2 postgres postgres     6 Dec  3 14:26 pg_log
drwx------  4 postgres postgres    68 Dec  3 14:26 pg_logical
drwx------  4 postgres postgres    36 Dec  3 14:26 pg_multixact
drwx------  2 postgres postgres    18 Dec  3 14:26 pg_notify
drwx------  2 postgres postgres     6 Dec  3 14:26 pg_replslot
drwx------  2 postgres postgres     6 Dec  3 14:26 pg_serial
drwx------  2 postgres postgres     6 Dec  3 14:26 pg_snapshots
drwx------  2 postgres postgres     6 Dec  3 14:26 pg_stat
drwx------  2 postgres postgres     6 Dec  3 14:26 pg_stat_tmp
drwx------  2 postgres postgres    18 Dec  3 14:26 pg_subtrans
drwx------  2 postgres postgres     6 Dec  3 14:26 pg_tblspc
drwx------  2 postgres postgres     6 Dec  3 14:26 pg_twophase
-rw-------  1 postgres postgres     4 Dec  3 14:26 PG_VERSION
drwx------  3 postgres postgres    28 Dec  3 14:26 pg_xlog
-rw-------  1 postgres postgres    88 Dec  3 14:26 postgresql.auto.conf
-rw-------  1 postgres postgres 22454 Dec  3 14:26 postgresql.conf
-rw-------  1 postgres postgres    60 Dec  3 14:26 postmaster.opts
-rw-r--r--  1 postgres postgres   118 Dec  3 14:26 recovery.conf

$ cat recovery.conf 
# recovery.conf generated by pg_rman 1.3.7
restore_command = 'cp /mnt/walbackup/%f %p'
recovery_target_timeline = '1'

查看 recovery.conf，恢复完成后就直接成了master，建议修改为如下内容

restore_command = 'cp /mnt/walbackup/%f %p'
recovery_target_timeline = '1'
recovery_target_action = 'pause' 
standby_mode = on

启动postgresql，观察日志

# systemctl start postgresql-9.6.service

# tail -f /var/lib/pgsql/9.6/data/pg_log/postgresql-2018-12-03.csv

2018-12-03 14:47:28.672 CST,,,27907,,5c04d180.6d03,1,,2018-12-03 14:47:28 CST,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
2018-12-03 14:47:28.673 CST,,,27910,,5c04d180.6d06,1,,2018-12-03 14:47:28 CST,,0,LOG,00000,"database system was interrupted; last known up at 2018-12-03 13:07:24 CST",,,,,,,,,""
2018-12-03 14:47:29.454 CST,,,27910,,5c04d180.6d06,2,,2018-12-03 14:47:28 CST,,0,LOG,00000,"entering standby mode",,,,,,,,,""
2018-12-03 14:47:29.468 CST,,,27910,,5c04d180.6d06,3,,2018-12-03 14:47:28 CST,,0,LOG,00000,"restored log file ""0000000100000003000000D9"" from archive",,,,,,,,,""
2018-12-03 14:47:29.699 CST,,,27910,,5c04d180.6d06,4,,2018-12-03 14:47:28 CST,1/0,0,LOG,00000,"redo starts at 3/D9000028",,,,,,,,,""
2018-12-03 14:47:29.701 CST,,,27910,,5c04d180.6d06,5,,2018-12-03 14:47:28 CST,1/0,0,LOG,00000,"consistent recovery state reached at 3/D9000130",,,,,,,,,""
2018-12-03 14:47:29.701 CST,,,27907,,5c04d180.6d03,2,,2018-12-03 14:47:28 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,,""
2018-12-03 14:47:29.718 CST,,,27910,,5c04d180.6d06,6,,2018-12-03 14:47:28 CST,1/0,0,LOG,00000,"restored log file ""0000000100000003000000DA"" from archive",,,,,,,,,""
2018-12-03 14:47:29.899 CST,,,27910,,5c04d180.6d06,7,,2018-12-03 14:47:28 CST,1/0,0,LOG,00000,"restored log file ""0000000100000003000000DB"" from archive",,,,,,,,,""
2018-12-03 14:47:30.138 CST,,,27910,,5c04d180.6d06,8,,2018-12-03 14:47:28 CST,1/0,0,LOG,00000,"restored log file ""0000000100000003000000DC"" from archive",,,,,,,,,""
2018-12-03 14:57:29.795 CST,,,27913,,5c04d181.6d09,1,,2018-12-03 14:47:29 CST,,0,LOG,00000,"restartpoint starting: time",,,,,,,,,""
2018-12-03 14:57:30.088 CST,,,27913,,5c04d181.6d09,2,,2018-12-03 14:47:29 CST,,0,LOG,00000,"restartpoint complete: wrote 0 buffers (0.0%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.293 s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB, estimate=32768 kB",,,,,,,,,""
2018-12-03 14:57:30.088 CST,,,27913,,5c04d181.6d09,3,,2018-12-03 14:47:29 CST,,0,LOG,00000,"recovery restart point at 3/DB000220","last completed transaction was at log time 2018-12-03 13:07:27.977677+08",,,,,,,,""

“2018-12-03 13:04:57” 基础备份lsn信息

# result
TIMELINEID=1
START_LSN=3/d7000028
STOP_LSN=3/d7000130
START_TIME='2018-12-03 13:04:57'
END_TIME='2018-12-03 13:05:18'
RECOVERY_XID=11034
RECOVERY_TIME='2018-12-03 13:05:17'

“2018-12-03 13:07:24” 增量备份lsn信息

# result
TIMELINEID=1
START_LSN=3/d9000028
STOP_LSN=3/d9000130
START_TIME='2018-12-03 13:07:24'
END_TIME='2018-12-03 13:07:29'
RECOVERY_XID=11038
RECOVERY_TIME='2018-12-03 13:07:27'

可以看到日志输出 consistent recovery state reached at 3/D9000130 对应 “2018-12-03 13:07:24” 增量备份的 STOP_LSN=3/d9000130，紧接着继续应用 wal

有时候restore后启动会碰到如下错误：

invalid primary checkpoint record
invalid secondary checkpoint record
could not locate a valid checkpoint record

此时只能重置xlog，并取消恢复模式

$ pg_resetxlog -f $PGDATA
$ mv $PGDATA/recovery.conf $PGDATA/recovery.done

pg_rman restore 恢复到指定时间

这种恢复一般是用于误操作删除了某个表、函数等。需要通过异机恢复到删除前的时间。

$ pg_rman --help

Restore options:
  --recovery-target-time    time stamp up to which recovery will proceed
  --recovery-target-xid     transaction ID up to which recovery will proceed
  --recovery-target-inclusive whether we stop just after the recovery target
  --recovery-target-timeline  recovering into a particular timeline
  --hard-copy                 copying archivelog not symbolic link

这几个参数和 recovery.conf文件中的参数的意思是一致的，可以具体参考 $PGHOME/share/postgresql.conf.sample

–recovery-target-timeline TIMELINE
指定恢复的时间线，不指定，则用$PGDATA/global/pg_control)中的时间线。

–recovery-target-time TIMESTAMP
指定恢复到哪个时间。不指定，则一直持续恢复到最后的时间。

–recovery-target-xid XID
指定恢复到哪个事务ID(XID)，不指定，则一直持续恢复到最后的XID。

–recovery-target-inclusive
前面指定的恢复点(recovery-target-time、recovery-target-xid)，恢复时是刚好包含这个点，还是刚好在这个点之前停掉，默认是包含这个点（即设置为true的情况）
其实就是数学里包含、不包含指定点的意思

–hard-copy
如果没有指定这个参数，pg_rman实际上是把在归档目录中建一个软链接指向恢复中要用到的WAL日志文件。如果指定了这个参数，则执行拷贝。
强烈建议使用 --hard-copy 方式

个人理解，不管指不指定 --hard-copy 都调整了wal归档文件，这样很不好。应该额外指定个文件夹，将恢复所需的wal归档文件拷贝到指定的文件夹里。

$ pg_rman restore --recovery-target-timeline='1' --recovery-target-time='2018-12-03 13:02:20' --hard-copy

由于使用了 --hard-copy ，发现wal归档目录 /mnt/walbackup 有的 wal 文件都被覆盖了。而不是调整为 ln -s 的方式指向 pg_rman 全量备份和增量备份。

$ vi $PGDATA/recovery.conf
restore_command = 'cp /mnt/walbackup/%f %p'
recovery_target_time = '2018-12-03 13:02:20'
recovery_target_timeline = '1'
recovery_target_action = 'pause' 
standby_mode = on

# systemctl start postgresql-9.6.service

# tail -f /var/lib/pgsql/9.6/data/pg_log/postgresql-2018-12-03.csv
2018-12-03 15:31:07.051 CST,,,31159,,5c04dbba.79b7,1,,2018-12-03 15:31:06 CST,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
2018-12-03 15:31:07.074 CST,,,31162,,5c04dbbb.79ba,1,,2018-12-03 15:31:07 CST,,0,LOG,00000,"database system was interrupted; last known up at 2018-12-03 13:01:15 CST",,,,,,,,,""
2018-12-03 15:31:07.955 CST,,,31162,,5c04dbbb.79ba,2,,2018-12-03 15:31:07 CST,,0,LOG,00000,"starting point-in-time recovery to 2018-12-03 13:02:20+08",,,,,,,,,""
2018-12-03 15:31:07.992 CST,,,31162,,5c04dbbb.79ba,3,,2018-12-03 15:31:07 CST,,0,LOG,00000,"restored log file ""0000000100000003000000D5"" from archive",,,,,,,,,""
2018-12-03 15:31:08.166 CST,,,31162,,5c04dbbb.79ba,4,,2018-12-03 15:31:07 CST,1/0,0,LOG,00000,"redo starts at 3/D5000028",,,,,,,,,""
2018-12-03 15:31:08.168 CST,,,31162,,5c04dbbb.79ba,5,,2018-12-03 15:31:07 CST,1/0,0,LOG,00000,"consistent recovery state reached at 3/D50000F8",,,,,,,,,""
2018-12-03 15:31:08.169 CST,,,31159,,5c04dbba.79b7,2,,2018-12-03 15:31:06 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,,""
2018-12-03 15:31:08.183 CST,,,31162,,5c04dbbb.79ba,6,,2018-12-03 15:31:07 CST,1/0,0,LOG,00000,"restored log file ""0000000100000003000000D6"" from archive",,,,,,,,,""
2018-12-03 15:31:08.352 CST,,,31162,,5c04dbbb.79ba,7,,2018-12-03 15:31:07 CST,1/0,0,LOG,00000,"restored log file ""0000000100000003000000D7"" from archive",,,,,,,,,""
2018-12-03 15:31:08.569 CST,,,31162,,5c04dbbb.79ba,8,,2018-12-03 15:31:07 CST,1/0,0,LOG,00000,"restored log file ""0000000100000003000000D8"" from archive",,,,,,,,,""
2018-12-03 15:31:08.786 CST,,,31162,,5c04dbbb.79ba,9,,2018-12-03 15:31:07 CST,1/0,0,LOG,00000,"recovery stopping before commit of transaction 11034, time 2018-12-03 13:05:17.729183+08",,,,,,,,,""
2018-12-03 15:31:08.786 CST,,,31162,,5c04dbbb.79ba,10,,2018-12-03 15:31:07 CST,1/0,0,LOG,00000,"recovery has paused",,"Execute pg_xlog_replay_resume() to continue.",,,,,,,""

参考：
https://github.com/ossc-db/pg_rman/tree/master
http://ossc-db.github.io/pg_rman/index.html
https://travis-ci.org/ossc-db/pg_rman

postgresql 物理备份 pg_rman 之三 recovery

pg_rman restore 全恢复

pg_rman restore 恢复到指定时间

猜你喜欢