Table of Contents
前言
在PaaS平台上,可能会部署各种数据库给业务使用,对于生产环境来说,除了尽可能保证这些服务的正常运行以及高可用外,还需要进行备份还原来保证极端情况下的数据恢复
本文将依次介绍如下数据库提供的原生备份还原方案:
- PostgreSQL
- MongoDB
- RabbitMQ
- Redis
- MariaDB
PostgreSQL备份还原方案
PostgreSQL
是流行的关系型数据库,本文先介绍PostgreSQL
的三种备份还原方案,然后针对SQL dump
给出备份还原(bur
)步骤
PostgreSQL Backup and Restore
PostgreSQL
官方给出了三种bur
方案:
- SQL dump
- File system level backup
- Continuous archiving
各有优缺点,对比如下:
SQL dump
The idea behind this dump method is to generate a file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. PostgreSQL provides the utility program pg_dump for this purpose
SQL dump
方案原理:生成构成数据库的SQL文件,利用这些SQL语句可以还原出数据库
- Pros
- 1、pg_dump is a regular PostgreSQL client application (albeit a particularly clever one). This means that you can perform this backup procedure from any remote host that has access to the database.
- 2、pg_dump’s output can generally be re-loaded into newer versions of PostgreSQL, whereas file-level backups and continuous archiving are both extremely server-version-specific.
- 3、pg_dump is also the only method that will work when transferring a database to a different machine architecture, such as going from a 32-bit to a 64-bit server.
- 4、Dumps created by pg_dump are internally consistent, meaning, the dump represents a snapshot of the database at the time pg_dump began running. pg_dump does not block other operations on the database while it is working. (Exceptions are those operations that need to operate with an exclusive lock, such as most forms of ALTER TABLE.)
- Cons
- pg_dump does not need to dump the contents of indexes for example, just the commands to recreate them. However, taking a file system backup might be faster.
简单实用,适用于PostgreSQL
数据量小的情况下
File system level backup
An alternative backup strategy is to directly copy the files that PostgreSQL uses to store the data in the database
File system level backup
方案原理:直接拷贝PostgreSQL
数据文件,然后进行还原
- Pros
- Taking a file system backup might be faster than SQL dump
- Cons(There are two restrictions, however, which make this method impractical, or at least inferior to the pg_dump method)
- The database server must be shut down in order to get a usable backup. Half-way measures such as disallowing all connections will not work (in part because tar and similar tools do not take an atomic snapshot of the state of the file system, but also because of internal buffering within the server). Information about stopping the server can be found in Section 18.5. Needless to say, you also need to shut down the server before restoring the data.
- If you have dug into the details of the file system layout of the database, you might be tempted to try to back up or restore only certain individual tables or databases from their respective files or directories. This will not work because the information contained in these files is not usable without the commit log files, pg_xact/*, which contain the commit status of all transactions. A table file is only usable with this information. Of course it is also impossible to restore only a table and the associated pg_xact data because that would render all other tables in the database cluster useless. So file system backups only work for complete backup and restoration of an entire database cluster.
- Note that a file system backup will typically be larger than an SQL dump. (pg_dump does not need to dump the contents of indexes for example, just the commands to recreate them.) However, taking a file system backup might be faster.
不建议使用
Continuous Archiving and Point-in-Time Recovery (PITR)
At all times, PostgreSQL maintains a write ahead log (WAL) in the pg_wal/ subdirectory of the cluster’s data directory. The log records every change made to the database’s data files. This log exists primarily for crash-safety purposes: if the system crashes, the database can be restored to consistency by “replaying” the log entries made since the last checkpoint. However, the existence of the log makes it possible to use a third strategy for backing up databases: we can combine a file-system-level backup with backup of the WAL files. If recovery is needed, we restore the file system backup and then replay from the backed-up WAL files to bring the system to a current state.
Continuous Archiving and Point-in-Time Recovery (PITR)
方案原理:利用WAL
的文件系统备份,对这些WAL
文件应答进行还原
- Pros
- We do not need a perfectly consistent file system backup as the starting point. Any internal inconsistency in the backup will be corrected by log replay (this is not significantly different from what happens during crash recovery). So we do not need a file system snapshot capability, just tar or a similar archiving tool.
- Since we can combine an indefinitely long sequence of WAL files for replay, continuous backup can be achieved simply by continuing to archive the WAL files. This is particularly valuable for large databases, where it might not be convenient to take a full backup frequently.
- It is not necessary to replay the WAL entries all the way to the end. We could stop the replay at any point and have a consistent snapshot of the database as it was at that time. Thus, this technique supports point-in-time recovery: it is possible to restore the database to its state at any time since your base backup was taken.
- If we continuously feed the series of WAL files to another machine that has been loaded with the same base backup file, we have a warm standby system: at any point we can bring up the second machine and it will have a nearly-current copy of the database.
- As with the plain file-system-backup technique, this method can only support restoration of an entire database cluster, not a subset. Also, it requires a lot of archival storage: the base backup might be bulky, and a busy system will generate many megabytes of WAL traffic that have to be archived. Still, it is the preferred backup technique in many situations where high reliability is needed.
- Cons
- This approach is more complex to administer than either of the previous approaches
复杂但是有效,适用于PostgreSQL
数据量大的情况下,不需要频繁做全量备份,只需要做增量备份,减少了备份时间
SQL dump bur步骤
SQL dump backup
pg_dump dumps only a single database at a time, and it does not dump information about roles or tablespaces (because those are cluster-wide rather than per-database). To support convenient dumping of the entire contents of a database cluster, the pg_dumpall program is provided. pg_dumpall backs up each database in a given cluster, and also preserves cluster-wide data such as role and tablespace definitions. The basic usage of this command is:
pg_dumpall > dumpfile
SQL dump restore
The resulting dump can be restored with psql:
psql -f dumpfile postgres
Refs
MongoDB备份还原方案
When deploying MongoDB in production, you should have a strategy for capturing and restoring backups in the case of data loss events.
MongoDB Backup and Restore Methods
MongoDB
官方给出了四种bur
方案:
- Back Up with Atlas
MongoDB Atlas, the official MongoDB cloud service
- Back Up with MongoDB Cloud Manager or Ops Manager
MongoDB Cloud Manager is a hosted back up, monitoring, and automation service for MongoDB. MongoDB Cloud Manager supports backing up and restoring MongoDB replica sets and sharded clusters from a graphical user interface.
You can create a backup of a MongoDB deployment by making a copy of MongoDB’s underlying data files.
These filesystem snapshots, or “block-level” backup methods, use system level tools to create copies of the device that holds MongoDB’s data files. These methods complete quickly and work reliably, but require additional system configuration outside of MongoDB.
mongodump reads data from a MongoDB database and creates high fidelity BSON files which the mongorestore tool can use to populate a MongoDB database. mongodump and mongorestore are simple and efficient tools for backing up and restoring small MongoDB deployments, but are not ideal for capturing backups of larger systems.(For resilient and non-disruptive backups, use a file system or block-level disk snapshot function)
When connected to a MongoDB instance, mongodump can adversely affect mongod performance. If your data is larger than system memory, the queries will push the working set out of memory, causing page faults.
Use these tools for backups if other backup methods, such as MongoDB Cloud Manager or file system snapshots are unavailable.
NOTE
mongodump and mongorestore cannot be part of a backup strategy for 4.2+ sharded clusters that have sharded transactions in progress as these tools cannot guarantee a atomicity guarantees of data across the shards.
For 4.2+ sharded clusters with in-progress sharded transactions, for coordinated backup and restore processes that maintain the atomicity guarantees of transactions across shards, see:
For MongoDB 4.0 and earlier deployments, refer to the corresponding versions of the manual. For example:
- https://docs.mongodb.com/v4.0
- https://docs.mongodb.com/v3.6
- https://docs.mongodb.com/v3.4
Conclusion
- 1、
MongoDB Atlas
与MongoDB Cloud Manager or Ops Manager
需要特定使用场景,不通用,这里舍弃 - 2、
Underlying Data Files
是官方推荐的bur
方案,适用于大数据量的MongoDB
备份与还原,但是需要额外的系统配置,比如:LVM - 3、
mongodump
简单有效,适用于MongoDB
数据量较小的情况;另外这种方式会影响mongod
性能,而且数据量越大影响越大;mongodump
不能用于Sharded Cluster
,可用于Replica-Set
(Difference between Sharding And Replication on MongoDB)
这里由于我们的MongoDB
数据量较小,最多100M,所以采用mongodump
方案进行备份与还原
Back Up with mongodump
mongodump
The mongodump utility backs up data by connecting to a running mongod.
The utility can create a backup for an entire server, database or collection, or can use a query to backup just part of a collection.
#mongodump --host=mongodb1.example.net --port=3017 --username=user --password="pass" --out=/opt/backup/mongodump-2013-10-24
mongorestore
The mongorestore utility restores a binary backup created by mongodump. By default, mongorestore looks for a database backup in the dump/ directory.
#mongorestore --host=mongodb1.example.net --port=3017 --username=user --authenticationDatabase=admin /opt/backup/mongodump-2013-10-24
Refs
- Back Up and Restore with MongoDB Tools
- MongoDB Backup Methods
- Back Up and Restore with Filesystem Snapshots
- difference-between-sharding-and-replication-on-mongodb
RabbitMQ备份还原方案
RabbitMQ
包含两种数据类型:Definitions (Topology)+Messages:
- Definitions (Topology):Nodes and clusters store information that can be thought of schema, metadata or topology. Users, vhosts, queues, exchanges, bindings, runtime parameters all fall into this category.
- Messages:Each node has its own data directory and stores messages for the queues that have their master hosted on that node. Messages can be replicated between nodes using queue mirroring. Messages are stored in subdirectories of the node’s data directory.
它们的生命周期不同:Definitions
通常是静态的,生命周期长;而messages
通常是动态的,频繁地从生产者流向消费者,生命周期短:
When performing a backup, first step is deciding whether to back up only definitions or the message store as well. Because messages are often short-lived and possibly transient, backing them up from under a running node is highly discouraged and can lead to an inconsistent snapshot of the data. Definitions can only be backed up from a running node.
本文针对Definitions
备份还原进行叙述
Definitions 备份
Definitions
备份一般有两种方式:
- 1、definition export/import: 将
Definitions
导出为JSON file - 2、backed up manually: 直接copy
Definitions
数据目录
官方推荐采用第一种方式,也即:definition export/import
,而这种方式也有三种具体实现:
- 1、There’s a definitions pane on the Overview page
- 2、rabbitmqadmin provides a command that exports definitions
- 3、The GET /api/definitions API endpoint can be invoked directly
这里我们用API
方式进行备份:
curl -u xxx:xxx http://xxx/api/definitions -o definitions.json
Definitions 还原
Definitions
还原和备份一样,也有两种方式,这里用definition export/import
方式:
- 1、There’s a definitions pane on the Overview page
- 2、rabbitmqadmin provides a command that imports definitions
- 3、The POST /api/definitions API endpoint can be invoked directly
这里我们用API
方式进行备份:
curl -H "Content-Type: application/json" -X POST -u xxx:xxx http://xxx/api/definitions -d @definitions.json
Refs
Redis备份还原方案
Before starting this section, make sure to read the following sentence: Make Sure to Backup Your Database. Disks break, instances in the cloud disappear, and so forth: no backups means huge risk of data disappearing into /dev/null.
Backing up Redis data
Redis is very data backup friendly since you can copy RDB files while the database is running: the RDB is never modified once produced, and while it gets produced it uses a temporary name and is renamed into its final destination atomically using rename(2) only when the new snapshot is complete.
This means that copying the RDB file is completely safe while the server is running. This is what we suggest:
- Create a cron job in your server creating hourly snapshots of the RDB file in one directory, and daily snapshots in a different directory.
- Every time the cron script runs, make sure to call the find command to make sure too old snapshots are deleted: for instance you can take hourly snapshots for the latest 48 * hours, and daily snapshots for one or two months. Make sure to name the snapshots with data and time information.
- At least one time every day make sure to transfer an RDB snapshot outside your data center or at least outside the physical machine running your Redis instance.
按照官方推荐的方式对Redis
进行备份:
# generate dump.rdb
$redis-cli
127.0.0.1:6379> auth xxx
OK
127.0.0.1:6379> config get dir
1) "dir"
2) "/var/lib/redis"
127.0.0.1:6379> bgsave
Disaster recovery
Disaster recovery in the context of Redis is basically the same story as backups, plus the ability to transfer those backups in many different external data centers. This way data is secured even in the case of some catastrophic event affecting the main data center where Redis is running and producing its snapshots.
Since many Redis users are in the startup scene and thus don’t have plenty of money to spend we’ll review the most interesting disaster recovery techniques that don’t have too high costs.
- Amazon S3 and other similar services are a good way for implementing your disaster recovery system. Simply transfer your daily or hourly RDB snapshot to S3 in an encrypted form. You can encrypt your data using gpg -c (in symmetric encryption mode). Make sure to store your password in many different safe places (for instance give a copy to the most important people of your organization). It is recommended to use multiple storage services for improved data safety.
- Transfer your snapshots using SCP (part of SSH) to far servers. This is a fairly simple and safe route: get a small VPS in a place that is very far from you, install ssh there, and generate an ssh client key without passphrase, then add it in the authorized_keys file of your small VPS. You are ready to transfer backups in an automated fashion. Get at least two VPS in two different providers for best results.
按照官方推荐的方式对Redis
进行还原,分两种场景:
- 一、AOF = no
# In case it is not AOF mode. ('no'), the process is quite straightforward. First, just stop Redis server (new Redis):
$ /etc/init.d/redis-server stop
# Then, remove the current dumb.rdb file (if there is one) or rename it to dump.rdb.old
$ mv /var/lib/redis/dump.rdb /var/lib/redis/dump.rdb.back
# After confirming there is no dump.rdb file name in the folder, copy the good backup you took earlier into the /var/lib/redis folder (or which folder Redis is using in your env).
$ cp /backup/redis/dump.xxxxxx.rdb /var/lib/redis/dump.rdb
# Remember to set it to be owned by Redis user/group
$ chown redis:redis dump.rdb
# Now, just start the Redis server.
$ /etc/init.d/redis-server start
- 二、AOF = yes
# The first thing to do is the same as previous case, just stop the Redis server:
$ /etc/init.d/redis-server stop
# Then, remove the current dumb.rdb file (if there is one) or rename it to dump.rdb.old:
$ mv /var/lib/redis/dump.rdb /var/lib/redis/dump.rdb.old
$ mv /var/lib/redis/appendonly.aof /var/lib/redis/appendonly.aof.old
# Copy the good backup file in and correct its permission:
$ cp /backup/redis/dump.xxxxxx.rdb /var/lib/redis/dump.rdb
$ chown redis:redis /var/lib/redis/dump.rdb
# Next the important part is to disable AOF by editing /etc/redis/redis.conf file, set appendonly as "no":
#Verify appendonlu is set to "no":
$cat /etc/redis/redis.conf |grep 'appendonly '|cut -d' ' -f2
#Next start the Redis server and run the following command to create new appendonly.aof file:
$ /etc/init.d/redis-server start
$ redis-cli bgrewriteaof
# Check the progress (0 - done, 1 - not yet):
$ redis-cli info | grep aof_rewrite_in_progress
# You should see a new appendonly.aof file.
# Next, stop the server:
$ /etc/init.d/redis-server stop
# After it finished, enable AOF again by changing appendonly in /etc/redis/redis.conf file to yes
# Then start the Redis server again:
$ /etc/init.d/redis-server start
Addition
上述方案适用于单实例Redis
。如果想要备份还原Redis
集群,则需要另外研究解决方案,会更复杂很多:参考How to efficiently backup and restore a Redis cluster?
Refs
- persistence
- how-to-back-up-and-restore-your-redis-data-on-ubuntu-14-04
- How-to-Backup-and-Restore-Open-Source-Redis
- Restoring redis cluster snapshot rdb files on new setup
MariaDB备份还原方案
MariaDB History
MariaDB is a backward compatible, binary drop-in replacement of MySQL. What this means is:
- Data and table definition files (.frm) files are binary compatible.
- All client APIs, protocols and structs are identical.
- All filenames, binaries, paths, ports, sockets, and etc… should be the same.
- All MySQL connectors work unchanged with MariaDB.
- The mysql-client package also works with MariaDB server.
In most common practical scenarios, MariaDB version 5.x.y will work exactly like MySQL 5.x.y, MariaDB follows the version of MySQL, i.e. it’s version number is used to indicate with which MySQL version it’s compatible.
MariaDB originated as a fork of MySQL by Michael “Monty” Widenius, one of the original developers of MySQL and co-founder of MySQL Ab. The MariaDB Foundation acts as the custodian of MariaDB.
The main motivation behind MariaDB was to provide a floss version of MySQL, in case Oracle goes all corporate with MySQL. It’s worth noting that Monty was vocal against MySQL acquisition (via Sun’s acquisition) by Oracle.
Although MariaDB is supposed to be compatible with MySQL, for one reason or the other there are quite a few compatibility issues and different features:
- MariaDB includes all popular open source engines,
- MariaDB claims several speed improvements over MySQL, and
- there are a few new floss extensions that MySQL lacks
Finally, the name comes from Monty’s daughter Maria (the other one being My), as MySQL is now a registered trademark of Oracle Corporation.
简单的说,MariaDB 是 MySQL fork出来的一个分支,旨在开源。MariaDB兼容了MySQL很多方面,当然也对MySQL进行了优化,开发了许多特性,这里不详细展开,具体见MariaDB vs MySQL
Backup and Restore
官方提供两种备份机制:Logical vs Physical,如下:
- Logical: Logical backups consist of the SQL statements necessary to restore the data, such as CREATE DATABASE, CREATE TABLE and INSERT.
- Physical: Physical backups are performed by copying the individual data files or directories.
优缺点如下:
- logical backups are more flexible, as the data can be restored on other hardware configurations, MariaDB versions or even on another DBMS, while physical backups cannot be imported on significantly different hardware, a different DBMS, or potentially even a different MariaDB version.
- logical backups can be performed at the level of database and table, while physical databases are the level of directories and files. In the MyISAM and InnoDB storage engines, each table has an equivalent set of files. (In versions prior to MariaDB 5.5, by default a number of InnoDB tables are stored in the same file, in which case it is not possible to backup by table. See innodb_file_per_table.)
- logical backups are larger in size than the equivalent physical backup.
- logical backups takes more time to both backup and restore than the equivalent physical backup.
- log files and configuration files are not part of a logical backup
也即:logical 更加灵活且可控,但是相比 Physical 更加占用空间和耗时
Logical Backup and Restore
对于MariaDB数据量较小的情况我们采用逻辑备份方式:
mysqldump performs a logical backup. It is the most flexible way to perform a backup and restore, and a good choice when the data size is relatively small.
For large datasets, the backup file can be large, and the restore time lengthy.
mysqldump dumps the data into SQL format (it can also dump into other formats, such as CSV or XML) which can then easily be imported into another database. The data can be imported into other versions of MariaDB, MySQL, or even another DBMS entirely, assuming there are no version or DBMS-specific statements in the dump.
mysqldump dumps triggers along with tables, as these are part of the table definition. However, stored procedures, views, and events are not, and need extra parameters to be recreated explicitly (for example, –routines and –events). Procedures and functions are however also part of the system tables (for example mysql.proc).
这里我们介绍mysqldump
工具进行Logical Backup and Restore:
# backup
$ mysqldump -uxxx -pxxx db_name > backup-file.sql
# restore-phase-1(drop db)
$ mysqladmin -uxxx -pxxx -f drop db_name
Database "xxx" dropped
# restore-phase-2(recreate db)
$ mysqladmin -uxxx -pxxx create db_name
# restore-phase-3(restore db)
$ mysql -uxxx -pxxx db_name < backup-file.sql
具体可参考mysqldump
Physical Backup and Restore
对于MariaDB数据量较大的情况我们采用物理备份方式:
Mariabackup is an open source tool provided by MariaDB for performing physical online backups of InnoDB, Aria and MyISAM tables. For InnoDB, “hot online” backups are possible. It was originally forked from Percona XtraBackup 2.3.8. It is available on Linux and Windows.
这里我们介绍Mariabackup工具进行Physical Backup and Restore:
1、Mariabackup Backup
# install Mariabackup
$ yum install MariaDB-backup
# Backing up the Database Server
$ mariabackup --backup \
--target-dir=/var/mariadb/backup/ \
--user=mariabackup --password=mypassword
$ ls /var/mariadb/backup/
aria_log.0000001 mysql xtrabackup_checkpoints
aria_log_control performance_schema xtrabackup_info
backup-my.cnf test xtrabackup_logfile
ibdata1 xtrabackup_binlog_info
2、Mariabackup Restore
- Preparing the Backup
Preparing the Backup(Before you can restore from a backup, you first need to prepare it to make the data files consistent.)
$ mariabackup --prepare \
--target-dir=/var/mariadb/backup/
- Restoring the Backup
- First, stop the MariaDB Server process.
- Then, ensure that the datadir is empty.
- Then, run Mariabackup with one of the options mentioned above:
$ mariabackup --copy-back \ --target-dir=/var/mariadb/backup/
- Then, you may need to fix the file permissions.
When Mariabackup restores a database, it preserves the file and directory privileges of the backup. However, it writes the files to disk as the user and group restoring the database. As such, after restoring a backup, you may need to adjust the owner of the data directory to match the user and group for the MariaDB Server, typically mysql for both. For example, to recursively change ownership of the files to the mysql user and group, you could execute:
$ chown -R mysql:mysql /var/lib/mysql/
- Finally, start the MariaDB Server process.
具体可参考full-backup-and-restore-with-mariabackup
Refs
总结
数据库的备份还原是生产环境需要满足的特性,本文依次介绍了Postgres,MongoDB,RabbitMQ,Redis以及MariaDB数据库的原生备份还原方案,希望对项目实际落地环境数据库的运维有所助益