Bare metal disaster recovery with Bareos

Bare metal disaster recovery with Bareos

Bare metal disaster recovery with Bareos

Bare metal recovery, sometimes called bare metal restore, is a backup restoration method that allows sysadmins to restore a server to a previous state without having to install an operating system or certain software first. Even though the word metal is mentioned, this backup restoration method can be used on both physical and virtual servers.

Here at Sysbee, we prefer to use open source software whenever possible, and backup systems are no exception. Our backup solution of choice is Bareos. If the name sounds familiar, that’s probably because we’ve recently written about how to use Bareos with S3 storage backend.

Bare metal restore method usually involves booting a server into some sort of rescue console or live OS environment, where an restore agent or similar bare metal restore software is readily available to perform the restoration process. Bareos doesn’t have its own bare metal restore agent. For this purpose, we use another open source tool – Relax-and-Recover (Rear). Rear isn’t designed specifically for Bareos. It’s a standalone utility that integrates nicely with many other backup solutions, such as Bacula, CommVault Galaxy, HP DataProtector, SEP Sesam, Symantec NetBackup, EMC NetWorker (Legato), FDR/Upstream, and IBM Tivoli Storage Manager.

Essentially, Rear collects various information about your hard drives, such as partition tables, Linux software RAID, LVM, encrypted volumes (LUKS), DRDB, multipath disks, HP SmartArray controllers, etc. With that information, Rear is able to recreate the complete filesystem layout prior to backup restoration. If you’re wondering how’s Rear able to do that in a scenario where the server has experienced data loss and it’s unable to boot the operating system, the answer is – bootable recovery media.

Rear can create a variety of bootable rescue media types: ISO (ISO9660), raw disk image (.raw.gz), USB (using extlinux), and OBDR tape. The recovery media can be stored both locally and, crucially, to a remote destination via SFTP, FTP(S), HTTP(S), HFTP, and Rsync protocol to name a few.

Configuring Relax-and-Recover

Our customers use different infrastructure providers and server types, which means that we deal with pyhsical and virtual servers. Some are hosted on our infrastructure, some are on-premises, and others are on DigitalOcean, Vultr, AWS, etc. To make our life easier, we opted for universal Rear setup, that would allow us to have near-identical disaster recovery procedure, regardless of the infrastructure provider and server type.

We decided to configure Rear to configure GRUB 2 menu entry and to store bootable ISO locally (in default /var/lib/rear/output/ directory). The configuration is very simple.

/etc/rear/site.conf

BACKUP=BAREOS
BAREOS_FILESET=LinuxAll
GRUB_RESCUE=1
BAREOS_RECOVERY_MODE=manual

BACKUP variable instructs Rear to integrate with the Bareos backup system. BAREOS_FILESET, as the name suggests, tells Rear to use Bareos fileset named LinuxAll. GRUB_RESCUE tells Rear to add GRUB 2 menu entry for easy access to recovery console, and BAREOS_RECOVERY_MODE instructs Rear that we’ll manually select backup restore point during bare metal restore process. If the last parameter is ommited, Rear will automatically trigger restoration of the latest Bareos backup for that particular host, which in some cases is not ideal.

Sometime partitions and filesystems change. E.g. a partition or logical volume may be resized to ensure enough capacity for user data. It’s therefore important to regularly rebuild recovery media, so that it contains up-to-date information required for recreating filesystems. Thankfully, Rear offers a dead simple solution for that – a cron job that checks if there were any changes in terms of storage configuration. If there were any changes, recovery image is rebuilt. Here’s an example cron job that runs every dat at 20:00:

0 20 * * * root rear checklayout || rear mkrescue

OK, so we’ve enabled GRUB integration and stored recovery media locally. If the server fails to boot, but we’re able to reach GRUB, we can still access Rear recovery console by choosing the appropriate boot menu item.
But what if the server failure is more serious and we can’t reach GRUB at all? How are we going get to the recovery ISO image? Well, we have Bareos for that. Since recovery image is saved locally, it means that it’s backed up with Bareos just like any other file on the system.

In case of serious failure, we can get to the recovery ISO image by restoring it to an alternate location. We can then use the ISO to boot the server into recovery mode. Depending on the server and infrastructure provider, we can mount the image using remote management utility such as IPMI and iDRAC, attach the image directly to the VM, or boot the recovery image via iPXE protocol.

Since our clients mostly use virtual machines, we tend to use iPXE method the most. We found it be flexible and applicable across different infrastructure providers.

Performing the bare metal restore

Once you boot into the Rear console, the bare metal restore process is pretty straightforward. The restore process is initiated by executing the rear recover command. Here’s an example output:

RESCUE sysbee-reartest:~ # rear recover
Relax-and-Recover 2.6 / Git
Running rear recover (PID 508)
Using log file: /var/log/rear/rear-sysbee-reartest.log
Running workflow recover within the ReaR rescue/recovery system
Will do driver migration (recreating initramfs/initrd)
Comparing disks
Device sda has expected (same) size 81923145728 bytes (will be used for 'recover')
Disk configuration looks identical
Proceed with 'recover' (yes) otherwise manual disk layout configuration is enforced
(default 'yes' timeout 30 seconds)

User confirmed to proceed with 'recover'
Start system layout restoration.
Disk '/dev/sda': creating 'gpt' partition table
Disk '/dev/sda': creating partition number 1 with name ''sda1''
Disk '/dev/sda': creating dummy partition number 2 with name 'dummy2' (will be deleted later)
Disk '/dev/sda': creating dummy partition number 3 with name 'dummy3' (will be deleted later)
Disk '/dev/sda': creating dummy partition number 4 with name 'dummy4' (will be deleted later)
Disk '/dev/sda': creating dummy partition number 5 with name 'dummy5' (will be deleted later)
Disk '/dev/sda': creating dummy partition number 6 with name 'dummy6' (will be deleted later)
Disk '/dev/sda': creating dummy partition number 7 with name 'dummy7' (will be deleted later)
Disk '/dev/sda': creating dummy partition number 8 with name 'dummy8' (will be deleted later)
Disk '/dev/sda': creating dummy partition number 9 with name 'dummy9' (will be deleted later)
Disk '/dev/sda': creating dummy partition number 10 with name 'dummy10' (will be deleted later)
Disk '/dev/sda': creating dummy partition number 11 with name 'dummy11' (will be deleted later)
Disk '/dev/sda': creating dummy partition number 12 with name 'dummy12' (will be deleted later)
Disk '/dev/sda': creating dummy partition number 13 with name 'dummy13' (will be deleted later)
Disk '/dev/sda': creating partition number 14 with name ''sda14''
Disk '/dev/sda': creating partition number 15 with name ''sda15''
Disk '/dev/sda': deleting dummy partition number 2
Disk '/dev/sda': deleting dummy partition number 3
Disk '/dev/sda': deleting dummy partition number 4
Disk '/dev/sda': deleting dummy partition number 5
Disk '/dev/sda': deleting dummy partition number 6
Disk '/dev/sda': deleting dummy partition number 7
Disk '/dev/sda': deleting dummy partition number 8
Disk '/dev/sda': deleting dummy partition number 9
Disk '/dev/sda': deleting dummy partition number 10
Disk '/dev/sda': deleting dummy partition number 11
Disk '/dev/sda': deleting dummy partition number 12
Disk '/dev/sda': deleting dummy partition number 13
Disk '/dev/sda': resizing partition number 14 to original size
Creating filesystem of type ext4 with mount point / on /dev/sda1.
Mounting filesystem /
Creating filesystem of type vfat with mount point /boot/efi on /dev/sda15.
Mounting filesystem /boot/efi
Disk layout created.

Once disk layout is created, Rear will present us with bconsole access because we opted for manual Bareos restore method:

The system is now ready for a restore via Bareos. bconsole will be started for
you to restore the required files. It's assumed that you know what is necessary
to restore - typically it will be a full backup. 
                                                                                
Do not exit 'bconsole' until all files are restored
                                        
WARNING: The new root is mounted under '/mnt/local'.
                                                                                
Press ENTER to start bconsole

Note: when you press enter to start bconsole, you may experience the bug where the shell output is completely redirected to /var/log/rear/rear-$HOSTNAME.log log file. This means that you won’t be able to see your input nor your output.
To circumvent this bug, after you press Enter key to start a “blind” bconsole session, open a second SSH session in another terminal window and start bconsole. Alternatively, restore backup using the Bareos web UI, because at this point bareos-fd is already running in the background.
It’s important to leave the first SSH session open the whole time.

Here’s an example of manual backup restore using bconsole:

RESCUE sysbee-reartest:~ # bconsole
Connecting to Director bareos-dir:9101
 Encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3
1000 OK: bareos-dir Version: 22.1.0 (13 June 2023)
Bareos subscription release.
Support available on https://www.bareos.com/support/
You are logged in as: sysbee-reartest.example

Enter a period (.) to cancel a command.
*restore
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog"

First you select one or more JobIds that contain files
to be restored. You will be presented several methods
of specifying the JobIds. Then you will be allowed to
select which files from those JobIds are to be restored.

To select the JobIds, you have the following choices:
 1: List last 20 Jobs run
 2: List Jobs where a given File is saved
 3: Enter list of comma separated JobIds to select
 4: Enter SQL list command
 5: Select the most recent backup for a client
 6: Select backup for a client before a specified time
 7: Enter a list of files to restore
 8: Enter a list of files to restore before a specified time
 9: Find the JobIds of the most recent backup for a client
10: Find the JobIds for a backup for a client before a specified time
11: Enter a list of directories to restore for found JobIds
12: Select full restore to a specified Job date
13: Cancel
Select item:  (1-13): 9
Automatically selected Client: sysbee-reartest.example
Automatically selected FileSet: SysbeeLinuxAll
+-------+-------+----------+---------------+---------------------+---------------------------------------+
| jobid | level | jobfiles | jobbytes      | starttime           | volumename                            |
+-------+-------+----------+---------------+---------------------+---------------------------------------+
| 28154 | F     |   66,378 | 1,610,109,258 | 2023-10-11 13:29:32 | sysbee-reartest.example-cons-1806 |
| 28263 | I     |   66,378 |             0 | 2023-10-12 22:13:50 | sysbee-reartest.example-inc-1814  |
+-------+-------+----------+---------------+---------------------+---------------------------------------+
To select the JobIds, you have the following choices:
 1: List last 20 Jobs run
 2: List Jobs where a given File is saved
 3: Enter list of comma separated JobIds to select
 4: Enter SQL list command
 5: Select the most recent backup for a client
 6: Select backup for a client before a specified time
 7: Enter a list of files to restore
 8: Enter a list of files to restore before a specified time
 9: Find the JobIds of the most recent backup for a client
10: Find the JobIds for a backup for a client before a specified time
11: Enter a list of directories to restore for found JobIds
12: Select full restore to a specified Job date
13: Cancel
Select item:  (1-13): 2815
28154 JobId(s), comma separated, to restore: 28154
You have selected the following JobId: 28154

Building directory tree for JobId(s) 28154 ...  ++++++++++++++++++++++++++++++++++++++++++++
59,126 files inserted into the tree.

You are now entering file selection mode where you add (mark) and
remove (unmark) files to be restored. No files are initially added, unless
you used the "all" keyword on the command line.
Enter "done" to leave this mode.

cwd is: /
$ mark /*
66,378 files marked.

$ done
Bootstrap records written to /var/lib/bareos/bareos-dir.restore.96.bsr

The job will require the following
   Volume(s)                 Storage(s)                SD Device(s)
===========================================================================

    sysbee-reartest.example-cons-1806 File                      backup_dir

Volumes marked with "*" are online.

66,378 files selected to be restored.

Using Catalog "MyCatalog"
Run Restore job
JobName:         RestoreFiles
Bootstrap:       /var/lib/bareos/bareos-dir.restore.96.bsr
Where:           /tmp/bareos-restores
Replace:         Always
FileSet:         LinuxAll
Backup Client:   sysbee-reartest.example
Restore Client:  sysbee-reartest.example
Format:          Native
Storage:         File
When:            2023-10-13 08:55:55
Catalog:         MyCatalog
Priority:        5
Plugin Options:  *None*
OK to run? (yes/mod/no): mod
Parameters to modify:
 1: Level
 2: Storage
 3: Job
 4: FileSet
 5: Restore Client
 6: Backup Format
 7: When
 8: Priority
 9: Bootstrap
10: Where
11: File Relocation
12: Replace
13: JobId
14: Plugin Options
Select parameter to modify (1-14): 10
Please enter the full path prefix for restore (/ for none): /mnt/local
Run Restore job
JobName:         RestoreFiles
Bootstrap:       /var/lib/bareos/bareos-dir.restore.96.bsr
Where:           /mnt/local
Replace:         Always
FileSet:         LinuxAll
Backup Client:   sysbee-reartest.example
Restore Client:  sysbee-reartest.example
Format:          Native
Storage:         File
When:            2023-10-13 08:55:55
Catalog:         MyCatalog
Priority:        5
Plugin Options:  *None*
Job queued. JobId=28296:
*

Important: when specifying multiple JobIds, it’s important to enter them in chronological order (from the lowest to the highest number), otherwise the merged directory tree will not be consistent!

Note: Before moving to the next step, wait for the backup restore job to finish. You can track restore progress on the backup server using the Bareos web UI.

Once the backup is restored, return to the first SSH session where you are still in the “blind” bconsole session. Press Ctrl+D to leave the “blind” bconsole session, and afterward press Ctrl+D again to exit the Rear shell (recognizable by rear> prompt). This is important, because additional restore tasks will automatically run afterwards (installation of GRUB2 bootloader, etc.).

Once the Rear has completed the recovery, reboot the server and your system should be fully recovered. 🤞

Did you know…

All our managed cloud servers come with Bareos backup at no additional cost! The servers come with 14 daily and 8 bi-weekly restore points, that allow you to restore all or just a portion of your data from up to 2 months back.

We are serious about data safety and disaster recovery. That’s why backups are encrypted in transit and at rest, and distributed to multiple EU-based regions, to ensure fast disaster recovery, even in the event of a major catastrophe.

Share this post