Bare metal disaster recovery with Bareos
Bare metal recovery, sometimes called bare metal restore, is a backup restoration method that allows sysadmins to restore a server to a previous state without having to install an operating system or certain software first. Even though the word metal is mentioned, this backup restoration method can be used on both physical and virtual servers.
Here at Sysbee, we prefer to use open source software whenever possible, and backup systems are no exception. Our backup solution of choice is Bareos. If the name sounds familiar, that’s probably because we’ve recently written about how to use Bareos with S3 storage backend.
Bare metal restore method usually involves booting a server into some sort of rescue console or live OS environment, where an restore agent or similar bare metal restore software is readily available to perform the restoration process. Bareos doesn’t have its own bare metal restore agent. For this purpose, we use another open source tool – Relax-and-Recover (Rear). Rear isn’t designed specifically for Bareos. It’s a standalone utility that integrates nicely with many other backup solutions, such as Bacula, CommVault Galaxy, HP DataProtector, SEP Sesam, Symantec NetBackup, EMC NetWorker (Legato), FDR/Upstream, and IBM Tivoli Storage Manager.
Essentially, Rear collects various information about your hard drives, such as partition tables, Linux software RAID, LVM, encrypted volumes (LUKS), DRDB, multipath disks, HP SmartArray controllers, etc. With that information, Rear is able to recreate the complete filesystem layout prior to backup restoration. If you’re wondering how’s Rear able to do that in a scenario where the server has experienced data loss and it’s unable to boot the operating system, the answer is – bootable recovery media.
Rear can create a variety of bootable rescue media types: ISO (ISO9660), raw disk image (.raw.gz), USB (using extlinux), and OBDR tape. The recovery media can be stored both locally and, crucially, to a remote destination via SFTP, FTP(S), HTTP(S), HFTP, and Rsync protocol to name a few.
Configuring Relax-and-Recover
Our customers use different infrastructure providers and server types, which means that we deal with pyhsical and virtual servers. Some are hosted on our infrastructure, some are on-premises, and others are on DigitalOcean, Vultr, AWS, etc. To make our life easier, we opted for universal Rear setup, that would allow us to have near-identical disaster recovery procedure, regardless of the infrastructure provider and server type.
We decided to configure Rear to configure GRUB 2 menu entry and to store bootable ISO locally (in default /var/lib/rear/output/ directory). The configuration is very simple.
/etc/rear/site.conf
BACKUP=BAREOS BAREOS_FILESET=LinuxAll GRUB_RESCUE=1 BAREOS_RECOVERY_MODE=manual
BACKUP variable instructs Rear to integrate with the Bareos backup system. BAREOS_FILESET, as the name suggests, tells Rear to use Bareos fileset named LinuxAll. GRUB_RESCUE tells Rear to add GRUB 2 menu entry for easy access to recovery console, and BAREOS_RECOVERY_MODE instructs Rear that we’ll manually select backup restore point during bare metal restore process. If the last parameter is ommited, Rear will automatically trigger restoration of the latest Bareos backup for that particular host, which in some cases is not ideal.
Sometime partitions and filesystems change. E.g. a partition or logical volume may be resized to ensure enough capacity for user data. It’s therefore important to regularly rebuild recovery media, so that it contains up-to-date information required for recreating filesystems. Thankfully, Rear offers a dead simple solution for that – a cron job that checks if there were any changes in terms of storage configuration. If there were any changes, recovery image is rebuilt. Here’s an example cron job that runs every dat at 20:00:
0 20 * * * root rear checklayout || rear mkrescue
OK, so we’ve enabled GRUB integration and stored recovery media locally. If the server fails to boot, but we’re able to reach GRUB, we can still access Rear recovery console by choosing the appropriate boot menu item.
But what if the server failure is more serious and we can’t reach GRUB at all? How are we going get to the recovery ISO image? Well, we have Bareos for that. Since recovery image is saved locally, it means that it’s backed up with Bareos just like any other file on the system.
In case of serious failure, we can get to the recovery ISO image by restoring it to an alternate location. We can then use the ISO to boot the server into recovery mode. Depending on the server and infrastructure provider, we can mount the image using remote management utility such as IPMI and iDRAC, attach the image directly to the VM, or boot the recovery image via iPXE protocol.
Since our clients mostly use virtual machines, we tend to use iPXE method the most. We found it be flexible and applicable across different infrastructure providers.
Performing the bare metal restore
Once you boot into the Rear console, the bare metal restore process is pretty straightforward. The restore process is initiated by executing the rear recover command. Here’s an example output:
RESCUE sysbee-reartest:~ # rear recover Relax-and-Recover 2.6 / Git Running rear recover (PID 508) Using log file: /var/log/rear/rear-sysbee-reartest.log Running workflow recover within the ReaR rescue/recovery system Will do driver migration (recreating initramfs/initrd) Comparing disks Device sda has expected (same) size 81923145728 bytes (will be used for 'recover') Disk configuration looks identical Proceed with 'recover' (yes) otherwise manual disk layout configuration is enforced (default 'yes' timeout 30 seconds) User confirmed to proceed with 'recover' Start system layout restoration. Disk '/dev/sda': creating 'gpt' partition table Disk '/dev/sda': creating partition number 1 with name ''sda1'' Disk '/dev/sda': creating dummy partition number 2 with name 'dummy2' (will be deleted later) Disk '/dev/sda': creating dummy partition number 3 with name 'dummy3' (will be deleted later) Disk '/dev/sda': creating dummy partition number 4 with name 'dummy4' (will be deleted later) Disk '/dev/sda': creating dummy partition number 5 with name 'dummy5' (will be deleted later) Disk '/dev/sda': creating dummy partition number 6 with name 'dummy6' (will be deleted later) Disk '/dev/sda': creating dummy partition number 7 with name 'dummy7' (will be deleted later) Disk '/dev/sda': creating dummy partition number 8 with name 'dummy8' (will be deleted later) Disk '/dev/sda': creating dummy partition number 9 with name 'dummy9' (will be deleted later) Disk '/dev/sda': creating dummy partition number 10 with name 'dummy10' (will be deleted later) Disk '/dev/sda': creating dummy partition number 11 with name 'dummy11' (will be deleted later) Disk '/dev/sda': creating dummy partition number 12 with name 'dummy12' (will be deleted later) Disk '/dev/sda': creating dummy partition number 13 with name 'dummy13' (will be deleted later) Disk '/dev/sda': creating partition number 14 with name ''sda14'' Disk '/dev/sda': creating partition number 15 with name ''sda15'' Disk '/dev/sda': deleting dummy partition number 2 Disk '/dev/sda': deleting dummy partition number 3 Disk '/dev/sda': deleting dummy partition number 4 Disk '/dev/sda': deleting dummy partition number 5 Disk '/dev/sda': deleting dummy partition number 6 Disk '/dev/sda': deleting dummy partition number 7 Disk '/dev/sda': deleting dummy partition number 8 Disk '/dev/sda': deleting dummy partition number 9 Disk '/dev/sda': deleting dummy partition number 10 Disk '/dev/sda': deleting dummy partition number 11 Disk '/dev/sda': deleting dummy partition number 12 Disk '/dev/sda': deleting dummy partition number 13 Disk '/dev/sda': resizing partition number 14 to original size Creating filesystem of type ext4 with mount point / on /dev/sda1. Mounting filesystem / Creating filesystem of type vfat with mount point /boot/efi on /dev/sda15. Mounting filesystem /boot/efi Disk layout created.
Once disk layout is created, Rear will present us with bconsole access because we opted for manual Bareos restore method:
The system is now ready for a restore via Bareos. bconsole will be started for you to restore the required files. It's assumed that you know what is necessary to restore - typically it will be a full backup. Do not exit 'bconsole' until all files are restored WARNING: The new root is mounted under '/mnt/local'. Press ENTER to start bconsole
Note: when you press enter to start bconsole, you may experience the bug where the shell output is completely redirected to /var/log/rear/rear-$HOSTNAME.log log file. This means that you won’t be able to see your input nor your output.
To circumvent this bug, after you press Enter key to start a “blind” bconsole session, open a second SSH session in another terminal window and start bconsole. Alternatively, restore backup using the Bareos web UI, because at this point bareos-fd is already running in the background.
It’s important to leave the first SSH session open the whole time.
Here’s an example of manual backup restore using bconsole:
RESCUE sysbee-reartest:~ # bconsole Connecting to Director bareos-dir:9101 Encryption: TLS_CHACHA20_POLY1305_SHA256 TLSv1.3 1000 OK: bareos-dir Version: 22.1.0 (13 June 2023) Bareos subscription release. Support available on https://www.bareos.com/support/ You are logged in as: sysbee-reartest.example Enter a period (.) to cancel a command. *restore Automatically selected Catalog: MyCatalog Using Catalog "MyCatalog" First you select one or more JobIds that contain files to be restored. You will be presented several methods of specifying the JobIds. Then you will be allowed to select which files from those JobIds are to be restored. To select the JobIds, you have the following choices: 1: List last 20 Jobs run 2: List Jobs where a given File is saved 3: Enter list of comma separated JobIds to select 4: Enter SQL list command 5: Select the most recent backup for a client 6: Select backup for a client before a specified time 7: Enter a list of files to restore 8: Enter a list of files to restore before a specified time 9: Find the JobIds of the most recent backup for a client 10: Find the JobIds for a backup for a client before a specified time 11: Enter a list of directories to restore for found JobIds 12: Select full restore to a specified Job date 13: Cancel Select item: (1-13): 9 Automatically selected Client: sysbee-reartest.example Automatically selected FileSet: SysbeeLinuxAll +-------+-------+----------+---------------+---------------------+---------------------------------------+ | jobid | level | jobfiles | jobbytes | starttime | volumename | +-------+-------+----------+---------------+---------------------+---------------------------------------+ | 28154 | F | 66,378 | 1,610,109,258 | 2023-10-11 13:29:32 | sysbee-reartest.example-cons-1806 | | 28263 | I | 66,378 | 0 | 2023-10-12 22:13:50 | sysbee-reartest.example-inc-1814 | +-------+-------+----------+---------------+---------------------+---------------------------------------+ To select the JobIds, you have the following choices: 1: List last 20 Jobs run 2: List Jobs where a given File is saved 3: Enter list of comma separated JobIds to select 4: Enter SQL list command 5: Select the most recent backup for a client 6: Select backup for a client before a specified time 7: Enter a list of files to restore 8: Enter a list of files to restore before a specified time 9: Find the JobIds of the most recent backup for a client 10: Find the JobIds for a backup for a client before a specified time 11: Enter a list of directories to restore for found JobIds 12: Select full restore to a specified Job date 13: Cancel Select item: (1-13): 2815 28154 JobId(s), comma separated, to restore: 28154 You have selected the following JobId: 28154 Building directory tree for JobId(s) 28154 ... ++++++++++++++++++++++++++++++++++++++++++++ 59,126 files inserted into the tree. You are now entering file selection mode where you add (mark) and remove (unmark) files to be restored. No files are initially added, unless you used the "all" keyword on the command line. Enter "done" to leave this mode. cwd is: / $ mark /* 66,378 files marked. $ done Bootstrap records written to /var/lib/bareos/bareos-dir.restore.96.bsr The job will require the following Volume(s) Storage(s) SD Device(s) =========================================================================== sysbee-reartest.example-cons-1806 File backup_dir Volumes marked with "*" are online. 66,378 files selected to be restored. Using Catalog "MyCatalog" Run Restore job JobName: RestoreFiles Bootstrap: /var/lib/bareos/bareos-dir.restore.96.bsr Where: /tmp/bareos-restores Replace: Always FileSet: LinuxAll Backup Client: sysbee-reartest.example Restore Client: sysbee-reartest.example Format: Native Storage: File When: 2023-10-13 08:55:55 Catalog: MyCatalog Priority: 5 Plugin Options: *None* OK to run? (yes/mod/no): mod Parameters to modify: 1: Level 2: Storage 3: Job 4: FileSet 5: Restore Client 6: Backup Format 7: When 8: Priority 9: Bootstrap 10: Where 11: File Relocation 12: Replace 13: JobId 14: Plugin Options Select parameter to modify (1-14): 10 Please enter the full path prefix for restore (/ for none): /mnt/local Run Restore job JobName: RestoreFiles Bootstrap: /var/lib/bareos/bareos-dir.restore.96.bsr Where: /mnt/local Replace: Always FileSet: LinuxAll Backup Client: sysbee-reartest.example Restore Client: sysbee-reartest.example Format: Native Storage: File When: 2023-10-13 08:55:55 Catalog: MyCatalog Priority: 5 Plugin Options: *None* Job queued. JobId=28296: *
Important: when specifying multiple JobIds, it’s important to enter them in chronological order (from the lowest to the highest number), otherwise the merged directory tree will not be consistent!
Note: Before moving to the next step, wait for the backup restore job to finish. You can track restore progress on the backup server using the Bareos web UI.
Once the backup is restored, return to the first SSH session where you are still in the “blind” bconsole session. Press Ctrl+D to leave the “blind” bconsole session, and afterward press Ctrl+D again to exit the Rear shell (recognizable by rear> prompt). This is important, because additional restore tasks will automatically run afterwards (installation of GRUB2 bootloader, etc.).
Once the Rear has completed the recovery, reboot the server and your system should be fully recovered. 🤞
Did you know…
All our managed cloud servers come with Bareos backup at no additional cost! The servers come with 14 daily and 8 bi-weekly restore points, that allow you to restore all or just a portion of your data from up to 2 months back.
We are serious about data safety and disaster recovery. That’s why backups are encrypted in transit and at rest, and distributed to multiple EU-based regions, to ensure fast disaster recovery, even in the event of a major catastrophe.