August 6th, 2024

Ask HN: Theory of Backups

The Tower of Hanoi and Incremental-Differential-Full methods enhance backup strategies, incorporating various backup types and emphasizing the importance of backup rotations, medium suitability, and hash verification for data integrity.

Ask HN: Theory of Backups

The discussion on backup strategies often centers around consumer-oriented solutions, but there is a deeper theoretical framework that can enhance backup practices. Two key concepts are introduced: the Tower of Hanoi (TOH) scheduling scheme and the Incremental-Differential-Full (IDF) backup method. The IDF-TOH scheme incorporates three types of backups: Incremental backups, which capture changes since the last backup; Differential backups, which record changes since the last full backup; and Full backups, which capture the entire system. The scheduling of these backups can be optimized, although the ideal frequency remains uncertain. Additionally, the IDF-TOH framework does not address the issue of backup rotations, which can lead to data corruption if not managed properly. Different storage mediums may be better suited for various backup types, with pressed CD-ROMs suggested for Full backups due to their longevity, while Solid State Drives may be more appropriate for Incremental backups. The IDF method itself may require further refinement, and incorporating hashing for backup verification is essential. While this theoretical approach may not be universally applicable, individuals using tools like rsync or borg may benefit from adopting a more robust backup strategy to minimize data loss with minimal effort.

- The Tower of Hanoi and Incremental-Differential-Full methods enhance backup strategies.

- IDF-TOH includes Incremental, Differential, and Full backups for comprehensive data protection.

- Backup rotations and medium suitability are critical considerations in backup planning.

- Hash verification is essential for ensuring backup integrity.

- The proposed strategies may appeal to users of existing backup tools seeking improved reliability.

Link Icon 8 comments
By @mikewarot - 6 months
My practically minded friend has an interesting scheme for backups. He uses clonezilla or similar products to clone the existing drives of a machine to new replacements. Then puts the old ones in a safe place. On a periodic basis.

The other normal backups are usually managed by someone else, he just does the hardware, most of the time.

His backups are tested by experience.

By @eschneider - 6 months
Rule 0 for backups: Whatever scheme you use, periodically verify your backups by restoring some files off them. Before you have a problem. You'll thank yourself eventually.
By @sandreas - 6 months
I'm missing the threat of ransomware in your explanation. The best concept does not help, if everything is always online and ransomware is able to encrypt your backups.

I personally use the following backup strategy:

- Setup an encrypted ZFS Storage in the network (e.g. TrueNAS - in my case it is Proxmox)

- Enable zfs-auto-snapshot for 15 min snapshots auto rotation (keep 24 daily, etc.)

- NEVER (!) type in the passwords of ZFS Storage permitted users on any client, that could be affected by ransomware

- Provide a user authenticated samba share to store all important data - try to prevent local storage of data

- Sync the ZFS snapshots to an external USB drive every night (I use a tasmota shelly plug and an external usb case to power off the devices if they are not needed)

  # create current snapshot
  zfs snapshot -r "$NEW_POOL_SNAP"

  # first backup
  zfs send --raw -R "$SRC_POOL@$NEW_SNAP_NAME" | pv | zfs recv -Fdu "$DST_POOL"

  # incremental backup
  zfs send --raw -RI "$BACKUP_FROM_SNAPSHOT" "$BACKUP_UNTIL_SNAPSHOT" | pv | zfs recv -Fdu "$DST_POOL"
- On Windows and macOS, backup the OS on an external drive

- Use restic to keep an additional copy of the local files and folders somewhere else

- Use a bluray burner to backup the most important stuff as a restic repository or encrypted archive (like very important documents, the best photo collections of you family, Keepass database, etc.) and put it to another location

- If cloud storage is affordable for the amount of data you have, consider using restic to store your stuff in the cloud

- From time to time try to restore a specific file from the backup and check if it worked and try to restore a full system (on an additional harddisk).

This may sound overkill, but ransomware is a pretty bad thing these days, even if you think you are not one of its targets.

By @t_believ-er873 - 6 months
Everything depends on the security compliance needs.

Regarding backup scheduler - sometimes companies need to have frequent backups due to their RPOs and RTOs, for example, if they operate in a highly regulated industry. If someone can tolerate the loss of data of two hours, then, they need to have backup performed every 2 hours, if we speak here about 8 hours (working day), so why not to have backups on a daily basis?

Regarding rotations - everything depends on a backup solution, if it provides with immutable backups, so the entire data won't be corrupted. Thus, the faster someone notices the mistake, the faster they can restore their copy. IDF helps more to decide the issue with storage - not to overload it (here also worth mentioning deduplication and compression).

By @scrapheap - 6 months
A few things that I find don't get considered when contemplating backup procedures are:

1. How long should you keep backups for - is the content of your backup covered by privacy laws that require you to not have copies of it after a certain period of time? is there a point where the content of your back up is so old that it's the logical equivalent of not having made a back up in the first place?

2. How much does your backup process cost - if it costs more to back up a system than it would cost you if you lost it, then you've got the backup process wrong (interestingly this can be affected by economies of scale)

3. What do you need to restore a backup - does your system requires bespoke hardware that might have been lost in whatever disaster you're trying to recover from?

By @dakiol - 6 months
If I want to know about Operating System, I know which books to read. What's the equivalent for backups? It seems to me I merely rely on blog posts and that's something I'm not comfortable with. There are some books out there that perhaps dedicate at most 1 chapter to backups, but those are usually outdated books and do not contain much practical information.
By @brudgers - 6 months
YAGNI is among my theories for personal backups. Data is not precious. It is a burden. When something matters, I print it. Or put it on Facebook or Youtube.

…but I never delete because the more copies of the same thing there are, the more likely it will survive. If in fact I need it, time spent searching is far shorter than tedious backup procedure.

In addition, if I have to recreate something version 2 will be better because I keep getting better at the things I do.

But that is me not you. Good luck.

By @illuminant - 6 months
Backups are like prayers, only if you make backups you won't need prayers.