Hi,
I’m not sure if this is the right community for my question, but as my daily driver is Linux, it feels somewhat relevant.
I have a lot of data on my backup drives, and recently added 50GB to my already 300GB of storage (I can already hear the comments about how low/high/boring that is). It’s mostly family pictures, videos, and documents since 2004, much of which has already been compressed using self-made bash scripts (so it’s Linux-related ^^).
I have a lot of data that I don’t need regular access to and won’t be changing anymore. I’m looking for a way to archive it securely, separate from my backup but still safe.
My initial thought was to burn it onto DVDs, but that’s quite outdated and DVDs don’t hold much data. Blu-ray discs can store more, but I’m unsure about their longevity. Is there a better option? I’m looking for something immutable, safe, easy to use, and that will stand the test of time.
I read about data crystals, but they seem to be still in the research phase and not available for consumers. What about using old hard drives? Don’t they need to be powered on every few months/years to maintain the magnetic charges?
What do you think? How do you archive data that won’t change and doesn’t need to be very accessible?
Cheers
I am using https://duplicati.com/ and https://www.backblaze.com/ ( use their b2 cloud storage its variable and 6$ a month for 1TB or less depending on how much you use) run a schedule beckup every night for my photos. It’s compressed and encrypted. I save a config file to my google so say if my house and server burn down. I just pull my config from google then redownload duplicati and boom pull my back up down. The whole set up backs up incremental so once you do the first back up its only changes that are uploaded. I love the whole set up.
Edit: You can also just pull files you need not the whole backup.
Assume anything you can buy has a shelf life and set a yearly reminder on your calendar to copy forward stuff more than five or so years old, if those files are of significant value to you. Or for the documents, print them out—paper has better longevity than any consumer-available electronic storage.
That being said, quality optical discs are probably the best option in terms of price to longevity ratio for the average person right now. Just keep in mind that they are not guaranteed to last forever and do need to be recopied from time to time.
(I have yet to have a DVD fail on me, but I keep them in hard plastic jewel cases in climate-controlled conditions, and I’ve probably just been lucky.)
I use LTO magnetic tape for archiving data, but unfortunately the tape drives are VERY expensive. The tape itself is relatively cheap though (this is a 5-pack at 12TB uncompressed, 30TB compressed per cardridge, totaling at 60TB uncompressed, 150TB compressed. This is a lot cheaper than hard drives, and lasts for much longer), has large storage capacity and 30+ years of shelf life. Yes, I know, LTO 9 has come out, but I won’t be upgrading, because LTO 8 works just fine for me, and is much cheaper. The drives are backwards compatible by one generation though, e.g. you can use LTO 8 tape in an LTO 9 drive.
There isn’t anything that meets your criteria.
Optical suffers from separation, hard drives break down, ssds lose their charge, tape is fantastic but has a high cost of entry.
There’s a lot of replies here, but if I were you I’d get last generation or two’s lto machine from some surplus auction and use that.
People hate being told to use magnetic tape, but it’s very reliable, long lived, pretty cost effective once you have a machine and surprisingly repairable.
What few replies are talking about is the storage conditions. If your archive can be relatively small and disconnected then you can easily meet some easy requirements for long term storage like temperature and humidity stability with a cardboard box, styrofoam cut to shape and desiccant packs (remember to rotate these!). An antifungal/antimicrobial agent on some level would be good too.
This is my day job, so I’d like to weigh in.
First of all, there’s a whole community of GLAM institutions involved in what is called Digital Preservation (try googling that specifically). Here in Germany, a lot of them have founded the Nestor Group (www.langzeitarchivierung.de) to further the case and share knowledge. Recently, Nestor had a discussion group on Personal Digital Archiving, addressing just your use case. They have set up a website at https://meindigitalesarchiv.de/ with the results. Nestor publishes mostly in German, but online translators are a thing, so I think you will be fine.
Some things that I want to address from your original post:
- Keep in mind that file formats, just like hardware and software, become obsolete over time. Think about a migration strategy for your files to a more recent format of your current format falls out of style and isn’t as widely supported anymore. I assume your photos are JPGs, which are widely not considered safe for preservation, as they decay with subsequent encoding runs and use lossy compression. A suitable replacement might be PNG, though I wouldn’t go ahead and convert my JPGs right away. For born digital photo material, uncompressed TIFF is the preferred format.
- Compression in general is considered a risk, because a damaged bit will potentially impact a larger block of compressed data. Saving a few bytes on your storage isn’t worth listing your precious memories.
- Storage media have different retention times. It’s true that magnetic tape storage has the best chances for survival, and it’s what we use for long term cold storage, but it’s prohibitively expensive for home use. Also, it’s VERY slow on random access, because tape has to be rewound to the specific location of your file before reading. If you insist on using it, format your tapes using LTFS to eliminate the need for a storage management system like IBM Spectrum Protect. The next best choice of storage media are NAS grade HDDs, which will last you upwards of five years. Using redundancy and a self correcting file system like ZFS (compression & dedup OFF!) will increase your chances of survival. Keep you hands off optical storage media; they tend to decay after a year already according top studies on the subject. Flash storage isn’t much greater either, avoid thumb drives at all cost. Quality SSD storage might last you a little longer. If you use ZFS or a comparable file system that provides snapshots, you can use that to implement immutability.
- Kudos for using Linux standard tooling; it will help other people understand your stack of anything happens to you. Digital Preservation is all about removing dependencies on specific formats, technologies and (importantly) people.
- Backup is not Digital Preservation, though I will admit that these two tend get mixed into one another in personal contexts. Backups save the state of a system at a specific point in time, DigiPres tries to preserve only data that isn’t specific to a system and tends to change very little. Also, and that is important, DigiPres tries to save context along with the actual payload, so you might want to at least save some metadata along with your photos and store them all in a structure that is made for preservation. I recommend BagIt; there’s a lot of existing tooling for creating it, it’s self-contained, secured by strong checksums and it’s an RFC.
- Keep complexity as low as possible!
- Last of all, good on you for doing SOMETHING. You don’t have to be perfect to improve your posture, and you’re on the right track, asking the right questions. Keep on going, you’re doing great.
Come back at me if you have any further questions.
You might be interested in git-annex (see the Bob use case).
It has file tracking so you can - for example - “ask” a repository at drive A where some file is, and git-annex can tell you it’s on drives C and D.
git-annex can also enforce rules like: “always have at least 3 copies of file X, and any drive will do”; “have one copy of every file at the drives in my house, and have another at the drives in my parents’ house”; or “if a file is really big, don’t store it on certain drives”.
I use external hard drives. Two of them, and they get rsynced every time something changes, so there’s a copy if one drive should fail. Once a month, I encrypt the whole shebang with gpg and send it off into an AWS bucket.
3-2-1 rule with restic. Check it out.
Checked it out, thanks. I have to figure out, how it compares to my rsync Script
Waaaaay better.
Restic allows you to make dedupe snapshots of your data. Everything is there and it’s damn hard to loose anything. I use backblaze b2 as my long term end point / offsite… some will use AWS glacier. But you don’t have to use any cloud services. You can just have a restic repository on some external drives. That’s what I use for my second copy of things. I also will do an annual backup to a hard disk that I leave with a friend for a second offsite copy.
I’ve been backing up all of my stuff like this for years now. I used to use BORG which is another great tool. But restic is more flexible with allowing multiple systems to use a single repository and has native support for things like B2 that BORG doesn’t.
We also use restic to backup control nodes for some of supercomputing clusters I manage. It’s that rock solid imho.