New backups: Arq to Wasabi

Sunday 27 August 2017

This week CrashPlan announced they were ending consumer services, so I had to replace it with something else. Backups are one of those things at the unpleasant intersection of tedious, difficult, and important.

A quick spin around the latest alternatives showed the usual spectrum of possibilities, ranging from perl hackers implementing rsync themselves, to slick consumer tools. I need to have something working well not just on my computer, but others in my family, so I went the consumerish route.

Arq backing up to Wasabi seems like a good choice for polish and price.

One thing I always struggle with: how to ensure my stuff is backed up, without needlessly copying around all the crap that ends up in my home directory that I don't need backed up. On a Mac, the ~/Library directory has all sorts of stuff that I think I don't need to copy around. Do I need these?:

  • Library/Application Support
  • Library/Caches
  • Library/Containers

I add these directories to the exclusions. Should my Dropbox folder get backed up? Isn't that what Dropbox is already doing?

Then as a developer, there's tons more to exclude. Running VirtualBox? You have have a 10Gb disk image somewhere under your home. I have something like 20,000 .pyc files. The .tox directory for coverage.py is 350Mb.

So I also exclude these:

  • .git
  • .hg
  • .svn
  • .tox
  • node_modules
  • .local
  • .npm
  • .vagrant.d
  • .vmdk
  • .bundle
  • .cache
  • .heroku
  • .rbenv
  • .gem
  • *.pyc
  • *.pyo
  • *$py.class

Of course, as a native Mac app for consumers, Arq doesn't provide a way that I can supply all these once, I have to fiddle with GUI + and - buttons, and enter them one at a time...

Lastly, some files don't seem comfortable with backups. Thunderbird's storage files are large, and while Arq copies only certain byte ranges, they still amount to about 300Mb each time. Should I even back up my email? Should I still be using Thunderbird? Too many uncertainties....

» 8 reactions

Comments

[gravatar]
Chris Warrick 7:44 PM on 27 Aug 2017

I think the list goes a bit too far. If you exclude .git, you'll lose some work (unpublished commits), and you'll have to do git gymnastics to "merge" your code files and the repo back, or — more likely — just delete what you got from the backups and clone again.

And in the event of failure, you'll want to get back to work quickly, and that includes all preferences. Case in point: Time Machine on my Mac backs up everything but disk images and downloads. I did something stupid, namely install the first High Sierra public beta. But since I had those backups, I could just copy those excluded files away, erase the drive and go back to Sierra. And everything just worked: all my repos, virtualenvs, VMs and apps. As if the upgrade never happened. And it would be pretty similar if my machine fails (save for VMs I could just rebuild in a weekend.)

(Also, if you're concerned about space on the cloud service: I've heard good things about Backblaze, they're $5/month for unlimited backups.)

[gravatar]
Aaron Meurer 8:09 PM on 27 Aug 2017

There must be some way to read the time machine excludes. It already includes things that would fill up a backup but aren't worth saving. And I know several apps add things to the default exclude list (e.g., Docker, which stores its containers in ~/Library/Containers, excludes them by default, but you can change it in the settings).

[gravatar]
Oliver Steele 8:48 PM on 27 Aug 2017

Arq (and CrashPlan) stores old versions of your files, so you can recover files that are overridden even if they aren't in under version control. If you have a personal Dropbox account and you aren't paying for Extended Version History (EVH) (formerly “Packrat”), you aren't getting this from Dropbox, and you may want to back your Dropbox files up to Arq as well.

Backblaze also doesn't store previous versions. Like Dropbox without the EVH option, it's good for recovery from lost computer / crashed disk / deleted file or folder, but not from corrupted files.

https://www.gitignore.io is is a good source of info for what you might want to ignore given your operating system and development stacks. It's too bad you can't import a .gitignore directly into Arq, but you can generate one and then scan it for candidates to enter into the Arq GUI.

I also exclude files ending with .log.

Lastly, I set PYTHONDONTWRITEBYTECODE=1 so that I never have to deal *.pyc files again.

[gravatar]
Oliver Steele 8:49 PM on 27 Aug 2017

> There must be some way to read the time machine excludes.

Arq has a “Exclude items skipped by Time Machine” option.

[gravatar]
Ivan Sagalaev 2:52 AM on 28 Aug 2017

I know the feeling! :-) At some point I decided to do away with fine grained tuning completely and back up the entire $HOME. Yes, it's going to backup a hell of a lot unnecessary bytes. So what? It mostly happens in background, it doesn't hurt much on incremental backups and on the initial one it's going to take a few more hours, but it already works overnight, so what do I care?

[gravatar]
Nick 7:55 AM on 28 Aug 2017

So the obvious question is "why exclude anything?" The obvious answer is cost (to store the backup) and performance (to transfer the data). Your list seems a bit aggressive to me personally. Is excluding that stuff saving you more than 1GB? If not, why not just back it up?

On a couple of your points:
* It's better to think of Dropbox as a sync/access product and not a backup. If a file gets corrupted, Dropbox will dutifully sync it everywhere. Yes, you can restore older versions with Dropbox, but there are limits and it might be more convenient to keep your Dropbox data versioned along with your other backups. If you're paying Dropbox and you have a lot of data, then maybe it's worth the savings.

* Yes, backup Library/Application Support. There's a lot of junk in there, but also genuine data that you create when using applications.

[gravatar]
Claudio A. Heckler 12:30 PM on 28 Aug 2017

Thanks for the suggestion, I didn't knew Arq and I like what I read so far.

Question: what is the rationale for picking Wasabi over Glacier? They are in the same price range and seem like a good promise, but OTOH it is such a new player for something as important as backups ...

Honestly curious for your thoughts :-)

[gravatar]
Ned Batchelder 10:56 PM on 28 Aug 2017

@Nick: you are right, I am perhaps too concerned about the "unneeded" data being backed up. If the data is only copied once, then I shouldn't care. But if the data changes constantly, and every hour I move a new version to backup, it will add up.

@Claudio: I can't claim to have done an in-depth analysis. Wasabi seems affordable and capable.

Add a comment:

name
email
Ignore this:
not displayed and no spam.
Leave this empty:
www
not searched.
 
Name and either email or www are required.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.