</bzexclusions><excludefname_rule plat="mac" osVers="*" ruleIsOptional="f" skipFirstCharThenStartsWith="*" contains_1="/users/username/dropbox/" contains_2="*" doesNotContain="*" endsWith="*" hasFileExtension="*" />
That is the exact path to my Dropbox folder, and I presume if I move my Dropbox folder this xml file will be updated to point to the new location. The top of the xml file states "Mandatory Exclusions: editing this file DOES NOT DO ANYTHING".
.git files seem to still be backing up on my machine, although they are hidden by default in the web restore (you must open Filters and enable Show Hidden Files). I don't see an option to show hidden files/folders in the Backblaze Restore app.
That would be nice, they'd be able to get their history back!
Not backing up .git folders however is completely unacceptable.
I have hundreds of small projects where I use git track of history locally with no remote at all. The intention is never to push it anywhere. I don't like to say these sorts of things, and I don't say it lightly when I say someone should be fired over this decision.
That aged well...
(as a side note, it's funny to see see them promoting their native C app instead of using Java as a "shortcut". What I wouldn't give for more Java apps nowadays)
JottaCloud is "unlimited" for $11.99 a month (your upload speed is throttled after 5TB).
I've been using them for a few years for backing up important files from my NAS (timemachine backups, Immich library, digitised VHS's, Proxmox Backup Server backups) and am sitting at about 3.5TB.
I’ve added restic to my backup routine, pointed at cloud files and other critical data
Regardless to the OP's issues:
- on macOS since 9.0.2.784 released in 2023 all .git folders are included in backups - Cloud drives are problematic to backup because they all use extension plugins to hide the network and your local disk only contains stubs instead of actual files. If Backblaze scans it fully it'll download everything and exhaust your disk space there's no easy solution here.
I don't buy for a minute they were trying to be "sneaky" to save some $$ I instead feel like for the majority of users they felt it was misleading to backup stubs only and would rather not brick user computers by downloading all the files. Remember they can't access your cloud disk directly so the only way they can get the file contents is by doing an fread and letting the cloud drive client sync the content on demand.
Feel free to reach out to me if you have any questions about setting up duplicati.
His daughter-in-law had gifted him a really nice new system. His old system wasn't too bad, either. He'd mostly been relying on an external USB HDD for data. He used Thunderbird for e-mail, which I am quite unfamiliar with.
As we worked on the migration, I collected all the apps and software he had been using, which he would need on the new system, and it wasn't much. I also complimented him on his "online hygiene" insofar as never clicking on suspicious links, or downloading suspicious software; his system had no malware and no shovelware, no unwanted browser bars or spyware was found.
We were completing the migration when I noticed a large discrepancy between the "new data" HDD space and the old data, but I needed to delete the old partition to complete the upgrade, and I flagged this with him: I said, "look this makes me uneasy: do you still want to move forward?" and he nodded approval, so I deleted the partition. Then we discovered that we had just lost many gigabytes of important data, such as was in his Firefox profile and his Thunderbird data, like all his email which had been downloaded locally. I turned white as a sheet and I was ready for him to sue me or something.
He was surprisingly sanguine about this, and he says, "What about Backblaze?" and I gaped at him, "You had an online backup of all this???" and he goes "Sure, here's how to install it..." and we installed his little Backblaze systray widget, and all his data began streaming back in. Nothing at all was lost, because he'd also been meticulous about using this app!
So that was the day I learned about Backblaze and their services, and I was intensely grateful to them for saving my bacon for sure, and we remained friends, and we finished the migration in one day, and he was grateful to me and my expertise, and not at all worried about the crippling data loss which I had incurred with my cavalier ignorance.
If you have a folder shared with 10 people, most likely only a few files will be accessed by others and the rest is dormant on all but one machine. Downloading and storing all these files is an expense in transfer fees and to some extent a waste of local disk space.
For that reason, cloud sync tools no longer copy everything up front, but transfer on-demand. Most tools have an option where you can choose "Make available offline" that will make a specific folder always synced.
That said, silently excluding a folder is very problematic, even if there is a good reason for it.
I work on the open-source Duplicati backup tool (https://github.com/duplicati/duplicati) and we take special care to not silently skip things as this is likely to cause problems when you want to restore later. For instance, you will get a lot of warnings if you try to make a backup of a cloud-synced folder, as the cloud-sync cannot keep up with the speed of the backup.
If you like the pricing of B2 but not the backup tool, you can use a B2 bucket (pay per usage, not flat rate) and have Duplicati back up to the bucket.
I've also configured encrypted cloud backups to a different geographic region and off-site backups to a friend's NAS (following the 3-2-1 backup rule). It does help having 2.5Gb networking as well, but owning your data is more important in the coming age of sloppy/degrading infrastructure and ransomware attacks.
1. You have to check "show hidden files" in the web ui (or the app) when restoring and
2. If you restore a folder that has a '.git' folder inside of it (by checking it in the ui) but you DID NOT check "show hidden files", then the '.git' (or any other hidden file/folder) does not get restored.
Which is.. unexpected.. if I check a folder to restore, I expect *everything* inside of it to be restored.
But the dropbox folder is, in fact, not there. Which is a surprise to me as well. :(
Technically speaking, imagine you're iterating over a million files, and some of them are 1000x slower than the others, it's not Backblaze's fault that things have gone this way. Avoiding files that are well-known network mount points is likely necessary for them to be reliable at what they do for local files.
It's important to recognize that these new OS-level filesystem hooks are slow and inefficient - the use case is opening one file and not 10,000 - and this means that things you might want to do (like recursive grep) are now unworkably slow if they don't fit in some warmed-up cache on your device.
To fix it, Backblaze would need a "cloud to cloud" backup that is optimized for that access pattern, or a checkbox (or detection system) for people who manage to keep a full local mirror in a place where regular files are fast. This is rapidly becoming a less common situation. I do, however, think that they should have informed people about the change.
The technical and performance implications of backing-up cloud mount-points are real, but that's zero excuse for the way this change was communicated.
This is a royal screw-up in corporate communications and I would not be surprised if it makes a huge negative impact in their bottom line and results in a few terminations.
We discovered this change recently because my dad was looking for a file that Dropbox accidentally overwrote which at first we said “no problem. This is why we pay for backblaze”
We had learned that this policy had changed a few months ago, and we were never notified. File was unrecoverable
If anyone at backblaze is reading this, I pay for your product so I can install you on my parents machine and never worry about it again. You decided saving on cloud storage was worth breaking this promise. Bad bad call
This is another example in disguise of two people disagreeing about what "unlimited" means in the context of backup, even if they do claim to have "no restrictions on file type or size" [2].
[1] https://www.reddit.com/r/backblaze/comments/jsrqoz/personal_... [2] https://www.backblaze.com/cloud-backup/personal
I don't quite understand why it's still like this; it's probably the biggest reason why git tends to play poorly with a lot of filesystem tools (not just backups). If it'd been something like an SQLite database instead (just an example really), you wouldn't get so much unnecessary inode bloat.
At the same time Backblaze is a backup solution. The need to back up everything is sort of baked in there. They promise to be the third backup solution in a three layer strategy (backup directly connected, backup in home, backup external), and that third one is probably the single most important one of them all since it's the one you're going to be touching the least in an ideal scenario. They really can't be excluding any files whatsoever.
The cloud service exclusion is similarly bad, although much worse. Imagine getting hit by a cryptoworm. Your cloud storage tool is dutifully going to sync everything encrypted, junking up your entire storage across devices and because restoring old versions is both ass and near impossible at scale, you need an actual backup solution for that situation. Backblaze excluding files in those folders feels like a complete misunderstanding of what their purpose should be.
I contacted the support asking WTF, "oh the file got deleted at some point, sorry for that", and they offered me 3 months of credits.
I do not trust my Backblaze backups anymore.
However, backing up these kinds of directories has always been ill-defined. Dropbox/Google Drive/etc. files are not actually present locally - at least not until you access the file or it resides to cache it. Should backup software force you to download all 1TB+ of your cloud storage? What if the local system is low on space? What if the network is too slow? What if the actually data is in an already excluded %AppData% location.
Similar issue with VCS, should you sync changes to .git every minute? Every hour? When is .git in a consistent state?
IMO .git and other VCS should just be synced X times per day and it wait for .git to be unchanged for Y minutes before syncing it. Hell, I bet Claude could write a special Git aware backup script.
But Google Drive and Dropbox mount points are not real. It’s crazy to expect backup software to handle that unless explicitly advertised.
The one thing they have to do is backup everything and when you see it in their console you can rest assured they are going to continue to back it up.
They’ve let the desktop client linger, it’s difficult to add meaningful exceptions. It’s obvious they want everyone to use B2 now.
Basically it works like this:
- I have syncthing moving files between all my devices. The larger the device, the more stuff I move there[2]. My phone only has my keepass file and a few other docs, my gaming PC has that plus all of my photos and music, etc.
- All of this ends up on a raspberry pi with a connected USB harddrive, which has everything on it. Why yes, that is very shoddy and short term! The pi is mirrored on my gaming PC though, which is awake once every day or two, so if it completely breaks I still have everything locally.
- Nightly a restic job runs, which backs up everything on the pi to an s3 compatible cloud[3], and cleans out old snapshots (30 days, 52 weeks, 60 months, then yearly)
- Yearly I test restoring a random backup, both on the pi, and on another device, to make sure there is no required knowledge stuck on there.
This is was somewhat of a pain to setup, but since the pi is never off it just ticks along, and I check it periodically to make sure nothing has broken.
[1] there is always weirdness with these tools. They don't sync how you think, or when you actually want to restore it takes forever, or they are stuck in perpetual sync cycles
[2] I sync multiple directories, broadly "very small", "small", "dumping ground", and "media", from smallest to largest.
[3] Currently Wasabi, but it really doens't matter. Restic encrypts client side, you just need to trust the provider enough that they don't completely collapse at the same time that you need backups.
I had no idea that it was such a good bargain. I used to be a Crashplan user back in the day, and I always thought Backblaze had tiered limits.
I've been using Duplicati to sync a lot of data to S3's cheapest tape-based long term storage tier. It's a serious pain in the ass because it takes hours to queue up and retrieve a file. It's a heavy enough process that I don't do anything nearly close to enough testing to make sure my backups are restorable, which is a self-inflicted future injury.
Here's the thing: I'm paying about $14/month for that S3 storage, which makes $99/year a total steal. I don't use Dropbox/Box/OneDrive/iCloud so the grievances mentioned by the author are not major hurdles for me. I do find the idea that it is silently ignoring .git folders troubling, primarily because they are indeed not listed in the exclusion list.
I am a bit miffed that we're actively prevented from backing up the various Program Files folders, because I have a large number of VSTi instruments that I'll need to ensure are rcloned or something for this to work.
I never trust them again with my data.
I know this is besides the point somewhat, but: Learn your tools people. The commit history could probably have been easily restored without involving any backup. The commits are not just instantly gone.
I've never needed to restore anything, so can't say anything about this, but once, one of my devices deleted a file in Syncthing, and I went into Backblaze to see if they have any logs of deletions/file modifications (had it disabled in syncthing).
I don't remember the exact details, but I remember clearly that I felt like the entire thing was done by a junior engineer straight out of college. Trying to understand the names of some variables used there, I stumbled upon a reddit thread where the person who worked on the client was trying to explain why things were done the way they were - and I felt like it was me in my first 3 months of software engineering.
How did Backblaze gain this trust in the first place? Is it because nobody is offering "unlimited" storage at the same price point?
But if that's truly their stance, then they are being deceptive about their non-business offering at the point of sale.
EDIT - see my other comment where I found the actual email
You should try downloading one of your backed up git repos to see if it actually does contain the full history, I just checked several and everything looks good.
But .git? It does not mean you have it synced to GitHub or anything reliable?
If you do anything then only backup the .git folder and not the checkout.
But backing up the checkout and not the .git folder is crazy.
Complete lack of communication (outside of release notes, which nobody really reads, as the article too states) is incompetence and indeed worrying.
Just show a red status bar that says "these folders will not be backed up anymore", why not?
Maybe there's something newer/better now (and I bought lifetime licenses of it long ago), but it works for me.
That said, I use Arq + Backblaze storage and I think my monthly bill is very low, like under $5. Though I haven't backed-up much media there yet, but I do have control over what is being backed-up.
If you've got huge amounts of files in Onedrive and the backup client starts downloading everyone of them (before it can reupload them again) you're going to run into problems.
But ideally, they'd give you a choice.
I know the post is talking about their personal backup product but it's the same company and so if they sneak in a reduction of service like this, as others have already commented, it erodes difficult-to-earn trust.
My understanding is that a modern, default onedrive setup will push all your onedrive folder contents to the cloud, but will not do the same in reverse -- it's totally possible to have files in your cloud onedrive, visible in your onedrive folder, but that do not exist locally. If you want to access such a file, it typically gets downloaded from onedrive for you to use.
If that's the case, what is Backblaze or another provider to do? Constantly download your onedrive files (that might have been modified on another device) and upload them to backblaze? Or just sync files that actually exist locally? That latter option certainly would not please a consumer, who would expect the files they can 'see' just get magically backed up.
It's a tricky situation and I'm not saying Backblaze handled it well here, but the whole transparent cloud storage situation thing is a bit of a mess for lots of people. If Dropbox works the same way (no guaranteed local file for something you can see), that's the same ugly situation.
However, there is a very good reason for not backing up what is in effect network attached storage. Particularly for OneDrive, as it often adds company SharePoint sites you open files from as mountpoints under your OneDrive folder (business OneDrive is basically a personal Sharepoint site under the hood). Trying to back them up would result in downloading potentially hundreds of gigabytes of files to the desktop only to them reupload them to OneDrive. That would also likely trigger data exfiltration flags at your corporate IT.
A Dropbox/OneDrive/Drive/etc folder is a network mount point by another name. (Many of them are not implemented as FUSE mounts or equivalent OS API, not folders on disk.) It's fundamentally reasonable for software that promises backing up the local disk not to backup whatever network drives you happen to have signed in/mounted.
Trying to audit—let alone change—the finer details is a pain even for power users, and there's a non-zero risk the GUI is simply lying to everybody while undocumented rules override what you specified.
When I finally switched my default boot to Linux, I found many of those offerings didn't support it, so I wrote some systemd services around Restic + Backblaze B2. It's been a real breath of fresh air: I can tell what's going on, I can set my own snapshot retention rules, and it's an order of magnitude cheaper. [2]
____
[1] Along the lines of "We have your My Documents. Oh, you didn't manually add My Videos or My Music for every user? Too bad." Or in some cases, certain big-file extensions are on the ignore list by default for no discernible reason.
[2] Currently a dollar or two a month for ~200gb. It doesn't change very much, and data verification jobs redownload the total amount once a month. I don't backn up anything I could get from elsewhere, like Steam games. Family videos are in the care of different relatives, but I'm looking into changing that.
Preferably cheap and rclone compatible.
Hetzner storagebox sounds good, what about S3 or Glacier-like options?
Try checking bzexcluderules_editable.xml. A few years ago, Backblaze would back up .git folders for Mac but not Windows. Not sure if this is still the case.
It is true that we recently updated how Backblaze Computer Backup handles cloud-synced folders. This decision was driven by a consistent set of technical issues we were seeing at scale, most of them driven by updates created by third-party sync tools, including unreliable backups and incomplete restores when backing up files managed by third-party sync providers.
To give a bit more context on the “why”: these cloud storage providers now rely heavily on OS-level frameworks to manage sync state. On Windows, for example, files are often represented as reparse points via the Cloud Files API. While they can appear local, they are still system-managed placeholders, which makes it difficult to reliably back them up as standard on-disk files.
Moreover, we built our product in a way to not backup reparse points for two reasons:
1. We wanted the backup client to be light on the system and only back up needed user-generated files. 2. We wanted the service to be unlimited, so following reparse points would lead to us backing up tons of data in the cloud
We’ve made targeted investments where we can, for example, adding support for iCloud Drive by working within Apple’s model and supporting Google Drive, but extending that same level of support to third-party providers like Dropbox or OneDrive is more complex and not included in the current version.
We are currently exploring building an add-on that either follows reparse points or backs up the tagged data in another way.
We also hear you clearly on the communication gap. Both the sync providers and Backblaze should have been more proactive in notifying customers about a change with this level of impact. Please don't hesitate to reach out to me or our support team directly if you have any questions. https://help.backblaze.com/hc/en-us/requests/new
We are here to help.
I still trust restic checksums will actually check whether restore is correct, but that way random part of storage gets tested every so often in case some old pack file gets damaged
Props for getting this implemented and seemingly trusted... I wish there was an easier way to handle some of this stuff (eg: tiny secure key material => hot syncthing => "live" git files => warm docs and photos => cold bulk movies, isos, etc)... along with selective "on demand pass through browse/fetch/cache"
They all have different policy, size, cost, technical details, and overall SLA/quality tradeoffs.
A big difference here is that Backblaze only keeps deleted/changed files for 30 days. Deleted files can go unnoticed for some time, especially if done by a malicious app or ignorant AI.
I'd pay that extra few dollars for peace of mind.
As for testing recovery, you can validate file counts, sizes + checksums without performing recovery.
A few shell scripts give you the power of advanced enterprise backup, whereas backblaze only supports GUI restores.
There are actually a lot of cheaper S3-compatible services out there, (like Backblaze B2, or Cloudflare R2). They pricing may work out to just backup to these directly. Certainly gives you far more control than Backblaze Backup.
Indeed, the commits and blobs might even have still been available on the GitHub remote, I'm not sure they clean them on some interval or something, but bunch of stuff you "delete" from git still stays in the remote regardless of what you push.
Yes, the unlimited storage is one factor. Their detailed write ups about hard drive reliability and transparency on how they build their racks (I think they essentially opened sourced the design) established a lot of credibility as well. Plus in my experience they just worked.
I paid for Backblaze for years but finally cancelled when I junked my last desktop and never got around to installing it on my laptop. I did use their restore functionality a couple of times and it was slow and kind of clunky but it worked.
I’m sad to hear they’ve started dropping stuff from the backup like this. I’ve been contemplating signing back up but most of the stuff I care about is in iCloud or OneDrive so if they aren’t backing that up, it’s pretty useless to me.
After using their website and their app for a few hours I pretty much immediately decided to not proceed with them as the software was clearly not built by a team that has great competency in software development. This was a year ago so they've had plenty of time to polish it.
You sort of let that kind of stuff pass for a hardware company, but backblaze is not a hardware company. There's more to backup than just ensuring the disks at the data centre are replaced in a timely manner.
And from comments here I don't see any other user-friendly options being proposed, it's all just suggestions to glue together some open source software with object storage and be your own sysadmin.
According to support's reply just now, my backups are crippled just like every other customer. No git, no cloud synced folders, even if those folders are fully downloaded locally.
(This is also my personal backup strategy for iCloud Drive: one Mac is set to fully download complete iCloud contents, and that Mac backs up to Backblaze.)
I, on the other hand, as a private consumer, use git for all my hobby projects and note-taking. And my language learning. Of course I do, or I couldn't keep track of what I'm doing over the years, and I wouldn't be able to sort things out. There's nothing professional there, are BB saying that if you try to do something in an orderly and controlled manner, then it's "professional" and shouldn't be backed up? If that's their stance then no wonder people are leaving BB. I for sure won't ever recommend them again.
> Bob (Backblaze Help)
> Aug 5, 2021, 11:33 PDT
> Hello there,
> Thank you for taking the time to write in,
> Unfortunately .git directories are excluded by Backblaze by default. File
> changes within .git directories occur far too often and over so many files
> that the Backblaze software simply would not be able to keep up. It's beyond
> the scope of our application.
> The Personal Backup Plan is a consumer grade backup product. Unfortunately we
> will not be able to meet your needs in this regard.
> Let me know if you have any other questions.
> Regards,
> Bob The Backblaze Team
There's no mention of .git being excluded in the Settings or on their support page (https://www.backblaze.com/computer-backup/docs/supported-bac...); they just silently decided to not back up a bunch of my files without telling me... wonderful.
It seems incredibly stupid for a BACKUP PROGRAM to not list the hidden files instead of indicating they're hidden (e.g. _(hidden)_.git)
No they are not. This is explicitly addressed in the article itself.
They don't need to be in my case, I'm only using them now because of existing shortcuts and VM shares and programs configured to source information from them. That doesn't mean I don't want them backed up.
Same for OneDrive: Microsoft configured my account for OneDrive when I set it up. Then I immediately uninstalled it (I don't want it). But I didn't notice that my desktop and documents folders live there. I hate it. But by the time I noticed it, it was already being used as a location for multiple programs that would need to be reconfigured, and it was easier to get used to it than to fix it. Several things I've forgotten about would likely break in ways I wouldn't notice for weeks/months. Multiple self-hosted servers for connecting to my android devices would need to reindex (Plex, voidtools everything, several remote systems that mount via sftp and connected programs would decide all my files were brand new and had never been seen before)
I wish lifetime licences were still sold.
Edit: on top of that I've built a custom one-page monitoring dashboard, so I see everything in one place (https://imgur.com/B3hppIW) - I'll opensource, it's decent architecture, I just need to cleanup some secrets from Git history...
I need it to capture local data, even though that local data is getting synced to Google Drive. Where we sync our data really has nothing to do with Backblaze backing up the endpoint. We don't wholly trust sync, that's why we have backup.
On my personal Mac I have iCloud Drive syncing my desktop, and a while back iCloud ate a file I was working on. Backblaze had it captured, thankfully. But if they are going to exclude iCloud Drive synced folders, and sounds like that is their intention, Backblaze is useless to me.
The deal was that Backblaze backs things up and I don't have to worry about it. Learning that it does not back things up is a punch to the gut. I am familiar with the exclusions and I have a look at that list to make I'm not missing anything from my backups. I had always thought the exclusions list was exhaustive.
Excluding other files and folders without telling me about it breaks the deal. Dropbox is important to several of the users I installed it for. Ignoring .git folders is another one that affects me and I had not known about. Ouch.
I will now have to look for alternatives. It has to be easy to install, run seamlessly on non-technical users' machines and be reliable.
I find it hard to be think of a worse breach of trust for a backup service than not to back up files!
I also want to be clear, this wasn’t about saving on storage. It came from cases where backing up cloud-synced folders (like Dropbox) was leading to unreliable or incomplete restores because of how those files are managed under the hood.
When Dropbox began using reparse points for synced files, those files no longer behaved like standard local files. Because of that, Backblaze Computer Backup can’t reliably back them up or restore them. The current behavior is focused on ensuring we only back up data we can reliably restore, and we are actively exploring ways to better support Dropbox and data touched by other sync services.
It is true that we recently updated how Backblaze Computer Backup handles cloud-synced folders. This decision was driven by a consistent set of technical issues we were seeing at scale, most of them driven by updates created by third-party sync tools, including unreliable backups and incomplete restores when backing up files managed by third-party sync providers.
To give a bit more context on the “why”: these cloud storage providers now rely heavily on OS-level frameworks to manage sync state. On Windows, for example, files are often represented as reparse points via the Cloud Files API. While they can appear local, they are still system-managed placeholders, which makes it difficult to reliably back them up as standard on-disk files.
Moreover, we built our product in a way to not backup reparse points for two reasons:
1. We wanted the backup client to be light on the system and only back up needed user-generated files. 2. We wanted the service to be unlimited, so following reparse points would lead to us backing up tons of data in the cloud
We’ve made targeted investments where we can, for example, adding support for iCloud Drive by working within Apple’s model and supporting Google Drive, but extending that same level of support to third-party providers like Dropbox or OneDrive is more complex and not included in the current version.
We are currently exploring building an add-on that either follows reparse points or backs up the tagged data in another way.
We also hear you clearly on the communication gap. Both the sync providers and Backblaze should have been more proactive in notifying customers about a change with this level of impact. Please don't hesitate to reach out to me or our support team directly if you have any questions. https://help.backblaze.com/hc/en-us/requests/new
We are here to help.
While I agree Backblaze is overdue to exclude the sparse image, in fairness to them, no other online backup solution (other than Time Machine itself) handled it correctly either, at the time I was investigating this last year.
(I'm not even sure Apple itself handles it correctly in all cases. I had to migrate to a new macbook recently, and Migration Assistant hung while transferring my files. I deleted the sparse image, tried again, and then it worked. Possibly a coincidence, I admit, but Migration Assistant reliably disappoints me every few years.)
Your backup solution is not something you ever want to be the source of surprises!
Ironically, I believe you have it backwards: pack files, git's solution to the "too many tiny files" problem, are the issue here; not the tiny files themselves.
In my experience, incremental backup software works best with many small files that never change. Scanning is usually just a matter of checking modification times and moving on. This isn't fast, but it's fast enough for backups and can be optimized by monitoring for file changes in a long-running daemon.
However, lots of mostly identical files ARE an issue for filesystems as they tend to waste a lot of space. Git solves this issue by packing these small objects into larger pack files, then compressing them.
Unfortunately, it's those pack files that cause issues for backup software: any time git "garbage collects" and creates new pack files, it ends up deleting and creating a bunch of large files filled with what looks like random data (due to compression). Constantly creating/deleting large files filled with random data wreaks havoc on incremental/deduplicating backup systems.
See Fossil (https://fossil-scm.org/)
P.S. There's also (https://www.sourcegear.com/vault/)
> SourceGear Vault Pro is a version control and bug tracking solution for professional development teams. Vault Standard is for those who only want version control. Vault is based on a client / server architecture using technologies such as Microsoft SQL Server and IIS Web Services for increased performance, scalability, and security.
https://utcc.utoronto.ca/~cks/space/blog/sysadmin/BackupTest...
You should naturally test your ordinary restore procedures (single-file, one directory, spot-checks) on the regular, and you should also form a viable disaster recovery plan, based on your projected risks. What if your house burns down? What if you're burglarized? What if your password manager loses all passwords? etc.
If you've never successfully run a disaster-recovery drill, then you don't have a plan.
"Those who fail to plan, plan to fail!"
> I contacted the support asking WTF, "oh the file got deleted at some point, sorry for that", and they offered me 3 months of credits.
This happened to me with CrashPlan for Windows many years ago, because of some Volume Shadow Copy Service thing. I noped out of there right after.
Borg backup is a good tool in my opinion and has everything that I need (deduplication, compression, mountable snapshot.
Hetzner Storage Box is nothing fancy but good enough for a backup and is sensibly cheaper for the alternatives (I pay about 10 eur/month for 5TB of storage)
Before that I was using s3cmd [3] to backup on a S3 bucket.
~ 5 years ago, I had a development flow that involved a large source tree (1-10K files, including build output) that was syncthing-ed over a residential network connection to some k8s stuff.
Desyncs/corruptions happened constantly, even though it was a one-way send.
I've never had similar issues with rsync or unison (well, I have in unison, but that's two-way sync, and it always prompted to ask for help by design).
Anyway, my decade-old synology is dying, so I'm setting up a replacement. For other reasons (mostly a decade of systemd / pulse audio finding novel ways to ruin my day, and not really understanding how to restore my synology backups), I've jumped ship over to FreeBSD. I've heard good things about using zfs to get:
saniod + syncoid -> zfs send -> zfs recv -> restic
In the absence of ZFS, I'd do:
rsync -> restic
Or:
unison <-> unison -> restic.
So, similar to what you've landed on, but with one size tier. I have docker containers that the phone talks to for stuff like calendars, and just have the source of the backup flow host my git repos.
One thing to do no matter what:
Write at least 100,000 files to the source then restore from backup (/ on a linux VM is great for this). Run rsync in dry run / checksum mode on the two trees. Confirm the metadata + contents match on both sides. I haven't gotten around to this yet with the flow I just proposed. Almost all consumer backup tools fail this test. Comments here suggest backblaze's consumer offering fails it badly. I'm using B2, but I haven't scrubbed my backup sets in a while. I get the impression it has much higher consistency / durability.
I do notice in their GUI that they offer a >30 memory for extra $:
30-Day Version History (Current) Included
1-Year Version History
Forever Version History $.006/GB/Month for versions changed or deleted more than 1 year ago
They are not giving much information here at all. The above is not a pasting artifact; their page literally doesn't give any indication of how they price the 1-year history. Presumably it's not just as simple as click to 12x your retention for free.
Meanwhile, it's even more unclear whether that $.006/GB is assessed for change deltas or for the total file size. Indeed, it's not clear if it's assessed against your entire fileset or just files that changed.
I'll have to email them, I guess.
I would ask you: what is the better alternative? That's not a rhetorical question; they don't have my credit card details for another two weeks.
I checked out what it would cost to store my data on rsync.net and it's ~1500/year.
There are a few folks who lost data to corrupted files on the BackBlaze side. That is much more worrisome, but still, the point is to go in eyes open. It's clear that it should not be used as an "only" backup! I already backup locally and rsync critical data offsite. I'm all about adding several "third option" lifelines that I hope will keep me covered in extreme situations.
I get that this is not a restorable image, but for $100 a year I'm not expecting that.
Well, "no problem" is an overstatement. Once you need a restore, you learn that their promise of end-to-end encryption is actually a lie. (As in, you have to break the end-to-end encryption to restore since everything has to be decrypted on their servers.)
I don’t really understand that. I’m using Windows File History, and while it’s limited to backing up changes only every 15 minutes, and is writing to a local network drive, it doesn’t seem to have any trouble with .git directories.
That's a crazy statement. The cloud backup system I use can be configured to how often it should bother even looking for new files, and for the section where I have my .git repos (they're actually "bare" git repos and I push to them, locally) I've set it to every two hours. Which is actually overkill because they absolutely do not change that quickly.
You are using it to mean "maintaining full version history", I believe? Another important consideration.
both services have internal backups to reduce the chance they lose data
both services allow some limited form of "going back to older version" (like the article states itself).
Just because the article says "sync is not backup" doesn't mean that is true, I mean it literally is backup by definition as it: makes a copy in another location and even has versioning.
It's just not _good enough_ backup for their standards. Maybe even standards of most people on HN, but out there many people are happy with way worse backups, especially wrt. versioning for a lot of (mostly static) media the only reason you need version rollback is in case of a corrupted version being backed up. And a lot of people mostly backup personal photos/videos and important documents, all static by nature.
Through
1. it doesn't really fulfill the 3-2-1 rules it's only 2-1-1 places (local, one backup on ms/drop box cloud, one offsite). Before when it was also backed up to backblaze it was 3-2-1 (kinda). So them silently stopping still is a huge issue.
2. newer versions of the 3-2-1 rule also say treat 2 not just as 2 backups, but also 2 "vendors/access accounts" with the one-drive folder pretty much being onedrive controlled this is 1 vendor across local and all backups. Which is risky.
So my idea is that it's a competency problem (lack of communication), not malice. But it's just a theory, based on my own experience.
In any case, this is a bad situation, however you look at it.
For stuff I care about (mostly photos), I back them up on two different services. I don't have TBs of those, so it's not very expensive. My personal code I store on git repositories anyway (like SourceHut or Codeberg or sometimes GitHub).
I have no clue why people still use it and I'd cut my losses if I were you, either backup to the cloud or pull from it, not both at the same time like an absolute tictac.
If iCloud is set to keep full copies on disk, Backblaze will treat those like normal files and back them up.
More details here: https://www.backblaze.com/computer-backup/docs/en/back-up-ic...
Happy to answer any questions if anything’s unclear.
It is true that we recently updated how Backblaze Computer Backup handles cloud-synced folders. This decision was driven by a consistent set of technical issues we were seeing at scale, most of them driven by updates created by third-party sync tools, including unreliable backups and incomplete restores when backing up files managed by third-party sync providers.
To give a bit more context on the “why”: these cloud storage providers now rely heavily on OS-level frameworks to manage sync state. On Windows, for example, files are often represented as reparse points via the Cloud Files API. While they can appear local, they are still system-managed placeholders, which makes it difficult to reliably back them up as standard on-disk files.
Moreover, we built our product in a way to not backup reparse points for two reasons:
1. We wanted the backup client to be light on the system and only back up needed user-generated files. 2. We wanted the service to be unlimited, so following reparse points would lead to us backing up tons of data in the cloud
We’ve made targeted investments where we can, for example, adding support for iCloud Drive by working within Apple’s model and supporting Google Drive, but extending that same level of support to third-party providers like Dropbox or OneDrive is more complex and not included in the current version.
We are currently exploring building an add-on that either follows reparse points or backs up the tagged data in another way.
We also hear you clearly on the communication gap. Both the sync providers and Backblaze should have been more proactive in notifying customers about a change with this level of impact. Please don't hesitate to reach out to me or our support team directly if you have any questions. https://help.backblaze.com/hc/en-us/requests/new
We are here to help.
Managing exclusions is something to keep vaguely on top of (I've accidentally had a few VM disk images get backed up when I don't need/want them) but the default exclusions are all very reasonable.
The new and very interesting problem with their business model is that drive prices have doubled - and in some cases, more than doubled - in the last 12 months.
Backblaze has a lot of debt and at some point the numbers don't make sense anymore.
Always prefer businesses who are upfront and honest about what they can offer their users, in a sustainable way.
If a company uses the word unlimited to describe their service, but then attempts to weasel out of it via their T&Cs, that doesn't constitute a disagreement over the meaning of the word unlimited. It just means the company is lying.
- report an error
- ignore
- materialize
Regardless, if you make it back up software that doesn’t give this level of control to users, and you make a change about which files you’re going to back up, you should probably be a lot more vocal with your users about the change. Vanishingly few people read release notes.
Making the change without making it clear though, that's just awful. A clear recipe for catastrophic loss & drip drip drip of news in the vein of "How Backblaze Lost my Stuff"
Hell, if I open a directory of photos and my OS tries to pull exif data for each one, it would be wild if that caused those files to be fully downloaded and consume disk space.
It would be reasonable to say that if you run the file sync in a mode that keeps everything locally, then Backblaze should be backing it up. Arguably they should even when not in that mode, but it'll churn files repeatedly as you stream files in and out of local storage with the cloud provider.
And, as a separate note, they shouldn't be balking at the amount of data in a virtualized onedrive or dropbox either considering the user could get a many-terabyte hard drive for significantly less money.
Why should a file backup solution adapt to work with git? Or any application? It should not try to understand what a git object is.
I’m paying to copy files from a folder to their servers just do that. No matter what the file is. Stay at the filesystem level not the application level.
That's a really important fact that's getting buried so I'd like to highlight it here.
First thing I noticed is that if it can't download a file due to network or some other problem then it just skips it. But you can force it to retry by modifying its job file which is just an SQLite DB. Also it stores and downloads files by splitting them into small chunks. It stores checksums of these chunks, but it doesn't store the complete checksum of the file, so judging by how badly the client is written I can't be sure that restored files are not corrupted after the stitching.
Then I found out that it can't download some files even after dozens of retries because it seems they are corrupted on Backblaze side.
But the most jarring issue for me is that it mangled all non-ascii filenames. They are stored as UTF-8 in the DB, but the client saves them as Windows-1252 or something. So I ended up with hundreds of gigabytes of files with names like фикац, and I can't just re-encode these names back, because some characters were dropped during the process.
I wanted to write a script that forces Backblaze Client to redownload files, logs all files that can't be restored, fixes the broken names and splits restored files back into chunks to validate their checksums against the SQLite DB, but it was too big of a task for me, so I just procrastinated for 3 years, while keeping paying monthly Backblaze fees because it's sad to let go of my data.
I wonder if they fixed their client since then.
Eh, I don't agree. Case in point: Microsoft.
Or in other words: a sucker is born every minute.
On macOS.
Except that before they did and then they didn't without any proper notification (release notes don't count for significant changes like this).
They should have just added a pop up or at least email or both, given a heads-up and then again when the change actually kicked in
The problem is not them not backing it up by default but:
* changing existing setting to backup less by default * essentially hiding the change from the user as it is not shown on directory exclude list
[0]: https://kopia.io/
TLDR: Despite claiming to backup all your data, Backblaze quietly stopped backing up OneDrive and Dropbox folders - along with potentially many other things.
For ten years I have been using Backblaze for my personal computer backup. Before 2015 I would backup files to one of two large external hard discs. I then rotated these drives between, first my father’s house, and after I moved to the UK, my office drawers.
In 2015 Backblaze seemed like a good bet. Unlike Crashplan their software wasn’t a bloated Java app, but they did have unlimited storage. If you could cram it into your PC they would back it up. With their yearly Hard Drive reviews making good press, a lot of personal recommendations from my friends and colleagues, their service sounded great. I installed the software, ran it for several weeks, and sure enough my data was safely stored in their cloud.
I had further reason to be impressed when several years later one of my hard drives failed. I made use of their “send me a hard drive with my stuff on it service”. A drive turned up filled with my precious data. That for me was proof that this system worked, and that it worked well.
And so I recommended Backblaze for years. What do you do for backup? I would extoll the virtues of Backblaze, and they made many sales from such recommendations.
There were a few things I didn’t like. The app, could use a lot of memory, especially after doing a large import of photographs. The website, which I often used to restore single files or folders, was slow and clunky to use. The windows app in particular was clunky with an early 2000s aesthetic and cramped lists. There was the time they leaked all your filenames to Facebook, but they probably fixed that.
But no matter, small problems for the peace of mind of having all my files backed up.
Backup software is meant to back up your files. Which files? Well the files you need. Given everyone is different, with different workflows and filetypes, the ideal thing is to back up all your files. No backup provider knows what I will need in the future. The provider must plan accordingly.
My first troubling discovery was in 2025, when I made several errors then did a push -f to GitHub and blew away the git history for a half decade old repo. No data was lost, but the log of changes was. No problem I thought, I’ll just restore this from Backblaze. Sadly it was not to be. At some point Backblaze had started to ignore .git folders.
This annoyed me. Firstly I needed that folder and Backblaze had let me down. Secondly within the Backblaze preferences I could find no way to re-enable this. In fact looking at the list of exclusions I could find no mention of .git whatsoever.

This made me wonder - I had checked the exclusions list when I installed Backblaze 9 years before, had I missed it? Had I missed anything else?
Well lesson learned I guess, but then a week ago I came across this thread on reddit: “Doesn’t back up Dropbox folder??”. A user was surprised to find their Dropbox folder no longer being backed up. Alarmed I logged into Backblaze, and lo and behold, my OneDrive folder was missing.
I. Am. Fucking. Furious.
Backblaze has one job, and apparently they are unable to do that job. Back up my stuff. But they have decided not to.
Lets take an aside.
A reasonable person might point out those files on OneDrive are already being backed up - by OneDrive! No. Dropbox and OneDrive are for file syncing - syncing your files to the cloud. They offer limited protection. OneDrive and Dropbox only retain deleted files for one month. Backblaze has one year file retention, or if you pay per GB, unlimited retention. While OneDrive retains version changes for longer, Dropbox only retains version changes for a month - again unless you pay for more. Your files are less secure and less backed up when you stick them in a cloud storage provider folder compared to just being on your desktop.
And that’s assuming your cloud provider is playing ball. If Microsoft or Dropbox bans your account you may find yourself with no backup whatsoever.
For me the larger issue is they never told us. My OneDrive folder sits at 383GB. You would think that having decided to no longer back this up I might get an email, and alert or some other notification. Of course not.
Nestled into their release notes under “Improvements” we see:
The Backup Client now excludes popular cloud storage providers from backup, including both mount points and cache directories. This prevents performance issues, excessive data usage, and unintended uploads from services like OneDrive, Google Drive, Dropbox, Box, iDrive, and others. This change aligns with Backblaze’s policy to back up only local and directly connected storage.
First, I would hardly call this change in policy an improvement, its hard to imagine anyone reading this as anything other than a downgrade in service. Secondly does Backblaze believe most of its users are reading their release notes?
And if you joined today and looked at their list of file exclusions you would find no reference to Dropbox or OneDrive. No mention of Git either.
Here’s the thing, today they don’t back up Git or OneDrive. Who’s to say tomorrow they wont add to the list. Maybe some obscure file format that’s critical to your work flow. Or they will ignore a file extension that just happens be the same as one used by your DAW or 3D Modelling software. And they won’t tell you this. They wont even list it on their site.
By deciding not to back up everything, Backblaze has made it as if they are backing up nothing.
But really this feels like a promise broken. Back in 2015 their website proudly proclaimed:
All user data included by default No restrictions on file type or size
Protect the digital memories and files that matter most to you.
File backup is a matter of trust. You are paying a monthly fee so that if and when things go wrong you can get your data back. By silently changing the rules, Backblaze has not simply eroded my trust, but swept it away.
I wrote this to warn you - Backblaze is no longer doing their part, they are no longer backing up your data. Some of your data sure, but not all of it.
Finally let me leave you with Backblaze’s own words from 2015:
Unlimited, Simplified, Secure Personal Online Backup Cloud Storage
They promised to simplify backup. They succeeded - they don’t even do the backup part anymore.
It's set it and forget.
You will need to set it up for them, then you get an email (from borgbackup, not the client so it works when the client is not running) when a backup hasn't happened for a while.
As client there are more options now (like Vorta, from them), but I have had success with https://github.com/garethgeorge/backrest and the Restic backend.
But I have no idea where the company currently sits on the spectrum from good actor to fully enshittified.
Set up your config to exclude common non-file dirs, or say "only `/Applications` and `Home` and that's about it. If it's a file then it's a file, and it will be synced up.
My frustration stems from paying hundreds of dollars over several years to pay for backup and then silently learning Dropbox was no longer supported when we went to look for it in our backup. We could’ve made other choices about how to store/bavkup our own files with better communication.
You can't connect to their Computer Backup service through third-party software.
Is icloud drive backed up or not? Please be clear. Which versions support this?
https://www.backblaze.com/computer-backup/docs/back-up-iclou... still says that it no longer backs up icloud drive files. Then it gives directions on how to back them up. Which is it?
Also, why did you stop backing up .git folders, sliently?
Appreciate the thoughtful response. I recognize the challenge that cloud-synced folders introduce into the file storage ecosystem and the challenges with online/offline files + storage loopholes that could take a cause an engineering challenge.
That being said, we’ve been nearly 6+ years backblaze users and we probably can’t rely on backblaze if it can’t support these tools that are now pretty standard services to have installed. As I mentioned above, the promise for us was “pay for backblazd and never worry about whether our files are backed up”. We’ll be looking for an announcement if you can bring back Dropbox support.
Julian
My dad had a file untouched in Dropbox for 2 years. He overwrote it 2 days prior to me trying to recover it from Dropbox/Backblaze. They said he couldn't access the version that was just overwritten because that was over 30 days old, which is not what the definition of 30-day history is....
Oh well, I guess this is why we're given two kidneys.
Or that they're targeting the mass retail market, where people are technically ignorant, and "unlimited" is required to compete.
And statistically-speaking, is viable as long as a company keeps its users to a normal distribution.
But we'd always have a few people at the end of the semester print 493 blank pages using up all of their print quota they'd "paid for". No sir, you didn't pay for 500 pages of printing a semester, we'd let you print as much as you needed, we just had to put a quota in place to prevent some joker from wallpapering the lecture hall.
It was hard to express what we meant and "unlimited" didn't cut it.
so it’s an even more frustrating misleading statement.
_Nothing_ is actually infinite. Everything has limits.
"But X terabytes is functionally infinite for 99.99% of users"
Cool, then advertise that you offer Xtb of storage. Infinite means infinite, and if you offer anything less than that - and you do - then you shouldn't be allowed to say otherwise.
Unlimited however, they can offer. I don’t see how people get into mental block of thinking something is nefarious when a company offers you unlimited hosting or data. Yes, they know it’s impossible if everyone took full advantage of that. They also know most people won’t and so they don’t have to spend time worrying about it. It’s a simple actuarial exercise to work out the pricing that covers the use of your users.
Back in the early 2000s I ran a web hosting service that was predominantly a LAMP stack shared hosting environment. It had several unlimited plans and they were easy to estimate/price. The only times I had an issue of supporting a heavy user, it would turn out they were doing something unrestricted. Back then, it was usually something pron or mp3 related. So the user would get kicked off for that. I didn’t have any issues with supporting the usage load if it was within TOS. The margins were so high it was almost impossible to find a user that could give me any trouble from an economic standpoint.
Hiding the network always ends in pain. But never goes out of style.
After a backup, you’d go out to a coffee shop or on a plane only to find that the files in the synced folder you used yesterday, and expected to still be there, were not - but photos from ten years ago were available!
When you have a couple terabytes of data in that drive, is it acceptable to cycle all that data and use all that bandwidth and wear down your SSD at the same time?
Also, high number of small files is a problem for these services. I have a large font collection in my cloud account and oh boy, if I want to sync that thing, the whole thing proverbially overheats from all the queries it's sending.
The moment you call read() (or fopen() or your favorite function), the download will be triggered. It's a hook sitting between you and the file. You can't ignore it.
The only way to bypass it is to remount it over rclone or something and use "ls" and "lsd" functions to query filenames. Otherwise it'll download, and it's how it's expected to work.
It's that to back up a folder on a filesystem, you need to traverse that folder and check every file in that folder to see if it's changed. Most filesystem tools usually assume a fairly low file count for these operations.
Git, rather unusually, tends to produce a lot of files in regular use; before packing, every commit/object/branch is simply stored as a file on the filesystem (branches only as pointers). Packing fixes that by compressing commit and object files together, but it's not done by default (only after an initial clone or when the garbage collector runs). Iterating over a .git folder can take a lot of time in a place that's typically not very well optimized (since most "normal" people don't have thousands of tiny files in their folders that contain sprawled out application state.)
The correct solution here is either for git to change, or for Backblaze to implement better iteration logic (which will probably require special handling for git..., so it'd be more "correct" to fix up git, since Backblaze's tools aren't the only ones with this problem.)
This is a joke, but honestly anyone here shouldn't be directly backing up their filesystems and should instead be using the right tool for the job. You'll make the world a more efficient place, have more robust and quicker to recover backups, and save some money along the way.
It's the same reason why the postgres autovacuum daemon tends to be borderline useless unless you retune it[0]: the defaults are barmy. git gc only runs if there's 6700 loose unpacked objects[1]. Most typical filesystem tools tend to start balking at traversing ~1000 files in a structure (depends a bit on the filesystem/OS as well, Windows tends to get slower a good bit earlier than Linux).
To fix it, running
> git config --global gc.auto 1000
should retune it and any subsequent commit to your repo's will trigger garbage collection properly when there's around 1000 loose files. Pack file management seems to be properly tuned by default; at more than 50 packs, gc will repack into a larger pack.
[0]: For anyone curious, the default postgres autovacuum setting runs only when 10% of the table consists of dead tuples (roughly: deleted+every revision of an updated row). If you're working with a beefy table, you're never hitting 10%. Either tune it down or create an external cronjob to run vacuum analyze more frequently on the tables you need to keep speedy. I'm pretty sure the defaults are tuned solely to ensure that Postgres' internal tables are fast, since those seem to only have active rows to a point where it'd warrant autovacuum.
> I wanted to write a script that forces Backblaze Client to redownload files, logs all files that can't be restored, fixes the broken names and splits restored files back into chunks to validate their checksums against the SQLite DB, but it was too big of a task for me, so I just procrastinated for 3 years, while keeping paying monthly Backblaze fees because it's sad to let go of my data.
Filenames are probably the most valuable of metadata for them to mangle. I value them as much as I do file creation/modification times. A backup program is dead to me if they mess up either of these.
I think it should be trivial for you to pipe your request into Claude now, and get them to write a quick script. Hope that'll free you from Backblaze for good!
> I wonder if they fixed their client since then.
They have not. I spent more than a week trying to restore a little less than 2 TB backup because the client would just freeze at the last few % every time. I ended up having to break the restore into 200GB chunks on the web client and download and restore manually which was extremely frustrating and made me despise their (required) Windows client.I hate things like "email recall" in Outlook or deleting messages in Teams etc because it trains normies into thinking you can recover from a compromise.
We grew up compiling Linux kernels when Microsoft was busy spreading FUD about how dangerous it would be to unleash open source and use open source. That using Linux on something critical like servers would lead to absolute chaos because the kernel wasn't written by someone who knew how to move Mt. Fuji.
I imagine Backblaze will soon realize why good PR firms are so expensive.
Just this weekend, my backup tool went rogue and exhausted quota on rsync.net (Some bad config by me on Borg.) Emailed them, they promptly added 100 GB storage for a day so that I could recover the situation. Plus, their product has been rock solid since a few years I've been using them.
D'argh.
Just to clarify - there are discounted plans that don't have free ZFS snapshots but you can still have them ... they just count towards your quota.
If your files don't change much - you don't have much "churn" - they might not take up any real space anyway.
There is 100% a difference between "dead data" (eg: movie.mp4) and "live data" (eg: a git directory with `chmod` attributes)- S3 and similar often don't preserve "attributes and metadata" without a special secondary pass, even though the `md5` might be the same.
One particular issue I've encountered is that syncthing 2.x does not work well for systems w/o an SSD due to the storage backend switching to sqlite which doesn't perform as well as leveldb on HDDs, the scans of the 6TB folder was taking an excessively long time to complete compared to 1.x using leveldb. I haven't encountered any issues with mixing the use of 1.x and 2.x in my setup. The only other issues I've encountered are usually related to filename incompatibilites between filesystems.
syncthing is not perfect, and can get into weird states if you add and remove devices from it for example, but for my case it is I think the best option.
https://www.reddit.com/r/backblaze/comments/175haik/is_upgra...
(Ironic that their ex-employee wrote: "In this case, it is always kind of scary for Backblaze to just quietly change a setting from one setting to a different setting in the product for the customer without the customer taking that action themselves." — when that's what happened in this case.)
I don't use Backblaze, but I found the Version History doc for you:
https://www.backblaze.com/computer-backup/docs/version-histo...
Seems like whatever version is not currently on your local disk would be chargeable.
About 2 years ago now, we started including the 1-Year Extended Version History for all accounts.
We did not automatically turn it on for everyone, as it is not something everyone wants.
But there is no longer an extra charge for the 1-Year Extended Version History option when enabled.
You can also enable the Forever Extended Version History option, which will keep any deleted/changed/updated file for a year as part of the base plan, and then after that year that file will begin accruing a storage cost at our current B2 per GB storage cost.
So if you have 1TB of files in the Forever Extended Version History plan, it would cost you an extra $6 per month on top of your current plan.
The storage charge is only charged on files that have not been seen on the customer's computer in over 365 days.
I hope this helps, but please don't hesitate to email us still if you have questions. We are here to help.
I assume when asking such a question, you expect an honest answer like mine:
rclone is my favorite alternative. Supports encryption seamlessly, and loaded with features. Plus I can control exactly what gets synced/backed up, when it happens, and I pay for what I use (no unsustainable "unlimited" storage that always comes with annoying restrictions). There's never any surprises (which I experienced with nearly every backup solution). I use Backblaze B2 as the backend. I pay like $50 a month (which I know sounds high), but I have many terabytes of data up there that matters to me (it's a decade or more of my life and work, including long videos of holidays like Christmas with my kids throughout the years).
For super-important stuff I keep a tertiary backup on Glacier. I also have a full copy on an external harddrive, though those drives are not very reliable so I don't consider it part of the backup strategy, more a convenience for restoring large files quickly.
Now I need a new solution that will work for my parents
Just because files are in bespoke folders, does NOT mean they are being backed up.
Example: I'm 1,016% over my OneDrive limit because I canceled my Microsoft 365 account due to their price hike to cover for AI costs. My laptop still pushes files there upon save thanks to Microsoft defaults (my desktop was moved to CachyOS long ago).
If I had been using Backblaze for backup, those files would not have been backed up.
Luckily, I'm a nerd and I'm way ahead of this (I moved away from OneDrive long ago and never deleted the files). Most folks aren't.
Backblaze should be alerting users when stuff isn't backed up. I've strongly considered their B2 offering for a big project. The fact that they changed this without proper notification has made me decide NOT to move forward.
- - -
Hey, I tried restoring a file from my backup — downloading it directly didn't work, and creating a restore with it also failed – I got an email telling me contract y'all about it.
Can you explain to me what happened here, and what can I do to get my file(s?) back?
- - -
Hi Jan,
Thanks for writing in!
I've reached out to our engineers regarding your restore, and I will get back to you as soon as I have an update. For now, I will keep the ticket open.
- - -
Hi Jan,
Regarding the file itself - it was deleted back in 2022, but unfortunately, the deletion never got recorded properly, which made it seem like the file still existed.
Thus, when you tried to restore it, the restoration failed, as the file doesn't actually exist anymore. In this case, it shouldn't have been shown in the first place.
For that, I do apologize. As compensation, we've granted you 3 monthly backup credits which will apply on your next renewal. Please let me know if you have any further questions.
- - -
That makes me even more confused to be honest - I’ve been paying for forever history since January 2022 according to my invoices?
Do you know how/when exactly it got deleted?
- - -
Hi Jan,
Unfortunately, we don't have that information available to us. Again, I do apologize.
- - -
I really don’t want to be rude, but that seems like a very serious issue to me and I’m not satisfied with that response.
If I’m paying for a forever backup, I expect it to be forever - and if some file got deleted even despite me paying for the “keep my file history forever” option, “oh whoops sorry our bad but we don’t have any more info” is really not a satisfactory answer.
I don’t hold it against _you_ personally, but I really need to know more about what happened here - if this file got randomly disappeared, how am I supposed to trust the reliability of anything else that’s supposed to be safely backed up?
- - -
Hi Jan,
I'll inquire with our engineers tomorrow when they're back in, and I'll update you as soon as I can. For now, I will keep the ticket open.
- - -
Appreciate that, thank you! It’s fine if the investigation takes longer, but I just want to get to the bottom of what happened here :)
- - -
Hi Jan,
Thanks for your patience.
According to our engineers and my management team:
With the way our program logs information, we don't have the specific information that explains exactly why the file was removed from the backup. Our more recent versions of the client, however, have vastly improved our consistency checks and introduced additional protections and audits to ensure complete reliability from an active backup.
Looking at your account, I do see that your backup is currently not active, so I recommend running the Backblaze installer over your current installation to repair it, and inherit your original backup state so that our updates can check your backup.
I do apologize, and I know it's not an ideal answer, but unfortunately, that is the extent of what we can tell you about what has happened.
- - -
I gave up escalating at this point and just decided these aren’t trusted anymore.
The files in question are four year old at this point so it’s hard for me conclusively state, so I guess there might be a perfect storm of that specific file being deleted because it was due to expire before upgraded to “keep history forever”, but I don’t think it’s super likely, and I absolutely would expect them to have telemetry about that in any case.
If anyone from Backblaze stumbles upon it and wants to escalate/reinvestigate, the support ID is #1181161.
But the moment that hits normal users, yeah, mess
It is true that we recently updated how Backblaze Computer Backup handles cloud-synced folders. This decision was driven by a consistent set of technical issues we were seeing at scale, most of them driven by updates created by third-party sync tools, including unreliable backups and incomplete restores when backing up files managed by third-party sync providers.
To give a bit more context on the “why”: these cloud storage providers now rely heavily on OS-level frameworks to manage sync state. On Windows, for example, files are often represented as reparse points via the Cloud Files API. While they can appear local, they are still system-managed placeholders, which makes it difficult to reliably back them up as standard on-disk files.
Moreover, we built our product in a way to not backup reparse points for two reasons:
1. We wanted the backup client to be light on the system and only back up needed user-generated files. 2. We wanted the service to be unlimited, so following reparse points would lead to us backing up tons of data in the cloud
We’ve made targeted investments where we can, for example, adding support for iCloud Drive by working within Apple’s model and supporting Google Drive, but extending that same level of support to third-party providers like Dropbox or OneDrive is more complex and not included in the current version.
We are currently exploring building an add-on that either follows reparse points or backs up the tagged data in another way.
We also hear you clearly on the communication gap. Both the sync providers and Backblaze should have been more proactive in notifying customers about a change with this level of impact. Please don't hesitate to reach out to me or our support team directly if you have any questions. https://help.backblaze.com/hc/en-us/requests/new
We are here to help.
Could you also provide an exhaustive list of items that are NOT being backed up, e.g. the .git folder? I can't find any reference to that anywhere on your website or in the app. What else is not being backed up? I know about the exclusion list in the app, which I have adjusted to suit my requirements, but you need to be clear, explicit and upfront about what you are not backing up. This is critical information.
My comment was pretty orthogonal to all the Backblaze stuff, which I realize now was confusing.
Doing a bait-and-switch on a percentage of your paying customers, no matter how small the percentage is, may be "viable" for the company, but it's a hostile experience for those users, and companies deserve to be called out for it.
And there speaks marketing.
So… Marketing has taken over, just as parent comment said. Got it.
I understand this, many others do too, the only difference seems to be that we're not willing to play those games. Others are, and that's OK, just giving my point of view which I know is shared by many others who are bit stricter about where we host our backups. Instead of "statistical games" we prefer "upfront limitations", as one example.
It's a bit safer when you know your playbook - if there was unlimited (as it is now) and unlimited plus (where they backup "cloud storage cached files") and unlimited pro max premier (where they backup entire cloud storages) you'd at least know where you stand, and you'd change "holy shit my important file I though was backed up isn't and now it's gone forever" to "I have to pay $10 a more a month or take on this risk".
Nobody has turned the moon into a hard drive yet.
It's been cobbled together over the years to add things I want, like not backing up on battery, or sending a desktop message on success. When I set it up I couldn't figure out how to set up a timer, so it runs when I wake from suspend. I'd probably use a systemd timer in the future though.
I also should probably snapshot my file system before backing up since I'm running btrfs, but I never figured out how to do that either, and this works, lol.
https://forge.taf.codes/taf/snippets/src/branch/main/backups...
If it's not open-source, but the protocol is documented, see above.
If it's not open-source, and the protocol isn't documented, well... that makes the decision easy, doesn't it?
Or maybe just do what they do now, but WARN about that in HUGE RED LETTERS, in the website and the app, instead of burying it in an update note like weasels!
This particular example is a useful one for me to think about, because it's a version of hiding complexity in order to present a simple interface that I actually hate. (WYSIWYG editors is another one, for similar reasons: it always ends up being buggy and unpredictable.)
Today's Dropbox is a network file system with inscrutable cache behavior that seeks to hide from the users the information about which files are actually present. That makes it impossible for normal users to correctly reason about its behavior, to have correct expectations for what will be available offline or what the side effects of opening a file will be, and Backblaze is stuck trying to cope with a situation where there is no right answer.
This is an instance of someone familiar with complex file access patterns not understanding the normal use case for these services.
The people using these bidirectional sync services want last writer wins behavior. The mild and moderately technical people I work with all get it and work with it. They know how to use the UI to look for old versions if someone accidentally overwrites their file.
Your characterization as complete chaos with constant problems does not mesh with the reality of the countless low-tech teams I've seen use Dropbox type services since they were launched.
You can build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.
It works perfectly fine as long as you keep how it works in mind, and probably most importantly don't have multiple users working directly on the same file at once.
I've been using these systems for over a decade at this point and never had a problem. And if I ever do have one, my real backup solution has me covered.
>NOTE: >iCloud's most recent update prevents Backblaze from backing up files that iCloud synced. >To back up these files, download them to another local location where Backblaze can read them.
So which is it?
Isn't it obvious you need to let your users know there is a new constraint they don't expect?
Seems incredibly short-sighted to me.. The trust you build for years gets destroyed in seconds..
I'm not surprised that support was wrong, but I was somewhat surprised there was zero attempt at customer retention.
I'm using B2 as the backend, ironically, along with a Hetzner Storage Box. It just runs in the background, has decent defaults for "Don't backup useless crap" etc.
I'm still debating whether to get the single purchase version or pay $60 a year for 5 computers + 1TB of cloud storage.
[0] My home file server, migrating a four-disk mirrored-pairs ZFS array to RAID5 including replacing the smaller pair of disks with ones matching the larger pair, so the old ZFS filesystem had to be totally destroyed in the process and I needed somewhere to put the data for the like 15 minutes the logical disk wouldn't exist in any form. The alternative would have been to build an entire new four-disk array, doubling the disk cost of the project and requiring some kind of second host-machine. This approach saved me $400 or more, probably wouldn't have attempted it otherwise, cost would have been too high. Ended up costing somewhere in the tens of dollars as I recall.
Especially if they allow them restoring all your data onto a drive and shipping it to you, they pretty clearly should have enough information available to them to test restorations of data, and the number of times I've heard that failure mode ("oh, we didn't track deletions well enough, so we only found out we deleted it when you tried restoring"), plus them saying they have made improvements to avoid this exact failure mode in newer client versions, makes me think they should have enough reports to investigate it.
...which makes me wonder if they did, and decided they would go bankrupt if they told people how much data they lost, so they decided to bet on people not trying restores on a lot of the lost data.
“Every file is only ever written to from a single client, and will be asynchronously made available to all other clients, and after some period of time has elapsed you can safely switch to always writing to the file from a different client”.
What do you use and how do you test / reconcile to make sure it’s not missing files? I find OneDrive extremely hard to deal with because the backup systems don’t seem to be 100% reliable.
I think there are a lot of solutions these days that error on the side of claiming success.
That being said i understand how it works at a high level.
For context, this was driven by changes on the Dropbox side in how synced files are handled, which affects how reliably they can be backed up. But even with that, it should have been surfaced much more clearly.
Pricing tiers suck if your usage needs are at the bottom of a tier, or you need exactly one premium feature but not more. A la carte pricing is always at least a bit steep, since there's no minimum charge/bulk discount (consider a gym or museum's "day pass") so they have to charge you the full one-time costs every time in case that's your only time.
Base cost + extra per usage might be the best overall, but because nobody has solved micro transactions, the usage fees have to be pretty steep too. And frankly, everyone hates being metered - it means you have to think about pricing every time you go to use something.
Although I will say it's been nice to have them give more transparency around their actual soft cap numbers.
Once growth slows, churn eats much of the organic growth and you need to spend money on marketing.
In general this is a myth promoted by platforms with millions of users. The vast majority of such large platforms could easily afford that level of human support, they just actively choose mot to give it. Blackblaxe - if they even have millions of users - belongs to the minority of such companies.
When a movie subscription says unlimited movies, we know they're not suggesting that they can break the laws of time, just that they won't turn you away from a screening. It's pretty normal language, used to communicate no additional limit, which is relevant when compared to cell phone data plans (which are actually, in my opinion, fraudulent) that shunt you to a lower tier after a certain amount of usage.
I do wish it was a word that had to be completely dropped from marketing/adverting.
For example there is not unlimited storage, hell the visible universe has a storage limit. There is not unlimited upload and download speed, and what if when you start using more space they started exponentially slowing the speed you could access the storage? Unlimited CPU time in processing your request? Unlimited execution slots to process your request? Unlimited queue size when processing your requests.
Hence everything turns into the mess of assumptions.
I doubt they have those pipes, at least if every of their customers (or a sufficiently large amount) would actually make use of that.
Second question would be, how long they would allow you to utilize your broadband 24/7 at max capacity without canceling your subscription. Which leads back to the point the person I replied to was making: If you truly make use of what is promised, they cancel you. Hence it is not a faithful offer in the first place.
[1]: https://www.reddit.com/r/backblaze/comments/1cgy93n/i_did_a_...
No, they are using it to mean “backed up”. Like, “if this data gets deleted or is in any way lost locally, it’s still backed remotely (even years later, when finally needed)”.
I’m astonished so many people here don’t know what a backup is! No wonder it’s easy for Backblaze to play them for fools.
Those $50 indeed sound high to me. I think I’d be fine depending on the Glacier backup, is that rclone compatible? What do you pay for it?
As for GUIs in general... Well, like I said, I just finished several years of bad experiences with some proprietary ones, and I wanted to see and choose what was really going on.
At this point, I don't think I'd ever want a GUI beyond a basic status-reporting widget. It's not like I need to regularly micromanage the folder-set, especially when nobody else is going to tweak it by surprise.
_____
[1] The downside to the dumb-store is a ransomware scenario, where the malware is smart enough to go delete my old snapshots using the same connection/credentials. Enforcing retention policies on the server side necessarily needs a smarter server. B2 might actually have something useful there, but I haven't dug into it.
I am not aware of any evidence supporting this.
[1]: https://www.reddit.com/r/backblaze/comments/1cgy93n/i_did_a_...
> git config --global gc.auto 1000
with the long option name, and no `=`.
Let's ride the lightning and see if it does anything.
Sorry to hear about your troubles. Hope your backup situation's sorted out?
Do you recall if you used a link like this to sign up?
https://www.rsync.net/signup/order.html?code=experts
If you don't, a good heuristic would be to see how much you pay per GB - if it's less than a cent, you probably did. The ones that come with support are typically a shade above per a cent per GB
In other words, a backup can be degraded into a sync-to-nothing situation if the client logic is untrustworthy.
Again, with regards to my other comments, I am not affiliated with rsync, just rclone is cool. You can use the same trick with any host with ssh and rclone installed.
I said that I want a solution that I don't have to think about. I'm happy to pay for not thinking about it. If that's not Blackblaze, do you have any good suggestions?
Maintaining version history out to a set retention period is a backup...no?
:P
Storage was already a hairy beast with the original setup, and it would be much better if they had defined limits you could at least know about (and pay for).
...even nearly any frame of reference for anything storage related, much less gigabytes
Yes, indeed, most relevant in this case probably "time" and "bandwidth", put together, even if you saturate the line for a month, they won't throttle you, so for all intents and purposes, the "data cap" is unlimited (or more precise; there is no data cap).
I assume you don’t think that, so I’m curious, what would you propose positively?
So, in practice, you shouldn't have to download the whole remote drive when you do an incremental backup.
When I backup my computer the .git folders are among the most important things on there. Most of my personal projects aren't pushed to github or anywhere else.
Fortunately I don't use Backblaze. I guess the moral is don't use a backup solution where the vendor has an incentive to exclude things.
It would be incredible if you started to look into S3 compatible object stores, unless you have made a business decision not to support it.
Thank You for providing an affordable option for self hosters.
> a copy of information held on a computer that is stored separately from the computer
there is nothing about _any_ versioning, or duration requirements or similar
To use your own words, I fear its you who doesn't know what a backup is and assume a lot other additional (often preferable(1)) things are part of that term.
Which is a common problem, not just for the term backup.
There is a reason lawyers define technical terms in a for this contract specific precise way when making contracts.
Or just requirements engineering. Failing there and you might end up having a backup of all your companies important data in a way susceptible to encrypting your files ransomware or similar.
---
(1): What often is preferable is also sometimes the think you really don't want. Like sometimes keeping data around too long is outright illegal. Sometimes that also applies to older versions only. And sometimes just some short term backups are more then enough for you use case. The point here is the term backup can't mean what you are imply it does because a lot of existing use cases are incompatible with it.
Not important here because backblaze only has to match the storage of your single device. Plus some extra versions but one year multiplied by upload speed is also a tractable amount.
The filesystem is a black box for these software since they don't know where a file resides. If you want control, you need to talk with every party, incl. the cloud provider, a-la rclone style.
Well, for backups the workaround is a bit easier (as they strictly only ever read files), but still.
I mention it not to shill rsync.net, but to shill rclone, because when I discovered it I was even more impressed with it.
Obviously having to run a command and apply some amount of plumbing is different to a service just providing that API at the outset so the applicability for users will differ but still, rclone is very cool!
It would still happen with the first backup - or first connection of the cloud drive - though, which isn’t a great post-setup new user experience. It probably drove complaints and cancellations.
I feel like I’ve accidentally started defending the concept of not backing up these folders, which I didn’t really intend to. I’d also want these backed up. I’m just thinking out loud about the reasons the decision was made.
What can get things into a weird state is if both machines are editing the same file while only one of them is actively syncing. But for basic backup and sync, this is extremely rare.
No, I'm not joking. We used to allow arbitrary paths in a cloud API I owned. Within about a month someone had figured out that the cost to store a single byte file was effectively zero, and they could encode arbitrary files into the paths of those things. It wasn't too long before there was a library to do it on Github. We had to put limits on it because otherwise people would store their data in the path, not the file.
If we remove the whole linux section and just ask "why not map a folder in Explorer" it's a reasonable question, probably even more reasonable in 2026 than in 2007. The network got faster and more reliable, and the dropbox access got slower.
I had to give up and delete plenty of data because of this. That data was important to me, but not important enough to pay their ransom.
The real issue is that everyone scrambles to make a sale, and nobody stops to determine if they should actually make that sale. Funny enough, I blame all of this on marketing and sales.
Residential network access is oversold as everything else.
The only difference with storage is there’s a theoretical maximum on how much a single person can use.
But you could just as well limit backup upload speed for similar effect. Having something about fair use in ToS is really not that different.
Of course, in countries where the internet isn't so developed as in other parts of the world, this might make sense, but modern countries don't tend to do that, at least in my experience.
I've used enough Claude coded applications that I wouldn't trust that with a backup, unless it had extensive tests along with it.
To have this thing you’re not supposed to need to worry about affect whether your files got backed up is exactly the problem here. The goal is to back up your files, whether they’re in the cloud or not.
I sympathize with Backblaze’s problem with their file change monitor, but then they should considee implementing connectors for OneDrive, Dropbox, etc. and back up files directly from the cloud.
Seems simple enough to do for Backblaze, no?
Yes, I didn't technically said that.
> It sounds like you are arguing it is impossible to backup files in Dropbox in any reasonable way, and therefore nobody should backup their cloud files.
I don't argue neither, either.
What I said is with "on demand file download", traditional backup software faces a hard problem. However, there are better ways to do that, primary candidate being rclone.
You can register a new application ID for your rclone installation for your Google Drive and Dropbox accounts, and use rclone as a very efficient, rsync-like tool to backup your cloud storage. That's what I do.
I'm currently backing up my cloud storages to a local TrueNAS installation. rclone automatically hash-checks everything and downloads the changed ones. If you can mount Backblaze via FUSE or something similar, you can use rclone as an intelligent MITM agent to smartly pull from cloud and push to Backblaze.
Also, using RESTIC or Borg as a backup container is a good idea since they can deduplicate and/or only store the differences between the snapshots, saving tons of space in the process, plus encrypting things for good measure.
Interestingly, rclone supports that on many providers, but to be able to backblaze support that, it needs to integrate rclone, connect to the providers via that channel and request checks, which is messy, complicated, and computationally expensive. Even if we consider that you won't be hitting API rate limits on the cloud provider.
And no, my server isn't behind cloudflare, primarily because I don't have $200 to throw at them to allow me to proxy arbitrary TCP/UDP ports through their network, and I don't know how to tell CF "Hey, only proxy this traffick but let me handle everything else" (assuming that's even possible given that the usual flow is to put your entire domain behind them).
However, we do support interoperating with block storage, such as 's5cmd':
https://news.ycombinator.com/item?id=44248372
... and, of course, rclone, which you can invoke remotely, on our end to move data between cloud accounts, etc.
Feel free to use my reputation, instead: when I say a system is backed up, data cannot be lost by that system being destroyed, because an independent copy always exists. This satisfies those whom it concerns, who put their money where their mouth is, whereas your more generous but insufficient definition would absolutely not be good enough.
When you assure a client that a system is backed up, which definition do they expect from you?
Back in the late 1990s we could run a couple dozen 56k lines on a 1.544 Mbps backhaul. We could have those to the same extent today, but there’s still a ratio that works fine.
Not an issue in most languages, but I'm using bash, so its more of a bother.
I think backing up the materialized files is appropriate. That’s what they (used to) promise.
Of course it is. So *you* don't have to know which cloud files are actually there. Doesn't mean backblaze can't know, and should work within that paradigm rather than not backup anything.
As for your (the user, not necessary you personally) expectation that Backblaze would backup the stubs (im not sure that would really matter, as you said in your own comment) regardless of its stub status, that's unreasonable-- that Backblaze would travers the stubs and... why? temporarily download them, upload to backblaze? That's not what they ever stated would happen and is a big stretch to expect what amounts to the extra service of backing up cloud drives simply because a user decides to have what amounts to a an 'ln <soft link>' to to a network drive. The do explicitly exclude that.
What is not reasonable on their part is to change any service at all that had previously been happening, regardless of whether it was or was not within the ToS, and likely contract law wouldn't support a claim on their part that there was an implied contract through ambiguity which courts will typically resolve in favor of an injured party, especially one in a position of lesser power in the relationship. I'm not claiming that's what happened here, my reading of any ToS has the same legitimacy as this comment on that. I'm saying they do claim they'll backup whatever is on the computer and unexcluded. and its wrong as a matter of basic provisioning of service to a customer of what was offered. That's the limit of my claim.
Use tools with straightforward, predictable semantics, like rclone, or synching, or restic/Borg. (Deduplication rules, too.)
But generally speaking, I'd agree with your sentiment.
[0]: https://www.backblaze.com/computer-backup/docs/supported-bac...
[1]: https://www.backblaze.com/docs/cloud-storage-about-backblaze...
the one in the contract (and the various EU laws)
that is not a satisfying answer, I know
e.g. in some past projects the customers explicitly did _not_ want year long backups and outright forbid them, redundant storage systems + daily backups kept for ~1-2 weeks (I don't remember) had been pretty close to the legal limit of what we are allowed to have for that project (1)
the point I'm making was never that a good general purpose backup solutions shouldn't have versioning and years of backups
it's that
1. the word backup just doesn't mean much, so you have to be very explicit about what is needed, and sometimes that is the opposite of the "generic best solution"
2. If data is explicitly handled by another backup solution, even if it's a very bad one, it's understandable that the default is not to handle it yourself. (Through only the default, you should always have an overwrite option, be warned if defaults change, etc.).
Insisting a word means something it doesn't in a way where most non-tech people tend to use it in the definition you say isn't right just isn't helpful at all. Telling them that this is a very bad form of backup which they probably shouldn't use is much more likely to be taken serious.
---
(1): Side note: It's because all data we had is backed up else where, by a different solution, and sometimes can be a bit sensitive. So the customers preferred data loss (on our side, not on theirs) over any data being kept longer then needed (and as such there being more data at any point of time if there is some hacker succeeding or similar). And from what I have heard that project is still around working the same way.
But ironically that is similar to the case here, the data is owned/handled by a different system and as such we should not handle the backup.
The sync application itself can handle this using openat(2) or similar and should probably be using that regardless to avoid races.
Avoid arbitrary limits on the length or number of any data structure, including filenames, lines, files, and symbols, by allocating all data structures dynamically.
I assume they're relying on the OOM Killer and quotas to prevent DoSes all over the place.Hadn't even considered your obvious point, a good one!
Force businesses to only sell to qualified buyers and make it incredibly easy for businesses to qualify buyers at the lowest possible cost. The end result in my fantasy world is that a business ends up with some document, some self attestation, that they've educated the customer. The benefit that actually matters much more than the attestation no one should ever see would be the educated customer.
Clearly, this is way too much just for a little storage.
But I also would be curious to know if there's another customer qualification concept like this. (And I just thought of one. Scuba diving! Customer pays to learn, otherwise you're in deep (reputational) doodoo when they drown.)
Of course there are practical limits as you can't make your 100Mb/s connection into a gigabit one (ignoring that you can buy burstable in a datacenter, etc, etc).
Where unlimited falls down is when it refers to a endlessly consumable resource, like storage.
My parents have gotten hit by this. Dad was downloading huge video files at one point on his WiFi and his ISP silently throttled him.
A common term is "data cap": https://en.wikipedia.org/wiki/Data_cap
I can audit and verify Claude's output. Code running at BackBlaze, not so much. Take some responsibility for your data. Rest assured, nobody else will.
Dropbox and onedrive can handle backblaze zipping through and opening many files. The risk is getting too many gigabytes at once, but that shouldn't happen because backblaze should only open enough for immediate upload. If it does happen it's very easily fixed.
If it overloads nextcloud by hitting too many files too fast, that's a legitimate issue but it's not what OP was worried about.
It shouldn't stress things to spend a couple weeks relaying a terabyte in small chunks. The most likely strain is on my upload bandwidth and yeah that's the cost of cloud backup, more ISPs need to improve upload.
Windows has a much harsher approach to file locking than Linux and backup software like BackBlaze absolutely should be making use of it (lest they back up files that are being modified while they back them up), but that also means that the software effectively has to ask the OS each time to lock the file, then release the lock when the software is done with it. With a large amount of files, that does stack up.
Linux file locking is to put it mildly, deficient. Most software doesn't even bother acquiring locks in the first place. Piling further onto that, basically nobody actually uses POSIX locks because the API has some very heavy footguns (most notably, every lock on a file is released whenever any close() for that file is called, even if another component of the same process is also having a second lock open). Most Linux file locks instead work on the honor system; you create a file called filename.lock in the same directory as the file you're working on, and then any software that detects the filename.lock file exists should stop reading the file.
Nobody using file locks is probably the bigger reason why Linux chokes less on fast iteration than Windows, given that Windows is slow with loads of files even when you aren't running a virus scanner.
This. It's best to do this in an atomic operation, such as a VSS style snapshot that then is consistent and done with no or paused operations on the files. Something like a zip is generally better because it takes less time on the file system than the upload process typically takes.
> not object storage
Happy to email you, if that's better, but is this because of unsustainable competition in the space or the tremendous volatility in consumption that object storage customers bring to the table?
I ask because in this current market, I would imagine investing in storage infrastructure is painful, but then I wonder, you are still in the storage infrastructure space anyways, so it likely has to do with the user behavior or user expectations or both.
That sort of horrible abuse only happens in areas where some provider has strict monopoly, but that’s an aberration and with Starlink’s availability there’s an upper bound nowadays.
What time I do have, I've been using to try and figure out photo libraries. Nothing is working the way I need it to. The providers are a mess of security restrictions and buggy software.
Sometimes modification time of a file which is not downloaded on computer A, but modified by computer B is not reflected immediately to computer A.
Henceforth, backup software running on computer A will think that the file has not been modified. This is a known problem in file synchronization. Also, some applications modifying the files revert or protect the mtime of the file for reasons. They are rare, but they're there.
Since the introduction of flock on Linux, how bad is it really though? I don't see why one would need kludges like filename.lock. Though of course flock is still an "honor system" as you put it.
We want to live in a world of UNIX filesystems and we want those to be available in the modern "cloud" ecosystem.
Reason - to not overcomplicate or give appearance of nickel-and-diming
Point taken, although I still think it's better for cloud storage services to err on the side of compatibility, i.e. what's the lowest common denominator between Linux, macOS, Android, iOS from 10 years ago and Windows 7?
Wow, I knew that was generally true, didn't know it was true for internet access in the US too, how backwards...
> A common term is "data cap": https://en.wikipedia.org/wiki/Data_cap
I think most are familiar with throttling because most (all?) phone plans have some data cap at one point, but I don't think I've heard of any broadband connections here with data caps, that wouldn't make any sense.
What i want is restores. The ability to restore anything from ideally any point back in time.
How that is achieved is not my concern.
Obviously Backblaze does not achieve that, today.
You're forgetting the third option:
You can remain blissfully unaware of it.
> more ISPs need to improve upload.
I was yelling the same things to the void for the longest time, then I had a brilliant idea of reading the technical specs of the technology coming to my home.
Lo and behold, the numbers I got were the technical limits of the technology that I had at home (PON for the time being), and going higher would need a very large and expensive rewiring with new hardware and technology.
I've seen it with my new fiber rollout - every single customer no matter their purchased speed had 1Gb up and down - as more customers came online and usage became higher, they're not limiting anyone, but you get closer to your advertised rate - but my upload is still faster than my download because most of my neighborhood is downloading, few are uploading.
You're dodging the question. Wanting to ignore the side effects does not mean they won't affect you.
And you can read many accounts of the outcome of that strategy in this very thread.
My parents have 5G wireless home as their primary connection, and that was only introduced in their area a couple of years ago. Before that, they could get dial-up, 512 kbps wireless with about a $1000 startup cost, ISDN (although the phone company really didn’t want to sell it to them), Starlink, or HughesNet. The folks across the asphalt road from them had 20 Mbps Ethernet over power lines years ago, and that’s now I think 250 Mbps. It’s a different power company, though, so they aren’t eligible.
Around 80% of the US population lives in large urban areas. The other 20% of the population range from smaller towns to living many kilometers from any town at all. There’s a lot of land in the US.
In the last panel, Charlie Brown tells him, "You have to move your feet, too."
Keeping recent files will work fine with a program that goes through them as fast as it can upload (which is not super fast).
> the technical limits of the technology that I had at home (PON for the time being)
Isn't that usually symmetrical? Is yours not?
> When you open an online-only file from the Dropbox folder on your computer, it will automatically download and become available offline. This means you’ll need to have enough hard drive space for the file to download before you can open it. You can change it back to online-only by following the instructions below.
https://help.dropbox.com/sync/make-files-online-only
Same exact behavior for OneDrive, though it apparently does have a Windows integration to eventually migrate unused files back to online-only if enabled.
> When you open an online-only file, it downloads to your device and becomes a locally available file. You can open a locally available file anytime, even without Internet access. If you need more space, you can change the file back to online only. Just right-click the file and select "Free up space."
https://support.microsoft.com/en-us/office/save-disk-space-w...
I'm pretty sure one landlord was cut in by his ISP, as he skipped town when I tried to ask about getting fiber, and his office locked their door and drew their shades when I went there with a technician on two occasions. The final time, we got there before they opened and the woman ran into the office and slammed the door on us.
https://featureassets.gocomics.com/assets/5ee4a050f894013014...
Depends on your device capacity and how much is in actual use. Wear leveling things also wear things while it moves things around.
> For something you'll need to do once or twice ever?
I don't know you, but my cloud storage is living, and even if it's not living, if the software can't smartly ignore files, it'll pull everything in, compare and pass without uploading, causing churns in every backup cycle.
> Isn't that usually symmetrical? Is yours not?
GPON (Gigabit PON) is asymmetric. Theoretical limits is 2.4Gbps down, 1.2Gbps up. I have 1000Mbit/75Mbit at home.
Of course I'm not modifying 4TB on a cloud drive, every day.
Would there be any engineering/management pushback on the customer side? "we have to write a tiny script", "this is non-standard" / "why are you the only ones who charge us for filenames?"
(have limited knowledge here)
But you're probably changing less than 1% each day. And new changes are likely already in the cache, no need to download them.
> if the software can't smartly ignore files, it'll
Backblaze checks the modification date.
> GPON (Gigabit PON) is asymmetric. Theoretical limits is 2.4Gbps down, 1.2Gbps up. I have 1000Mbit/75Mbit at home.
2:1 is fine. If you're getting worse than 10:1 then that does sound like your ISP failed you?