That company had the power to destroy our businesses, cripple travel and medicine and our courts, and delay daily work that could include some timely and critical tasks.
Unless you have the ability and capacity to develop your own ISA/CPU architecture, firmware, OS, and every tool you use from the ground up, you will always be, at some point, “relying on others stuff” which can break on you at a moments notice.
That could be Intel, or Microsoft, or OpenSSH, or CrowdStrike^0. Very, very, very few organizations can exist in the modern computing world without relying on others code/hardware (with the main two that could that come to mind outside smaller embedded systems being IBM and Apple).
I do wish that consumers had held Microsoft more to account over the last few decades to properly use the Intel Protection Rings (if the CrowdStrike driver were able to run in Ring 1, then it’s possible the OS could have isolated it and prevented a BSOD, but instead it runs in Ring 0 with the kernel and has access to damage anything and everything) — but that horse appears to be long out of the gate (enough so that X86S proposes only having Ring 0 and Ring 3 for future processors).
But back to my basic thesis: saying “it’s your fault for relying on other peoples code” is unhelpful and overly reductive, as in the modern day it’s virtually impossible to do so. Even fully auditing your stacks is prohibitive. There is a good argument to be made about not living in a compute monoculture^1; and lots of good arguments against ever using Windows^2 (especially in the cloud) — but those aren’t the arguments you’re making. Saying “this is your fault for relying on other peoples stuff” is unhelpful — and I somehow doubt you designed your own ISA, CPU architecture, firmware, OS, network stack, and application code to post your comment.
——-
^0 — Indeed, all four of these organizations/projects have let us down like this; Intel with Spectre/Meltdown, Microsoft with the 28 day 32-bit Windows reboot bug, and OpenSSH just announced regreSSHion.
^1 — My organization was hit by the Falcon Sensor outage — our app tier layers running on Linux and developer machines running on macOS were unaffected, but our DBMS is still a legacy MS SQL box, so the outage hammered our stack pretty badly. We’ve fortunately been well funded to remove our dependency on MS SQL (and Windows in general), but that’s a multi-year effort that won’t pay off for some time yet.
^2 — my Windows hate is well documented elsewhere.
…until the CrowdStrike agent updated, and you wind up dead in the water again.
The whole point of CrowdStrike is to be able to detect and prevent security vulnerabilities, including zero-days. As such, they can release updates multiple times per day. Rebooting in a known-safe state is great, but unless you follow that up with disabling the agent from redownloading the sensor configuration update again, you’re just going to wing up in a BSOD loop.
A better architectural solution like would have been to have Windows drivers run in Ring 1, giving the kernel the ability to isolate those that are misbehaving. But that risks a small decrease in performance, and Microsoft didn’t want that, so we’re stuck with a Ring 0/Ring 3 only architecture in Windows that can cause issues like this.