Massive Windows/CrowdStrike Fail

This is the fault of...

  • Micro$oft, because they write software where you need stuff like this

    Votes: 3 13.0%
  • CrowdStrike, because their code broke the world

    Votes: 11 47.8%
  • Those who chose Windows over Linux, because that was the critical market decision

    Votes: 5 21.7%
  • Giant Death Robots, because they did it

    Votes: 4 17.4%

  • Total voters
    23

Samson

Deity
Joined
Oct 24, 2003
Messages
19,488
Location
Cambridge
CrowdStrike code update bricking Windows machines around the world

The Register has found numerous accounts of Windows 10 PCs crashing, displaying the Blue Screen of Death, then being unable to reboot.

“We're seeing BSOD Org wide that are being caused by csagent.sys, and it's taking down critical services. I'll open a ticket, but this is a big deal,” wrote one user.

Forums report that Crowdstrike has issued an advisory with a URL that includes the text "Tech-Alert-Windows-crashes-related-to-Falcon-Sensor-2024-07-19" – but it's behind a regwall that only customers can access.

An apparent screenshot of that article reads "CrowdStrike is aware of reports of crashes on Windows hosts related to the Falcon Sensor. Symptoms include hosts experiencing a bugcheck\blue screen error related to the Falcon Sensor."

CrowdStrike's engineers are working on the issue.

Falcon Sensor is an agent that CrowdStrike claims "blocks attacks on your systems while capturing and recording activity as it happens to detect threats fast."

Right now, however, the sensor appears to be the threat.

If you are affected by this, especially if it your job to sort out peoples IT problems, you have my sympathy. I urge you to use this as a learning experience about computer security. Open source code is secure code. This is a comment:

There is supposedly a fix that involves booting affected computers in safe mode, and deleting/renaming a Crowdstrike file in System 32. Which is great if all your workstations/servers are remote and the workstations all have bitlocker. And the bitlocker keys are all on a server thats affected....
 
While not confirmed as related, several major US airlines have put their planes on ground stop due to communications issues.

So they actually think there may be two major issues in one day? That is mad.

There are reports that some big organisations have been scammed by these people. It would be funny if it is the Murdock empire that was the worst at IT as a lot of his orgs are mentioned. He is quite anti Billy Gates isn't he?

Banks, airlines and media outlets hit by global outage linked to Windows PCs

Businesses including banks, airlines, telecommunications companies, TV and radio broadcasters, and supermarkets have been taken offline

Australian broadcasters the Australian Broadcasting Corporation and Sky News confirmed they were having broadcast difficulties as a result.

Sky News in the United Kingdom reported being off air on Friday morning.
 
Yeah, given the amount of open source projects that have been hijacked and replaced with malware, I'm not entirely sure this is a truism.
It is not a truism, and there is plenty of bad OS code out there. But considering the amount of the web that is run on a Linux stack compared to Mirco$oft and how frequently these problems affect Linux compared to Mirco$oft I think it is a statistical fact.

8fzgIwX.png

Spoiler Unrelated, but interesting. :
By accident is not on the list, but that seems to be the reason in this case.
bvTFB25.png
 
The Beeb are saying "The cause is not known". I wonder if that is them being cautious, or have El Reg jumped the gun?

I think some of these are pretty embarassing to be relying on this:

Alaska State Troopers report 911 outage

Amsterdam's Schiphol, Stansted airport, Narita airport, Delhi airport, Christchurch International Airport, Sydney Airport all reporting problems. Just think who you are putting your faith in when you fly.
 
It is not a truism, and there is plenty of bad OS code out there. But considering the amount of the web that is run on a Linux stack compared to Mirco$oft and how frequently these problems affect Linux compared to Mirco$oft I think it is a statistical fact.
How familiar are you with problems that affect UNIX stacks? How many critical OpenSSL exploits have needed to be patched? More than you might realise. And that's just a single (albeit pretty critical) dependency.

We had a system brought down last month by a Linux OS upgrade doing something to the provided Perl version that broke our environment variables (pattern matching behaviour changed which meant variables weren't being populated properly).

I could go on, but this is why we have a DevOps team - and stuff still gets through to production and affects our customers.

Does the interconnected nature of SaaS make attack vectors more prevalent? Yes. But that affects the UNIX ecosystem as well (which is why I raised OpenSSL, and I didn't even start on the top-level cert issues in recent years).

And this is before I start on the tension between the dev teams at an institution whose job it is to be on top of these issyes, and management which normally doesn't give two hoots about either of those teams. Or the resource (not) given to them. Unless there's a problem of course :D

There's plenty to criticise about MS, Google, all the big companies whose bottom line motivation is about their bottom line. Money money money. I could vent about that as much as you do!

For example, Google are apparently sunsetting their URL shortener! Dumb decision made to help somebody put numbers on a spreadsheet.

1721376450450.png


Should we therefore be aware that companies can do this? Yes? Absolutely. But at the same time the fact that companies can do this is also staggering. That tangent gets political very fast though, hah.

I think we've had this convo before, but a lot of my work (when it isn't helping out DevOps or security because I'm a bit of an all-purpose developer where I work) involves using open source frameworks and products. They introduce so many issues. We're fundamentally reliant on OpenJDK. What happens if that bites the bullet? Something as bad as this current CrowdStrike outage, or worse.
 
Before I get to the details I will note none of them touch on my point. I said "I think it is a statistical fact". What i meant was that Linux has at least 76.6% (Nginx, Apache and Litespeed) of the web and Micro$oft has 4.5% (I know nothing about what Cloudflair hosts on). How many widespread system failures are the fault of the OSS supply chain and how many the fault of the closed source supply chain? I cannot find good numbers, but from the sample of the wiki examples there are loads of windows ones that caused serious problems, the only obvious Linux one is the XZ Utils backdoor which could have been bad but was caught. I think that shows that MS products are less secure in the real world than OS products.
How familiar are you with problems that affect UNIX stacks? How many critical OpenSSL exploits have needed to be patched? More than you might realise. And that's just a single (albeit pretty critical) dependency.
Years ago I was fairly familiar. I have never had to on a MS box, but I suspect that would be much harder.
Does the interconnected nature of SaaS make attack vectors more prevalent? Yes. But that affects the UNIX ecosystem as well (which is why I raised OpenSSL, and I didn't even start on the top-level cert issues in recent years).
The "cloud computing" over self hosting is quite a different question to the open or closed software question. The whole top level cert thing seems pretty broken, and I do not get why. Once I trust a web host once why should a hack on a third party degrade my security?
For example, Google are apparently sunsetting their URL shortener! Dumb decision made to help somebody put numbers on a spreadsheet.
Yeah, I agree, but equally if you are a proper company and your systems where broken by this you have to ask why was the design decision made in the first place? What was the cost / benefit analysis there?
I think we've had this convo before, but a lot of my work (when it isn't helping out DevOps or security because I'm a bit of an all-purpose developer where I work) involves using open source frameworks and products. They introduce so many issues. We're fundamentally reliant on OpenJDK. What happens if that bites the bullet? Something as bad as this current CrowdStrike outage, or worse.
The whole point of OS is that OpenJDK cannot "bite the bullet" as long as there are some people who want it not to.

I am not sure how that would be like CrowdStrike, no one should be relying on a JDK in a production environment (I think) and no one would upgrade their JVM without testing. If you are doing that sort of thing you may have been affected by the Log4j thing? That was pretty bad, but did not cause any real world problems as far as I am aware.
 
Before I get to the details I will note none of them touch on my point. I said "I think it is a statistical fact". What i meant was that Linux has at least 76.6% (Nginx, Apache and Litespeed) of the web and Micro$oft has 4.5% (I know nothing about what Cloudflair hosts on). How many widespread system failures are the fault of the OSS supply chain and how many the fault of the closed source supply chain? I cannot find good numbers, but from the sample of the wiki examples there are loads of windows ones that caused serious problems, the only obvious Linux one is the XZ Utils backdoor which could have been bad but was caught. I think that shows that MS products are less secure in the real world than OS products.
Yeah that was something I meant to talk more about but completely forgot (was on my phone at the time). The choice of OS isn't the only source of vulnerability in a tech stack. And beyond that, the sheer number of UNIX systems in the cloud doesn't correlate with the criticality of those systems (for a related but different example: COBOL and how many banks and older institutions rely on it despite there being a handful of COBOL programmers left in existence).

In short, I don't think the statistics provide a cohesive view on the cause-and-effect, here.
Yeah, I agree, but equally if you are a proper company and your systems where broken by this you have to ask why was the design decision made in the first place? What was the cost / benefit analysis there?
Your options are: roll your own, or roll someone elses'. When it comes to someone elses', your options boil down to cost (to be fair that's also the main reason to rolling / not rolling your own as well).

Everything comes down to profit. Software giants like Google exist not just because of their competence in (increasingly) specific regards, but also because of political deregulation, simply "being there first", etc, et al. If you're a one-person startup, you're going to use the goo.gl shortener. If you get bought out and your new company elects to not change what isn't broken (which is a common enough adage, right?), then you have a problem waiting to happen. And that's just one choice.

If you need to maintain your Java version, your Node.js version, your Linux version, your OpenSSL version, your Git version . . . and this is before we get onto software-for-the-business (HR, compliance, security, etc). How many things can anyone viably commit to rolling in-house? How many times must we re-invent the wheel, vs. being dependent on an outside dependency (whether that's OSS or SaaS)?
The whole point of OS is that OpenJDK cannot "bite the bullet" as long as there are some people who want it not to.

I am not sure how that would be like CrowdStrike, no one should be relying on a JDK in a production environment (I think) and no one would upgrade their JVM without testing. If you are doing that sort of thing you may have been affected by the Log4j thing? That was pretty bad, but did not cause any real world problems as far as I am aware.
Java is huge these days. You need more than "some" people. You need a well-rounded team with enough time on their hands to devote to the project, or it dies. Something I've seen happen time and time again, from small endeavours to bigger ones.

And yes, it won't be an immediate crash like CrowdStrike. CrowdStrike's influence over the world of IT is immediate, and will be rectified. If the OpenJDK project goes under and nobody can update their Java versions anymore, this will cause a much slower but much harder to fix set of problems. And to compound my point, these are the issues that are often ignored. Until it's too late.

(we avoided the Log4j issue, thankfully)
 
Last edited:
Yeah that was something I meant to talk more about but completely forgot (was on my phone at the time). The choice of OS isn't the only source of vulnerability in a tech stack. And beyond that, the sheer number of UNIX systems in the cloud doesn't correlate with the criticality of those systems (for a related but different example: COBOL and how many banks and older institutions rely on it despite there being a handful of COBOL programmers left in existence).

In short, I don't think the statistics provide a cohesive view on the cause-and-effect, here.
I am not sure how this argument goes. Is it that more of that 4.3% that IIS has is critical, so we notice it more? The difference woudl have to be so big to counter this effect I am not really sure that is credable. I have not done the maths because I do not think the data collection justifies it, but we are talking about something like a risk ratio over 17 (76%/4.3%).
Your options are: roll your own, or roll someone elses'.
We are talking about a link shortener. The obvious answer is not to use one, I think they are a security risk anyway, you should never obfuscate the URI you are asking someone to visit.
Java is huge these days. You need more than "some" people. You need a well-rounded team with enough time on their hands to devote to the project, or it dies. Something I've seen happen time and time again, from small endeavours to bigger ones.

And yes, it won't be an immediate crash like CrowdStrike. CrowdStrike's influence over the world of IT is immediate, and will be rectified. If the OpenJDK project goes under and nobody can update their Java versions anymore, this will cause a much slower but much harder to fix set of problems.
There are many OS tools that are kept alive that have much less incentive than the OpenJDK team. I think that is one of the biggest pluses. If that did not exist there is always the closed source answer of Oracle, right? Is that better?
 
I am not sure how this argument goes. Is it that more of that 4.3% that IIS has is critical, so we notice it more? The difference woudl have to be so big to counter this effect I am not really sure that is credable. I have not done the maths because I do not think the data collection justifies it, but we are talking about something like a risk ratio over 17 (76%/4.3%).
Based on the reportedly-affected institutions, I would argue that yes, there appears to be more of an impact on more critical services (aviation, banking, etc). If the (presumably) UNIX server powering CFC got hit by something, the scope is decidedly minute (in the context of the Internet as a whole). Or even the servers for a games developer or even publisher. An entire games publisher losing its online services for day still doesn't compare to flights being grounded at a single airport, imo.
We are talking about a link shortener. The obvious answer is not to use one, I think they are a security risk anyway, you should never obfuscate the URI you are asking someone to visit.
Oh, I was talking generally, sorry. The URL shortener is one in a long list of things Google have deprecated or abandoned on a whim. I'm not here to defend Google or Microsoft, is part of the point.
There are many OS tools that are kept alive that have much less incentive than the OpenJDK team. I think that is one of the biggest pluses. If that did not exist there is always the closed source answer of Oracle, right? Is that better?
And there are many that die, or are abandoned, or silently go without updates. This is not me arguing against OSS. Or saying we should rely on Oracle. I was one of the people pushing OpenJDK prior to the legal deadline for Oracle compliance internally! :p

I'm just trying to inject what I think is a measured view given our interconnected and at-times hopelessly-dependent-on-IT modern world that we live in. Open source isn't always a viable choice, even when the people making the decisions are doing so with good motivations and with all knowledge around a particular problem.

I think we should use OSS where possible, but we also need to make sure that developers are trained internally on how to maintain and patch or even replace parts of our stack that are OSS (closed source products generally being inaccessible and reliant on 3rd party help, usually at a cost). Not all businesses invest in that kind of thing. Where I work in particular it's intensely dependent on what kind of developer you are, for good and for bad. This is a management problem, but also a company culture problem, and these things can only be fixed from the top-down. Whether we use a Microsoft product or its OSS equivalent doesn't matter much because the culture problem exists regardless. Using SaaS just hides it to some extent (for a cost).

EDIT

But at the same time, there are things that we need to rely on SaaS for. Take the OAuth 2.0 (open) standard, for example. Everything I've encountered as an implementation of that has been a product. Owned by somebody. We have our own internal SAML (which is different to OAuth anyway), but it's purely for test purposes. It's not something our customers can use. Our customers are generally non-technical (education), and therefore they can't always have their own on-premise IdP - they'll buy externally. Everyone does (whether it's Okta, Microsoft Azure, Auth0 - also owned by Okta apparently - or whatever). The best OSS equivalent is Shibboleth, but you need the in-house expertise for that and that's not something universities necessarily have.
 
Last edited:
I'm just trying to inject what I think is a measured view given our interconnected and at-times hopelessly-dependent-on-IT modern world that we live in. Open source isn't always a viable choice, even when the people making the decisions are doing so with good motivations and with all knowledge around a particular problem.
This I guess is the biggest way where we differ. Certainly for an organisation providing services over the internet open source is always a viable choice, as long as it is made soon enough in the development cycle. I could be wrong, but I have not seen a use case where a Micro$oft stack is a better tool than a Linux stack.

A case in point, as I know a little about it is the UK Biobank. This provides programmatic access to private medical data in a secure way. They do that with an Open Source stack and I bet less money than any airport spends on its IT. That is about the hardest problem I have seen solved, and clearly open source software the right tool for the job. If you really believe there is a computational task that Micro$oft stack is a better tool than a Linux stack I would like to see an example.
COBOL and how many banks and older institutions rely on it despite there being a handful of COBOL programmers left in existence).
I shall highlight that this means the banks, who are not short of the money if they wanted to build a proper system, are using both a 1950's programming language (so probably 1960's code) AND Windows in an environment where they are happy to give a third party read and write access to their systems. Does this sound like an organisation that has data security at the heart or their priorities? Remember what this is designed to do, they chose to have this running in the systems that control all our money:

> Falcon Sensor ... blocks attacks on your systems while capturing and recording activity as it happens to detect threats fast
 
Last edited:
This I guess is the biggest way where we differ. Certainly for an organisation providing services over the internet open source is always a viable choice, as long as it is made soon enough in the development cycle. I could be wrong, but I have not seen a use case where a Micro$oft stack is a better tool than a Linux stack.

A case in point, as I know a little about it is the UK Biobank. This provides programmatic access to private medical data in a secure way. They do that with an Open Source stack and I bet less money than any airport spends on its IT. That is about the hardest problem I have seen solved, and clearly open source software the right tool for the job. If you really believe there is a computational task that Micro$oft stack is a better tool than a Linux stack I would like to see an example.
I don't know enough about the operating system fundamentals to claim that in any case Microsoft would be better (or worse). I know certain things have been a bit of a pain with the few Windows cloud machines I've had to deal with over the past 10 - 12 years or so, but generally-speaking these choices are made through a lack of available resource or expertise. Which is simply a matter of fact in the world. I've even dealt with Macs being used as cloud servers (generally not preferable either, even though they're UNIX-based).

I saw a statistic on social media where it seemed like someone was celebrating that Windows XP had a larger userbase than Vista, 7 or 8 (or maybe it was just Vista and 8). This to me isn't a good thing. Windows XP is horrifically ancient and compromised on pretty much every level you can think of. It's not like we're comparing Windows 10 (which is still in active support until late 2025) vs. Windows 11. We're talking about a 23-year-old operating system on functional life support only because companies that can't afford to upgrade their systems from it (ironically) pay Microsoft for support contracts.

I understand the criticisms of the productisation (if that's a word) of stuff like Windows. Being forced to upgrade because the old versions aren't being supported anymore is a pain (but is also a thing in OSS - sometimes new versions bring new features, and old tech has to be decommissioned). It can also be (and at times is) exploited for profit. But the fact is many of these vulnerable systems are stuck with stuff like WinXP, with whatever Microsoft cloud offering over the top for some kind of additional protection. This cloud service is the kind of thing that gets compromised when things like this CloudStrike nonsense happen. It's not Windows XP, that's riddled with holes. It's the thing they have over the top to hide the holes so that companies can keep going without having to modernise however many hundred workstations they have in dire need to upgrading.

Is it ideal - no. I don't think any of it is ideal. In an ideal world everything would be OSS and everybody would be motivated to make the right decisions to empower communal development of open source IT infrastructure. But that's effectively a fantasy, as much as to me it's the ideal.

So I compromise. I personally push for what we can in-house. I settle for OSS if that falls through, and if we absolutely need to pay for something, then we need to pay for something (that last decision is pretty much out of my hands, though). But the business' choice to use O365, the Microsoft apps dashboard, Teams and so on? I have literally zero influence on that. Nobody in my business unit has any influence on that. Nobody at the (child) company I work at does. It's all on the parent company. Who own many companies, and for that - logically - a vendor like Microsoft makes sense. Not all the companies they hold are developer-focused, or even that technical. But there is no OSS solution that will work at scale across all of the various products, offices and business units that operate globally under the parent company's logo. None whatsoever.

EDIT

Remember what this is designed to do, they chose to have this running in the systems that control all our money
And on this, we agree 100%.

But it's also an interesting case in critical systems again - the effort and investment required to overhaul banking systems in a way that doesn't impact customers would be obscene. Which would also impact on the bank's bottom line, and I have no doubt that's the real motivation. But in an ideal world, the impact on the customers would be the thing that keeps sub-optimal solutions in place - even if the entire tech stack was OSS.
 
Last edited:
I've even dealt with Macs being used as cloud servers (generally not preferable either, even though they're UNIX-based).
I think it is twenty years today when I was first responsible for an off site computer, and that was some little Mac running in the docklands. It ran linux fine. That was without SSL, or cookies, or GDPR. Simpler times.
So I compromise. I personally push for what we can in-house. I settle for OSS if that falls through, and if we absolutely need to pay for something, then we need to pay for something (that last decision is pretty much out of my hands, though).
If I was not clear, I am not blaming you but the suits. I know it is rare a techy would have any say in that sort of high level decision.
I saw a statistic on social media where it seemed like someone was celebrating that Windows XP had a larger userbase than Vista, 7 or 8 (or maybe it was just Vista and 8). This to me isn't a good thing. Windows XP is horrifically ancient and compromised on pretty much every level you can think of. It's not like we're comparing Windows 10 (which is still in active support until late 2025) vs. Windows 11. We're talking about a 23-year-old operating system on functional life support only because companies that can't afford to upgrade their systems from it (ironically) pay Microsoft for support contracts.
It is the decision that got them there I am criticising, and those who made that. We have all had to deal with technical debt, and you have to do what you have to do. But do not forget who accrued that debt. Remember when XP came out Linux was 10 years old and I think had dominated the web server market by then.

I shall make my point explicit: You should use this fail as evidence in your consideration as to who to trust with your life and livelihood. I will agree these are difficult questions that sometimes involve choosing the least bad.
 
Last edited:
I shall make my point explicit: You should use this fail as evidence in your consideration as to who to trust with your life and livelihood. I will agree these are difficult questions that sometimes involve choosing the least bad.
On a related tangent, how do we as developers / technical architects solve these problems going forwards. Can we?
 
On a related tangent, how do we as developers / technical architects solve these problems going forwards. Can we?
I do not think we have the power. If we are asked to produce shiny feature packed products as cheaply as possible even the best open source tool stack will not stop people cutting corners, and if some people do everyone kind of needs to, at least in some markets. But no one should choose a closed source product when an open source product does the job. That should start with twitter/mastadon and windows/linux.

I think the only entity that can really change things is the state/voters. I am not exactly sure what is the right thing to do, the Cyber Resilience Act is something, but I do not agree with a lot of it, particularly making it illegal to retire. Making companies tortuously responsible for data leaks would be another, but is a bit radical an I have not thought very much about it.
 
They call anything "AI" now, but there is a good chance this company was training an AI in America on the companies network data, which could include your personal information is someone you deal with is down. This is the Crowdstrike Falcon Platform page:

aaet8o1.png
 
sara Connor is not a giant death robot .
 
Confirmation on fix:

Brody Nisbet, CrowdStrike's chief threat hunter, has confirmed the issue and on X posted the following:

There is a faulty channel file, so not quite an update. There is a workaround...
1. Boot Windows into Safe Mode or WRE.
2. Go to C:\Windows\System32\drivers\CrowdStrike
3. Locate and delete file matching "C-00000291*.sys"
4. Boot normally.

In a later post he wrote "That workaround won't help everyone though and I've no further actionable help to provide at the minute".

 
Is there something Joe Schmo user should be doing, not doing, to prevent this from impacting his personal computer?
 
Back
Top Bottom