We have seen social media frenzy this morning following a triple whammy of issues impacting Azure Virtual Machines (running Windows 10 and Server 2016) and Windows devices across hundreds of organisations where devices are rebooting to the Windows Recovery Screen issue on Windows 10 devices and Server running older versions.
19/7/24 11:00am: The impacts of the issue are still on-going although the root cause is known and CrowdStrike and working with Microsoft on getting a patch out…
19/7/24: 15:00: CrowdStrike have updated their sites to take accountability of the issue (Microsoft still helping) that has impacted devices due to a “bug” in their software update which caused the BSOD. They have pulled and fixed the update and are working with their customers to remediate the impact. Microsoft have also offered guidance on what can be done to reverse the issue – links to this below.
29/7/2024: 18.00: this is not a Microsoft problem (yet I imagine they will be blamed) but it affected millions of Windows systems… Read to the bottom to see why.
Summary
Since the early hours of the morning, several media companies, airlines, transport companies, tech companies, and schools / universities are reporting a Blue Screen (actually a safety recovery screen) issue Windows 10.
The issue is impacting Windows 10 devices that are using CrowdStrike Falcon agent – their flagship Extended Detect and Response (XDR) Security platform.
Impacted devices are crashing following this Falcon Client update and then getting stuck at the “Recovery” screen due to a critical system driver failure that is preventing the device from starting back up.
CrowdStrike and Microsoft are actively working on this to drive a permanent fix, workarounds are available which require manually preventing this service from starting on affected devices.
The issue is not known to be affecting devices running Windows 11 and Server 2019 and beyond.
What is CrowdStrike?
CrowdStrike, a cybersecurity firm based in the US, assists organisations in securing their IT environments, which encompasses all internet-connected resources.
Their mission is to “safeguard businesses from data breaches, ransomware, and cyberattacks” and they position themselves as having leading offerings that compete with other vendors including Microsoft themselves, SentinelOne, and Palo Alto Networks. Their client base is extensive and includes legal, banking, finance, travel firms, airlines, educational institutions, and retail customers.
A key offering from CrowdStrike is their Falcon XDR tool, touted on their website for delivering “real-time indicators of attack, hyper-accurate detection, and automated protection” against cybersecurity threats.
Root Cause
Information available from CrowdStrike and Microsoft state that the issue is caused by a “faulty” version of the csagent.sys file which is key system start-up file needed by CrowdStrike’s new sensors update for their Falcon Sensor agent. It is this file that has been responsible for the BSOD errors on Windows 11 and many servers running older Windows Server OS running in private and public data centres such as Microsoft Azure. .
George Kurtz, the CEO of the global cybersecurity firm CrowdStrike, stated that the issues were due to a “defect” in a “content update” for Microsoft Windows devices.
“The issue has been identified, isolated, and a fix has been deployed.” he said as he clarified that the problems did not impact operating systems other than Windows 10 and WIndows Server 2016 and older and also emphasized, “This is not a security incident or cyber-attack.”
Impact
- Windows 10 devices are primarily affected.
- Devices running Windows Server 2016 and older in Azure are also impacted if they run the CrowdStrike Falcon agent.
- Limited/less impact on devices running Windows 11 or Windows 2019 and later.
Note: Windows 10 enters end of support in October 2025.
Is there a fix?
Updated: 21/7/2024: Microsoft have updated their guidance and provided additional support for fixing these issues using managed devices via Intune. This can be found here.
The formal advice if this issue is affecting your organisation is to contact your CrowdStrike Support representative – CrowdStrike and Microsoft are actively working to address the issue both as a response to the issue and preventative to ensure more devices are not impacted.
Since the issue is known to be caused by the csagent.sys file, there are ways to manually prevent this file being loaded, allowing the device to load. There are a couple of ways to do this.
- Use Safe Mode and delete the affected file:
- Boot the device to Safe Mode
- Open Command Prompt and navigate to the CrowdStrike directory which should be C:\Windows\System32\drivers\CrowdStrike
- Locate and delete the file matching the pattern C-00000291.sys* – you can do this using the by using a wildcard dir C-00000291*.sys.
- Remove or rename the file.
- Use Registry Editor to block the CrowdStrike CSAgent service:
- Boot to Safe Mode
- Open Windows Registry Editor.
- Navigate to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\CSAgent
- Change the Start value to 4 to disable the service.
Dan Card, of BCS, The Chartered Institute for IT and a cyber security expert said: “People should remain calm whilst organisations respond to this global issue. It’s affecting a very wide range of services from banks to stores to air travel.“
He also said that whilst the cause is now known, it is still causing worldwide issues and impacts on consumer services, banking, healthcare and travel and will take some time to remediate.
“Companies should make sure their IT teams are well supported as it will be a difficult and highly stressful weekend for them as they help customers of all kinds. People often forget the people that are running around fixing things.”
Updated: 21/7/2024: Microsoft have updated their guidance and provided additional support for fixing these issues using managed devices via Intune. This can be found here.
Conclusion
CrowdStrike has acknowledged the issue and is investigating the cause. Users can follow the above steps to resolve the recovery screen issues and boot their PCs normally.
Crowdstrike and Microsoft worked tirelessly to resolve this issue and prevent further widespread impact.
“The issue has been identified, isolated, and a fix has been deployed.” he said as he clarified that the problems did not impact operating systems other than Windows 10 and WIndows Server 2016 and older and also emphasized, “This is not a security incident or cyber-attack.”
Devices running Microsoft’s latest Operating Systems seem to be less impacted (though information still being collated).
How did Microsoft allow this to this happen?
How did this happen? Many people are asking why Microsoft are shifting blame to Crowdstrike (who have admitted fault) asking why and how did Microsoft allow this?
In short, it’s not their fault and there really wasn’t anything they could have done to prevent it…. Here’s why..
Many Security products such as XDR products made by Crowdstrike, Palo Alto, and even Microsoft’s own XDR product defender, are what is known as “kernel mode products” . Whilst this issue affected Windows the same “hiccup error with the update” could have equally of affected other OS such as MacOS and Linux since they are kernal extensions.. This means is they had made the same mistake on the updates for these OS’s the same product mess up would have occurred.
In an ideal world all applications and services would run in user mode rather than Kernel Mode, but with many security and AV products, these have a need (a legitimately one) to monitor at the lowest levels of the OS in order to detect attacks… This is not possible if running in user mode as the kernel is protected.
The Blue Recovery Screen (which was mistaken by most as the Blue Screen of Death (BSoD) which it actually was not is actually the Windows OS safety net.
As such, there is not much more Microsoft can do here. These are third party applications not managed or developed or controlled/updated by Microsoft. If Microsoft were to manually vet every update and change to an application, Microsoft would be classed as control hogs and the world will crucify them for it!
Microsoft cannot legally wall off its operating system in the same way Apple does because of an understanding it reached with the European Commission following a complaint. In 2009, Microsoft agreed it would give makers of security software the same level of access to Windows that Microsoft gets.
The outage is awful and has impacted so many organisation including crutiic services, but it’s also not fair IMO that Microsoft and Windows have been dragged through the dirt simply because it’s their OS that was impacted by the poor updates and issues another third party application caused.
It’s not the first time this had happened…to other OS’s
According report by Neowin, ” similar problems have been occurring for months without much awareness, despite the fact that many may view this as an isolated incident. Users of Debian and Rocky Linux also experienced significant disruptions as a result of CrowdStrike updates, raising serious concerns about the company”s software update and testing procedures. These occurrences highlight potential risks for customers who rely on their products daily.
In April, a CrowdStrike update caused all Debian Linux servers in a civic tech lab to crash simultaneously and refuse to boot. The update proved incompatible with the latest stable version of Debian, despite the specific Linux configuration being supposedly supported. The lab”s IT team discovered that removing CrowdStrike allowed the machines to boot and reported the incident. “
What this shows it the vital importance on update testing and deployment rings.