Microsoft is reviewing their options and looking to push for significant changes to their Windows security architecture in the after math of the major outage caused by a “faulty” CrowdStrike update last a couple of week back. The impact of the faulty update, is thought to have afftected around 8.5 million Windows devices and services when the faulty update caused Windows devices to reboot and enter their protected recovery mode.
Microsoft acknowledges the inherent ‘tradeoff’ kernel-level cybersecurity solutions pose and confirms the root cause of the global outage.
This has prompted Microsoft to reassess the level of control that third party security vendors have over the deepest parts of their operating system and they are considering limiting kernel- level access for these vendors.
“This incident shows clearly that Windows must prioritize change and innovation in the area of end-to-end resilience“. | John Cable | Microsoft see blog post,
Time to bring control back?
John Cable, Microsoft’s VP of program management for Windows servicing and delivery, discussed passionately their viewpoint in a blog post named “Windows resiliency: Best practices and the path forward.” In this post, he emphasised the need for “end-to-end resilience” and discussed potential changes Microsoft are reviewing that could mean restricting kernel access for third party security vendors such as CrowdStrike.
The CrowdStrike update bug, which resulted in widespread system crashes, has clearly highlighted the risks associated with allowing third-party security apps and services to operate at the kernel level – a new approach is needed.
Privileged access, though advantageous for detecting threats, can result in disastrous failures if mishandled. Microsoft is investigating alternatives that circumvent future kernel access issues, including VBS enclaves and the Azure Attestation service. Employing Zero Trust methodologies, these solutions aim to bolster security without incurring the dangers inherent in kernel-level operations.
Why do Microsoft let third parties access the kernel?
In short, they dont have much choice (see below).
While Microsoft may be looking to further restrict access to its Windows kernel going forward, they have used this event to explain why third-parties antivirus and security vendors to access the “core of Windows” the first place.
The Windows kernel is a deep layer of its operating system. Kernel-level cybersecurity lets developers do more to protect machines, can perform better, and can be harder for threat actors to alter or disable.
When a kernel-level cybersecurity solution loads at the earliest possible time, it gives users (and companies) the most data and context possible when threats arise and also ensures protection can kick in at the earliest stage of the Operating Systems boot up stage rather than waiting for the OS to load and then running as a normal system process.
The EU may prevent changes over anti-trust claims
Whilst this makes common sense to most, after all why shouldn’t Microsoft be able to restrict access to ensure stability of an operating system used by more than a billion users, their push for change is likley to face resistance from both cybersecurity vendors and regulators.
Back in 2006, Microsoft tried to restrict kernel access around the release of Windows Vista, but was met with opposition and a ruling that preventing them doing this, citing anti compete. In contrast, however, Apple successfully managed to lock down their kernel level
access in macOS in 2020. The market for Windows software is of course far larger than Apple’s MacOS and Microsoft is an open platform for developers to build upon so any changes will need to be done in a way that make this possible without preventing developers software doing what they are supposed to do!
Microsoft has attributed part of the CrowdStrike outage to the 2009 European Union antitrust agreement, which mandates that Microsoft must provide kernel-level access to third-party software vendors. Conversely, Apple started to phase out kernel extensions in macOS in 2020, encouraging software vendors to adopt the “system extension framework” due to its reliability and security advantages.
It is not the first and wont be the last time either that the EU have played the anti-trust card. Microsoft has recently had to decouple Teams from Microsoft 365 as a response to competitors such as Zoom citing Mcirosoft have an unfair advantage. They have had recent claims against them with Internet Explorer and Edge.
Zero Trust Kernel Protection mayt be the way forward
The blog post indicates that Microsoft is not proposing a complete shutdown of access to the Windows kernel. Rather, it highlights alternatives like the newly introduced VBS enclaves, which offer an isolated computing environment that doesn’t necessitate kernel mode drivers for tamper resistance.
“These examples use modern Zero Trust approaches and show what can be done to encourage development practices that do not rely on kernel access…We will continue to develop these capabilities, harden our platform, and do even more to improve the resiliency of the Windows ecosystem, working openly and collaboratively with the broad security community vendors”.
John Cable | Microsoft Windows VP
Trade off between “anti-compete” and stability.
Microsoft acknowledges that the tradeoff of kernel-level cybersecurity products is that if it glitches out, it can’t be easily fixed, saying in their blog that. “all code operating at kernel level requires extensive validation because it cannot fail and restart like a normal user application.”
As such companies have to demonstrate strict quality and testing controls over their software. The CrowdStrike issue occurred since this wasn’t a new product but” simply” and software patch by CrowdStrike that… well, went wrong.
Microsoft can’t vet every patch and every update released by their “trusted” ISVs/third parties, especially when it comes to security updates which these security vendors need to roll out requently.
“There is a tradeoff that security vendors must rationalise when it comes to kernel drivers. Since kernel drivers run at the most trusted level of Windows, where containment and recovery capabilities are by nature constrained, security vendors must carefully balance needs like visibility and tamper resistance with the risk of operating within kernel mode.” | Microsoft
What ever happens – businesses still need to have backup and remediation processed in place.
In response to the CrowdStrike incident, Microsoft deployed over 5,000 support engineers to aid affected organizations and provided continuous updates via the Windows release health dashboard. They rapidly developed recovery tools to assist companies in their recovery efforts, while emphasising the significance of business continuity planning, secure data backups, and the adoption of cloud-native strategies for managing Windows devices to bolster resilience against future incidents.
Further whitepapers and guidance will be released in the coming months and I expect this will lead to Microsoft, and their third party vendors releasing more recovery tools and guidance.
Summary
Microsoft “confirmed that CrowdStrike’s analysis that this was a read-out-of-bounds memory safety error in the CrowdStrike developed CSagent.sys driver,” Microsoft explained in their technical analysis of the crash and why the impact was so huge in a technical paper published last week.
Reviewing the security architecture and access to the kernel is definately needed, but their approach and desire to prevent future issues with third party glitches will likley be at the brunt of complaints from third party security vendors and the EU anti-compete regulators.
Apple “seem” to have a much easier ride when it comes to doing what they want – they say “jump” and developers say “how high”. Microsoft repeatedly have to “please” regulators far more – this recent huge global impact, may work in Microsoft’s favour however, to bring some control and governance in the name of system and business stability which I am sure will get the backing of everyone and every organisation impacted.
One thing is for certain -Microsoft wont take this sitting down. They will work hard to continue to protect their OS which is run on billions of devices and used by almost all coporations, education and crititical infrastrucutre. Change will happen!