I was recently sent a pretty neat kernel-dump by my good friend Jared. I've always wanted to go into double faults, so let's get started! Thanks, Jared : )
In our case, the 1st argument was 8, therefore this indicates a double fault occurred. So, what is a double fault, and when/why does one occur?
Double faults occur when an exception cannot be handled by the handler, or when an exception occurs when the CPU is already trying to call an exception handler for a previously thrown exception. In most cases, two exceptions that were thrown at the exact same time are handled separately, however in some cases, you may have a situation occur in which a pagefault occurs, but the exception handler is located in a not-present page, two page faults would occur and neither of them can be handled. This is known as a double fault! Also, double faults can occur (like in this scenario) when the processor cannot properly service an interrupt that is pending.
By unassmembling nt!KiIpiSendRequest+0x305 backwards, it looks like there's a check for active processors, and then the attempt to send the IPI.
By running !ipi we can check the inter-processor interrupt state for every processor on the box. We can see here that every single processor (except #4) is in a frozen state (idle), therefore obviously our IPI is never going to be serviced, will remain pending, and we're going to double fault.
The IRST driver is dated from early 2012, which is likely the problem since it is a notoriously problematic driver, and it gets worse as it gets older. The newer update would likely solve it, but honestly, I always usually recommend a user safely removes and replaces this driver with the standard MSFT driver if they aren't running a RAID setup. Kaspersky was also present on this system, and antivirus suites don't tend to play nice with this software either.
This post also shows how helpful Driver Verifier is, and how without it in this specific scenario, we likely would have had no idea what was causing this, and may interpret it as a hardware problem.
Thanks for reading!
Code:
UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault).
Arguments:
Arg1: [COLOR=#ff0000]0000000000000008[/COLOR], EXCEPTION_DOUBLE_FAULT
Arg2: 0000000080050033
Arg3: 00000000000406f8
Arg4: fffff800032aa875
In our case, the 1st argument was 8, therefore this indicates a double fault occurred. So, what is a double fault, and when/why does one occur?
Double faults occur when an exception cannot be handled by the handler, or when an exception occurs when the CPU is already trying to call an exception handler for a previously thrown exception. In most cases, two exceptions that were thrown at the exact same time are handled separately, however in some cases, you may have a situation occur in which a pagefault occurs, but the exception handler is located in a not-present page, two page faults would occur and neither of them can be handled. This is known as a double fault! Also, double faults can occur (like in this scenario) when the processor cannot properly service an interrupt that is pending.
Code:
4: kd> k
Child-SP RetAddr Call Site
fffff880`009b9de8 fffff800`0328b169 [COLOR=#ff0000]nt!KeBugCheckEx[/COLOR]
fffff880`009b9df0 fffff800`03289632 [COLOR=#0000cd]nt!KiBugCheckDispatch+0x69[/COLOR]
fffff880`009b9f30 fffff800`032aa875 [COLOR=#0000cd]nt!KiDoubleFaultAbort+0xb2[/COLOR] <- Uh oh, double fault!
fffff880`03dccfd0 fffff800`032909ba [COLOR=#4b0082]nt!KiIpiSendRequest+0x305[/COLOR] <- Processor #4 sent an inter-processor interrupt to interrupt another processor saying "Hey, we need to flush the TLB."
fffff880`03dcd090 fffff800`032ec198 [COLOR=#0000cd]nt!KeFlushMultipleRangeTb+0x22a[/COLOR] <- Flushing translation lookaside buffer, this is a multiprocessor job.
fffff880`03dcd160 fffff800`033935ea [COLOR=#ff8c00]nt! ?? ::FNODOBFM::`string'+0x204ce[/COLOR]
fffff880`03dcd350 fffff800`03394be7 [COLOR=#006400]nt!MiEmptyWorkingSet+0x24a[/COLOR] <- Removing as many pages as possible from the working set.
fffff880`03dcd400 fffff800`0372f371 [COLOR=#0000cd]nt!MiTrimAllSystemPagableMemory+0x218[/COLOR] <- Unmapping all pageable system memory.
fffff880`03dcd460 fffff800`0372f4cf [COLOR=#4b0082]nt!MmVerifierTrimMemory+0xf1[/COLOR]
fffff880`03dcd490 fffff800`0372fc24 [COLOR=#4b0082]nt!ViKeRaiseIrqlSanityChecks+0xcf[/COLOR] <- A sanity check is essentially verifier saying "Okay, what IRQL are we on and are we supposed to be here?"
fffff880`03dcd4d0 fffff880`018443f5 [COLOR=#4b0082]nt!VerifierKeAcquireSpinLockRaiseToDpc+0x54[/COLOR] <- IRST resetting IRQL to DISPATCH (2) and then acquiring a lock.
fffff880`03dcd530 fffff880`018222a2 [COLOR=#ff0000]iaStor+0x253f5[/COLOR] <- Intel Rapid Storage Technology
fffff880`03dcd560 fffff880`01871489 [COLOR=#ff0000]iaStor+0x32a2[/COLOR] <- Intel Rapid Storage Technology
Code:
4: kd> ub [COLOR=#4b0082]nt!KiIpiSendRequest+0x305[/COLOR]
nt!KiIpiSendRequest+0x2eb:
fffff800`032aa85b 5e pop rsi
fffff800`032aa85c 5d pop rbp
fffff800`032aa85d c3 ret
fffff800`032aa85e 8bc6 mov eax,esi
fffff800`032aa860 e9e2feffff jmp [COLOR=#ff0000]nt!KiIpiSendRequest+0x1d7 (fffff800`032aa747)[/COLOR]
fffff800`032aa865 0fb70db4892100 movzx ecx,word ptr [[COLOR=#0000cd]nt!KeActiveProcessors (fffff800`034c3220)[/COLOR]]
fffff800`032aa86c 0fb705af892100 movzx eax,word ptr [[COLOR=#0000cd]nt!KeActiveProcessors+0x2 (fffff800`034c3222)[/COLOR]]
fffff800`032aa873 8bfa mov edi,edx
By unassmembling nt!KiIpiSendRequest+0x305 backwards, it looks like there's a check for active processors, and then the attempt to send the IPI.
Code:
4: kd> !ipi
IPI State for Processor 0
TargetCount 0 PacketBarrier 0 IpiFrozen 2 [COLOR=#ff0000][Frozen][/COLOR]
IPI State for Processor 1
TargetCount 0 PacketBarrier 0 IpiFrozen 2 [COLOR=#ff0000][Frozen][/COLOR]
IPI State for Processor 2
TargetCount 0 PacketBarrier 0 IpiFrozen 2 [COLOR=#ff0000][Frozen][/COLOR]
IPI State for Processor 3
TargetCount 0 PacketBarrier 0 IpiFrozen 2 [COLOR=#ff0000][Frozen][/COLOR]
IPI State for Processor 4
TargetCount 0 PacketBarrier 0 IpiFrozen 0 [COLOR=#0000cd][Running][/COLOR]
IPI State for Processor 5
TargetCount 0 PacketBarrier 0 IpiFrozen 2 [COLOR=#ff0000][Frozen][/COLOR]
IPI State for Processor 6
TargetCount 0 PacketBarrier 0 IpiFrozen 2 [COLOR=#ff0000][Frozen][/COLOR]
IPI State for Processor 7
TargetCount 0 PacketBarrier 0 IpiFrozen 2 [COLOR=#ff0000][Frozen][/COLOR]
By running !ipi we can check the inter-processor interrupt state for every processor on the box. We can see here that every single processor (except #4) is in a frozen state (idle), therefore obviously our IPI is never going to be serviced, will remain pending, and we're going to double fault.
Code:
4: kd> lmvm iastor
start end module name
fffff880`0181f000 fffff880`01bc3000 iaStor (no symbols)
Loaded symbol image file: iaStor.sys
Image path: \SystemRoot\system32\DRIVERS\iaStor.sys
Image name: iaStor.sys
Timestamp: Wed Feb 01 19:15:24 [COLOR=#ff0000]2012[/COLOR]
The IRST driver is dated from early 2012, which is likely the problem since it is a notoriously problematic driver, and it gets worse as it gets older. The newer update would likely solve it, but honestly, I always usually recommend a user safely removes and replaces this driver with the standard MSFT driver if they aren't running a RAID setup. Kaspersky was also present on this system, and antivirus suites don't tend to play nice with this software either.
This post also shows how helpful Driver Verifier is, and how without it in this specific scenario, we likely would have had no idea what was causing this, and may interpret it as a hardware problem.
Thanks for reading!