Double Fault

Patrick

Sysnative Staff
Joined
Jun 7, 2012
Posts
4,618
I was recently sent a pretty neat kernel-dump by my good friend Jared. I've always wanted to go into double faults, so let's get started! Thanks, Jared : )

Code:
UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault).
Arguments:
Arg1: [COLOR=#ff0000]0000000000000008[/COLOR], EXCEPTION_DOUBLE_FAULT
Arg2: 0000000080050033
Arg3: 00000000000406f8
Arg4: fffff800032aa875

In our case, the 1st argument was 8, therefore this indicates a double fault occurred. So, what is a double fault, and when/why does one occur?

Double faults occur when an exception cannot be handled by the handler, or when an exception occurs when the CPU is already trying to call an exception handler for a previously thrown exception. In most cases, two exceptions that were thrown at the exact same time are handled separately, however in some cases, you may have a situation occur in which a pagefault occurs, but the exception handler is located in a not-present page, two page faults would occur and neither of them can be handled. This is known as a double fault! Also, double faults can occur (like in this scenario) when the processor cannot properly service an interrupt that is pending.

Code:
4: kd> k
Child-SP          RetAddr           Call Site
fffff880`009b9de8 fffff800`0328b169 [COLOR=#ff0000]nt!KeBugCheckEx[/COLOR]
fffff880`009b9df0 fffff800`03289632 [COLOR=#0000cd]nt!KiBugCheckDispatch+0x69[/COLOR]
fffff880`009b9f30 fffff800`032aa875 [COLOR=#0000cd]nt!KiDoubleFaultAbort+0xb2[/COLOR] <- Uh oh, double fault!
fffff880`03dccfd0 fffff800`032909ba [COLOR=#4b0082]nt!KiIpiSendRequest+0x305[/COLOR] <- Processor #4 sent an inter-processor interrupt to interrupt another processor saying "Hey, we need to flush the TLB."
fffff880`03dcd090 fffff800`032ec198 [COLOR=#0000cd]nt!KeFlushMultipleRangeTb+0x22a[/COLOR] <- Flushing translation lookaside buffer, this is a multiprocessor job.
fffff880`03dcd160 fffff800`033935ea [COLOR=#ff8c00]nt! ?? ::FNODOBFM::`string'+0x204ce[/COLOR]
fffff880`03dcd350 fffff800`03394be7 [COLOR=#006400]nt!MiEmptyWorkingSet+0x24a[/COLOR] <- Removing as many pages as possible from the working set.
fffff880`03dcd400 fffff800`0372f371 [COLOR=#0000cd]nt!MiTrimAllSystemPagableMemory+0x218[/COLOR] <- Unmapping all pageable system memory.
fffff880`03dcd460 fffff800`0372f4cf [COLOR=#4b0082]nt!MmVerifierTrimMemory+0xf1[/COLOR]
fffff880`03dcd490 fffff800`0372fc24 [COLOR=#4b0082]nt!ViKeRaiseIrqlSanityChecks+0xcf[/COLOR]  <- A sanity check is essentially verifier saying "Okay, what IRQL are we on and are we supposed to be here?"
fffff880`03dcd4d0 fffff880`018443f5 [COLOR=#4b0082]nt!VerifierKeAcquireSpinLockRaiseToDpc+0x54[/COLOR] <- IRST resetting IRQL to DISPATCH (2) and then acquiring a lock.
fffff880`03dcd530 fffff880`018222a2 [COLOR=#ff0000]iaStor+0x253f5[/COLOR] <- Intel Rapid Storage Technology
fffff880`03dcd560 fffff880`01871489 [COLOR=#ff0000]iaStor+0x32a2[/COLOR] <- Intel Rapid Storage Technology

Code:
4: kd> ub [COLOR=#4b0082]nt!KiIpiSendRequest+0x305[/COLOR]
nt!KiIpiSendRequest+0x2eb:
fffff800`032aa85b 5e              pop     rsi
fffff800`032aa85c 5d              pop     rbp
fffff800`032aa85d c3              ret
fffff800`032aa85e 8bc6            mov     eax,esi
fffff800`032aa860 e9e2feffff      jmp     [COLOR=#ff0000]nt!KiIpiSendRequest+0x1d7 (fffff800`032aa747)[/COLOR]
fffff800`032aa865 0fb70db4892100  movzx   ecx,word ptr [[COLOR=#0000cd]nt!KeActiveProcessors (fffff800`034c3220)[/COLOR]]
fffff800`032aa86c 0fb705af892100  movzx   eax,word ptr [[COLOR=#0000cd]nt!KeActiveProcessors+0x2 (fffff800`034c3222)[/COLOR]]
fffff800`032aa873 8bfa            mov     edi,edx

By unassmembling nt!KiIpiSendRequest+0x305 backwards, it looks like there's a check for active processors, and then the attempt to send the IPI.

Code:
4: kd> !ipi
IPI State for Processor 0
    TargetCount          0  PacketBarrier        0  IpiFrozen     2 [COLOR=#ff0000][Frozen][/COLOR]


IPI State for Processor 1
    TargetCount          0  PacketBarrier        0  IpiFrozen     2 [COLOR=#ff0000][Frozen][/COLOR]


IPI State for Processor 2
    TargetCount          0  PacketBarrier        0  IpiFrozen     2 [COLOR=#ff0000][Frozen][/COLOR]


IPI State for Processor 3
    TargetCount          0  PacketBarrier        0  IpiFrozen     2 [COLOR=#ff0000][Frozen][/COLOR]


IPI State for Processor 4
    TargetCount          0  PacketBarrier        0  IpiFrozen     0 [COLOR=#0000cd][Running][/COLOR]


IPI State for Processor 5
    TargetCount          0  PacketBarrier        0  IpiFrozen     2 [COLOR=#ff0000][Frozen][/COLOR]


IPI State for Processor 6
    TargetCount          0  PacketBarrier        0  IpiFrozen     2 [COLOR=#ff0000][Frozen][/COLOR]


IPI State for Processor 7
    TargetCount          0  PacketBarrier        0  IpiFrozen     2 [COLOR=#ff0000][Frozen][/COLOR]

By running !ipi we can check the inter-processor interrupt state for every processor on the box. We can see here that every single processor (except #4) is in a frozen state (idle), therefore obviously our IPI is never going to be serviced, will remain pending, and we're going to double fault.

Code:
4: kd> lmvm iastor
start             end                 module name
fffff880`0181f000 fffff880`01bc3000   iaStor     (no symbols)           
    Loaded symbol image file: iaStor.sys
    Image path: \SystemRoot\system32\DRIVERS\iaStor.sys
    Image name: iaStor.sys
    Timestamp:        Wed Feb 01 19:15:24 [COLOR=#ff0000]2012[/COLOR]

The IRST driver is dated from early 2012, which is likely the problem since it is a notoriously problematic driver, and it gets worse as it gets older. The newer update would likely solve it, but honestly, I always usually recommend a user safely removes and replaces this driver with the standard MSFT driver if they aren't running a RAID setup. Kaspersky was also present on this system, and antivirus suites don't tend to play nice with this software either.

This post also shows how helpful Driver Verifier is, and how without it in this specific scenario, we likely would have had no idea what was causing this, and may interpret it as a hardware problem.

Thanks for reading!
 
Just to add, there is also Triple Faults which is when an exception occurs whilst a Double Fault is being handled by the exception handler, Triple Faults result in a CPU reset and a reboot of the entire computer.
 
Yep!

Triple Faults are mainly caused by buffer overflows (or underflows) in 3rd party drivers which lead to writing over the Interrupt Descriptor Table (IDT). The Triple Fault itself actually occurs when the next interrupt fires and the CPU cannot call the interrupt handler or the double fault handler because the IDT descriptors are now corrupted.

Do you know if the shutdown cycle occurs on x64 as well? I know it definitely does (the CPU reset) on x86.
 
Good post, I was thinking of doing something similar when I had the time, I might write it on my blog.

One question, what exactly is the nt! ?? ::FNODOBFM::`string'+0x204ce function?
I'm guessing it's a user mode function but I can't say for sure.
I've seen those strings on callstacks quite a lot, I remember reading what it was but I can't remember.

AFAIK it does still initiate a shutdown cycle on x64 systems.
 
One question, what exactly is the nt! ?? ::FNODOBFM::`string'+0x204ce function?

Wow, I cannot believe I actually found my post from nearly five months ago where I talked about this.

nt! ?? ::FNODOBFM::`string' - TO MY KNOWLEDGE, the debugger (WingDbg) is slightly confused about symbol names in NTDLL due to the binary being reorganized into function chunks. The functions are no longer contiguous in memory. Hot code paths are clustered together with hot code paths of other functions. “Cold” code paths are moved elsewhere. That way you save on paging I/O by maximizing the amount of relative data on each code page.

Essentially, to my understanding, when a sequence of code is compiled, it will occupy a single contiguous chunk of memory. With this said however, the optimizer can spread the executable code all over the place, replacing the inline code with a jump to some other memory location.

When this happens, to my knowledge, FunctionName+Offset no longer equals FunctionAddress+Offset, therefore the output of information in the debugger isn't correct. In these specific cases, the code is moved to a location (which is random, to my knowledge) and the closest symbolic name is a string in the image. When this happens, the debugger (WinDbg) uses the string as a best guess for the return address on the stack.
 
Great thread! :thumbsup2:

One question, what exactly is the nt! ?? ::FNODOBFM::`string'+0x204ce function?
I was told by a crash expert from Microsoft years ago that it was a CPU instruction.



A few years ago under Vista and early Windows 7, almost always when a BSOD had the double-fault bugcheck 0x7f (0x8,,,), the first thing we looked for was to see if COMODO was installed. All it took was a quick check for inspect.sys.

No idea why, but COMODO was responsible for the majority of double-fault BSODs that we saw during that time period.
 
Interesting, I wonder if there's a way we could find out what the instruction is... :huh:
 
Probably with internal Microsoft symbols. If it's a function, it looks like it does to us because we don't have the private symbols to resolve the function name.
 
Back
Top