0x000000D1 Debugging - NotMyFault exploration (x64)

Patrick · Jul 5, 2014

I've discussed some 0xD1 debugging here, but I figured I'd also go into a different 0xD1 scenario here, and just show it from different angles by using NotMyFault to force a bug check.

Download NotMyfault here.

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)

This indicates that a kernel-mode driver attempted to access pageable memory at a process IRQL that was too high.

We're all familiar with this bug check, so let's move on to what I wanted to talk about.

Let's go ahead and do an !analyze -v

Code:

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: [COLOR=#ff0000]fffff8a0066eb800[/COLOR], memory referenced
Arg2: 000000000000000[COLOR=#0000ff]2[/COLOR], IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffff88002af7385, address which referenced memory

fffff8a0066eb800 was the memory that was referenced. It's either invalid or it was at an IRQL that was too high.

Code:

kd> !pte fffff8a0066eb800
                                           VA fffff8a0066eb800
PXE at FFFFF6FB7DBEDF88    PPE at FFFFF6FB7DBF1400    PDE at FFFFF6FB7E280198    PTE at FFFFF6FC50033758
contains 000000007AC84863  contains 000000000367B863  contains 000000006B4C6863  contains 00003B5000000000
pfn 7ac84     ---DA--KWEV  pfn 367b      ---DA--KWEV  pfn 6b4c6     ---DA--KWEV  [COLOR=#ff0000]not valid[/COLOR]
                                                                                  [COLOR=#4b0082]PageFile:  0[/COLOR]
                                                                                  Offset: 3b50
                                                                                  Protect: 0

Using our handy !pte command which shows page table and directory entry for an address, we can see that it is not a valid address despite appearing to be one based on a first glance. Why is it not valid? As we can see above, and as I highlighted in purple, it's because this address is currently on the pagefile.

Why can't we just page it in? As we know, this is not how the Windows memory manager works regarding kernel-mode and its rules. If we're at IRQL (2) or higher (which we are, see argument 2), we cannot page anything in, therefore we bug check.

Great, so we know why the system crashed. However, what caused it?

Let's go ahead and dump the stack:

Code:

kd> k
Child-SP          RetAddr           Call Site
fffff880`032f4448 fffff800`02a912a9 nt!KeBugCheckEx
fffff880`032f4450 fffff800`02a8ff20 nt!KiBugCheckDispatch+0x69
fffff880`032f4590 fffff880`02af7385 [COLOR=#4b0082]nt!KiPageFault+0x260[/COLOR] [COLOR=#008000]<-- Calling into a pagefault.[/COLOR]
fffff880`032f4720 fffff880`02af7727 [COLOR=#ff0000]myfault+0x1385[/COLOR] [COLOR=#008000]<-- Same as before.[/COLOR]
fffff880`032f4870 fffff800`02dac127 [COLOR=#ff0000]myfault+0x1727[/COLOR] [COLOR=#008000]<-- Ending up in myfault.[/COLOR]
fffff880`032f48d0 fffff800`02dac986 nt!IopXxxControlFile+0x607 [COLOR=#008000]<--- Same as before.[/COLOR]
fffff880`032f4a00 fffff800`02a90f93 nt!NtDeviceIoControlFile+0x56 [COLOR=#008000]<--- Going through this function in kernel-mode.[/COLOR]
fffff880`032f4a70 00000000`76df138a nt!KiSystemServiceCopyEnd+0x13 [COLOR=#008000]<--- Calling down into kernel-mode.[/COLOR]
00000000`0023edc8 00000000`00000000 [COLOR=#0000ff]0x[/COLOR][COLOR=#ff0000]7[/COLOR][COLOR=#0000ff]6df138a[/COLOR] [COLOR=#008000]<-- Something in user-mode.[/COLOR]

We start out with something in user-mode that we don't have the symbols for, and this is why it's 0x76df138a as opposed to a resolved name that we can understand. Why did I make the 7 in the address red, and how did I know we started out with something going on in user-mode? Good question! When the first digit of an address like that is 7 or lower, it's a user-mode address.

This is also due to the fact that this is a kernel-dump, which we can see towards the top of our crash dump within WinDbg:

Code:

Kernel Summary Dump File: [COLOR=#ff0000]Only kernel address space is available[/COLOR]

With that said, we cannot see what the application was doing outside of when it went down into kernel-mode.

So we know that some application (0x76df138a) did something, and called down into kernel-mode. Everything above 0x76df138a is now kernel-mode. On x64, you can tell because the addresses start with fffff880`032f4a00 under Child-SP which implies kernel-mode.

We can see it goes through a few functions, and then ends up in myfault. Shortly afterwards, we hit a pagefault (trying to page in memory from the pagefile -- big no no).

If we take a look at the trap frame:

Code:

kd> .trap 0xfffff880032f4590
[COLOR=#ff0000]NOTE: The trap frame does not contain all registers.[/COLOR]
[COLOR=#ff0000]Some register values may be zeroed or incorrect.[/COLOR]
rax=0000000005000000 [COLOR=#4b0082]rbx=0000000000000000[/COLOR] rcx=0000000000002481
rdx=fffffa8001810000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff88002af7385 rsp=fffff880032f4720 rbp=fffff880032f4b60
 r8=0000000000012408  r9=0000000000000810 r10=fffff80002a12000
r11=0000000000000002 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz na po nc
myfault+0x1385:
fffff880`02af7385 8b03            mov     eax,dword ptr [rbx] ds:00000000`00000000=????????

The first very important thing to note is the note about the trap frame not containing all registers, and how they may be either zeroed out or incorrect. The big question is why? Well, trap frame code generation on x64 versions of Windows does not save the contents of registers that are non-volatile.

With that said, registers such as rbx, rdi, rsi, etc, are either zeroed out or incorrect. This is due to the fact that on x64, any code that runs after the generation of a trap frame will properly hand it and restore it to its own frame. It's seen as an unnecessary step in a hot path within the kernel.

Extremely detailed article with much more info here.

Moving on, what happened with the instruction we failed on, we were setting the eax register to the value stored in/at address rbx:

Code:

[COLOR=red]mov[/COLOR]     eax,dword [COLOR=blue]ptr[/COLOR] [[COLOR=purple]rbx[/COLOR]]

Uh oh, rbx is zeroed out. With that said, we can't !pte the register address to double check it, etc. We just need to assume that this all occurred because of myfault attempted to access memory that was either paged out or invalid (which it did).

If you wanted any extra proof or to see if NotMyFault was the crash, you could dump all of the processes at the time of the crash to see if there was any correlation. In this case, you'd use !process 0 0. Flags are important in this case, and you can as always check the WinDbg help file for info, or use MSDN.

Code:

PROCESS fffffa80040a7060
    SessionId: 1  Cid: 0654    Peb: 7fffffd4000  ParentCid: 0708
    DirBase: 670ea000  ObjectTable: fffff8a00666c330  HandleCount:  68.
    Image: [COLOR=#ff0000]NotMyfault.exe[/COLOR]

We can see we did indeed have a NotMyFault process running at the time of the crash, so we can at this point assume that this is very likely the accurate cause of the crash.

Hope you enjoyed reading!

Gator · Mar 26, 2015

I hope this is ok to comment here. What if I use !pte 'address' and I get the error message "Unable to get PXE 'address'"?

Patrick · Mar 26, 2015

Gator said:
I hope this is ok to comment here. What if I use !pte 'address' and I get the error message "Unable to get PXE 'address'"?

You need a kernel dump or greater, and cannot use small dumps for !pte most of the time.

Gator · Mar 26, 2015

Thanks for the quick response Patrick!

Patrick · Mar 26, 2015

You're welcome.

It's due to the contents of the virtual addresses not being stored in small dumps as small dumps are merely a snapshot of the call stack, kernel context, and other small things.

x BlueRobot · Mar 27, 2015

You're guaranteed a call stack and some context at the point of the crash, another reason why Minidumps contain all little information.

Gator · Mar 27, 2015

So when doing an analysis, when is it pertinent to get a user to try to get a larger dump file? I've been reading a lot of the threads around here (great, great work by the way) and I saw that Watch_Dog BSODs require a bigger more complete dump. Are the situations where its needed few and far between or are there specific Bugchecks where its useless to analyze only the minidump?

Thanks in advance.

blueelvis · Mar 27, 2015

Gator said:
So when doing an analysis, when is it pertinent to get a user to try to get a larger dump file? I've been reading a lot of the threads around here (great, great work by the way) and I saw that Watch_Dog BSODs require a bigger more complete dump. Are the situations where its needed few and far between or are there specific Bugchecks where its useless to analyze only the minidump?

Thanks in advance.

The DPC_WATCHDOG_VIOLATIONS bugchecks should be solved with only minidumps in most of the cases. For a proper analysis of the CLOCK_WATCHDOG_TIMEOUT, you would need a larger dump file since most of the addresses are not stored.

That is not a hard and fast rule because in case the operating system did not have the chance to save the information, you might not get a dump file at all. Once you gain more experience, you would figure out when you require the complete/kernel dump or can solve the problem with only minidump only.

I hope this helps :)

-Pranav

Patrick · Mar 27, 2015

0x133 can be solved with a minidump if you get a 0x0 for the first parameter, and are lucky enough to get a decent call stack containing the guilty driver. If you get a 0x1 first parameter, you need a kernel-dump to dig through unassembled DPC routines, PCRs, etc.. or an ETW trace.

0x101 on the other hand cannot be solved with a minidump, as everything you need to access (different CPUs, etc) is kernel + only.

Gator · Mar 27, 2015

Got it, thanks for the clarification.

x BlueRobot · Mar 30, 2015

In an ideal world, I would always debug with a Kernel Memory Dump, but due to time and size limitations, that isn't always feasible, so most of us would make do with a Minidump unless a Kernel Memory Dump is the only way forward.

Patrick · Mar 31, 2015

Hopefully in my lifetime the actual standard (outside of antivirus vendors and misc. archiving such for malware analysis) will become kernel/complete by reaching multiple computational breakthroughs.

A man can dream.

x BlueRobot · Mar 31, 2015

I think we have already reached the limits of Classical Computation, maybe not in the consumer market, where it will never likely exist, but certainly in academia. I'm still waiting for my Quantum Computer to arrive.

Jared · Apr 1, 2015

As Harry said, it already exists, and is very common in specific areas, but not the common market.
It is very unlikely it will become the standard, simply because it isn't necessary for the standard user.

0x000000D1 Debugging - NotMyFault exploration (x64)

Patrick

Sysnative Staff

Gator

Active member

Patrick

Sysnative Staff

Gator

Active member

Patrick

Sysnative Staff

x BlueRobot

Administrator

Gator

Active member

blueelvis

BSOD Kernel Dump Senior Analyst

Patrick

Sysnative Staff

Gator

Active member

x BlueRobot

Administrator

Patrick

Sysnative Staff

x BlueRobot

Administrator

Jared

Sysnative Staff, BSOD Kernel Dump Expert