The Complete Debugging Guide to Stop 0x50

x BlueRobot

Administrator
Staff member
Joined
May 7, 2013
Posts
10,400
A Stop 0x50 is one of the most common bugchecks you'll encounter, and you'll usually be able to use same techniques learned here, to understand the most common cause of a Stop 0x3B which is typically an invalid page fault caused by a null pointer. Before we begin, I'll assume you have a general understanding of address translation and the purpose of a page fault. Otherwise, I would recommend reading the following article.

There are three common causes of a Stop 0x50: a large page was referenced, a null pointer was being used or a valid but non-paged pool address was being used. We'll explore and explain these concepts in greater detail later in this tutorial. We'll also explore why x64 trap frames can't be relied upon when debugging a Stop 0x50.

Rich (BB code):
PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced.  This cannot be protected by try-except.
Typically the address is just plain bad or it is pointing at freed memory.
Arguments:
Arg1: ffffcc0150939000, memory referenced.
Arg2: 0000000000000002, value 0 = read operation, 1 = write operation.
Arg3: fffff80421d7fc6f, If non-zero, the instruction address which referenced the bad memory
    address.
Arg4: 0000000000000000, (reserved)

Let's examine the bugcheck description as it contains a couple of clues. We know that an invalid memory address was referenced, which is possibly due to reasons as mentioned in the description: we've referenced a corrupt memory address or we've referenced a dangling pointer i.e. one which points to a freed memory address. It should be noted too, that a page fault can't be handled using SEH (Structured Exception Handling). A page fault is actually a system exception which is handled by the system via the interrupt dispatch table. We won't delve into the interrupt dispatch table too much, but for those interested then please refer to this article.

Now, let's dump the page table entry for the memory address which caused the page fault by using the !pte extension command.

Rich (BB code):
7: kd> !pte ffffcc0150939000
                                           VA ffffcc0150939000
PXE at FFFFB158AC562CC0    PPE at FFFFB158AC598028    PDE at FFFFB158B3005420    PTE at FFFFB16600A849C8
contains 0A000000058BE863  contains 0A000000058BF863  contains 1A000007C4B41863  contains 0000000000000000
pfn 58be      ---DA--KWEV  pfn 58bf      ---DA--KWEV  pfn 7c4b41    ---DA--KWEV  not valid

As we can see, the address isn't valid, which refers to the fact that it has no corresponding mapping with a page in physical memory, hence why a page fault was generated for that address. However, why couldn't the page fault handler resolve this fault for us?

We can dump the pointer using the dd command; there are other variations of this but dd will suffice. The dd command means dump this pointer using DWORDs, which are 32-bit unsigned integers i.e. dump the value stored at this address in chunks of 4 bytes.

Rich (BB code):
7: kd> dd ffffcc0150939000 L2
ffffcc01`50939000  ???????? ????????

Notice that it points to a freed region of memory? This is the reason why the page fault could not be resolved. How do we find out who caused it? We can use the ln command with the address provided in the third parameter. The third parameter contains the address of the instruction which referenced the invalid memory address.

Rich (BB code):
7: kd> ln fffff80421d7fc6f
Browse module
Set bu breakpoint

Wait, what happened? Since the address belongs to a third-party driver which we do not have symbols for, the ln command isn't able to resolve the address for us. Nonetheless, we can dump the call stack instead using the kV command. Please note that you can use the other variants if you wish, I just prefer to use kV.

Rich (BB code):
7: kd> kV
# Child-SP          RetAddr           : Args to Child                                                           : Call Site
00 ffffdc09`b4f37de8 fffff804`06be3463 : 00000000`00000050 ffffcc01`50939000 00000000`00000002 ffffdc09`b4f38090 : nt!KeBugCheckEx
01 ffffdc09`b4f37df0 fffff804`06a730bf : 00000000`00000102 00000000`00000002 00000000`00000000 ffffcc01`50939000 : nt!MiSystemFault+0x1d6733
02 ffffdc09`b4f37ef0 fffff804`06bcf120 : fffff804`22044202 fffff804`2169fc90 00000000`00000000 fffff804`21fc7b57 : nt!MmAccessFault+0x34f
03 ffffdc09`b4f38090 fffff804`21d7fc6f : ffff858c`1a425b78 00000000`0007c3c3 ffffc9e8`956bf85f ffff858c`1a418000 : nt!KiPageFault+0x360 (TrapFrame @ ffffdc09`b4f38090)
04 ffffdc09`b4f38220 ffff858c`1a425b78 : 00000000`0007c3c3 ffffc9e8`956bf85f ffff858c`1a418000 00000000`00000007 : nvlddmkm+0x6efc6f
05 ffffdc09`b4f38228 00000000`0007c3c3 : ffffc9e8`956bf85f ffff858c`1a418000 00000000`00000007 fffff804`2204429d : 0xffff858c`1a425b78
06 ffffdc09`b4f38230 ffffc9e8`956bf85f : ffff858c`1a418000 00000000`00000007 fffff804`2204429d 00000000`00000003 : 0x7c3c3
07 ffffdc09`b4f38238 ffff858c`1a418000 : 00000000`00000007 fffff804`2204429d 00000000`00000003 00000000`00000000 : 0xffffc9e8`956bf85f
08 ffffdc09`b4f38240 00000000`00000007 : fffff804`2204429d 00000000`00000003 00000000`00000000 00000000`00000003 : 0xffff858c`1a418000
09 ffffdc09`b4f38248 fffff804`2204429d : 00000000`00000003 00000000`00000000 00000000`00000003 00000000`0000000f : 0x7
0a ffffdc09`b4f38250 00000000`00000003 : 00000000`00000000 00000000`00000003 00000000`0000000f 00000000`00000000 : nvlddmkm+0x9b429d
0b ffffdc09`b4f38258 00000000`00000000 : 00000000`00000003 00000000`0000000f 00000000`00000000 00000000`00000001 : 0x3

We can see two third-party driver calls on the stack along with the page fault. How do we determine which one caused the page fault? As you may have noticed, there is a trap frame stored on the stack. Since page faults are technically exceptions, the operating system will automatically generate a trap frame for us in order to transfer control to the exception handler. If the fault were to be handled, then control would be transferred back to the instruction which caused the page fault.

Let's dump the trap frame using the .trap command.

Rich (BB code):
7: kd> .trap ffffdc09`b4f38090
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=ffffcc0150938000 rbx=0000000000000000 rcx=ffff858c1a425b78
rdx=000000000001f3fd rsi=0000000000000000 rdi=0000000000000000
rip=fffff80421d7fc6f rsp=ffffdc09b4f38220 rbp=0000000000000007
r8=0000000000000400  r9=000000000001e033 r10=0000000000000000
r11=ffffdc09b4f38240 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz na pe nc
nvlddmkm+0x6efc6f:
fffff804`21d7fc6f 42891480        mov     dword ptr [rax+r8*4],edx ds:ffffcc01`50939000=????????

Please ignore the following: "The trap frame does not contain all registers" statement for now. We'll explore this together in a moment. For now, let's focus on two key pieces of information available in the trap frame: the instruction address (shown in blue) and the address being referenced (shown in red). The address being referenced is actually a result of several registers which calculated through the use of pointer arithmetic.

Notice how the instruction address matches the address shown in the third parameter? We've found the instruction which has caused the page fault. If you look closely, you can see that it belongs to the nvlddmkm+0x6efc6f function. In this case, the issue appears to be with the graphics card driver.

Rich (BB code):
7: kd> lmvm nvlddmkm
Browse full module list
start             end                 module name
fffff804`21690000 fffff804`22c49000   nvlddmkm T (no symbols) 
    Loaded symbol image file: nvlddmkm.sys
    Image path: \SystemRoot\System32\DriverStore\FileRepository\nvcvi.inf_amd64_11bdd2121036771e\nvlddmkm.sys
    Image name: nvlddmkm.sys
    Browse all global symbols  functions  data
    Timestamp:        Thu Sep  5 20:44:22 2019 (5D716596)
    CheckSum:         0155B6AC
    ImageSize:        015B9000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

Next, let's look at another variation of this bugcheck and this time we'll look at a large page and the CR2 register.

Rich (BB code):
PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced.  This cannot be protected by try-except.
Typically the address is just plain bad or it is pointing at freed memory.
Arguments:
Arg1: fffff8031d80c8a3, memory referenced.
Arg2: 0000000000000003, value 0 = read operation, 1 = write operation.
Arg3: fffff8031d847016, If non-zero, the instruction address which referenced the bad memory
    address.
Arg4: 0000000000000002, (reserved)

As we can see, the first parameter again contains the address being referenced. We can the ln command with the third parameter to find the faulting instruction, however, let's dump the trap frame to begin with, as there is an important point which I would like to make. This is also beneficial for a wide range of different bugchecks.

Rich (BB code):
8: kd> .trap 0xfffff10a73e28670
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000004 rbx=0000000000000000 rcx=ffffce8581549280
rdx=0000021a200dedd0 rsi=0000000000000000 rdi=0000000000000000
rip=fffff8031d847016 rsp=fffff10a73e28800 rbp=0000000000000000
r8=0000000000000000  r9=0000000000000000 r10=0000000000000008
r11=fffff10a73e28848 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz na pe nc
nt!IoSetIoCompletionEx+0x46:
fffff803`1d847016 48894730        mov     qword ptr [rdi+30h],rax ds:00000000`00000030=????????????????

Wait, what has happened? The address being referenced by the instruction mentioned in the third parameter is vastly different to the address stated in the first parameter? Why is this case? The reason is due to how trap frames are handled on x64 Windows. When a trap frame is generated by the processor, only the volatile registers are guaranteed to saved and thus you'll see registers which are zeroed out or contain garbage values which can't trusted.

The volatile registers are the following: RCX; RDX; R8 through to R11; XMM0 to XMM5. Since the rdi register is a non-volatile register its value hasn't been saved.

Although, we can find the address which was referenced by dumping the CR2 register.

Rich (BB code):
8: kd> r @cr2
Last set context:
cr2=fffff8031d80c8a3

As we can see, it matches the address shown in the first parameter of the bugcheck. This is because the CR2 register - known as the page fault linear address - is updated with the address which caused the page fault, therefore if you ever need to find the memory address which generated the page fault, then you can dump the CR2 register.

Let's examine the page table entry associated with the address and see why we couldn't handle the page fault.

Rich (BB code):
8: kd> !pte fffff8031d80c8a3
                                           VA fffff8031d80c8a3
PXE at FFFFFC7E3F1F8F80    PPE at FFFFFC7E3F1F0060    PDE at FFFFFC7E3E00C760    PTE at FFFFFC7C018EC060
contains 000000000560A063  contains 0000000004C0B063  contains 0A000000032001A1  contains 0000000000000000
pfn 560a      ---DA--KWEV  pfn 4c0b      ---DA--KWEV  pfn 3200      -GL-A--KREV  LARGE PAGE pfn 320c

That's odd, looks like its mapped to a valid region of physical memory? Why did we crash then? The reason being is because large pages are always non-paged pool and like the title of the bugcheck mentions: a page fault has occurred in a non-paged pool region.

Okay, there is another possible case which you may see and this is non-canonical addresses.

Rich (BB code):
1: kd> !pte fffdb2073c832cc0
                                           VA fffdb2073c832cc0
PXE at FFFFF5FAFD7EBB20    PPE at FFFFF5FAFD7640E0    PDE at FFFFF5FAEC81CF20    PTE at FFFFF5D9039E4190
Unable to get PXE FFFFF5FAFD7EBB20
WARNING: noncanonical VA, accesses will fault !

On x64 systems, there is a large region of address space which is unavailable to the operating system and referencing any addresses within this address range will result in a page fault. However, since the address is non-canonical, it isn't able to be resolved by the page fault handler and therefore leads to a crash as shown above. All canonical addresses have bits 47 to 63 either all set or cleared.

Here's some examples to illustrate this point.

Canonical Address:

Rich (BB code):
0: kd> .formats ffffe000502e37c0
Evaluate expression:
  Hex:     ffffe000`502e37c0
  Decimal: -35183026882624
  Octal:   1777777000012013433700
  Binary:  11111111 11111111 11100000 00000000 01010000 00101110 00110111 11000000
  Chars:   ....P.7.
  Time:    ***** Invalid FILETIME
  Float:   low 1.16916e+010 high -1.#QNAN
  Double:  -1.#QNAN

Non-Canoncial Address:

Rich (BB code):
0: kd> .formats fffdb2073c832cc0
Evaluate expression:
  Hex:     fffdb207`3c832cc0
  Decimal: -648680780387136
  Octal:   1777755440347440626300
  Binary:  11111111 11111101 10110010 00000111 00111100 10000011 00101100 11000000
  Chars:   ....<.,.
  Time:    ***** Invalid FILETIME
  Float:   low 0.0160125 high -1.#QNAN
  Double:  -1.#QNAN

Notice how bit 47 and bit 50 are cleared i.e. 0?

As a closing point, you may have noticed that the second parameter refers to the operation being performed on the address, the value of parameter can vary between architectures and Windows builds, therefore please ensure that you consult the documentation beforehand. For convenience, please refer to the following table:

Build NumberArchitectureParameter Value
1507+x640 = Read; 2 = Write; 10 = Execute
1507+x860 = Read; 2 = Write; 10 = Execute
1507+ARM0 = Read; 1 = Write; 8 = Execute
1507 and beforex86 & x640 = Read; 1 = Write

That concludes this tutorial, if you have any questions then please let me know.
 
Code:
7: kd> .trap ffffdc09`b4f38090
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=ffffcc0150938000 rbx=0000000000000000 rcx=ffff858c1a425b78
rdx=000000000001f3fd rsi=0000000000000000 rdi=0000000000000000
rip=fffff80421d7fc6f rsp=ffffdc09b4f38220 rbp=0000000000000007
r8=0000000000000400  r9=000000000001e033 r10=0000000000000000
r11=ffffdc09b4f38240 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz na pe nc
nvlddmkm+0x6efc6f:
[HI]fffff804`21d7fc6f 42891480[/HI]        mov     dword ptr [rax+r8*4],edx ds:ffffcc01`50939000=????????

Hi Blue,

You mention in this part of the article that the highlighted instruction address matches the address in the first parameter. I'm trying to figure out where that first parameter is, can you point it out to me please?
 
You mention in this part of the article that the highlighted instruction address matches the address in the first parameter.
You mean the third parameter i.e. Arg 3.

Wait, what has happened? The address being referenced by the instruction mentioned in the third parameter is vastly different to the address stated in the first parameter? Why is this case? The reason is due to how trap frames are handled on x64 Windows.
Note, I mentioned the address being referenced by the instruction in the third parameter. Have a look at the address shown in the rdi register and then compare that to the first parameter.

I'm trying to figure out where that first parameter is, can you point it out to me please?
It's in the bugcheck parameters, you can use the .bugcheck command to display them. Alternatively, the bugcheck parameters are loaded as part of the !analyze -v output.
 
instruction.png


Not sure if I'm just not understanding correctly but here's the snippet of your write up I was referencing which starts off saying:

Code:
the instruction address (shown in blue) and the address being referenced (shown in red).

And then:

Code:
Notice how the instruction address matches the address shown in the first parameter?

So the instruction address, ie fffff803`1d847016 (since it's shown in blue) is said to match the address shown in the first parameter. I'm not seeing a match there.

Isn't this the first param (highlighted below)?

Code:
PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced.  This cannot be protected by try-except.
Typically the address is just plain bad or it is pointing at freed memory.
Arguments:
Arg1: [HI]fffff8031d80c8a3[/hi], memory referenced.
Arg2: 0000000000000003, value 0 = read operation, 1 = write operation.
Arg3: fffff8031d847016, If non-zero, the instruction address which referenced the bad memory
    address.
Arg4: 0000000000000002, (reserved)
 
So the instruction address, ie fffff803`1d847016 (since it's shown in blue) is said to match the address shown in the first parameter. I'm not seeing a match there.
Thank you for spotting that, it was a typo, I've corrected it now. I apologise for the confusion it caused. The instruction address will match the third parameter and the address being referenced will match the first parameter (in most circumstances).
 
I've just noticed that MEX has a nice debugger extension which is similar to the r command but will highlight which registers are volatile for you.

Rich (BB code):
0: kd> !dr

Asterisk ("*") indicates a volatile register

Frame: 0x0
*rax=fffff800390fb320 rbx=fffff80033c8b180 *rcx=0000000000000133
*rdx=0000000000000000 rsi=0000000000000001  rdi=0000000000000000
 rip=fffff800387f5a80 rsp=fffff8003befce18  rbp=fffff80039127600
* r8=0000000000000501 *r9=0000000000000500 *r10=0000fffff80034f9
*r11=ffff94feab800000 r12=0000000000000000  r13=0000000001570637
 r14=0000000000000002 r15=0000000000000201
 

Has Sysnative Forums helped you? Please consider donating to help us support the site!

Back
Top