- May 7, 2013
- 10,400
A Stop 0x50 is one of the most common bugchecks you'll encounter, and you'll usually be able to use same techniques learned here, to understand the most common cause of a Stop 0x3B which is typically an invalid page fault caused by a null pointer. Before we begin, I'll assume you have a general understanding of address translation and the purpose of a page fault. Otherwise, I would recommend reading the following article.
There are three common causes of a Stop 0x50: a large page was referenced, a null pointer was being used or a valid but non-paged pool address was being used. We'll explore and explain these concepts in greater detail later in this tutorial. We'll also explore why x64 trap frames can't be relied upon when debugging a Stop 0x50.
Let's examine the bugcheck description as it contains a couple of clues. We know that an invalid memory address was referenced, which is possibly due to reasons as mentioned in the description: we've referenced a corrupt memory address or we've referenced a dangling pointer i.e. one which points to a freed memory address. It should be noted too, that a page fault can't be handled using SEH (Structured Exception Handling). A page fault is actually a system exception which is handled by the system via the interrupt dispatch table. We won't delve into the interrupt dispatch table too much, but for those interested then please refer to this article.
Now, let's dump the page table entry for the memory address which caused the page fault by using the !pte extension command.
As we can see, the address isn't valid, which refers to the fact that it has no corresponding mapping with a page in physical memory, hence why a page fault was generated for that address. However, why couldn't the page fault handler resolve this fault for us?
We can dump the pointer using the dd command; there are other variations of this but dd will suffice. The dd command means dump this pointer using DWORDs, which are 32-bit unsigned integers i.e. dump the value stored at this address in chunks of 4 bytes.
Notice that it points to a freed region of memory? This is the reason why the page fault could not be resolved. How do we find out who caused it? We can use the ln command with the address provided in the third parameter. The third parameter contains the address of the instruction which referenced the invalid memory address.
Wait, what happened? Since the address belongs to a third-party driver which we do not have symbols for, the ln command isn't able to resolve the address for us. Nonetheless, we can dump the call stack instead using the kV command. Please note that you can use the other variants if you wish, I just prefer to use kV.
We can see two third-party driver calls on the stack along with the page fault. How do we determine which one caused the page fault? As you may have noticed, there is a trap frame stored on the stack. Since page faults are technically exceptions, the operating system will automatically generate a trap frame for us in order to transfer control to the exception handler. If the fault were to be handled, then control would be transferred back to the instruction which caused the page fault.
Let's dump the trap frame using the .trap command.
Please ignore the following: "The trap frame does not contain all registers" statement for now. We'll explore this together in a moment. For now, let's focus on two key pieces of information available in the trap frame: the instruction address (shown in blue) and the address being referenced (shown in red). The address being referenced is actually a result of several registers which calculated through the use of pointer arithmetic.
Notice how the instruction address matches the address shown in the third parameter? We've found the instruction which has caused the page fault. If you look closely, you can see that it belongs to the nvlddmkm+0x6efc6f function. In this case, the issue appears to be with the graphics card driver.
Next, let's look at another variation of this bugcheck and this time we'll look at a large page and the CR2 register.
As we can see, the first parameter again contains the address being referenced. We can the ln command with the third parameter to find the faulting instruction, however, let's dump the trap frame to begin with, as there is an important point which I would like to make. This is also beneficial for a wide range of different bugchecks.
Wait, what has happened? The address being referenced by the instruction mentioned in the third parameter is vastly different to the address stated in the first parameter? Why is this case? The reason is due to how trap frames are handled on x64 Windows. When a trap frame is generated by the processor, only the volatile registers are guaranteed to saved and thus you'll see registers which are zeroed out or contain garbage values which can't trusted.
The volatile registers are the following: RCX; RDX; R8 through to R11; XMM0 to XMM5. Since the rdi register is a non-volatile register its value hasn't been saved.
Although, we can find the address which was referenced by dumping the CR2 register.
As we can see, it matches the address shown in the first parameter of the bugcheck. This is because the CR2 register - known as the page fault linear address - is updated with the address which caused the page fault, therefore if you ever need to find the memory address which generated the page fault, then you can dump the CR2 register.
Let's examine the page table entry associated with the address and see why we couldn't handle the page fault.
That's odd, looks like its mapped to a valid region of physical memory? Why did we crash then? The reason being is because large pages are always non-paged pool and like the title of the bugcheck mentions: a page fault has occurred in a non-paged pool region.
Okay, there is another possible case which you may see and this is non-canonical addresses.
On x64 systems, there is a large region of address space which is unavailable to the operating system and referencing any addresses within this address range will result in a page fault. However, since the address is non-canonical, it isn't able to be resolved by the page fault handler and therefore leads to a crash as shown above. All canonical addresses have bits 47 to 63 either all set or cleared.
Here's some examples to illustrate this point.
Canonical Address:
Non-Canoncial Address:
Notice how bit 47 and bit 50 are cleared i.e. 0?
As a closing point, you may have noticed that the second parameter refers to the operation being performed on the address, the value of parameter can vary between architectures and Windows builds, therefore please ensure that you consult the documentation beforehand. For convenience, please refer to the following table:
That concludes this tutorial, if you have any questions then please let me know.
There are three common causes of a Stop 0x50: a large page was referenced, a null pointer was being used or a valid but non-paged pool address was being used. We'll explore and explain these concepts in greater detail later in this tutorial. We'll also explore why x64 trap frames can't be relied upon when debugging a Stop 0x50.
Rich (BB code):
PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced. This cannot be protected by try-except.
Typically the address is just plain bad or it is pointing at freed memory.
Arguments:
Arg1: ffffcc0150939000, memory referenced.
Arg2: 0000000000000002, value 0 = read operation, 1 = write operation.
Arg3: fffff80421d7fc6f, If non-zero, the instruction address which referenced the bad memory
address.
Arg4: 0000000000000000, (reserved)
Let's examine the bugcheck description as it contains a couple of clues. We know that an invalid memory address was referenced, which is possibly due to reasons as mentioned in the description: we've referenced a corrupt memory address or we've referenced a dangling pointer i.e. one which points to a freed memory address. It should be noted too, that a page fault can't be handled using SEH (Structured Exception Handling). A page fault is actually a system exception which is handled by the system via the interrupt dispatch table. We won't delve into the interrupt dispatch table too much, but for those interested then please refer to this article.
Now, let's dump the page table entry for the memory address which caused the page fault by using the !pte extension command.
Rich (BB code):
7: kd> !pte ffffcc0150939000
VA ffffcc0150939000
PXE at FFFFB158AC562CC0 PPE at FFFFB158AC598028 PDE at FFFFB158B3005420 PTE at FFFFB16600A849C8
contains 0A000000058BE863 contains 0A000000058BF863 contains 1A000007C4B41863 contains 0000000000000000
pfn 58be ---DA--KWEV pfn 58bf ---DA--KWEV pfn 7c4b41 ---DA--KWEV not valid
As we can see, the address isn't valid, which refers to the fact that it has no corresponding mapping with a page in physical memory, hence why a page fault was generated for that address. However, why couldn't the page fault handler resolve this fault for us?
We can dump the pointer using the dd command; there are other variations of this but dd will suffice. The dd command means dump this pointer using DWORDs, which are 32-bit unsigned integers i.e. dump the value stored at this address in chunks of 4 bytes.
Rich (BB code):
7: kd> dd ffffcc0150939000 L2
ffffcc01`50939000 ???????? ????????
Notice that it points to a freed region of memory? This is the reason why the page fault could not be resolved. How do we find out who caused it? We can use the ln command with the address provided in the third parameter. The third parameter contains the address of the instruction which referenced the invalid memory address.
Rich (BB code):
7: kd> ln fffff80421d7fc6f
Browse module
Set bu breakpoint
Wait, what happened? Since the address belongs to a third-party driver which we do not have symbols for, the ln command isn't able to resolve the address for us. Nonetheless, we can dump the call stack instead using the kV command. Please note that you can use the other variants if you wish, I just prefer to use kV.
Rich (BB code):
7: kd> kV
# Child-SP RetAddr : Args to Child : Call Site
00 ffffdc09`b4f37de8 fffff804`06be3463 : 00000000`00000050 ffffcc01`50939000 00000000`00000002 ffffdc09`b4f38090 : nt!KeBugCheckEx
01 ffffdc09`b4f37df0 fffff804`06a730bf : 00000000`00000102 00000000`00000002 00000000`00000000 ffffcc01`50939000 : nt!MiSystemFault+0x1d6733
02 ffffdc09`b4f37ef0 fffff804`06bcf120 : fffff804`22044202 fffff804`2169fc90 00000000`00000000 fffff804`21fc7b57 : nt!MmAccessFault+0x34f
03 ffffdc09`b4f38090 fffff804`21d7fc6f : ffff858c`1a425b78 00000000`0007c3c3 ffffc9e8`956bf85f ffff858c`1a418000 : nt!KiPageFault+0x360 (TrapFrame @ ffffdc09`b4f38090)
04 ffffdc09`b4f38220 ffff858c`1a425b78 : 00000000`0007c3c3 ffffc9e8`956bf85f ffff858c`1a418000 00000000`00000007 : nvlddmkm+0x6efc6f
05 ffffdc09`b4f38228 00000000`0007c3c3 : ffffc9e8`956bf85f ffff858c`1a418000 00000000`00000007 fffff804`2204429d : 0xffff858c`1a425b78
06 ffffdc09`b4f38230 ffffc9e8`956bf85f : ffff858c`1a418000 00000000`00000007 fffff804`2204429d 00000000`00000003 : 0x7c3c3
07 ffffdc09`b4f38238 ffff858c`1a418000 : 00000000`00000007 fffff804`2204429d 00000000`00000003 00000000`00000000 : 0xffffc9e8`956bf85f
08 ffffdc09`b4f38240 00000000`00000007 : fffff804`2204429d 00000000`00000003 00000000`00000000 00000000`00000003 : 0xffff858c`1a418000
09 ffffdc09`b4f38248 fffff804`2204429d : 00000000`00000003 00000000`00000000 00000000`00000003 00000000`0000000f : 0x7
0a ffffdc09`b4f38250 00000000`00000003 : 00000000`00000000 00000000`00000003 00000000`0000000f 00000000`00000000 : nvlddmkm+0x9b429d
0b ffffdc09`b4f38258 00000000`00000000 : 00000000`00000003 00000000`0000000f 00000000`00000000 00000000`00000001 : 0x3
We can see two third-party driver calls on the stack along with the page fault. How do we determine which one caused the page fault? As you may have noticed, there is a trap frame stored on the stack. Since page faults are technically exceptions, the operating system will automatically generate a trap frame for us in order to transfer control to the exception handler. If the fault were to be handled, then control would be transferred back to the instruction which caused the page fault.
Let's dump the trap frame using the .trap command.
Rich (BB code):
7: kd> .trap ffffdc09`b4f38090
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=ffffcc0150938000 rbx=0000000000000000 rcx=ffff858c1a425b78
rdx=000000000001f3fd rsi=0000000000000000 rdi=0000000000000000
rip=fffff80421d7fc6f rsp=ffffdc09b4f38220 rbp=0000000000000007
r8=0000000000000400 r9=000000000001e033 r10=0000000000000000
r11=ffffdc09b4f38240 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na pe nc
nvlddmkm+0x6efc6f:
fffff804`21d7fc6f 42891480 mov dword ptr [rax+r8*4],edx ds:ffffcc01`50939000=????????
Please ignore the following: "The trap frame does not contain all registers" statement for now. We'll explore this together in a moment. For now, let's focus on two key pieces of information available in the trap frame: the instruction address (shown in blue) and the address being referenced (shown in red). The address being referenced is actually a result of several registers which calculated through the use of pointer arithmetic.
Notice how the instruction address matches the address shown in the third parameter? We've found the instruction which has caused the page fault. If you look closely, you can see that it belongs to the nvlddmkm+0x6efc6f function. In this case, the issue appears to be with the graphics card driver.
Rich (BB code):
7: kd> lmvm nvlddmkm
Browse full module list
start end module name
fffff804`21690000 fffff804`22c49000 nvlddmkm T (no symbols)
Loaded symbol image file: nvlddmkm.sys
Image path: \SystemRoot\System32\DriverStore\FileRepository\nvcvi.inf_amd64_11bdd2121036771e\nvlddmkm.sys
Image name: nvlddmkm.sys
Browse all global symbols functions data
Timestamp: Thu Sep 5 20:44:22 2019 (5D716596)
CheckSum: 0155B6AC
ImageSize: 015B9000
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4
Information from resource tables:
Next, let's look at another variation of this bugcheck and this time we'll look at a large page and the CR2 register.
Rich (BB code):
PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced. This cannot be protected by try-except.
Typically the address is just plain bad or it is pointing at freed memory.
Arguments:
Arg1: fffff8031d80c8a3, memory referenced.
Arg2: 0000000000000003, value 0 = read operation, 1 = write operation.
Arg3: fffff8031d847016, If non-zero, the instruction address which referenced the bad memory
address.
Arg4: 0000000000000002, (reserved)
As we can see, the first parameter again contains the address being referenced. We can the ln command with the third parameter to find the faulting instruction, however, let's dump the trap frame to begin with, as there is an important point which I would like to make. This is also beneficial for a wide range of different bugchecks.
Rich (BB code):
8: kd> .trap 0xfffff10a73e28670
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000004 rbx=0000000000000000 rcx=ffffce8581549280
rdx=0000021a200dedd0 rsi=0000000000000000 rdi=0000000000000000
rip=fffff8031d847016 rsp=fffff10a73e28800 rbp=0000000000000000
r8=0000000000000000 r9=0000000000000000 r10=0000000000000008
r11=fffff10a73e28848 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na pe nc
nt!IoSetIoCompletionEx+0x46:
fffff803`1d847016 48894730 mov qword ptr [rdi+30h],rax ds:00000000`00000030=????????????????
Wait, what has happened? The address being referenced by the instruction mentioned in the third parameter is vastly different to the address stated in the first parameter? Why is this case? The reason is due to how trap frames are handled on x64 Windows. When a trap frame is generated by the processor, only the volatile registers are guaranteed to saved and thus you'll see registers which are zeroed out or contain garbage values which can't trusted.
The volatile registers are the following: RCX; RDX; R8 through to R11; XMM0 to XMM5. Since the rdi register is a non-volatile register its value hasn't been saved.
Although, we can find the address which was referenced by dumping the CR2 register.
Rich (BB code):
8: kd> r @cr2
Last set context:
cr2=fffff8031d80c8a3
As we can see, it matches the address shown in the first parameter of the bugcheck. This is because the CR2 register - known as the page fault linear address - is updated with the address which caused the page fault, therefore if you ever need to find the memory address which generated the page fault, then you can dump the CR2 register.
Let's examine the page table entry associated with the address and see why we couldn't handle the page fault.
Rich (BB code):
8: kd> !pte fffff8031d80c8a3
VA fffff8031d80c8a3
PXE at FFFFFC7E3F1F8F80 PPE at FFFFFC7E3F1F0060 PDE at FFFFFC7E3E00C760 PTE at FFFFFC7C018EC060
contains 000000000560A063 contains 0000000004C0B063 contains 0A000000032001A1 contains 0000000000000000
pfn 560a ---DA--KWEV pfn 4c0b ---DA--KWEV pfn 3200 -GL-A--KREV LARGE PAGE pfn 320c
That's odd, looks like its mapped to a valid region of physical memory? Why did we crash then? The reason being is because large pages are always non-paged pool and like the title of the bugcheck mentions: a page fault has occurred in a non-paged pool region.
Okay, there is another possible case which you may see and this is non-canonical addresses.
Rich (BB code):
1: kd> !pte fffdb2073c832cc0
VA fffdb2073c832cc0
PXE at FFFFF5FAFD7EBB20 PPE at FFFFF5FAFD7640E0 PDE at FFFFF5FAEC81CF20 PTE at FFFFF5D9039E4190
Unable to get PXE FFFFF5FAFD7EBB20
WARNING: noncanonical VA, accesses will fault !
On x64 systems, there is a large region of address space which is unavailable to the operating system and referencing any addresses within this address range will result in a page fault. However, since the address is non-canonical, it isn't able to be resolved by the page fault handler and therefore leads to a crash as shown above. All canonical addresses have bits 47 to 63 either all set or cleared.
Here's some examples to illustrate this point.
Canonical Address:
Rich (BB code):
0: kd> .formats ffffe000502e37c0
Evaluate expression:
Hex: ffffe000`502e37c0
Decimal: -35183026882624
Octal: 1777777000012013433700
Binary: 11111111 11111111 11100000 00000000 01010000 00101110 00110111 11000000
Chars: ....P.7.
Time: ***** Invalid FILETIME
Float: low 1.16916e+010 high -1.#QNAN
Double: -1.#QNAN
Non-Canoncial Address:
Rich (BB code):
0: kd> .formats fffdb2073c832cc0
Evaluate expression:
Hex: fffdb207`3c832cc0
Decimal: -648680780387136
Octal: 1777755440347440626300
Binary: 11111111 11111101 10110010 00000111 00111100 10000011 00101100 11000000
Chars: ....<.,.
Time: ***** Invalid FILETIME
Float: low 0.0160125 high -1.#QNAN
Double: -1.#QNAN
Notice how bit 47 and bit 50 are cleared i.e. 0?
As a closing point, you may have noticed that the second parameter refers to the operation being performed on the address, the value of parameter can vary between architectures and Windows builds, therefore please ensure that you consult the documentation beforehand. For convenience, please refer to the following table:
Build Number | Architecture | Parameter Value |
1507+ | x64 | 0 = Read; 2 = Write; 10 = Execute |
1507+ | x86 | 0 = Read; 2 = Write; 10 = Execute |
1507+ | ARM | 0 = Read; 1 = Write; 8 = Execute |
1507 and before | x86 & x64 | 0 = Read; 1 = Write |
That concludes this tutorial, if you have any questions then please let me know.