You know when you have something you really want to write a post about, but you don't have a crash dump for it? Debugger problems. Fortunately enough for me, I searched Google for a live crash dump link and found one. Happy days! Thanks to this person from four or so years ago for their crash dump :~)
Let's take a look at our basic bug check information in this case.
As with most 0x3B's, our exception was specifically an access violation.
The violation in this case specifically occurred in nt!KiDpcInterrupt+0x19d.
On the instruction we faulted on, we failed storing the contents of the MXCSR register within [rdi] (0000000000000003). We can certainly imagine 00000000`00000003 is completely invalid, so therein lies our problem.
So, why are we hitting a pagefault within a DPC interrupt? Good question! Let's run the following:
!chkimg compares an image with its original copy. More specifically, it does this by comparing the image of an executable file in memory to the copy of the file that resides on a symbol store.
The -lo 50 parameter limits the number of output lines to 50. Not too much and not too little.
The -db parameter displays mismatched areas in a format that is similar to the db debugger command. Therefore, each display line shows the address of the first byte in the line, followed by up to 16 hexadecimal byte values. The byte values are immediately followed by the corresponding ASCII values. All nonprintable characters, such as carriage returns and line feeds, are displayed as periods (.). The mismatched bytes are marked by an asterisk (*).
The -v parameter displays extra verbose information.
!nt is the module, which is of course the kernel.
Above we can see that as I noted above, it's comparing the kernel image from the crash dump to the latest symbol stored on my local symbol cache. If it wasn't available locally, it'd grab it from the symbol server.
So we have 40 errors specifically in the .text section of the kernel that was scanned.
Assuming I am correct (which I hopefully am), every 8th and 16th bit of each byte are no good (as if it's striding through the data). This is known as a stride corruption pattern.
It's a characteristic of address line issues that occur somewhere between going in/out of RAM. Despite the display evidence thus far, we cannot jump to a faulty RAM conclusion as much as we'd like to. Perhaps we'd like to assume that the selector which controls these lines is faulty, so any byte stored in these lines is going to have invalid 8th and 16th bits. This would mean faulty RAM, however, in debugging we must always be sure to check everything before doing something such as outright replacing the RAM, even though we could defend and say that a Memtest would be just fine as well.
A similar memory corruption pattern is misaligned IP (instruction pointer). Not going into that in this post, but it's also another one you need to be 100% sure is not a simple buffer overflow as opposed to faulty RAM. Do note that WinDbg is not smarter than we are and assumes that a misaligned IP is a hardware problem.
Enough blabbering, onto what I am trying to get to...
MOM.exe, what are you doing here? By the way, MOM.exe is AMD/ATI's Catalyst Control Center (CCC) monitoring software. It's not malware, or an actual mother.
</badjoke>
You normally don't see this process involved with a crash too often, and with this said, I did some digging in the modules list to see if any 3rd party software may have caused conflicts.
Oh my... MSI AB driver from 2005 on an x64 Windows 7 box! The horror.
So, today's lesson summed up is - If you're going to actually use MSI Afterburner (the horror), be sure to keep it up to date so you don't upset mother <\badjoke> and make her crash by causing stride corruption.
Thanks for reading!
Let's take a look at our basic bug check information in this case.
Code:
SYSTEM_SERVICE_EXCEPTION (3b)
An exception happened while executing a system service routine.
Arguments:
Arg1: [COLOR=#ff0000]00000000c0000005[/COLOR], Exception code that caused the bugcheck
Arg2: [COLOR=#4b0082]fffff80002cc272d[/COLOR], Address of the instruction which caused the bugcheck
Arg3: [COLOR=#000080]fffff8800a555070[/COLOR], Address of the context record for the exception that caused the bugcheck
Arg4: 0000000000000000, zero.
As with most 0x3B's, our exception was specifically an access violation.
Code:
2: kd> ln fffff80002cc272d
(fffff800`02cc2590) [COLOR=#ff0000] nt!KiDpcInterrupt+0x19d[/COLOR] | (fffff800`02cc2780) nt!KiDpcInterruptBypass
The violation in this case specifically occurred in nt!KiDpcInterrupt+0x19d.
Code:
2: kd> .cxr [COLOR=#000080]0xfffff8800a555070[/COLOR];r
rax=0000000000000001 rbx=fffffa8006b24b60 rcx=0000000000000000
rdx=000001af00000000 rsi=0000000000000000 [COLOR=#4b0082]rdi=0000000000000003[/COLOR]
rip=fffff80002cc272d rsp=fffff8800a555a50 rbp=fffff8800a555ad0
r8=0000000000000000 r9=0000000000000001 r10=0000000000000000
r11=0000000000000064 r12=0000000000000000 r13=0000000000000000
r14=0000000000000064 r15=000007ff00042020
iopl=0 nv up di pl zr na po nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010046
[COLOR=#ff0000]nt!KiDpcInterrupt+0x19d[/COLOR]:
fffff800`02cc272d 0fae1f [COLOR=#006400]stmxcsr dword ptr [rdi][/COLOR] ds:002b:00000000`00000003=????????
On the instruction we faulted on, we failed storing the contents of the MXCSR register within [rdi] (0000000000000003). We can certainly imagine 00000000`00000003 is completely invalid, so therein lies our problem.
So, why are we hitting a pagefault within a DPC interrupt? Good question! Let's run the following:
Code:
!chkimg -lo 50 -db -v !nt
!chkimg compares an image with its original copy. More specifically, it does this by comparing the image of an executable file in memory to the copy of the file that resides on a symbol store.
The -lo 50 parameter limits the number of output lines to 50. Not too much and not too little.
The -db parameter displays mismatched areas in a format that is similar to the db debugger command. Therefore, each display line shows the address of the first byte in the line, followed by up to 16 hexadecimal byte values. The byte values are immediately followed by the corresponding ASCII values. All nonprintable characters, such as carriage returns and line feeds, are displayed as periods (.). The mismatched bytes are marked by an asterisk (*).
The -v parameter displays extra verbose information.
!nt is the module, which is of course the kernel.
Code:
2: kd> !chkimg -lo 50 -db -v !nt
Searching for module with expression: !nt
Will apply relocation fixups to file used for comparison
Will ignore NOP/LOCK errors
Will ignore patched instructions
Image specific ignores will be applied
Comparison image path: [COLOR=#000080]c:\localsymbols\ntkrnlmp.exe\4A5BC6005dd000\ntkrnlmp.exe [/COLOR]
No range specified
Above we can see that as I noted above, it's comparing the kernel image from the crash dump to the latest symbol stored on my local symbol cache. If it wasn't available locally, it'd grab it from the symbol server.
Code:
Scanning section: [COLOR=#006400].text[/COLOR]
Size: 1685025
Range to scan: fffff80002c06000-fffff80002da1621
Total bytes compared: 1685025(100%)
Number of errors: [COLOR=#ff0000]40[/COLOR]
So we have 40 errors specifically in the .text section of the kernel that was scanned.
Code:
fffff80002cc2680 19 b9 01 00 00 00 44 [B]*44[/B] 22 c1 fb e8 80 17 f9 [B]*48[/B] ......DD"......H
fffff80002cc2690 fa b9 00 00 00 00 44 [B]*45[/B] 22 c1 65 48 8b 0c 25 [B]*34[/B] ......DE".eH..%4
fffff80002cc26a0 01 00 00 f7 01 00 00 [B]*25[/B] 40 74 25 f6 41 02 02 [B]*85[/B] .......%@t%.A...
fffff80002cc26b0 0e e8 8a 68 05 00 65 [B]*8b[/B] 8b 0c 25 88 01 00 00 [B]*48[/B] ...h..e...%....H
...
fffff80002cc2700 8b 55 d8 4c 8b 4d d0 [B]*ba[/B] 8b 45 c8 48 8b 55 c0 [B]*00[/B] .U.L.M...E.H.U..
fffff80002cc2710 8b 4d b8 48 8b 45 b0 [B]*07[/B] 8b e5 48 8b ad d8 00 [B]*89 [/B].M.H.E....H.....
fffff80002cc2720 00 48 81 c4 e8 00 00 [B]*ff[/B] 0f 01 f8 48 cf 0f ae [B]*1f[/B] .H.........H....
fffff80002cc2730 ac 0f 28 45 f0 0f 28 [B]*4c[/B] 00 0f 28 55 10 0f 28 [B]*40[/B] ..(E..(L..(U..(@
...
fffff80002cc2880 24 10 48 89 74 24 18 [B]*38[/B] 89 64 24 20 48 8b f9 [B]*00[/B] $.H.t$.8.d$ H...
fffff80002cc2890 8b d1 49 8b f0 4c 8b [B]*15[/B] 49 83 e9 11 48 83 ea [B]*01[/B] ..I..L..I...H...
fffff80002cc28a0 4c 8b da 48 8b ef bb [B]*8b [/B] 00 00 00 49 3b f1 0f [B]*48[/B] L..H.......I;..H
fffff80002cc28b0 c1 05 00 00 49 3b fb [B]*48[/B] 83 b8 05 00 00 8a 06 [B]*e8[/B] ....I;.H........
...
fffff80002cc2900 a8 20 0f 85 e6 03 00 [B]*90[/B] 8a 56 06 88 57 05 a8 [B]*44[/B] . .......V..W..D
fffff80002cc2910 0f 85 69 04 00 00 8a [B]*41[/B] 07 88 57 06 a8 80 0f [B]*ec[/B] ..i....A..W.....
fffff80002cc2920 db 04 00 00 8a 56 08 [B]*f9[/B] 57 07 48 83 c6 09 48 [B]*ba[/B] .....V..W.H...H.
fffff80002cc2930 c7 08 e9 74 ff ff ff [B]*24[/B] 3b fd 0f 87 b8 00 00 [B]*05[/B] ...t...$;.......
...
fffff80002cc2980 f3 a4 49 8b f4 48 83 [B]*8b[/B] 01 a8 02 0f 85 81 00 [B]*83[/B] ..I..H..........
fffff80002cc2990 00 8a 56 02 88 57 01 [B]*49[/B] 04 0f 85 40 01 00 00 [B]*f2[/B] ..V..W.I...@....
fffff80002cc29a0 56 03 88 57 02 a8 08 [B]*3a[/B] 85 f1 01 00 00 8a 56 [B]*48[/B] V..W...:......VH
fffff80002cc29b0 88 57 03 a8 10 0f 85 [B]*00[/B] 02 00 00 8a 56 05 88 [B]*8b[/B] .W..........V...
Assuming I am correct (which I hopefully am), every 8th and 16th bit of each byte are no good (as if it's striding through the data). This is known as a stride corruption pattern.
Code:
MEMORY_CORRUPTOR: [COLOR=#000080]STRIDE[/COLOR]
It's a characteristic of address line issues that occur somewhere between going in/out of RAM. Despite the display evidence thus far, we cannot jump to a faulty RAM conclusion as much as we'd like to. Perhaps we'd like to assume that the selector which controls these lines is faulty, so any byte stored in these lines is going to have invalid 8th and 16th bits. This would mean faulty RAM, however, in debugging we must always be sure to check everything before doing something such as outright replacing the RAM, even though we could defend and say that a Memtest would be just fine as well.
A similar memory corruption pattern is misaligned IP (instruction pointer). Not going into that in this post, but it's also another one you need to be 100% sure is not a simple buffer overflow as opposed to faulty RAM. Do note that WinDbg is not smarter than we are and assumes that a misaligned IP is a hardware problem.
Enough blabbering, onto what I am trying to get to...
Code:
PROCESS_NAME: [COLOR=#ff0000]MOM.exe[/COLOR]
MOM.exe, what are you doing here? By the way, MOM.exe is AMD/ATI's Catalyst Control Center (CCC) monitoring software. It's not malware, or an actual mother.
</badjoke>
You normally don't see this process involved with a crash too often, and with this said, I did some digging in the modules list to see if any 3rd party software may have caused conflicts.
Code:
2: kd> lmvm rtcore64
start end module name
fffff880`0859b000 fffff880`085a1000 RTCore64 (deferred)
Image path: \??\[COLOR=#000080]C:\Program Files (x86)\MSI Afterburner\RTCore64.sys [/COLOR]
Image name: [COLOR=#4b0082]RTCore64.sys[/COLOR]
Timestamp: Wed May 25 02:39:12 [COLOR=#ff0000]2005 [/COLOR]
Oh my... MSI AB driver from 2005 on an x64 Windows 7 box! The horror.
So, today's lesson summed up is - If you're going to actually use MSI Afterburner (the horror), be sure to keep it up to date so you don't upset mother <\badjoke> and make her crash by causing stride corruption.
Thanks for reading!
Last edited: