[Question] Debugging an NTFS_FILE_SYSTEM dump

Patrick

Sysnative Staff
Joined
Jun 7, 2012
Posts
4,618
Okay, so for 124's we can !errrec and for 9F's we can !irp, etc. On 24's, I've noticed it mentions the possibility to do a .cxr and then kb to obtain more info. First of all, what the hell does this mean? Second of all, how do I go about figuring this out?

For example, here's a dump I am talking about - View attachment 061713-13884-01.rar

Now, it says this:

NTFS_FILE_SYSTEM (24)
If you see NtfsExceptionFilter on the stack then the 2nd and 3rd
parameters are the exception record and context record. Do a .cxr
on the 3rd parameter and then kb to obtain a more informative stack
trace.

Okay, now (and this may seem like a stupid question, but just making sure), the stack is everything below STACK_TEXT:.. correct? And you read it from BOTTOM to TOP?

Well, the stack text in the attached dump is:

fffff880`03c5a120 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!MiObtainSystemCacheView+0x22e

So since there is no NtfsExceptionFilter in the stack, I would assume this is not a dump in which we can perform the following commands noted in the dump? If so, regardless, can anyone show me in a dump with an exception filter, what you would do and what would be the outcome?

Thanks.
 
The .cxr command displays the registers for the Context Switch I believe. You can use the !thread extension with the dps command to obtain a raw stack for the entire thread.

You are correct about reading the stack from bottom to top.

.cxr Remarks from Windows Debugger -

The information from a context record can be used to assist in debugging a system halt where an unhandled exception has occurred and an exact stack trace is not available. The .cxr command displays the important registers for the specified context record.
This command also instructs the debugger to use the specified context record as the register context. After this command is executed, the debugger will have access to the most important registers and the stack trace for this thread. This register context persists until you allow the target to execute or use another register context command (.thread, .ecxr, .trap , or .cxr again). In user mode, it will also be reset if you change the current process or thread. See Register Context for details.
The .cxr command is often used to debug bug check 0x1E. For more information and an example, see Bug Check 0x1E (KMODE_EXCEPTION_NOT_HANDLED).

I managed to find this from the raw stack:

Code:
3: kd>[COLOR=#00ff00][/COLOR][COLOR=#008000][/COLOR] [COLOR=#008000][/COLOR][COLOR=#008000]lmvm em018_64[/COLOR]
start             end                 module name
fffffa80`03f5b000 fffffa80`03fbe000   em018_64 T (no symbols)           
    Loaded symbol image file: em018_64.dat
    Image path: C:\Program Files\ESET\ESET Smart Security\em018_64.dat
    Image name: em018_64.dat
    Timestamp:        [COLOR=#ff0000][/COLOR][COLOR=#ff0000][/COLOR][COLOR=#ff0000]Tue Jun 04 15:24:38 2013 (51ADF8A6)[/COLOR]
    CheckSum:         00000000
    ImageSize:        00063000
Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

It seems to be part of ESET Smart Security.
 
It's not really any different than any other typical bugcheck, in that an exception occurs, the context/exception records of the occurrence are saved (cxr and ecxr for short, respectively), and then the bugcheck is triggered. The context record saves the most recent registers and stack trace of the event that triggered the exception (the place where it went wrong). The stacktrace !analyze -v starts you off with is just what happened in response to that event (the event, btw, being an illegal instruction in this case). So you were going in the right direction with the context record, but of course that's on the first step in the journey really.

As for this certain instance, in case you wish to know, I personally can't see anything problematic with the cmp instruction it faulted on. I would venture to guess that the CPU actually executed something else besides what the code says, which means a CPU/Mobo/PSU issue.
 
Just posting again as I finally discovered a *24 in which I was able to do a .cxr on the 3rd parameter. The 2nd parameter was the exception record.

From running a .cxr on the 3rd parameter, we get:

Code:
2: kd> .cxr 0xfffff88003192f70
rax=ffff78a004d0a728 rbx=0000000000000001 rcx=fffff8a004d0a728
rdx=fffff8a004d0a5f0 rsi=fffff8a004d0a6f0 rdi=0000000000000000
rip=fffff800039345b5 rsp=fffff88003193950 rbp=0000000000000130
 r8=fffff8a004d0a600  r9=0000000000000000 r10=fffff8a004d0a5c0
r11=fffff8a004d0a6f0 r12=0000000000000705 r13=0000000000000000
r14=fffffa80041bccf8 r15=fffff8a004d0a958
iopl=0         nv up ei pl nz na pe nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
nt!FsRtlTeardownPerStreamContexts+0x69:
fffff800`039345b5 48894808        mov     qword ptr [rax+8],rcx ds:002b:ffff78a0`04d0a730=????????????????

From running an .exr on the 2nd parameter (exception record) we get:

Code:
2: kd> .exr 0xfffff88003193718
ExceptionAddress: fffff800039345b5 (nt!FsRtlTeardownPerStreamContexts+0x0000000000000069)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: ffffffffffffffff
Attempt to read from address ffffffffffffffff

and from running a .cxr on the on the 2nd parameter (exception record) we get:

Code:
2: kd> .cxr fffff88003193718
rax=0000000000000003 rbx=fffff8a00a92cd80 rcx=fffff800037ac4f1
rdx=fffff880031937e8 rsi=0000000000000008 rdi=00000000000007ff
rip=0000000000000000 rsp=fffff8a00b26a000 rbp=fffff80003676b8a
 r8=fffff8a00a92cdb0  r9=0000000000000030 r10=0000000000000100
r11=00001f80010007ff r12=ffff78a004d0a728 r13=fffff8a004d0a728
r14=fffff8a004d0a5f0 r15=fffff8a004d0a600
iopl=3 vip vif ov up di ng nz na pe nc
cs=0000  ss=0113  ds=0000  es=0000  fs=0000  gs=8692             efl=fffff880
0000:0000 ??              ???

In the dump, it mentions ~
Do a .cxr
on the 3rd parameter and then kb to obtain a more informative stack
trace.

Okay, so let's do that:

Code:
2: kd> .cxr fffff88003192f70
rax=ffff78a004d0a728 rbx=0000000000000001 rcx=fffff8a004d0a728
rdx=fffff8a004d0a5f0 rsi=fffff8a004d0a6f0 rdi=0000000000000000
rip=fffff800039345b5 rsp=fffff88003193950 rbp=0000000000000130
 r8=fffff8a004d0a600  r9=0000000000000000 r10=fffff8a004d0a5c0
r11=fffff8a004d0a6f0 r12=0000000000000705 r13=0000000000000000
r14=fffffa80041bccf8 r15=fffff8a004d0a958
iopl=0         nv up ei pl nz na pe nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
nt!FsRtlTeardownPerStreamContexts+0x69:
fffff800`039345b5 48894808        mov     qword ptr [rax+8],rcx ds:002b:ffff78a0`04d0a730=????????????????

--------------------------------------------------------------------------------

2: kd> kb
  *** Stack trace for last set context - .thread/.cxr resets it
RetAddr           : Args to Child                                                           : Call Site
fffff880`012e5b8c : fffff8a0`04d0a6f0 fffff880`03193a28 fffff880`03193a28 00000000`00000706 : nt!FsRtlTeardownPerStreamContexts+0x69
fffff880`012eabb1 : 00000000`00000000 00000000`00000000 fffff800`0381e200 00000000`00000001 : Ntfs!NtfsDeleteScb+0x108
fffff880`01264620 : fffff8a0`04d0a5f0 fffff8a0`04d0a6f0 fffff800`0381e200 fffff880`03193b52 : Ntfs!NtfsRemoveScb+0x61
fffff880`012e862c : fffff8a0`04d0a5c0 fffff800`0381e280 fffff880`03193b52 fffffa80`04543460 : Ntfs!NtfsPrepareFcbForRemoval+0x50
fffff880`0126aab2 : fffffa80`04543460 fffffa80`04543460 fffff8a0`04d0a5c0 00000000`00000000 : Ntfs!NtfsTeardownStructures+0xdc
fffff880`012f7f93 : fffffa80`04543460 fffff800`0381e280 fffff8a0`04d0a5c0 00000000`00000009 : Ntfs!NtfsDecrementCloseCounts+0xa2
fffff880`012e732b : fffffa80`04543460 fffff8a0`04d0a6f0 fffff8a0`04d0a5c0 fffffa80`049ac180 : Ntfs!NtfsCommonClose+0x353
fffff800`03682251 : 00000000`00000000 fffff800`03970d00 fffff800`0381e201 fffff800`00000002 : Ntfs!NtfsFspClose+0x15f
fffff800`03916ede : 00000000`00000000 fffffa80`03a88040 00000000`00000080 fffffa80`039739e0 : nt!ExpWorkerThread+0x111
fffff800`03669906 : fffff880`02f63180 fffffa80`03a88040 fffff880`02f6dfc0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
00000000`00000000 : fffff880`03194000 fffff880`0318e000 fffff880`031939e0 00000000`00000000 : nt!KiStartSystemThread+0x16

Taking me back to my original question, where would I go from here? Did I do this correctly?
 
That call stack should be within the context of the context switch now. Run the .thread command to return to the thread, and then do a kb, the call stack should be slightly different. That's how I tell if I've done something right.

With the call stack, I usually try and search what the routines are doing, and then attempt to work out, what went wrong within the thread.
 
Thanks, here's what I got now:

Code:
2: kd> .thread
Implicit thread is now fffffa80`03a88040

2: kd> kb
RetAddr           : Args to Child                                                           : Call Site
fffff880`01260688 : 00000000`00000024 00000000`001904fb fffff880`03193718 fffff880`03192f70 : nt!KeBugCheckEx
fffff880`0134d9e4 : fffff880`012aa394 fffff880`03193be0 fffff880`03193be0 00000000`00000000 : Ntfs! ?? ::FNODOBFM::`string'+0x2899
fffff800`036a3cdc : fffffa80`044f4000 fffff800`037a9572 00000000`00000000 00000000`00000000 : Ntfs! ?? ::NNGAKEGL::`string'+0x5f94
fffff800`036a375d : fffff880`012aa388 fffff880`03193be0 00000000`00000000 fffff880`0125c000 : nt!_C_specific_handler+0x8c
fffff800`036a2535 : fffff880`012aa388 fffff880`031928a8 fffff880`03193718 fffff880`0125c000 : nt!RtlpExecuteHandlerForException+0xd
fffff800`036b34d1 : fffff880`03193718 fffff880`03192f70 fffff880`00000000 00000000`00000000 : nt!RtlDispatchException+0x415
fffff800`03678282 : fffff880`03193718 00000000`00000001 fffff880`031937c0 fffff8a0`04d0a6f0 : nt!KiDispatchException+0x135
fffff800`03676b8a : 00000000`00000008 00000000`000007ff fffff8a0`0a92cdb0 00000000`00000030 : nt!KiExceptionDispatch+0xc2
fffff800`039345b5 : fffff880`031939f0 fffff880`012e61e6 fffff8a0`0a3b9360 00000000`00000000 : nt!KiGeneralProtectionFault+0x10a
fffff880`012e5b8c : fffff8a0`04d0a6f0 fffff880`03193a28 fffff880`03193a28 00000000`00000706 : nt!FsRtlTeardownPerStreamContexts+0x69
fffff880`012eabb1 : 00000000`00000000 00000000`00000000 fffff800`0381e200 00000000`00000001 : Ntfs!NtfsDeleteScb+0x108
fffff880`01264620 : fffff8a0`04d0a5f0 fffff8a0`04d0a6f0 fffff800`0381e200 fffff880`03193b52 : Ntfs!NtfsRemoveScb+0x61
fffff880`012e862c : fffff8a0`04d0a5c0 fffff800`0381e280 fffff880`03193b52 fffffa80`04543460 : Ntfs!NtfsPrepareFcbForRemoval+0x50
fffff880`0126aab2 : fffffa80`04543460 fffffa80`04543460 fffff8a0`04d0a5c0 00000000`00000000 : Ntfs!NtfsTeardownStructures+0xdc
fffff880`012f7f93 : fffffa80`04543460 fffff800`0381e280 fffff8a0`04d0a5c0 00000000`00000009 : Ntfs!NtfsDecrementCloseCounts+0xa2
fffff880`012e732b : fffffa80`04543460 fffff8a0`04d0a6f0 fffff8a0`04d0a5c0 fffffa80`049ac180 : Ntfs!NtfsCommonClose+0x353
fffff800`03682251 : 00000000`00000000 fffff800`03970d00 fffff800`0381e201 fffff800`00000002 : Ntfs!NtfsFspClose+0x15f
fffff800`03916ede : 00000000`00000000 fffffa80`03a88040 00000000`00000080 fffffa80`039739e0 : nt!ExpWorkerThread+0x111
fffff800`03669906 : fffff880`02f63180 fffffa80`03a88040 fffff880`02f6dfc0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
00000000`00000000 : fffff880`03194000 fffff880`0318e000 fffff880`031939e0 00000000`00000000 : nt!KiStartSystemThread+0x16

At least we see the call stack is now different, however, I am not sure how to interpret what I'm seeing here.
 
Code:
2: kd> [COLOR=#008000].thread[/COLOR]
Implicit thread is now fffffa80`03a88040

2: kd>[COLOR=#008000] kb[/COLOR]
RetAddr           : Args to Child                                                           : Call Site
fffff880`01260688 : 00000000`00000024 00000000`001904fb fffff880`03193718 fffff880`03192f70 : nt!KeBugCheckEx
fffff880`0134d9e4 : fffff880`012aa394 fffff880`03193be0  fffff880`03193be0 00000000`00000000 : Ntfs! ??  ::FNODOBFM::`string'+0x2899
fffff800`036a3cdc : fffffa80`044f4000 fffff800`037a9572  00000000`00000000 00000000`00000000 : Ntfs! ??  ::NNGAKEGL::`string'+0x5f94
fffff800`036a375d : fffff880`012aa388 fffff880`03193be0 00000000`00000000 fffff880`0125c000 : nt!_C_specific_handler+0x8c
fffff800`036a2535 : fffff880`012aa388 fffff880`031928a8  fffff880`03193718 fffff880`0125c000 :  nt!RtlpExecuteHandlerForException+0xd
fffff800`036b34d1 : fffff880`03193718 fffff880`03192f70 fffff880`00000000 00000000`00000000 : nt!RtlDispatchException+0x415
fffff800`03678282 : fffff880`03193718 00000000`00000001 fffff880`031937c0 fffff8a0`04d0a6f0 : nt!KiDispatchException+0x135
fffff800`03676b8a : 00000000`00000008 00000000`000007ff fffff8a0`0a92cdb0 00000000`00000030 : nt!KiExceptionDispatch+0xc2
fffff800`039345b5 : fffff880`031939f0 fffff880`012e61e6 fffff8a0`0a3b9360 00000000`00000000 : [COLOR=#ff0000]nt!KiGeneralProtectionFault[/COLOR]+0x10a <-- Interrupt
fffff880`012e5b8c : fffff8a0`04d0a6f0 fffff880`03193a28  fffff880`03193a28 00000000`00000706 :  nt!FsRtlTeardownPerStreamContexts+0x69 <-- Exception
fffff880`012eabb1 : 00000000`00000000 00000000`00000000 fffff800`0381e200 00000000`00000001 : [COLOR=#ff0000]Ntfs!NtfsDeleteScb[/COLOR]+0x108
fffff880`01264620 : fffff8a0`04d0a5f0 fffff8a0`04d0a6f0 fffff800`0381e200 fffff880`03193b52 : [COLOR=#ff0000]Ntfs!NtfsRemoveScb[/COLOR]+0x61
fffff880`012e862c : fffff8a0`04d0a5c0 fffff800`0381e280 fffff880`03193b52 fffffa80`04543460 : [COLOR=#ff0000]Ntfs!NtfsPrepareFcbForRemoval[/COLOR]+0x50
fffff880`0126aab2 : fffffa80`04543460 fffffa80`04543460 fffff8a0`04d0a5c0 00000000`00000000 : Ntfs!NtfsTeardownStructures+0xdc
fffff880`012f7f93 : fffffa80`04543460 fffff800`0381e280 fffff8a0`04d0a5c0 00000000`00000009 : Ntfs!NtfsDecrementCloseCounts+0xa2
fffff880`012e732b : fffffa80`04543460 fffff8a0`04d0a6f0 fffff8a0`04d0a5c0 fffffa80`049ac180 : Ntfs!NtfsCommonClose+0x353
fffff800`03682251 : 00000000`00000000 fffff800`03970d00 fffff800`0381e201 fffff800`00000002 : Ntfs!NtfsFspClose+0x15f
fffff800`03916ede : 00000000`00000000 fffffa80`03a88040 00000000`00000080 fffffa80`039739e0 : nt!ExpWorkerThread+0x111
fffff800`03669906 : fffff880`02f63180 fffffa80`03a88040 fffff880`02f6dfc0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
00000000`00000000 : fffff880`03194000 fffff880`0318e000 fffff880`031939e0 00000000`00000000 : nt!KiStartSystemThread+0x16

Ntfs!NtfsPrepareFcbForRemoval refers to the File Control Block structure, therefore I'm guessing this structure is being removed by the filesystem. The FCB is used to maintain any open files, I think it's used to perform file operations and allow a program to have as many files open as it wants, it's maintained within the address space for the program.

Ntfs!NtfsRemoveScb and Ntfs!NtfsDeleteScb refer to the Stream Control Block, and is used to store the address of that file stream I believe, a file stream is used to hold data about a particular open file. Each open file has it's own file stream. I'm guessing that the file stream was closed, and therefore the structure wasn't needed anymore.

Going back to the other information you stated earlier, a access violation occured, so I'm wondering what if the address of the file which has been closed, has been referenced?

Code:
[/SIZE]
2: kd> [COLOR=#008000][/COLOR][SIZE=2][COLOR=#008000].exr 0xfffff88003193718[/COLOR][/SIZE] 
ExceptionAddress: fffff800039345b5 (nt!FsRtlTeardownPerStreamContexts+0x0000000000000069) 
   ExceptionCode: [COLOR=#ff0000][/COLOR][SIZE=2][COLOR=#ff0000]c0000005 (Access violation)[/COLOR][/SIZE]   
ExceptionFlags: 00000000 
NumberParameters: 2    
Parameter[0]: 0000000000000000    
Parameter[1]: ffffffffffffffff 

Attempt to read from address ffffffffffffffff

Personally, I would check the drivers with Driver Verifier, but I may be wrong, and would welcome any further information and advice :)

I hope this helps!


References:

File Control Block:

File control block - Wikipedia, the free encyclopedia

The FCB Structure (Windows Drivers)

Stream Control Block:

File Streams, Stream Contexts, and Per-Stream Contexts (Windows Drivers)
 
It did, very much so. Thank you.

I've had the user enable Verifier and there's nothing really of value. A this point in the analysis, I am assuming it's a hard drive issue itself even though it passed WD diagnostics, sfc, chkdsk, etc. The user has also done a clean install.
 
Have you checked the data cables? Swapped the cables around?

I would check the RAM too, mappings between virtual and physical memory corrupt? I'm trying to think of things which might have some relevance :huh:
 
Ah, good call! OP ran Memtest, but only for 1 hour. Advised him to re-run for ~8 passes and check physical connections.
 
That's good, and bad data cables can be a problem, I've seen it occur in a few thread over at SevenForums.
 

Has Sysnative Forums helped you? Please consider donating to help us support the site!

Back
Top