DPC_WATCHDOG_VIOLATION (0x133)

Patrick

Sysnative Staff
Joined
Jun 7, 2012
Posts
4,618
Hello everyone! In this post, we're going to be discussing DPC_WATCHDOG_VIOLATION (0x133) bug check debugging. I'll go as in detail as I can regarding the different parameters, why the bug check itself occurs, why it came to be, etc, as my personal knowledge permits. As always, if you have any information to add that you see I may have missed, or a correction, please do not hesitate to tell me!

Disclaimer: This is not a 'live' debugging tutorial, as in, it's a case I worked on weeks ago, and I no longer have access to the kernel-dump. The information I have here is purely what I managed to save for reference.



Before we get into the actual debugging of the bug check itself, let's talk about why this bug check 'came to be', etc, so you have a little more than a basic understanding. I guess you could call it a little history lesson! This is something I believe I am going to try and do with all of my future debugging posts. Not only is it neat & fun for me to get to talk about it, but it's always great to learn not just how to solve the bugcheck, but why it came to be an actual bugcheck.

Starting with Windows 8 (Server 2012, etc) we had the famous (or infamous depending on your stance with Windows 8!) touch-optimized user interface (GUI) that was based off of Microsoft's Metro. We all know the good & bad that has been said about Metro, so let's not get into that. Regardless, that's not what we're really here to discuss anyway. What we can say regarding the Metro GUI is that is was developed primarily in mind to improve the tablet-based computing experience as Microsoft was competing with mobile OS' such as Android and iOS.

Now, with that said (we'll get back to it) let's move on to the Interrupt handler, or otherwise known as ISR (Interrupt Service Routine). In its simplest definition, this is a callback subroutine in microcontroller firmware, operating system or device driver whose execution is triggered by the reception of an interrupt. Interrupt handlers have a multitude of functions, which vary based on the reason the interrupt was generated and the speed at which the interrupt handler completes its task.

ISR runs at a very high level (15 levels on x64-based processor, and 32 levels on x86-based processor). Programs/Applications run at 0, DPCs run at 1, and interrupts will go ahead and fire at a higher level than that. Let's say for example that you have IRQ 4 and IRQ 5 waiting to be serviced. IRQ 5 holds a higher priority than IRQ 4 does, therefore it will shove its own function to the top of the stack and continue running. Afterwards, it will come back, unwind, and then IRQ 4 will continue having no idea that it was stopped entirely. With this said, interrupts will interrupt each other (where the word interrupt comes from).

ISRs will queue work, because hogging of the CPU is not good. Obviously the CPU needs to address all of the interrupts, work its assigned, etc, however, it needs to be 'fair'. This queued work is turned into what is known as a DPC (Deferred Procedure Call). This mechanism allows high-priority tasks to defer required but lower-priority tasks for later execution from the ISR. This permits device drivers and other low-level event consumers to perform the high-priority part of their processing quickly, and schedule non-critical additional processing for execution at a lower priority.

DPCs block all applications from running (or being serviced by the CPU) as they take the absolute highest priority in being serviced. Absolutely no applications whatsoever (not even explorer.exe) can run when a DPC is PENDING and waiting to be serviced. Now, let's use a neat example for fun! I would go as far to say that if you've used a computer in your life for an extended period of time, you've at least once had a system hang, but you were still able to move your mouse cursor. However, even though you were able to use your mouse cursor, NO applications worked. You weren't able to Ctrl+Shift+Esc to bring up Task Manager, you weren't able to do anything at all regarding any applications. I always laugh when I think of this, because I just imagine tons and tons of frustration and attempting to open Task Manager over and over again, or trying to open the Start Menu to Restart the system.

Anyway, the reason no applications work, but the mouse cursor still does and is not frozen with the applications, is because the mouse cursor is actually an interrupt that's firing a scheduled DPC. With this said, we still have the drawing of the mouse cursor. So, with that said, if you've ever wondered why your system froze but you still had a moving mouse cursor, now you know why. Some pretty fun food for thought, I think!

Now that we're (hopefully) smiling about mouse cursors still moving during DPC hangs, we can return back to why we discussed Metro as a GUI. Remember, as I said, Metro was developed primarily in mind to improve the tablet-based computing experience. Now, on a tablet-based device, you don't have a keyboard. Okay, we don't have a keyboard on a tablet, big deal. Well, because we don't have a keyboard, if a DPC hang occurs, we cannot force a system crash (and/or break-out) to call a bugcheck and ultimately safely bring down the system. You may be asking "Couldn't I just power down the tablet"? Yes, absolutely. However, if you power down the tablet, you risk data corruption.

-- If you're interested in reading about forcing a bugcheck with the keyboard, take a look at this article - Forcing a System Crash from the Keyboard (Windows Debuggers)

Due to the inability to force a bugcheck if a DPC hang occurs on a tablet-based device due to the lack of a keyboard, the wonderful DPC_WATCHDOG_VIOLATION bug check was created. If we didn't have this bug check, if a DPC hang occurred on your tablet-based device, you'd either inevitably wait for the device to lose battery and simply turn off because it'd hang forever, or the DPC(s) MIGHT (depending - and I use might and depending very loosely) be serviced and Windows will continue.

To my knowledge, the rules that apply to calling the bugcheck in a DPC hang-based scenario are as follows:

1. If one specific DPC has been running for ~10 seconds, the system will bugcheck.

2. If an accumulation of DPCs have been running for ~15 seconds, the system will bugcheck.

(Microsoft recommends that DPCs should not run longer than 100 microseconds and ISRs should not run longer than 25 microseconds, however the actual timeout values on the system are set much higher).

-- I'd also like to go and make note that depending on your tablet device, there are sometimes alternative methods of performing an ACPI 'reset' by usually hitting two buttons simultaneously.



Fantastic! Now that we know all of the cool information above, let's proceed with the debugging. Remember, as with all bug checks, there is no same culprit. You will see 0x133's being caused by all sorts of things! With this said, just because you see what you see here in my debugging example, doesn't mean you'll see it in another! This is not a 'how to solve' tutorial, but more-so a look into how to generally debug this type of bug check.

Code:
DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending
    component can usually be identified with a stack trace.
Arg2: 0000000000000501, The DPC time count (in ticks).
Arg3: 0000000000000500, The DPC time allotment (in ticks).

^^ Here we have the basic bug check information. First off, the DPC_WATCHDOG_VIOLATION bug check can be triggered in two ways:

1. If a single DPC exceeds a specified number of ticks, the system will bugcheck with 0x133, and the first parameter of the bug check will be set to 0. If this is the case, the system's time limit for single DPC will be in the third parameter of the bug check, with the number of ticks taken by this DPC in the 2nd parameter of the bug check.

2. If the system exceeds a larger timeout of time spent cumulatively in all DPCs since the IRQL was raised to DPC level, the system will bugcheck with a 0x133, and the first parameter will be set to 1.

BugCheck 133, {0, 501, 500, 0}

^^ In this case, the first parameter = 0. We'd refer to #1. This specific DPC has run for 0x501 ticks, when the limit was 0x500.

In the case of a stop 0x133 with the first parameter set to 0, the call stack should contain the offending driver. Rather than running a kv as we usually do, let's go ahead and run knL:

Code:
     0: kd> knL
     # Child-SP          RetAddr           Call Site
    00 fffff803`a2459408 fffff803`a044bf4b nt!KeBugCheckEx
    01 fffff803`a2459410 fffff803`a0310774 nt! ?? ::FNODOBFM::`string'+0x145a4
    02 fffff803`a2459490 fffff803`a0228eca nt!KeUpdateTime+0x2ec
    03 fffff803`a2459670 fffff803`a02c501e hal!HalpTimerClockInterrupt+0x86
    04 fffff803`a24596a0 fffff880`18564dd2 nt!KiInterruptDispatchLBControl+0x1ce
    05 fffff803`a2459830 fffff803`a02f51ea [COLOR=#ff0000]usb80236!CancelSendsTimerDpc+0xa6[/COLOR]
    06 fffff803`a2459870 fffff803`a02f3655 nt!KiProcessExpiredTimerList+0x22a
    07 fffff803`a24599a0 fffff803`a02f5668 nt!KiExpireTimerTable+0xa9
    08 fffff803`a2459a40 fffff803`a02f4a06 nt!KiTimerExpiration+0xc8
    09 fffff803`a2459af0 fffff803`a02f59ba nt!KiRetireDpcList+0x1f6
    0a fffff803`a2459c60 00000000`00000000 nt!KiIdleLoop+0x5a

k = Displays the call stack of the given thread.
n = Displays frame numbers.
L (important you capitalize it) = Hides source lines in the display.

This makes a nice, neat, and informative call stack for 0x133 debugging.

As we can see above in the call stack, the offending driver is usb80236.sys, which is the Remote NDIS USB Driver. It calls into KiInterruptDispatchLBControl.

Let’s view the driver’s unassembled DPC routine:
Code:
     0: kd> ub fffff880`18564dd2
    usb80236!CancelSendsTimerDpc+0x81:
    fffff880`18564dad 488bcd          mov     rcx,rbp
    fffff880`18564db0 c6465001        mov     byte ptr [rsi+50h],1
    fffff880`18564db4 ff1556230000    call    qword ptr [usb80236!_imp_[COLOR=#ff0000]KeReleaseSpinLockFromDpcLevel[/COLOR] (fffff880`18567110)]
    fffff880`18564dba 488bcb          mov     rcx,rbx
    fffff880`18564dbd ff15a5220000    call    qword ptr [usb80236!_imp_[COLOR=#006400]IoCancelIrp[/COLOR] (fffff880`18567068)]
    fffff880`18564dc3 488bcd          mov     rcx,rbp
    fffff880`18564dc6 ff15e4220000    call    qword ptr [usb80236!_imp_[COLOR=#4b0082]KeAcquireSpinLockAtDpcLevel[/COLOR] (fffff880`185670b0)]
    fffff880`18564dcc ff8708030000    inc     dword ptr [rdi+308h]

1. The driver started by calling the KeAcquireSpinLockAtDpcLevel routine which acquires a spin lock when the caller is already running at IRQL >= DISPATCH_LEVEL.

The caller should release the spin lock with KeReleaseSpinLockFromDpcLevel as quickly as possible.

2. The driver then called the IoCancelIrp routine which sets the cancel bit in a given IRP and calls the cancel routine for the IRP if there is one.

-- If the IRP has a cancel routine, IoCancelIrp sets the cancel bit and calls the cancel routine.

3. The driver then called the KeReleaseSpinLockFromDpcLevel routine (as mentioned above) which releases an executive spin lock without changing the IRQL.

Going a bit deeper...

Code:
     0: kd> u fffff880`18564dd2
    usb80236!CancelSendsTimerDpc+0xa6:
    fffff880`18564dd2 488b36          mov     rsi,qword ptr [rsi]
    fffff880`18564dd5 493bf6          cmp     rsi,r14
    fffff880`18564dd8 75a9            jne     [COLOR=#ff0000]usb80236!CancelSendsTimerDpc+0x57[/COLOR] (fffff880`18564d83)
    fffff880`18564dda 4c8d87b0020000  lea     r8,[rdi+2B0h]
    fffff880`18564de1 488d8f70020000  lea     rcx,[rdi+270h]
    fffff880`18564de8 48c7c2001f0afa  mov     rdx,0FFFFFFFFFA0A1F00h
    fffff880`18564def ff1553220000    call    qword ptr [usb80236!_imp_[COLOR=#006400]KeSetTimer[/COLOR] (fffff880`18567048)]
    fffff880`18564df5 eb12            jmp     [COLOR=#4b0082]usb80236!CancelSendsTimerDpc+0xdd[/COLOR] (fffff880`18564e09)

1. The driver called the CancelSendsTimerDpc routine. I do not know exactly what this routine does, however, it's certainly something in regards to a timer on and/or for a DPC. According to Harry (x BlueRobot), he believes that the driver may use a Custom DPC associated with a Timer object.

2. The driver then calls the KeSetTimer routine which sets the absolute or relative interval at which a timer object is to be set to a signaled state and, optionally, supplies a CustomTimerDpc routine to be executed when that interval expires.

3. The driver then calls the CancelSendsTimerDpc routine again. As far as I know, what should be going on here is the CustomTimerDpc routine should be called, but CancelSendsTimerDpc may be in a loop.

Overall, what seems to be occurring is the DPC may be looping itself by gathering a Spinlock at DPC Level, cancelling the Timer, and then finally releasing the Spinlock again. This is happening over and over again, therefore we have a loop.

If we run a !pcr to show us the queued DPCs for the processor:

-- Essentially, !pcr will display the current status of the Processor Control Region (PCR) on a specific processor.

Code:
0: kd> !pcr
    KPCR for Processor 0 at fffff803a056a000:
        Major 1 Minor 1
        NtTib.ExceptionList: fffff803a2452000
            NtTib.StackBase: fffff803a2453080
           NtTib.StackLimit: 0000000004ccee48
         NtTib.SubSystemTib: fffff803a056a000
              NtTib.Version: 00000000a056a180
          NtTib.UserPointer: fffff803a056a7f0
              NtTib.SelfTib: 000000007ef44000

                    SelfPcr: 0000000000000000
                       Prcb: fffff803a056a180
                       Irql: 0000000000000000
                        IRR: 0000000000000000
                        IDR: 0000000000000000
              InterruptMode: 0000000000000000
                        IDT: 0000000000000000
                        GDT: 0000000000000000
                        TSS: 0000000000000000

              CurrentThread: fffff803a05c4880
                 NextThread: fffffa80036e8040
                 IdleThread: fffff803a05c4880

                  DpcQueue:  0xfffffa80073595e0 0xfffff8800400c960 [Normal] [COLOR=#ff0000]dxgkrnl!DpiFdoDpcForIsr[/COLOR]
                             0xfffffa8007be9b68 0xfffff88001efb380 [Normal] [COLOR=#4b0082]ndis!ndisInterruptDpc[/COLOR]
                             0xfffff803a04fcfe0 0xfffff803a027c71c [Normal] nt!PpmCheckPeriodicStart
                             0xfffff803a0545d60 0xfffff803a031e45c [Normal] nt!KiBalanceSetManagerDeferredRoutine

1. ndis.sys:

(Network Driver Interface Specification driver) interrupt routine. The Network Driver Interface Specification (NDIS) is an application programming interface (API) for network interface cards (NICs). The NDIS forms the Logical Link Control (LLC) sublayer, which is the upper sublayer of the OSI data link layer (layer 2). Therefore, the NDIS acts as the interface between the Media Access Control (MAC) sublayer, which is the lower sublayer of the data link layer, and the network layer (layer 3).

The NDIS is a library of functions often referred to as a "wrapper" that hides the underlying complexity of the NIC hardware and serves as a standard interface for level 3 network protocol drivers and hardware level MAC drivers. Another common LLC is the Open Data-Link Interface (ODI).

2. dxgkrnl.sys - Direct X Kernel.

So, with all of this said, we know that something is causing usb80236.sys to call into a loop, and it may be anything that's working with and/or possibility interfering with Windows' networking, or Direct X. We'll need to do some detective work to determine what is causing this, as it's a system driver and is being faulted by something else. At this point, since we're at quite the wall, I recommend enabling Driver Verifier so we can see what's going on. The user enabled DV, and sure enough, they had an 0xC4 crash! Let's have a look below!



Code:
DRIVER_VERIFIER_DETECTED_VIOLATION (c4)
A device driver attempting to corrupt the system has been caught.  This is
because the driver was specified in the registry as being suspect (by the
administrator) and the kernel has enabled substantial checking of this driver.
If the driver attempts to corrupt the system, bugchecks 0xC4, 0xC1 and 0xA will
be among the most commonly seen crashes.
Arguments:
Arg1: 0000000000001011, Invariant MDL buffer contents for Read Irp were modified during dispatch or buffer backed by dummy pages.
Arg2: [COLOR=#ff0000]fffffa8006219060[/COLOR], Device object to which the Read IRP was issued.
Arg3: [COLOR=#4b0082]fffff980098a8c60[/COLOR], The address of the IRP.
Arg4: fffff8801a5b3000, System-Space Virtual Address for the buffer that the MDL describes.

Here we have the basic bug check info, with the second/third parameter highlighted as they will be useful later on. Let's go ahead and take a look at the call stack first:

Code:
0: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`1af96888 fffff803`07a31fcc : 00000000`000000c4 00000000`00001011 fffffa80`06219060 fffff980`098a8c60 : nt!KeBugCheckEx
fffff880`1af96890 fffff803`07ebfa51 : 00000000`00000000 fffff880`1af96a00 fffff980`098a8c60 fffff803`07cba000 : nt!MdlInvariantPreProcessing1+0x200
fffff880`1af96900 fffff803`07ebdc51 : 00000000`00000000 fffffa80`0dd877d0 fffffa80`06218060 00000000`00000001 : nt!IovpCallDriver1+0x1cd
fffff880`1af96a60 fffff803`07eb4cde : fffff980`098a8c60 00000000`00000002 fffff880`1af96c40 00000000`c0000089 : nt!VfBeforeCallDriver+0x141
fffff880`1af96a90 fffff880`012ccdab : fffff980`098a8c60 fffffa80`06219060 fffffa80`062191b0 fffffa80`0dd877d0 : nt!IovCallDriver+0x35e
fffff880`1af96ae0 fffff880`012ccb7e : fffff980`098a8c60 fffff880`1af96ba0 fffff980`098a8e50 fffffa80`06219060 : [COLOR=#ff0000]intmsd+0x3dab[/COLOR]
fffff880`1af96b20 fffff880`012d4402 : 00000000`00000000 fffff880`1af96c40 00000000`00000000 fffff803`07928d44 : [COLOR=#ff0000]intmsd+0x3b7e[/COLOR]
fffff880`1af96b60 fffff880`012ccaf3 : fffff980`098a8c60 fffffa80`062191b0 fffffa80`06219060 00000000`00000000 : [COLOR=#ff0000]intmsd+0xb402[/COLOR]
fffff880`1af96bf0 fffff803`07eb4d66 : fffff980`098a8c60 00000000`00000002 fffffa80`0bf1b790 fffffa80`06219b10 : [COLOR=#ff0000]intmsd+0x3af3[/COLOR]
fffff880`1af96cc0 fffff803`07eb4d66 : fffff980`098a8c60 fffffa80`06219b10 00000000`00000002 fffffa80`0bf1b790 : nt!IovCallDriver+0x3e6
fffff880`1af96d10 fffff880`0127b14e : fffffa80`05356de0 fffffa80`05356c90 00000000`00000002 fffffa80`0d745ba0 : nt!IovCallDriver+0x3e6
fffff880`1af96d60 fffff803`07eb4d66 : 00000000`00000002 fffff980`098a8c60 fffffa80`05356c90 fffffa80`0dd66e00 : volmgr!VmReadWrite+0x13e
fffff880`1af96da0 fffff880`01f63faa : fffff980`098a8c60 00000001`94c3f000 fffff980`098a8c60 fffffa80`0dd66e00 : nt!IovCallDriver+0x3e6
fffff880`1af96df0 fffff880`01f64236 : fffff980`098a8c60 00000001`94c3f000 fffff980`098a8c60 fffff803`07ebdc51 : fvevol!FveReadWrite+0x3e
fffff880`1af96e30 fffff803`07eb4d66 : fffffa80`06227040 00000000`00000002 fffffa80`0d9eee40 fffff880`01c18329 : fvevol!FveFilterRundownReadWrite+0x1b6
fffff880`1af96e80 fffff880`01c01af2 : fffffa80`06228190 fffffa80`06228040 00000000`00000002 fffffa80`0d9eee40 : nt!IovCallDriver+0x3e6
fffff880`1af96ed0 fffff803`07eb4d66 : fffff980`098a8c60 00000000`00000002 00000000`00000030 00000000`00000000 : volsnap!VolSnapReadFilter+0x112
fffff880`1af96f00 fffff880`01817b69 : fffff880`1af79828 fffff880`1af798f0 fffffa80`0c543080 fffffa80`045faf40 : nt!IovCallDriver+0x3e6
fffff880`1af96f50 fffff803`078c8b67 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000001 : Ntfs!NtfsStorageDriverCallout+0x16
fffff880`1af96f80 fffff803`078c8b2d : 00000000`00000000 00000000`00000000 00000000`00000002 fffff803`07923ad8 : nt!KxSwitchKernelStackCallout+0x27 (TrapFrame @ fffff880`1af96e40)
fffff880`1af79690 fffff803`07923ad8 : 00000000`00000006 00000000`00000000 00000000`00000006 00000000`00000000 : nt!KiSwitchKernelStackContinue
fffff880`1af796b0 fffff803`07926405 : fffff880`01817b54 fffff880`1af79828 00000000`00055000 00000000`00000001 : nt!KeExpandKernelStackAndCalloutInternal+0x218
fffff880`1af797b0 fffff880`01817aac : fffff880`1af79c60 fffff980`098a8c60 fffff8a0`01299b90 00000000`0005d000 : nt!KeExpandKernelStackAndCalloutEx+0x25
fffff880`1af797f0 fffff880`0181646a : fffff880`1af798a0 fffff880`1af79c60 fffff8a0`01299b90 fffff880`018770b0 : Ntfs!NtfsMultipleAsync+0xac
fffff880`1af79860 fffff880`01825b26 : 00000000`00000000 fffff880`1af79df0 fffff8a0`01299be8 fffff8a0`01299b90 : Ntfs!NtfsNonCachedIo+0x26a
fffff880`1af79a70 fffff880`0182742b : fffff880`1af79c60 fffff980`098a8c60 fffffa80`0c169701 00000000`00000001 : Ntfs!NtfsCommonRead+0x896
fffff880`1af79c30 fffff803`07eb4d66 : fffff980`098a8c60 fffff980`098a8c60 00000000`00000002 fffffa80`0dd800c0 : Ntfs!NtfsFsdRead+0x1db
fffff880`1af79e70 fffff880`013924ee : fffffa80`0fb303f0 fffff880`1af79f00 fffff980`098a8c60 fffffa80`0dd800c0 : nt!IovCallDriver+0x3e6
fffff880`1af79ec0 fffff880`013900b6 : fffffa80`0538bb00 00000000`00000002 fffff980`098a8c60 fffffa80`0d579e28 : FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x25e
fffff880`1af79f60 fffff803`07eb4d66 : fffff980`098a8c60 00000000`00000002 fffffa80`0d5f47b0 fffffa80`0398e598 : FLTMGR!FltpDispatch+0xb6
fffff880`1af79fc0 fffff803`0794b43e : fffff980`098a8c60 00000000`00000000 fffffa80`0d5f4878 fffffa80`0d579d80 : nt!IovCallDriver+0x3e6
fffff880`1af7a010 fffff803`0794fa61 : fffffa80`0d5f47b0 fffffa80`0d5f47d8 fffffa80`0d5f4818 fffffa80`0d5f4808 : nt!IoPageRead+0x21e
fffff880`1af7a060 fffff803`0794ab20 : 00000000`00000002 fffff880`1af7a0d0 fffffa80`0c543080 fffffa80`0d5f47b0 : nt!MiIssueHardFaultIO+0xc9
fffff880`1af7a0a0 fffff803`07908d8f : fffffa80`0c543080 fffff803`07bc5f40 00000000`c0033333 00000000`00000000 : nt!MiIssueHardFault+0x170
fffff880`1af7a130 fffff803`0792369b : 00000000`00000000 fffffa80`0c543080 8de00001`15efb900 00000000`00000000 : nt!MmAccessFault+0x81f
fffff880`1af7a270 fffff803`07939a37 : 00000000`00000000 00000000`00051000 00000000`00055000 fffff880`1af7a438 : nt!MmCheckCachedPageStates+0x8db
fffff880`1af7a400 fffff803`07938e62 : fffffa80`0c16b010 0000003c`a4737c90 fffff880`1af7a550 fffff880`00000000 : nt!CcMapAndCopyInToCache+0x397
fffff880`1af7a4f0 fffff880`018d4070 : 00000000`000551b8 fffffa80`0538bb00 fffffa80`0c5acbe8 fffffa80`0c1697b0 : nt!CcCopyWriteEx+0x1b2
fffff880`1af7a590 fffff880`01394415 : 00000000`00000000 00000000`000002b0 fffff880`1af7a9b0 00000000`00000000 : Ntfs!NtfsCopyWriteA+0x290
fffff880`1af7a7f0 fffff880`01394b53 : fffff880`1af7a8e0 00000000`00000000 fffffa80`0c169700 fffffa80`0c5aca90 : FLTMGR!FltpPerformFastIoCall+0x155
fffff880`1af7a850 fffff880`013bcabd : 00000000`000000d8 00000000`00000000 00000000`00000000 00000000`00000000 : FLTMGR!FltpPassThroughFastIo+0xc3
fffff880`1af7a8b0 fffff803`07caa249 : fffffa80`0c1697b0 00000000`00000000 00000000`00000000 00000000`00000000 : FLTMGR!FltpFastIoWrite+0x19d
fffff880`1af7a960 fffff803`078cd453 : fffffa80`0c543001 00000000`00000000 00000000`00000000 00000000`00000000 : nt!NtWriteFile+0x5b8
fffff880`1af7aa90 000007f8`a88c2c6a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`1af7ab00)
0000003c`a744fa28 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x000007f8`a88c2c6a

As we can see, we have many different file system related routines (Ntfs, FLTMGR, etc). Why? Well, as we move up the stack, we eventually see four intmsd.sys calls. This is the IntelliMemory Storage Filter Driver from Condusiv Technologies.

Now, let's go ahead and refer to the 2nd/3rd parameters as I've highlighted above!

If we run an !irp on the 3rd parameter:

Code:
      [  3, 0]  10 e0 fffffa8006219060 00000000 fffff880012ccbc8-fffffa8006219060 Success Error Cancel
               \Driver\[COLOR=#ff0000]intmsd[/COLOR]    [COLOR=#ff0000]intmsd[/COLOR]
                Args: 00008000 00000000 1cec3f000 00000000
    >[  3, 0]  10 e1 fffffa8006219060 00000000 fffff880012183c0-00000000 Success Error Cancel pending
               \Driver\[COLOR=#ff0000]intmsd[/COLOR]    partmgr!PmIoCompletion

If we run a !devobj on the 2nd parameter:

Code:
    0: kd> !devobj fffffa8006219060
    Device object (fffffa8006219060) is for:
     intmsd0 \Driver\[COLOR=#ff0000]intmsd[/COLOR] DriverObject fffffa80049241b0
    Current Irp 00000000 RefCount 0 Type 00000007 Flags 00000850
    Vpb fffffa800534d8f0 Dacl fffff9a10052d360 DevExt fffffa80062191b0 DevObjExt fffffa8006219930 Dope fffffa800534d880
    ExtensionFlags (0x80000800)  DOE_DEFAULT_SD_PRESENT, DOE_DESIGNATED_FDO
    Characteristics (0x00000100)  FILE_DEVICE_SECURE_OPEN
    [COLOR=#4b0082]AttachedDevice (Upper) fffffa8006219b10 \Driver\partmgr[/COLOR]
    [COLOR=#4b0082]AttachedTo (Lower) fffffa8006218060 \Driver\disk[/COLOR]
    Device queue is not busy.

That's why we're seeing so many file system and storage related routines being called. After this was found, I recommend disabling and/or preferably uninstalling IntelliMemory. After uninstalling IntelliMemory, the crashes ceased. Why?

First off, IntelliMemory is an intelligent data caching technology that provides faster access to frequently used files. IntelliMemory is supposed to improve latency and throughput by reducing disk I/O requests as active files are predicatively cached within the server to preempt round trips between VMs and network storage.

Remember how we saw various network related routines, etc, during the 0x133 debugging? Well, it's because IntelliMemory was the driver that was causing the loop.

I hope this helps!



-- I will also discuss scenario #2 (when the first parameter is set to 1) when I have debugged such a situation myself!
 
Last edited:
Excellent article Patrick! Really you always manage to fascinate me with your debugging tutorials and write ups. Keep up the good work mate ^_^
 
While this is an old tutorial, the following debugging extensions may prove to be useful with a Stop 0x133:

The !dpcs debugging extension would have been a better alternative:

Code:
6: kd>[COLOR=#008000] !dpcs[/COLOR]
CPU Type      KDPC       Function

dpcs: no pending DPCs found

We can also find some interesting fields within the _KPRCB structure:

Code:
0: kd> [COLOR=#008000]dt nt!_KPRCB -y DpcWatchdogPeriod[/COLOR]
   +0x58e8 DpcWatchdogPeriod : Int4B

Code:
0: kd> [COLOR=#008000]dt nt!_KPRCB -y DpcWatchdogCount[/COLOR]
   +0x58ec DpcWatchdogCount : Int4B

Code:
0: kd> [COLOR=#008000]dt nt!_KPRCB -y DpcTimeCount[/COLOR]
   +0x5ba0 DpcTimeCount : Uint4B

Code:
0: kd>[COLOR=#008000] dt nt!_KPRCB -y DpcTimeLimit[/COLOR]
   +0x5ba4 DpcTimeLimit : Uint4B
 
Sorry to bump an old thread, but I've found some interesting debugging extensions from an OSR Online thread:

Rich (BB code):
DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending
    component can usually be identified with a stack trace.
Arg2: 0000000000000501, The DPC time count (in ticks).
Arg3: 0000000000000500, The DPC time allotment (in ticks).
Arg4: fffff80060973358, cast to nt!DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which contains
    additional information regarding this single DPC timeout

You may have noticed the fourth parameter contains a pointer to the DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which WinDbg states contains additional debugging information regarding the DPC timeout. You can actually obtain this information from the dump file by using the following command.

Rich (BB code):
1: kd> !dpcwatchdog

All durations are in seconds (1 System tick = 0.000000 milliseconds)

Circular Kernel Context Logger history: !logdump 0x2
DPC and ISR stats: !intstats /d

--------------------------------------------------
CPU#0
--------------------------------------------------
Current DPC: No Active DPC

Pending DPCs:
----------------------------------------
CPU Type      KDPC       Function
Failed to read DPC at 0xfffff8006083b520

dpcs: no pending DPCs found

--------------------------------------------------
CPU#1
--------------------------------------------------
Current DPC: nt!IopLiveDumpCorralDpc (Normal DPC)
Debugger Saved IRQL: 13
Cumulative DPC Time Limit: 0.000 seconds
Current Cumulative DPC Time: 0.000 seconds
Single DPC Time Limit: 0.000 seconds
Current Single DPC Time: 0.000 seconds
dt nt!_ISRDPCSTATS ffff8e8b5a0a04b0
IsrActive: TRUE

Pending DPCs:
----------------------------------------
CPU Type      KDPC       Function
1: Normal  : 0xffff8e8b5f669da8 0xfffff80069f71010 HDAudBus!HdaController::NotificationDpc
1: Normal  : 0xffff8e8b5f62ec30 0xfffff80064522610 Wdf01000!FxInterrupt::_InterruptDpcThunk
1: Normal  : 0xffff8e8b5f630d80 0xfffff80064522610 Wdf01000!FxInterrupt::_InterruptDpcThunk


c0000094 Exception in kdexts.dpcwatchdog debugger extension.
      PC: 00007ff8`45854b30  VA: 00000000`00000000  R/W: 0  Parameter: 00000000`00000000

Looking at the call stack, we can see the DPC which took too long to execute:

Rich (BB code):
1: kd> knL
# Child-SP          RetAddr           Call Site
00 ffffa880`4739fb08 fffff800`605eea3d nt!KeBugCheckEx
01 ffffa880`4739fb10 fffff800`6041f897 nt!KeAccumulateTicks+0x1cbfdd
02 ffffa880`4739fb70 fffff800`603601e1 nt!KeClockInterruptNotify+0xc07
03 ffffa880`4739ff30 fffff800`60402a25 hal!HalpTimerClockIpiRoutine+0x21
04 ffffa880`4739ff60 fffff800`605c2f7a nt!KiCallInterruptServiceRoutine+0xa5
05 ffffa880`4739ffb0 fffff800`605c34e7 nt!KiInterruptSubDispatchNoLockNoEtw+0xfa
06 ffffde08`c2a30a10 fffff800`6041ba1d nt!KiInterruptDispatchNoLockNoEtw+0x37
07 ffffde08`c2a30ba0 fffff800`609ab3f5 nt!KeYieldProcessorEx+0xd
08 ffffde08`c2a30bd0 fffff800`609aa555 nt!IopLiveDumpProcessCorralStateChange+0x2d
09 ffffde08`c2a30c00 fffff800`6046ae95 nt!IopLiveDumpCorralDpc+0x55
0a ffffde08`c2a30c40 fffff800`6046a4ef nt!KiExecuteAllDpcs+0x305
0b ffffde08`c2a30d80 fffff800`605c8525 nt!KiRetireDpcList+0x1ef
0c ffffde08`c2a30fb0 fffff800`605c8310 nt!KxRetireDpcList+0x5
0d ffffde08`c38a06b0 fffff800`605c79e0 nt!KiDispatchInterruptContinue
0e ffffde08`c38a06e0 00000000`00000000 nt!KiDpcInterrupt+0x2f0

Source: Please help. PCI driver DMA DPC cause BSOD on Win10 1809 version
 
Back
Top