- May 7, 2013
- 10,400
Purpose: A collective summary of on the workings of a Stop 0x9F, and how to debug the two most common Stop 0x9F bugcheck types. I originally wrote this article in a Word document which spans 10 pages, so there may be most grammar mistakes or spelling mistakes.
Note: Since we're dealing with Stop 0x9F's, I switch interchangeably Watchdog Timer and Timer when speaking about the Timer types. Thanks to Jared for providing me with the Kernel Mode Dump files.
How it Works?
The Stop 0x9F common refers to the system being in a inconsistent or unsupported power state. This bugcheck is almost always exclusively related to device drivers not handling Power IRPs correctly. It's important to remember that filter and bus drivers are commonly blamed with these bugchecks, because these drivers tend to be the only drivers which will handle the actual Power IRP.
Stop 0x9Fs are largely caused by a device driver which isn't passing the Power IRP through it's driver stack to appropriate drivers which are able to process the Power IRP. Windows monitors this activity with the use of timer dispatcher objects. If we examine the stack of a Stop 0x9F, we can notice a few interesting functions.
The most two most important functions are PopCheckIrpWatchdog and PopCheckForIdleness. The Pop prefix indicates we're dealing with the Power Manager, an important subsystem of the Windows kernel. The PopCheckForIdleness function periodically checks for inactivity, and is built upon a DPC object. If examine the lower stack frames then we can see a timer object has expired, the timer object is associated with the previously stated function, so when the timer object expires, the PopCheckForIdleness is called and calls it's own code. The function will run in intervals of 100 seconds, each time it will call another function called PopCheckIrpWatchdog. The Watchdog will check a linked list where the IRPs with the pending status are stored (PopIrpList), and then increment a counter. Once the counter threshold has been breached, the KeBugCheckEx function will be called and a crash with the Stop 0x9F stop code will be produced. The counter threshold is stored within a variable called PopWatchdogMaxTicks.
We can view all pending timers with either the !timer extension or !ms_timers (requires SwishDbgExt.dll). All timers are created using a structure called _KTIMER or IO_TIMER.
Before Windows 7, the timers were stored inside a doubly linked list which has a different expiration time associated with each list head. There were 512 list heads, and each list head was stored within an array called KiTimerListHead. However, from Windows 7 onwards, the timer lists are now stored within a structure called KTIMER_TABLE.
The TimerExpiry field corresponds to the list of timers which are set to be wakened at a given time interval. The TimerEntries field indicates the list heads for the given time interval. The KTIMER_TABLE_ENTRY has the following structure:
The Entry field is the position of a timer object within the linked list as discussed above, and the Time entry is the time where the timer object is set to expire.
The _KTIMER data structure has the following entries:
The DueTime field is self-explanatory, it's the time which the timer has been set to expire. The Processor field is which processor number the timer object has been associated with, and the Dpc field is which DPC object is being used to schedule when the appropriate driver should be notified of the timer expiration, this for synchronization purposes. However, threads attached to timer objects can be awoken immediately after the timer expiration. The Period field is the same as the DueTime field. It's also important to remember that timer objects can be episodic (repeatedly) or periodic (only once).
There is another timer structure called _IO_TIMER, and is used with a associated device object. These timer objects can be used by driver developers to check specific operations within their device.
The Timer field within the DEVICE_OBJECT structure contains a 64-bit pointer to the _IO_TIMER structure, which is defined as follows:
The timer object described is used to monitor any important operations, and ensure that these operations are successful, otherwise the operating system is informed to crash and produce a bugcheck. The timer is created with the IoIntializeTimer function, taking parameters which include the address of the specified device object and callback routine which is to be called every second (see TimerRoutine field), once the timer object has been started with the IoStartTimer. If the operation is successful or unsuccessful, then the timer can be stopped with IoStopTimer, and the appropriate code will be executed to handle the exception. The same timer can be restarted (called 'kicking the dog') by calling IoStartTimer again.
To summarise quickly, the timer object is used when the driver notices an operation hasn't been completed for a given timer interval, and this operation is vital for the stability of the device, thereby the Watchdog Timer is called once every second until that Timer has timed out - or the operation completes successfully - leading to an exception and an IRP with status code STATUS_IO_TIMEOUT being used to implement any exception handling for the device object. The IoCompleteRequest function will check the stated status, by looking into the _IO_STATUS_BLOCK structure.
We have now discussed how the Stop 0x9F is created and have briefly examined how timer objects work. We'll now discuss a little about Power IRPs and particularly how they handled within a Stop 0x9F. This last section will wrap up about the internals of a Stop 0x9F. In preceding sections, we will investigate the common causes of a Stop 0x9F and how to debug these crashes accordingly.
A Stop 0x9F is generally caused by a Power IRP being blocked somewhere within the device stack. Power IRPs are sent by the Power Manager or the Power Policy Device Manager. This is achieved by PoRequestPowerIrp. Power IRPs are used to achieve several things, but to say within the topic of discussion, we'll let Power IRPs main purpose being handling power transitions, wherever from sleep or to sleep. We can check power transition states with the !poaction.
Power IRPs are filtered through the device stack until they reach a bus driver, which is typically the only type of driver which can handle such IRPs. We can view a device stack for a specified physical device object by using the !devstack.
The Power IRP is passed down the device stack until it reaches the bus driver, which as stated before, tends to be only driver that is able to process the Power IRP, and therefore tends to be the driver which is blamed for the crash. With some IRPs, several different device stacks may be used to process an IRP, and therefore it can become more difficult in tracking where the IRP has become blocked, or if a preceding driver has caused the IRP to become blocked in some other location.
The above device stack belongs to the device node stated below it. Each device node (represents a physical device) will have a device stack associated with it.
The drivers used by the device objects to process the IRP form another called the driver stack. Every IRP has a Major Function Code and a Minor Function Code, in the case of a Stop 0x9F, the very common IRP which is seen is the [16,2], where 0x16 is the Major Function Code (IRP_MJ_POWER) and 0x2 is the Minor Function Code (IRP_MN_SET_POWER). The 0x16 Major Function Code shows us that the IRP is in fact a Power IRP, with the 0x2 indicating that the specific purpose of the Power IRP was to notify a driver of a power state change in the system.
0x4 – How it Works?
Let's have a quick look at the internal works of a Stop 0x9F in relation to the PnP Manager. I've explained most of this Stop 0x9F in the debugging example, but this will provide a small generic description of how PnPs work.
There are two types of PnP IRPs: State Changing and Non-State Changing. With this particular type of bugcheck, we're most interested in the State Changing PnP IRPs.
The main difference between State Changing and Non-State Changing IRPs, is how these IRPs are handled by the I/O Manager. State Changing IRPs must be synchronized against each other, this prevents two state changing IRPs from existing in the same stack. The most obvious problem this helps to prevent, this when an removal request and a start request are being processed at the same time. The State Changing IRPs have to be synchronized with System Power IRPs, a power transition must be processed before a PnP IRP. The State Changing PnP IRPs are only reserved for the use of system drivers.
On the other hand, there is a few Power IRPs which the PnPs aren't synchronized with, and these IRPs are the following:
There is a number of causes for a Stop 0x9F, but the main two causes are a device driver blocking a IRP within the device stack, or a timer object has timed out and failed to notify a thread of a power transition state change and synchronize with the PnP Manager.
A complete list of parameters can be found on the MSDN page and in the BSOD Index.
Stop 0x9F – 0x3:
The two most important arguments are the second and the fourth argument. The fourth argument contains the address of the blocked IRP, and the second argument contains the address of the PDO (Physical Device Object) of the stack location where the IRP has become blocked.
Using the !irp extension on the fourth parameter, we can view the current stack where the IRP has become stuck.
The output doesn't give us much indication, in most cases, a potential third party driver can be found, however, we'll need to investigate further using the !stacks extension to display all the kernel stacks in the system. I don't regularly use this extension simply because of the sheer volume of output it produces, there is hundreds of threads to check through.
As a side note, the fffff9801c458e90 address seen with the IRP is the current I/O Stack Location, which is stored with the IO_STACK_LOCATION structure:
There is one IO_STACK_LOCATION for each driver within the driver stack for the associated IRP.
This is a small subset of the stacks extension, the first column contains the Process ID and the Thread ID, in this case it is a System process. The second column contains the address of the _ETHREAD block which we can pass to the !thread extension. The threads which are of a interest to us are the ones which have the Blocked status.
I haven't been able to find a direct cause, but based on the device stack for the blocked IRP, I'm assuming that AVG may a culprit driver.
Stop 0x9F – 0x4:
This bugcheck indicates that we have failed to synchronize our power transition with the PnP Manager. This is common for the removal PnP devices, since after the removal of the device we need to update the device tree to show that the device object has been removed from the system.
The timeout is specified in the second argument, and the thread which is holding onto the lock required to access the PnP is shown in the third argument. We'll begin by using the !thread extension with the thread address, and then examining the stack.
As we can see in the call stack for the thread, there is two device removal functions, which in turn supports our argument from earlier on. More importantly, if we check the pending IRP for the current thread, we'll find some more evidence relating to PnP and the third party driver shown in our call stack.
The Major Function code corresponds to IRP_MJ_PNP, and the Minor Function Code corresponds to IRP_MN_SURPRISE_REMOVAL. These IRP function codes can be found within the WinDbg documentation, and an explanation can be found on the appropriate MSDN pages. I will briefly explain these function codes, and how they link in with the nature of the bugcheck. PnP IRPs are handled by DispatchPnP driver dispatch rountine.
The Minor Function Code is used to notify important Kernel components and User Mode applications that the device is being removed from the system, and no other IRPs will be able to be sent to that device. The IRP is exclusively reserved for the system, and thus no other drivers should be sending the IRP. The IRP is processed from the top layer of the device stack down to the bottom layer of the device stack.
By checking the IRP, the call stack makes much more sense to us, at the moment the thread is currently waiting on a lock. We need to investigate what this lock is, and why these locks are required. Using the !locks extension, we can list all the currently acquired locks on executive resources (_ERESOURCE).
The current thread holding both locks is our PnP thread. The most interesting lock is the IopDeviceTreeLock. The other thread waiting on the PiEngineLock is a kernel related thread which is handling other device removal operations and power transitions.
Instead of going in depth about the various locks and synchronization mechanisms used in Windows, it would be better to explain why the thread is using a lock to begin with. The answer with our PnP device.
The main reason of acquiring the lock is to ensure that the driver will not use DispatchPnP to handle a IRP_MN_REMOVE_DEVICE IRP while another driver is performing an I/O operation at the same time. Furthermore, by using these lock routines, the driver can determine when all I/O operations have been completed for that device, and therefore when it is safe to remove that device.
The IO_REMOVE_LOCK illustrates a counter which is used to check when all the pending I/O operations have been completed for that device. Once the I/O operations have been completed, then the device can be safely removed from the system. A safe removal of the device is indicated with the IoRemoveLockAndWait. The IO_REMOVE_LOCK is destroyed when the device is removed from the system.
In conclusion, the ZTEusbnet.sys driver seems like the most likely culprit, it is a USB NDIS Miniport driver developed by a company called ZTE.
Commands and Extensions
This a quick overview of the extensions and commands which can be used for debugging Stop 0x9F. Some of the extensions are added for informational purposes i.e. you wish to learn a little about the nature of the device etc.
!irp, !irpfind, !devext, !devobj, !drvobj, !popolicy, !timer, !ms_timer, !thread, !stacks, !devnode,
!devstack, !podev, !poaction, !popolicy, !poreqlist, !locks and !pnptriage.
Data Structures
*Discussed in article
_IO_TIMER*, _DEVICE_OBJECT*, _DRIVER_OBJECT, _KTIMER*, _KTIMER_TABLE_ENTRY*, _KTIMER_TABLE*, _SYSTEM_POWER_STATE, _DEVICE_POWER_STATE, _SYSTEM_POWER_POLICY, _IO_STATUS_BLOCK*, _TRIAGE_9F_POWER
References:
Managing PnP State Transistions
Using Remove Locks
Debugging Stop 0x9F - CodeMachine
Which PnP IRPs are State Changing? - Hole In My Head
A Performance Issue in Windows Timer Management?
Ain't Nuthin But A K(Timer) Thing, Baby
Power IRPs for Individual Devices
I/O Stack Locations
Note: Since we're dealing with Stop 0x9F's, I switch interchangeably Watchdog Timer and Timer when speaking about the Timer types. Thanks to Jared for providing me with the Kernel Mode Dump files.
How it Works?
The Stop 0x9F common refers to the system being in a inconsistent or unsupported power state. This bugcheck is almost always exclusively related to device drivers not handling Power IRPs correctly. It's important to remember that filter and bus drivers are commonly blamed with these bugchecks, because these drivers tend to be the only drivers which will handle the actual Power IRP.
Stop 0x9Fs are largely caused by a device driver which isn't passing the Power IRP through it's driver stack to appropriate drivers which are able to process the Power IRP. Windows monitors this activity with the use of timer dispatcher objects. If we examine the stack of a Stop 0x9F, we can notice a few interesting functions.
The most two most important functions are PopCheckIrpWatchdog and PopCheckForIdleness. The Pop prefix indicates we're dealing with the Power Manager, an important subsystem of the Windows kernel. The PopCheckForIdleness function periodically checks for inactivity, and is built upon a DPC object. If examine the lower stack frames then we can see a timer object has expired, the timer object is associated with the previously stated function, so when the timer object expires, the PopCheckForIdleness is called and calls it's own code. The function will run in intervals of 100 seconds, each time it will call another function called PopCheckIrpWatchdog. The Watchdog will check a linked list where the IRPs with the pending status are stored (PopIrpList), and then increment a counter. Once the counter threshold has been breached, the KeBugCheckEx function will be called and a crash with the Stop 0x9F stop code will be produced. The counter threshold is stored within a variable called PopWatchdogMaxTicks.
Code:
0: kd> [COLOR=#008000]? poi(nt!PopWatchdogMaxTicks)[/COLOR]
Evaluate expression: [COLOR=#ff0000]41[/COLOR] = 00000029
We can view all pending timers with either the !timer extension or !ms_timers (requires SwishDbgExt.dll). All timers are created using a structure called _KTIMER or IO_TIMER.
Before Windows 7, the timers were stored inside a doubly linked list which has a different expiration time associated with each list head. There were 512 list heads, and each list head was stored within an array called KiTimerListHead. However, from Windows 7 onwards, the timer lists are now stored within a structure called KTIMER_TABLE.
Code:
0: kd> [COLOR=#008000]dt nt!_KTIMER_TABLE[/COLOR]
+0x000 TimerExpiry : [64] Ptr64 _KTIMER
+0x200 TimerEntries : [256] _KTIMER_TABLE_ENTRY
The TimerExpiry field corresponds to the list of timers which are set to be wakened at a given time interval. The TimerEntries field indicates the list heads for the given time interval. The KTIMER_TABLE_ENTRY has the following structure:
Code:
0: kd> [COLOR=#008000]dt nt!_KTIMER_TABLE_ENTRY[/COLOR]
+0x000 Lock : Uint8B
+0x008 Entry : _LIST_ENTRY
+0x018 Time : _ULARGE_INTEGER
The Entry field is the position of a timer object within the linked list as discussed above, and the Time entry is the time where the timer object is set to expire.
The _KTIMER data structure has the following entries:
Code:
0: kd> [COLOR=#008000]dt nt!_KTIMER[/COLOR]
+0x000 Header : _DISPATCHER_HEADER
[COLOR=#ff0000]+0x018 DueTime : _ULARGE_INTEGER[/COLOR]
+0x020 TimerListEntry : _LIST_ENTRY
[COLOR=#ff0000]+0x030 Dpc : Ptr64 _KDPC[/COLOR]
[COLOR=#ff0000]+0x038 Processor : Uint4B[/COLOR]
+0x03c Period : Uint4B
The DueTime field is self-explanatory, it's the time which the timer has been set to expire. The Processor field is which processor number the timer object has been associated with, and the Dpc field is which DPC object is being used to schedule when the appropriate driver should be notified of the timer expiration, this for synchronization purposes. However, threads attached to timer objects can be awoken immediately after the timer expiration. The Period field is the same as the DueTime field. It's also important to remember that timer objects can be episodic (repeatedly) or periodic (only once).
There is another timer structure called _IO_TIMER, and is used with a associated device object. These timer objects can be used by driver developers to check specific operations within their device.
Code:
0: kd> [COLOR=#008000]dt nt!_DEVICE_OBJECT[/COLOR]
+0x000 Type : Int2B
+0x002 Size : Uint2B
+0x004 ReferenceCount : Int4B
+0x008 DriverObject : Ptr64 _DRIVER_OBJECT
+0x010 NextDevice : Ptr64 _DEVICE_OBJECT
+0x018 AttachedDevice : Ptr64 _DEVICE_OBJECT
+0x020 CurrentIrp : Ptr64 _IRP
[COLOR=#ff0000]+0x028 Timer : Ptr64 _IO_TIMER[/COLOR]
+0x030 Flags : Uint4B
+0x034 Characteristics : Uint4B
+0x038 Vpb : Ptr64 _VPB
+0x040 DeviceExtension : Ptr64 Void
+0x048 DeviceType : Uint4B
+0x04c StackSize : Char
+0x050 Queue : <unnamed-tag>
+0x098 AlignmentRequirement : Uint4B
+0x0a0 DeviceQueue : _KDEVICE_QUEUE
+0x0c8 Dpc : _KDPC
+0x108 ActiveThreadCount : Uint4B
+0x110 SecurityDescriptor : Ptr64 Void
+0x118 DeviceLock : _KEVENT
+0x130 SectorSize : Uint2B
+0x132 Spare1 : Uint2B
+0x138 DeviceObjectExtension : Ptr64 _DEVOBJ_EXTENSION
+0x140 Reserved : Ptr64 Void
The Timer field within the DEVICE_OBJECT structure contains a 64-bit pointer to the _IO_TIMER structure, which is defined as follows:
Code:
0: kd> [COLOR=#008000]dt nt!_IO_TIMER[/COLOR]
+0x000 Type : Int2B
+0x002 TimerFlag : Int2B
+0x008 TimerList : _LIST_ENTRY
[COLOR=#ff0000]+0x018 TimerRoutine : Ptr64 void [/COLOR]
+0x020 Context : Ptr64 Void
[COLOR=#ff0000]+0x028 DeviceObject : Ptr64 _DEVICE_OBJECT[/COLOR]
The timer object described is used to monitor any important operations, and ensure that these operations are successful, otherwise the operating system is informed to crash and produce a bugcheck. The timer is created with the IoIntializeTimer function, taking parameters which include the address of the specified device object and callback routine which is to be called every second (see TimerRoutine field), once the timer object has been started with the IoStartTimer. If the operation is successful or unsuccessful, then the timer can be stopped with IoStopTimer, and the appropriate code will be executed to handle the exception. The same timer can be restarted (called 'kicking the dog') by calling IoStartTimer again.
To summarise quickly, the timer object is used when the driver notices an operation hasn't been completed for a given timer interval, and this operation is vital for the stability of the device, thereby the Watchdog Timer is called once every second until that Timer has timed out - or the operation completes successfully - leading to an exception and an IRP with status code STATUS_IO_TIMEOUT being used to implement any exception handling for the device object. The IoCompleteRequest function will check the stated status, by looking into the _IO_STATUS_BLOCK structure.
Code:
0: kd> [COLOR=#008000]dt nt!_IO_STATUS_BLOCK[/COLOR]
[COLOR=#ff0000]+0x000 Status : Int4B[/COLOR]
+0x000 Pointer : Ptr64 Void
+0x008 Information : Uint8B
We have now discussed how the Stop 0x9F is created and have briefly examined how timer objects work. We'll now discuss a little about Power IRPs and particularly how they handled within a Stop 0x9F. This last section will wrap up about the internals of a Stop 0x9F. In preceding sections, we will investigate the common causes of a Stop 0x9F and how to debug these crashes accordingly.
A Stop 0x9F is generally caused by a Power IRP being blocked somewhere within the device stack. Power IRPs are sent by the Power Manager or the Power Policy Device Manager. This is achieved by PoRequestPowerIrp. Power IRPs are used to achieve several things, but to say within the topic of discussion, we'll let Power IRPs main purpose being handling power transitions, wherever from sleep or to sleep. We can check power transition states with the !poaction.
Power IRPs are filtered through the device stack until they reach a bus driver, which is typically the only type of driver which can handle such IRPs. We can view a device stack for a specified physical device object by using the !devstack.
Code:
0: kd> [COLOR=#008000]!devstack fffffa80`05823060[/COLOR]
!DevObj !DrvObj !DevExt ObjectName
fffffa80058882c0 \Driver\partmgr fffffa8005888410
fffffa8005888790 \Driver\Disk fffffa80058888e0 DR0
fffffa80057399b0 \Driver\ACPI fffffa80039c5d50
[COLOR=#ff0000]> fffffa8005823060 \Driver\atapi[/COLOR] fffffa80058231b0 IdeDeviceP2T0L0-2
!DevNode [COLOR=#0000cd]fffffa800573d900[/COLOR] :
DeviceInst is "IDE\DiskST3250824A______________________________3.AAH___\5&3731d328&0&0.0.0"
ServiceName is "[COLOR=#ff0000]disk[/COLOR]"
The Power IRP is passed down the device stack until it reaches the bus driver, which as stated before, tends to be only driver that is able to process the Power IRP, and therefore tends to be the driver which is blamed for the crash. With some IRPs, several different device stacks may be used to process an IRP, and therefore it can become more difficult in tracking where the IRP has become blocked, or if a preceding driver has caused the IRP to become blocked in some other location.
The above device stack belongs to the device node stated below it. Each device node (represents a physical device) will have a device stack associated with it.
Code:
0: kd> [COLOR=#008000]!devnode fffffa800573d900[/COLOR]
DevNode [COLOR=#0000cd]0xfffffa800573d900[/COLOR] for PDO [COLOR=#ff0000]0xfffffa8005823060[/COLOR]
Parent 0xfffffa8005726900 Sibling 0xfffffa800573e900 Child 0000000000
InstancePath is "IDE\DiskST3250824A______________________________3.AAH___\5&3731d328&0&0.0.0"
ServiceName is "disk"
State = DeviceNodeStarted (0x308)
Previous State = DeviceNodeEnumerateCompletion (0x30d)
StateHistory[08] = DeviceNodeEnumerateCompletion (0x30d)
StateHistory[07] = DeviceNodeEnumeratePending (0x30c)
StateHistory[06] = DeviceNodeStarted (0x308)
StateHistory[05] = DeviceNodeStartPostWork (0x307)
StateHistory[04] = DeviceNodeStartCompletion (0x306)
StateHistory[03] = DeviceNodeResourcesAssigned (0x304)
StateHistory[02] = DeviceNodeDriversAdded (0x303)
StateHistory[01] = DeviceNodeInitialized (0x302)
StateHistory[00] = DeviceNodeUninitialized (0x301)
StateHistory[19] = Unknown State (0x0)
StateHistory[18] = Unknown State (0x0)
StateHistory[17] = Unknown State (0x0)
StateHistory[16] = Unknown State (0x0)
StateHistory[15] = Unknown State (0x0)
StateHistory[14] = Unknown State (0x0)
StateHistory[13] = Unknown State (0x0)
StateHistory[12] = Unknown State (0x0)
StateHistory[11] = Unknown State (0x0)
StateHistory[10] = Unknown State (0x0)
StateHistory[09] = Unknown State (0x0)
Flags (0x2c000130) DNF_ENUMERATED, DNF_IDS_QUERIED,
DNF_NO_RESOURCE_REQUIRED, DNF_NO_LOWER_DEVICE_FILTERS,
DNF_NO_LOWER_CLASS_FILTERS, DNF_NO_UPPER_DEVICE_FILTERS
UserFlags (0x00000008) DNUF_NOT_DISABLEABLE
DisableableDepends = 1 (including self)
The drivers used by the device objects to process the IRP form another called the driver stack. Every IRP has a Major Function Code and a Minor Function Code, in the case of a Stop 0x9F, the very common IRP which is seen is the [16,2], where 0x16 is the Major Function Code (IRP_MJ_POWER) and 0x2 is the Minor Function Code (IRP_MN_SET_POWER). The 0x16 Major Function Code shows us that the IRP is in fact a Power IRP, with the 0x2 indicating that the specific purpose of the Power IRP was to notify a driver of a power state change in the system.
0x4 – How it Works?
Let's have a quick look at the internal works of a Stop 0x9F in relation to the PnP Manager. I've explained most of this Stop 0x9F in the debugging example, but this will provide a small generic description of how PnPs work.
There are two types of PnP IRPs: State Changing and Non-State Changing. With this particular type of bugcheck, we're most interested in the State Changing PnP IRPs.
The main difference between State Changing and Non-State Changing IRPs, is how these IRPs are handled by the I/O Manager. State Changing IRPs must be synchronized against each other, this prevents two state changing IRPs from existing in the same stack. The most obvious problem this helps to prevent, this when an removal request and a start request are being processed at the same time. The State Changing IRPs have to be synchronized with System Power IRPs, a power transition must be processed before a PnP IRP. The State Changing PnP IRPs are only reserved for the use of system drivers.
On the other hand, there is a few Power IRPs which the PnPs aren't synchronized with, and these IRPs are the following:
- IRP_MN_SET_POWER
- IRP_MN_WAIT_WAKE
- IRP_MN_QUERY_POWER
- IRP_MN_POWER_SEQUENCE
There is a number of causes for a Stop 0x9F, but the main two causes are a device driver blocking a IRP within the device stack, or a timer object has timed out and failed to notify a thread of a power transition state change and synchronize with the PnP Manager.
A complete list of parameters can be found on the MSDN page and in the BSOD Index.
Stop 0x9F – 0x3:
Code:
DRIVER_POWER_STATE_FAILURE (9f)
A driver has failed to complete a power IRP within a specific time (usually 10 minutes).
Arguments:
Arg1: 0000000000000003, A device object has been blocking an Irp for too long a time
Arg2: [COLOR=#0000cd]fffffa8005823060[/COLOR], Physical Device Object of the stack
Arg3: fffff80000b9c518, nt!TRIAGE_9F_POWER on Win7, otherwise the Functional Device Object of the stack
Arg4: [COLOR=#ff0000]fffff9801c458dc0[/COLOR], The blocked IRP
The two most important arguments are the second and the fourth argument. The fourth argument contains the address of the blocked IRP, and the second argument contains the address of the PDO (Physical Device Object) of the stack location where the IRP has become blocked.
Using the !irp extension on the fourth parameter, we can view the current stack where the IRP has become stuck.
Code:
0: kd> [COLOR=#008000]!irp fffff9801c458dc0[/COLOR]
Irp is active with 5 stacks 1 is current (= [COLOR=#ff8c00]0xfffff9801c458e90[/COLOR])
No Mdl: No System Buffer: Thread 00000000: Irp stack trace.
cmd flg cl Device File Completion-Context
>[ [COLOR=#ff0000]16[/COLOR], [COLOR=#0000cd]2[/COLOR]] 0 e1 fffffa8005823060 00000000 fffff88000e0957c-fffff88000e093c4 Success Error Cancel [COLOR=#ff0000]pending[/COLOR]
[COLOR=#ff0000]\Driver\atapi [/COLOR] ACPI!ACPIDeviceIrpDeviceFilterRequest
Args: 00000000 00000001 00000004 00000000
[ 16, 2] 0 e1 fffffa80057399b0 00000000 fffff880018f27d0-fffffa8005888cd0 Success Error Cancel pending
\Driver\ACPI CLASSPNP!ClasspStartNextPowerIrpCompletion
Args: 00000000 00000001 00000004 00000000
[ 16, 2] 0 e1 fffffa8005888790 00000000 fffff88000eaff50-00000000 Success Error Cancel pending
\Driver\Disk partmgr!PmPowerCompletion
Args: 00000000 00000001 00000004 00000000
[ 16, 2] 0 e1 fffffa80058882c0 00000000 00000000-00000000 pending
\Driver\partmgr
Args: 00000000 00000001 00000004 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-fffffa8006c71c50
Args: 00000000 00000000 00000000 00000000
The output doesn't give us much indication, in most cases, a potential third party driver can be found, however, we'll need to investigate further using the !stacks extension to display all the kernel stacks in the system. I don't regularly use this extension simply because of the sheer volume of output it produces, there is hundreds of threads to check through.
As a side note, the fffff9801c458e90 address seen with the IRP is the current I/O Stack Location, which is stored with the IO_STACK_LOCATION structure:
Code:
0: kd> [COLOR=#008000]dt nt!_IO_STACK_LOCATION[/COLOR]
+0x000 MajorFunction : UChar
+0x001 MinorFunction : UChar
+0x002 Flags : UChar
+0x003 Control : UChar
+0x008 Parameters : <unnamed-tag>
+0x028 DeviceObject : Ptr64 _DEVICE_OBJECT
+0x030 FileObject : Ptr64 _FILE_OBJECT
+0x038 CompletionRoutine : Ptr64 long
+0x040 Context : Ptr64 Void
There is one IO_STACK_LOCATION for each driver within the driver stack for the associated IRP.
Code:
[COLOR=#008000]!stacks[/COLOR]
Proc.Thread .Thread Ticks ThreadState Blocker
[fffff80002c0d180 Idle]
0.000000 fffff80002c0ccc0 ff1674ec RUNNING nt!KeBugCheckEx
0.000000 fffff880009f3fc0 ff187841 RUNNING amdppm!C1Halt+0x2
0.000000 fffff88002f6efc0 ff186575 RUNNING amdppm!C1Halt+0x2
0.000000 fffff88002fdffc0 ff184b88 RUNNING nt!KiIdleLoop+0x10d
[fffffa8003989b30 System]
*** ERROR: Module load completed but symbols could not be loaded for avgidsfiltera.sys
4.000018 fffffa80039fbb50 ff1849a6 Blocked Ntfs!NtfsWaitOnIo+0x28
4.00001c fffffa80039fb660 ff1848b8 Blocked Ntfs!NtfsWaitOnIo+0x28
4.000020 fffffa80039fb170 ff1845b7 Blocked Ntfs!NtfsWaitOnIo+0x28
4.000024 fffffa80039fd040 ff184537 Blocked Ntfs!NtfsWaitOnIo+0x28
4.000028 fffffa80039fdb50 ff167729 Blocked tdtcp!TdiSubmitRequest+0xca
4.00002c fffffa80039fd660 ff1681d2 Blocked +0xfffffa80075dfcda
4.000038 fffffa80039fe660 ff184b88 Blocked nt!IopCloseFile+0x260
4.00003c fffffa80039ff040 ff184a83 Blocked nt!CmpDoFileWrite+0x281
4.000040 fffffa80039ffb50 ff1849b5 STANDBY nt!KeSetSystemGroupAffinityThread+0x18a
4.000044 fffffa80039ff660 ff1797f0 Blocked disk!DiskPerformSmartCommand+0x1f0
4.000060 fffffa800460eb50 ff167536 Blocked nt!ViKeTrimWorkerThreadRoutine+0x1e
4.000064 fffffa800460e660 ff184b89 Blocked nt!ViPoolDelayFreeTrimThreadRoutine+0x29
4.000068 fffffa800460f040 ff167ff5 Blocked nt!ViPoolDelayFreeTrimThreadRoutine+0x29
4.00006c fffffa800460fb50 ff168ad9 Blocked nt!ViPendingWorkerThread+0x20
4.000070 fffffa800460f660 ff168ad9 Blocked nt!ViPendingWorkerThread+0x20
4.000074 fffffa8004610040 ff167ff5 Blocked nt!ViPendingWorkerThread+0x20
4.000078 fffffa8004610b50 ff167ff5 Blocked nt!ViPendingWorkerThread+0x20
4.00007c fffffa8004610660 ff167ff5 Blocked nt!ViPendingWorkerThread+0x20
4.000080 fffffa8004611040 ff167ff5 Blocked nt!ViPendingWorkerThread+0x20
4.000084 fffffa8004611b50 ff167ff5 Blocked nt!ViPendingWorkerThread+0x20
4.000088 fffffa8004611660 ff167ff5 Blocked nt!ViPendingWorkerThread+0x20
4.000090 fffffa8004612660 ff18486c Blocked nt!MiModifiedPageWriter+0xcf
4.000094 fffffa8004613b50 ff183446 Blocked Ntfs!NtfsWaitOnIo+0x28
4.0000b0 fffffa8004617610 ff1843b6 Blocked nt!CcQueueLazyWriteScanThread+0x85
4.0000bc fffffa800461e940 ff167ff5 Blocked nt!AlpcpReceiveMessagePort+0x189
4.0000c0 fffffa800466f040 ff184bfa Blocked nt!EtwpLogger+0xf2
4.0000c4 fffffa800466f790 ff16753a Blocked nt!EtwpLogger+0xf2
4.0000c8 fffffa800467c5d0 ff16753a Blocked nt!EtwpLogger+0xf2
4.0000cc fffffa80046bd040 ff16753a Blocked nt!EtwpLogger+0xf2
4.0000d4 fffffa80047c9b50 ff1674fa Blocked nt!EtwpLogger+0xf2
4.0000d8 fffffa80047ec040 ffffe2ed Blocked nt!EtwpLogger+0xf2
4.0000dc fffffa80047f1b50 ff167d9a Blocked nt!WdipSemCheckTimeout+0x12c
4.0000e0 fffffa80039a65c0 fffffb53 Blocked ACPI!ACPIWorkerThread+0x74
4.0000e8 fffffa8005636b50 fffff583 Blocked ndis!ndisThreadPoolTimerHandler+0x10c
4.0000f0 fffffa8005639740 ff167644 Blocked ndis!ndisCmWaitThread+0x6e
4.0000f4 fffffa80057fc110 ffffff8d Blocked ACPI!PciRootBusBiosMethodDispatcherOnResume+0x51
4.000104 fffffa8005766b50 ffcd71dd Blocked volsnap!VspWorkerThread+0x80
4.000108 fffffa8005769b50 ff197630 Blocked volsnap!VspWorkerThread+0x80
4.00010c fffffa800576ab50 ff1880a0 Blocked volsnap!VspWorkerThread+0x80
4.000110 fffffa800576bb50 fffffdf4 Blocked volsnap!VspWorkerThread+0x80
4.000114 fffffa800576cb50 fffffea1 Blocked volsnap!VspWorkerThread+0x80
4.000118 fffffa800576db50 fffffea1 Blocked volsnap!VspWorkerThread+0x80
4.00011c fffffa800576eb50 fffffea1 Blocked volsnap!VspWorkerThread+0x80
4.000120 fffffa800576fb50 fffffea1 Blocked volsnap!VspWorkerThread+0x80
4.000124 fffffa8005770b50 fffffea1 Blocked volsnap!VspWorkerThread+0x80
4.00012c fffffa8005c0a930 fffffde0 Blocked watchdog!SMgrGdiCalloutThread+0x4f
*** ERROR: Module load completed but symbols could not be loaded for avgtdia.sys
4.000130 fffffa8005c3d040 fffffddc Blocked nt!VerifierKeWaitForMultipleObjects+0x1dc
4.000134 fffffa8005c2cb50 ff16759e Blocked avgtdia+0x31676
[...]
Threads Processed: 939
This is a small subset of the stacks extension, the first column contains the Process ID and the Thread ID, in this case it is a System process. The second column contains the address of the _ETHREAD block which we can pass to the !thread extension. The threads which are of a interest to us are the ones which have the Blocked status.
I haven't been able to find a direct cause, but based on the device stack for the blocked IRP, I'm assuming that AVG may a culprit driver.
Code:
0: kd> [COLOR=#008000]lmvm avgrkx64[/COLOR]
start end module name
fffff880`01911000 fffff880`0191d000 avgrkx64 (deferred)
Image path: \SystemRoot\system32\DRIVERS\avgrkx64.sys
Image name: avgrkx64.sys
Timestamp: [COLOR=#ff0000]Tue Jan 31 03:11:41 2012[/COLOR] (4F275BED)
CheckSum: 0000D438
ImageSize: 0000C000
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4
Stop 0x9F – 0x4:
This bugcheck indicates that we have failed to synchronize our power transition with the PnP Manager. This is common for the removal PnP devices, since after the removal of the device we need to update the device tree to show that the device object has been removed from the system.
Code:
DRIVER_POWER_STATE_FAILURE (9f)
A driver has failed to complete a power IRP within a specific time (usually 10 minutes).
Arguments:
Arg1: 0000000000000004, The power transition timed out waiting to synchronize with the Pnp
subsystem.
Arg2: 0000000000000[COLOR=#ff8c00]258[/COLOR], Timeout in seconds.
Arg3: [COLOR=#ff0000]fffffa8007005660[/COLOR], The thread currently holding on to the Pnp lock.
Arg4: fffff800053e83d0, nt!TRIAGE_9F_PNP on Win7
The timeout is specified in the second argument, and the thread which is holding onto the lock required to access the PnP is shown in the third argument. We'll begin by using the !thread extension with the thread address, and then examining the stack.
Code:
0: kd> [COLOR=#008000]!thread fffffa8007005660[/COLOR]
THREAD fffffa8007005660 Cid 0004.0048 Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertable
fffffa800d035ee8 NotificationEvent
IRP List:
[COLOR=#0000cd]fffffa8008f5cc10[/COLOR]: (0006,03e8) Flags: 00000000 Mdl: 00000000
Not impersonating
DeviceMap fffff8a000008c10
Owning Process fffffa8006f8d890 Image: System
Attached Process N/A Image: N/A
Wait Start TickCount 396427 Ticks: 38463 ([COLOR=#ff0000]0:00:10:00.026[/COLOR])
Context Switch Count 44059 IdealProcessor: 0 NoStackSwap
UserTime 00:00:00.000
KernelTime 00:00:00.343
Win32 Start Address nt!ExpWorkerThread (0xfffff80003298150)
Stack Init fffff88003bd2c70 Current fffff88003bd2280
Base fffff88003bd3000 Limit fffff88003bcd000 Call 0
Priority 15 BasePriority 12 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP RetAddr : Args to Child : Call Site
fffff880`03bd22c0 fffff800`032845f2 : fffffa80`07005660 fffffa80`07005660 00000000`00000000 00000000`00000000 : nt!KiSwapContext+0x7a
fffff880`03bd2400 fffff800`0329599f : fffffa80`0d0df208 fffff880`0ae9e10b fffffa80`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x1d2
fffff880`03bd2490 fffff880`0ae915dd : fffffa80`0d035000 00000000`00000000 fffffa80`0dd8ca00 00000000`00000000 : nt!KeWaitForSingleObject+0x19f
fffff880`03bd2530 fffff880`0ae92627 : fffffa80`0d035000 00000000`00000000 fffffa80`0c0891a0 fffff880`03bd2670 : ZTEusbnet+0x35dd
fffff880`03bd2580 fffff880`0215d809 : fffffa80`0c0891a0 fffff880`020f0ecd fffff880`03bd2670 fffffa80`091c5550 : ZTEusbnet+0x4627
fffff880`03bd25b0 fffff880`0215d7d0 : fffffa80`091c54a0 fffffa80`0c0891a0 fffff880`03bd2670 fffffa80`08fc2ac0 : ndis!NdisFDevicePnPEventNotify+0x89
fffff880`03bd25e0 fffff880`0215d7d0 : fffffa80`08fc2a10 fffffa80`0c0891a0 fffffa80`091f9010 fffffa80`091f90c0 : ndis!NdisFDevicePnPEventNotify+0x50
fffff880`03bd2610 fffff880`0219070c : fffffa80`0c0891a0 00000000`00000000 00000000`00000000 fffffa80`0c0891a0 : ndis!NdisFDevicePnPEventNotify+0x50
fffff880`03bd2640 fffff880`021a1da2 : 00000000`00000000 fffffa80`08f5cc10 00000000`00000000 fffffa80`0c0891a0 : ndis! ?? ::LNCPHCLB::`string'+0xddf
fffff880`03bd26f0 fffff800`034fb121 : fffffa80`091c7060 fffffa80`0c089050 fffff880`03bd2848 fffffa80`070bfa00 : ndis!ndisPnPDispatch+0x843
fffff880`03bd2790 fffff800`0367b3a1 : fffffa80`070bfa00 00000000`00000000 fffffa80`0dc19990 fffff880`03bd2828 : nt!IopSynchronousCall+0xe1
fffff880`03bd2800 fffff800`03675d78 : fffffa80`09196e00 fffffa80`070bfa00 00000000`0000030a 00000000`00000308 : [COLOR=#ff0000]nt!IopRemoveDevice+0x101[/COLOR]
fffff880`03bd28c0 fffff800`0367aee7 : fffffa80`0dc19990 00000000`00000000 00000000`00000003 00000000`00000136 : [COLOR=#ff0000]nt!PnpSurpriseRemoveLockedDeviceNode+0x128[/COLOR]
fffff880`03bd2900 fffff800`0367b000 : 00000000`00000000 fffff8a0`11d1c000 fffff8a0`049330d0 fffff880`03bd2a58 : nt!PnpDeleteLockedDeviceNode+0x37
fffff880`03bd2930 fffff800`0370b97f : 00000000`00000002 00000000`00000000 fffffa80`09122010 00000000`00000000 : nt!PnpDeleteLockedDeviceNodes+0xa0
fffff880`03bd29a0 fffff800`0370c53c : fffff880`03bd2b78 fffffa80`114ab700 fffffa80`07005600 fffffa80`00000000 : [COLOR=#ff0000]nt!PnpProcessQueryRemoveAndEject+0x6cf[/COLOR]
fffff880`03bd2ae0 fffff800`035f573e : 00000000`00000000 fffffa80`114ab7d0 fffff8a0`123a25b0 00000000`00000000 : nt!PnpProcessTargetDeviceEvent+0x4c
fffff880`03bd2b10 fffff800`03298261 : fffff800`034f9f88 fffff8a0`11d1c010 fffff800`034342d8 fffff800`034342d8 : nt! ?? ::NNGAKEGL::`string'+0x54d9b
fffff880`03bd2b70 fffff800`0352b2ea : 00000000`00000000 fffffa80`07005660 00000000`00000080 fffffa80`06f8d890 : nt!ExpWorkerThread+0x111
fffff880`03bd2c00 fffff800`0327f8e6 : fffff880`03965180 fffffa80`07005660 fffff880`0396ffc0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
fffff880`03bd2c40 00000000`00000000 : fffff880`03bd3000 fffff880`03bcd000 fffff880`03bd2410 00000000`00000000 : nt!KxStartSystemThread+0x16
As we can see in the call stack for the thread, there is two device removal functions, which in turn supports our argument from earlier on. More importantly, if we check the pending IRP for the current thread, we'll find some more evidence relating to PnP and the third party driver shown in our call stack.
Code:
0: kd> [COLOR=#008000]!irp fffffa8008f5cc10[/COLOR]
Irp is active with 10 stacks 10 is current (= 0xfffffa8008f5cf68)
No Mdl: No System Buffer: Thread fffffa8007005660: Irp stack trace.
cmd flg cl Device File Completion-Context
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
>[ [COLOR=#008000]1b[/COLOR],[COLOR=#ff0000]17[/COLOR]] 0 0 fffffa800c089050 00000000 00000000-00000000
[COLOR=#0000cd]\Driver\ZTEusbnet[/COLOR]
Args: 00000000 00000000 00000000 00000000
The Major Function code corresponds to IRP_MJ_PNP, and the Minor Function Code corresponds to IRP_MN_SURPRISE_REMOVAL. These IRP function codes can be found within the WinDbg documentation, and an explanation can be found on the appropriate MSDN pages. I will briefly explain these function codes, and how they link in with the nature of the bugcheck. PnP IRPs are handled by DispatchPnP driver dispatch rountine.
The Minor Function Code is used to notify important Kernel components and User Mode applications that the device is being removed from the system, and no other IRPs will be able to be sent to that device. The IRP is exclusively reserved for the system, and thus no other drivers should be sending the IRP. The IRP is processed from the top layer of the device stack down to the bottom layer of the device stack.
By checking the IRP, the call stack makes much more sense to us, at the moment the thread is currently waiting on a lock. We need to investigate what this lock is, and why these locks are required. Using the !locks extension, we can list all the currently acquired locks on executive resources (_ERESOURCE).
Code:
0: kd> [COLOR=#008000]!locks[/COLOR]
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks..
Resource @ nt!IopDeviceTreeLock (0xfffff80003492ce0) Shared 1 owning threads
Contention Count = [COLOR=#0000cd]1[/COLOR]
Threads: [COLOR=#ff0000]fffffa8007005660-01[/COLOR]<*>
KD: Scanning for held locks.
Resource @ nt!PiEngineLock (0xfffff80003492be0) Exclusively owned
Contention Count = 21
NumberOfExclusiveWaiters = 1
Threads: [COLOR=#ff0000]fffffa8007005660-01[/COLOR]<*>
Threads Waiting On Exclusive Access:
fffffa800f308b50
KD: Scanning for held locks.....
The current thread holding both locks is our PnP thread. The most interesting lock is the IopDeviceTreeLock. The other thread waiting on the PiEngineLock is a kernel related thread which is handling other device removal operations and power transitions.
Instead of going in depth about the various locks and synchronization mechanisms used in Windows, it would be better to explain why the thread is using a lock to begin with. The answer with our PnP device.
The main reason of acquiring the lock is to ensure that the driver will not use DispatchPnP to handle a IRP_MN_REMOVE_DEVICE IRP while another driver is performing an I/O operation at the same time. Furthermore, by using these lock routines, the driver can determine when all I/O operations have been completed for that device, and therefore when it is safe to remove that device.
The IO_REMOVE_LOCK illustrates a counter which is used to check when all the pending I/O operations have been completed for that device. Once the I/O operations have been completed, then the device can be safely removed from the system. A safe removal of the device is indicated with the IoRemoveLockAndWait. The IO_REMOVE_LOCK is destroyed when the device is removed from the system.
In conclusion, the ZTEusbnet.sys driver seems like the most likely culprit, it is a USB NDIS Miniport driver developed by a company called ZTE.
Code:
0: kd> [COLOR=#008000]lmvm ZTEusbnet[/COLOR]
start end module name
fffff880`0ae8e000 fffff880`0aebc000 ZTEusbnet (no symbols)
Loaded symbol image file: ZTEusbnet.sys
Image path: \SystemRoot\system32\DRIVERS\ZTEusbnet.sys
Image name: ZTEusbnet.sys
Timestamp: [COLOR=#ff0000]Mon Oct 13 06:50:10 2008[/COLOR] (48F2E192)
CheckSum: 000329ED
ImageSize: 0002E000
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4
Commands and Extensions
This a quick overview of the extensions and commands which can be used for debugging Stop 0x9F. Some of the extensions are added for informational purposes i.e. you wish to learn a little about the nature of the device etc.
!irp, !irpfind, !devext, !devobj, !drvobj, !popolicy, !timer, !ms_timer, !thread, !stacks, !devnode,
!devstack, !podev, !poaction, !popolicy, !poreqlist, !locks and !pnptriage.
Data Structures
*Discussed in article
_IO_TIMER*, _DEVICE_OBJECT*, _DRIVER_OBJECT, _KTIMER*, _KTIMER_TABLE_ENTRY*, _KTIMER_TABLE*, _SYSTEM_POWER_STATE, _DEVICE_POWER_STATE, _SYSTEM_POWER_POLICY, _IO_STATUS_BLOCK*, _TRIAGE_9F_POWER
References:
Managing PnP State Transistions
Using Remove Locks
Debugging Stop 0x9F - CodeMachine
Which PnP IRPs are State Changing? - Hole In My Head
A Performance Issue in Windows Timer Management?
Ain't Nuthin But A K(Timer) Thing, Baby
Power IRPs for Individual Devices
I/O Stack Locations
Last edited: