0x19 BAD_POOL_HEADER Pool Party!

Vir Gnarus

BSOD Kernel Dump Expert
Joined
Mar 2, 2012
Posts
474
Hi guys,

I'm currently working with an individual in one of the other forums on a particular case and figured I'd share. At the time of writing this I'm not complete with my analysis but I have scrounged up enough to deem it worth posting here. I'll update with any new discoveries. Understand that this article covers a number of Windbg extensions and commands that I've covered one way or another in previous articles so I decided not to repeat myself on describing their purpose again here. If you still have questions, ask away. As always, dump file is attached below.

To continue the introduction by providing some context, the client started by mentioning that after a month of purchasing a system he was getting BSODs. I'm sure all of you - as well as I - will immediately suspect hardware, which is the current speculation. The fellow also went ahead and tried Driver Verifier, but it wasn't showing up any hits, which further implicates hardware. Without is is also showing up some relatively scattered crashes with no strong patterns. Hardware tests however were not very successful either so far, so the culprit is becoming rather elusive. Rather than try to continue with the usual process of elimination, I decided to get adventurous and trek through one of the fella's crashdumps, which was a 0x19 bugcheck, or BAD_POOL_HEADER.

Now, usually these minidumps - like all minidumps - typically don't harbor the data we seek, as they are very fickle creatures. Yet it was our fortunate luck that we stumbled upon one that was willing to cooperate and provide us info on the pool actually causing the problem. So therefore our journey begins.

The following is a copy-paste of my original post to the client to describe the situation to fellow analysts:

_____

[/B]Strangely, and with considerable luck, the minidump for the 0x19 crash did indeed retain the pool information we need to look into this further. Start with !analyze -v:

Code:
6: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

BAD_POOL_HEADER (19)
The pool is already corrupt at the time of the current request.
This may or may not be due to the caller.
The internal pool links must be walked to figure out a possible cause of
the problem, and then special pool applied to the suspect tags or the driver
verifier to a suspect driver.
Arguments:
Arg1: 0000000000000020, a pool block header size is corrupt.
Arg2: [COLOR=Sienna]fffff8a00feb0390[/COLOR], The pool entry we were looking for within the page.
Arg3: fffff8a00feb03c0, The next pool entry.
Arg4: 0000000005030115, (reserved)

Debugging Details:
------------------

TRIAGER: Could not open triage file : C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\triage\modclass.ini, error 2

BUGCHECK_STR:  0x19_20

POOL_ADDRESS:  fffff8a00feb0390 Paged pool

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT

PROCESS_NAME:  svchost.exe

CURRENT_IRQL:  0

LOCK_ADDRESS:  fffff80002c8db80 -- (!locks fffff80002c8db80)

Resource @ nt!PiEngineLock (0xfffff80002c8db80)    Available

WARNING: SystemResourcesList->Flink chain invalid. Resource may be corrupted, or already deleted.


WARNING: SystemResourcesList->Blink chain invalid. Resource may be corrupted, or already deleted.

1 total locks

PNP_TRIAGE: 
    Lock address  : 0xfffff80002c8db80
    Thread Count  : 0
    Thread address: 0x0000000000000000
    Thread wait   : 0x0

LAST_CONTROL_TRANSFER:  from fffff80002bbccae to fffff80002a91fc0

STACK_TEXT:  
fffff880`09521d28 fffff800`02bbccae : 00000000`00000019 00000000`00000020 fffff8a0`0feb0390 fffff8a0`0feb03c0 : nt!KeBugCheckEx
fffff880`09521d30 fffff800`02d67663 : fffff8a0`019364d8 fffff8a0`019364a8 fffff8a0`624e4d43 fffff880`09522440 : nt!ExDeferredFreePool+0x12da
fffff880`09521de0 fffff800`02d583a8 : fffff8a0`00024010 fffff8a0`00024010 fffff880`09521f01 fffff880`09522440 : nt!CmpGetSymbolicLink+0x2b3
fffff880`09521ee0 fffff800`02d883d8 : fffffa80`07883cc8 fffff800`02a13000 fffffa80`07883b10 fffff800`00000000 : nt!CmpParseKey+0x948
fffff880`095221e0 fffff800`02d895f6 : 00000000`00000000 fffffa80`07883b10 fffff8a0`0ff09060 fffffa80`067047a0 : nt!ObpLookupObjectName+0x588
fffff880`095222d0 fffff800`02d3f088 : fffff880`030fb180 00000000`00000000 00000000`00000000 fffff880`095224a8 : nt!ObOpenObjectByName+0x306
fffff880`095223a0 fffff800`02d3f926 : fffff880`095227b8 fffff880`000f003f fffff880`095227d0 fffff800`02cc4600 : nt!CmCreateKey+0x2e1
fffff880`09522510 fffff800`02a91253 : ffffffff`ffffffff 00000000`00000000 fffff6fc`50053698 fffff800`02ab4038 : nt!NtCreateKey+0x2e
fffff880`09522560 fffff800`02a8d810 : fffff800`02cfca66 00000000`00000001 20207050`02a4b5d8 00000000`000000b5 : nt!KiSystemServiceCopyEnd+0x13
fffff880`09522768 fffff800`02cfca66 : 00000000`00000001 20207050`02a4b5d8 00000000`000000b5 fffff880`095229e0 : nt!KiServiceLinkage
fffff880`09522770 fffff800`02cf96b2 : 00000000`00000001 00000000`00000002 00000000`00000002 00000000`000f003f : nt!IopCreateRegistryKeyEx+0x7e
fffff880`09522840 fffff800`02cfa081 : 00000000`00000002 00000000`00000000 fffff880`09522a40 00000000`00000001 : nt!IopGetDeviceInterfaces+0x146
fffff880`095229b0 fffff800`02cf9351 : 00000000`00000002 00000000`00000000 00000000`00000000 fffff8a0`09334bd0 : nt!PiGetInterfaceDeviceList+0x41
fffff880`09522a10 fffff800`02d521a0 : fffff8a0`09334bd0 fffff800`00008000 fffff880`09522b01 fffff800`02f59da0 : nt!PiControlGetInterfaceDeviceList+0x111
fffff880`09522a90 fffff800`02a91253 : fffffa80`096fc060 00000000`013ee880 fffff880`09522b60 00000000`013ee908 : nt!NtPlugPlayControl+0x100
fffff880`09522ae0 00000000`77c7236a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
00000000`013ee848 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x77c7236a


STACK_COMMAND:  kb

FOLLOWUP_IP: 
nt!ExDeferredFreePool+12da
fffff800`02bbccae cc              int     3

SYMBOL_STACK_INDEX:  1

SYMBOL_NAME:  nt!ExDeferredFreePool+12da

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

IMAGE_NAME:  ntkrnlmp.exe

DEBUG_FLR_IMAGE_TIMESTAMP:  503f82be

FAILURE_BUCKET_ID:  X64_0x19_20_nt!ExDeferredFreePool+12da

BUCKET_ID:  X64_0x19_20_nt!ExDeferredFreePool+12da

Followup: MachineOwner
---------
It's stating the header for a pool allocation has got corrupted. Each allocation of pool memory (driver/kernel reserved memory) has a header that has the pool tag which describes the purpose of the allocation - given by the driver allocating it - and some other details like the size of it. The subcode tells us that the header size defined in a pool header is incorrect and has been corrupted. Often this is caused by a driver mishandling its pool allocation and overflowing its allocated pool, which ends up spilling into the nearby pool allocation with its contents. We'll want to look at the pool table for this to see what we got, using the pool address mentioned:

Code:
6: kd> [COLOR=Blue]!pool[/COLOR] [COLOR=Sienna]fffff8a00feb0390[/COLOR]
Pool page fffff8a00feb0390 region is Paged pool
 fffff8a00feb0000 size:  110 previous size:    0  (Allocated)  IoNm
 fffff8a00feb0110 size:   40 previous size:  110  (Allocated)  MmSm
 fffff8a00feb0150 size:   40 previous size:   40  (Allocated)  MmSm
[COLOR=DarkSlateBlue] fffff8a00feb0190 size:   70 previous size:   40  (Free )  CMNb (Protected)[/COLOR]
 fffff8a00feb0200 size:   40 previous size:   70  (Allocated)  MmSm
 fffff8a00feb0240 size:  150 previous size:   40  (Allocated)  NtFs
[COLOR=Purple]*fffff8a00feb0390 size:   30 previous size:  150  (Free ) *CMNb (Protected)
        Pooltag CMNb : Configuration Manager Name Tag, Binary : nt!cm[/COLOR]
[COLOR=Red]
fffff8a00feb03c0 doesn't look like a valid small pool allocation, checking to see
if the entire page is actually part of a large page allocation...

GetUlongFromAddress: unable to read from fffff80002c2fa38
fffff8a00feb03c0 is not a valid small pool allocation, checking large pool...
unable to get pool big page table - either wrong symbols or pool tagging is disabled
fffff8a00feb03c0 is freed (or corrupt) pool
Bad previous allocation size @fffff8a00feb03c0, last size was 3

***
*** An error (or corruption) in the pool was detected;
*** Attempting to diagnose the problem.
***
*** Use !poolval fffff8a00feb0000 for more details.


Pool page [ fffff8a00feb0000 ] is __inVALID.

Analyzing linked list...
[ fffff8a00feb0390 --> fffff8a00feb0400 (size = 0x70 bytes)]: Corrupt region
[/COLOR]

Scanning for single bit errors...

None found
The highlighted pool region is the allocation made that has the fault. It will tell you the size (which is corrupted) and the size of the allocation made before it. This is rather misleading so make sure you're aware that previous size doesn't mean the size of the region we're looking at in its 'previous state', but it's only referring to the size of the allocation before it, which is tagged NtFs and is for an NTFS allocation. For this one, the tag is CMNb, which it explains is the Configuration Manager Name Tag (we wouldn't get this info if it wasn't for pooltag.txt in Windbg directory). It also gives you the driver that allocated it, which is nt!cm, as in the Conf Manager for the NT module. Consider these elements when looking at future pool regions, but for now the only element we're interested in is the size, which says it's 0x30.

Hmm, 0x30 seems rather odd given that the previous region tagged CMNb in the list was valued as 70 and was also recently freed, probably along with this one. Curious. We'll compare the two regions by dumping their contents based on their specified sizes. I usually use dc instead of dps first on pool allocation since strings are often the best clues to gain from em. Here they are:

Code:
6: kd> [COLOR=Blue]dc[/COLOR] [COLOR=DarkSlateBlue]fffff8a00feb0190 fffff8a00feb0190+70[/COLOR]
fffff8a0`0feb0190  05070104 e24e4d43 09d68b30 fffffa80  ....CMN.0.......
fffff8a0`0feb01a0  00000000 00000000 0feb01b0 fffff8a0  ................
fffff8a0`0feb01b0  0052005c 00470045 00530049 00520054  \.R.E.G.I.S.T.R.
fffff8a0`0feb01c0  005c0059 0041004d 00480043 004e0049  Y.\.M.A.C.H.I.N.
fffff8a0`0feb01d0  005c0045 00590053 00540053 004d0045  E.\.S.Y.S.T.E.M.
fffff8a0`0feb01e0  0043005c 006e006f 00720074 006c006f  \.C.o.n.t.r.o.l.
fffff8a0`0feb01f0  00650053 00300074 00310030 00000000  S.e.t.0.0.1.....
fffff8a0`0feb0200  03040107                             ....
6: kd> [COLOR=Blue]dc[/COLOR] [COLOR=Purple]fffff8a00feb0390 fffff8a00feb0390+30[/COLOR]
fffff8a0`0feb0390  05030115 e24e4d43 00000000 00000000  ....CMN.........
fffff8a0`0feb03a0  004c004c 00000000 0feb03b0 fffff8a0  L.L.............
fffff8a0`0feb03b0  0052005c 00470045 00530049 00520054  \.R.E.G.I.S.T.R.
fffff8a0`0feb03c0  005c0059                             Y.\.
Hmm, without a doubt the corrupt region looks to be cut off midway. Let's see what it looks like if we give it the 'normal' size of 0x70:

Code:
6: kd> [COLOR=Blue]dc[/COLOR] [COLOR=Purple]fffff8a00feb0390 fffff8a00feb0390+70[/COLOR]
fffff8a0`0feb0390  05030115 e24e4d43 00000000 00000000  ....CMN.........
fffff8a0`0feb03a0  004c004c 00000000 0feb03b0 fffff8a0  L.L.............
fffff8a0`0feb03b0  0052005c 00470045 00530049 00520054  \.R.E.G.I.S.T.R.
fffff8a0`0feb03c0  005c0059 0041004d 00480043 004e0049  Y.\.M.A.C.H.I.N.
fffff8a0`0feb03d0  005c0045 00590053 00540053 004d0045  E.\.S.Y.S.T.E.M.
fffff8a0`0feb03e0  0043005c 006e006f 00720074 006c006f  \.C.o.n.t.r.o.l.
fffff8a0`0feb03f0  00650053 00300074 00310030 fffff880  S.e.t.0.0.1.....
fffff8a0`0feb0400  03040104                             ....
Lookin pretty. Now let's look at each header by dumping the headers as data structures. I assumed the name was going to be _POOL_HEADER and sure enough, I was correct:

Code:
6: kd> [COLOR=Blue]dt !_POOL_HEADER[/COLOR]
nt!_POOL_HEADER
   +0x000 PreviousSize     : Pos 0, 8 Bits
   +0x000 PoolIndex        : Pos 8, 8 Bits
   +0x000 BlockSize        : Pos 16, 8 Bits
   +0x000 PoolType         : Pos 24, 8 Bits
   +0x000 Ulong1           : Uint4B
   +0x004 PoolTag          : Uint4B
   +0x008 ProcessBilled    : Ptr64 _EPROCESS
   +0x008 AllocatorBackTraceIndex : Uint2B
   +0x00a PoolTagHash      : Uint2B
6: kd>[COLOR=Blue] dt !_POOL_HEADER[/COLOR] [COLOR=DarkSlateBlue]fffff8a00feb0190[/COLOR]
nt!_POOL_HEADER
   +0x000 PreviousSize     : 0y00000100 (0x4)
   +0x000 PoolIndex        : 0y00000001 (0x1)
   +0x000 BlockSize        : 0y00000111 (0x7)
   +0x000 PoolType         : 0y00000101 (0x5)
   +0x000 Ulong1           : 0x5070104
   +0x004 PoolTag          : 0xe24e4d43
   +0x008 ProcessBilled    : 0xfffffa80`09d68b30 _EPROCESS
   +0x008 AllocatorBackTraceIndex : 0x8b30
   +0x00a PoolTagHash      : 0x9d6
6: kd> [COLOR=Blue]dt !_POOL_HEADER[/COLOR] [COLOR=Purple]fffff8a00feb0390[/COLOR]
nt!_POOL_HEADER
   +0x000 PreviousSize     : 0y00010101 (0x15)
   +0x000 PoolIndex        : 0y00000001 (0x1)
   +0x000 BlockSize        : 0y00000011 (0x3)
   +0x000 PoolType         : 0y00000101 (0x5)
   +0x000 Ulong1           : 0x5030115
   +0x004 PoolTag          : 0xe24e4d43
   +0x008 ProcessBilled    : (null) 
   +0x008 AllocatorBackTraceIndex : 0
   +0x00a PoolTagHash      : 0
Aside from the manipulated BlockSize value which is the corrupted value, I'm initially suspicious about the lack of values in the other ones like ProcessBilled, AllocatorBackTraceIndex & PoolTagHash. However research hasn't really come up with anything solid on the last two, and ProcessBilled appears to be only necessary under the condition the driver requesting the pool allocation didn't provide a pool tag. So I'm not confident we're dealing with any issues with those items there, to my current knowledge. So at the moment, unless I'm completely unaware, we don't appear to see any other problems in the header besides the pool size getting messed up.

So far we're looking at just the 0x30 value being erroneous. Now, when you look at it in binary and compare it to 0x70, you'll see that there's only a solitary bit difference between the two:

Code:
6: kd> [COLOR=Blue].formats 30[/COLOR]
Evaluate expression:
  Hex:     00000000`00000030
  Decimal: 48
  Octal:   0000000000000000000060
  Binary:  00000000 00000000 00000000 00000000 00000000 00000000 00000000 0[COLOR=Red]0[/COLOR]110000
  Chars:   .......0
  Time:    Wed Dec 31 19:00:48 1969
  Float:   low 6.72623e-044 high 0
  Double:  2.37152e-322
6: kd> [COLOR=Blue].formats 70[/COLOR]
Evaluate expression:
  Hex:     00000000`00000070
  Decimal: 112
  Octal:   0000000000000000000160
  Binary:  00000000 00000000 00000000 00000000 00000000 00000000 00000000 0[COLOR=Red]1[/COLOR]110000
  Chars:   .......p
  Time:    Wed Dec 31 19:01:52 1969
  Float:   low 1.56945e-043 high 0
  Double:  5.53354e-322
So my suspicion weighs rather heavy on there being a bit flip here. Everything else just looks too clean to have it be otherwise. Now just to make sure, we're going to dump a little further more than the end of the corrupt pool allocation to see if there's anything that matches up. I would suspect the pool allocation after this should actually have the valid PreviousSize, which should be 0x70 instead of 0x30 (actually, 0x7 instead of 0x3, since block sizes are counted by 0x10's). Let's give a look-see:

Code:
6: kd> [COLOR=#0000cd]dc[/COLOR] [COLOR=#800080]fffff8a00feb0390 fffff8a00feb0390+100[/COLOR]
fffff8a0`0feb0390  05030115 e24e4d43 00000000 00000000  ....CMN.........
fffff8a0`0feb03a0  004c004c 00000000 0feb03b0 fffff8a0  L.L.............
fffff8a0`0feb03b0  0052005c 00470045 00530049 00520054  \.R.E.G.I.S.T.R.
fffff8a0`0feb03c0  005c0059 0041004d 00480043 004e0049  Y.\.M.A.C.H.I.N.
fffff8a0`0feb03d0  005c0045 00590053 00540053 004d0045  E.\.S.Y.S.T.E.M.
fffff8a0`0feb03e0  0043005c 006e006f 00720074 006c006f  \.C.o.n.t.r.o.l.
fffff8a0`0feb03f0  00650053 00300074 00310030 fffff880  S.e.t.0.0.1.....
[COLOR=RoyalBlue]fffff8a0`0feb0400 [/COLOR] 03040104 6d536d4d 00000000 00000000  ....MmSm........  [COLOR=DarkGreen]//Notice the 'MmSm' pooltag? This is where this allocation starts.[/COLOR]
fffff8a0`0feb0410  068882d0 fffffa80 00000040 000c0000  ........@.......
fffff8a0`0feb0420  00000000 00000000 00040000 00000000  ................
fffff8a0`0feb0430  00000000 00000000 00000000 00000000  ................
fffff8a0`0feb0440  03140104 7346744e 00000000 00000000  ....NtFs........
fffff8a0`0feb0450  000009df 00010000 00000000 00000000  ................
fffff8a0`0feb0460  00000000 00000000 00000000 00000000  ................
fffff8a0`0feb0470  00000000 00000000 00000000 00000000  ................
fffff8a0`0feb0480  00000000 00000000 00000000 00000000  ................
fffff8a0`0feb0490  0050013b                             ;.P.

6: kd> [COLOR=Blue]dt !_POOL_HEADER[/COLOR] [COLOR=RoyalBlue]fffff8a0`0feb0400[/COLOR]
nt!_POOL_HEADER
   +0x000 PreviousSize     : 0y00000100 (0x4)
   +0x000 PoolIndex        : 0y00000001 (0x1)
   +0x000 BlockSize        : 0y00000100 (0x4)
   +0x000 PoolType         : 0y00000011 (0x3)
   +0x000 Ulong1           : 0x3040104
   +0x004 PoolTag          : 0x6d536d4d
   +0x008 ProcessBilled    : (null) 
   +0x008 AllocatorBackTraceIndex : 0
   +0x00a PoolTagHash      : 0
What?! This is where the monkey wrench is thrown in. It doesn't say 0x3, and it doesn't say 0x7, but instead it says previous size is 0x4, or 0x40. This isn't right! I went ahead and checked the next allocation after this one and that one correctly gave PreviousSize of 0x4, which is the size of the allocation we're looking at now. So it wasn't that the whole pool table was messed up all the way down. Something's odd here.

TBH, this is the item here that had me pretty stumped. Part of me thinks that a little bit of math was involved in calculating the previous size, since 0x3 and 0x4 add up to 0x7 which is the correct amount, it might have thought that there was another allocation sitting here. In fact, upon pondering on it further, there probably was!

See, what it looks like now, is that this wasn't because the pool allocation for the Config. Manager happened to have been originally 0x70 and all of a sudden was reduced to 0x30. Rather, the original value was probably in fact 0x30! So the Configuration Manager erroneously allocated 30 bytes for this region when it was supposed to be 70, and then it indeed overflowed by shoving that long string of text into it, which spilled over and completely obliterated the pool allocation made after it.

This is where the plot thickens. Now the problem is, we need to know where on earth the Config. Manager decided to allocate 30 bytes instead of 70 like its previous allocations. When dumping the raw stack of this faulting thread I do see that before the attempt to free the pool (which, btw, is the only time when Windows - without Driver Verifier - checks for pool corruption) allocation that has the erroneous size, that this thread did previously allocate some pool (e.g. nt!MiAllocatePagedPoolPages) so now I have to figure out if this is the same pool allocation we're dealing with, and if so when it gave the request to allocate 30 bytes instead of 70. This is where things get pretty heavy, and where I seriously doubt I'll be able to pull much from this minidump, but I'm willing to give it a shot.

That's all I have for this right now, but my suspicion still rests on hardware problems. The fact it's only a bit away from being correct is just too close to be a suspected driver bug, and TBH I'm not sure any third party driver was bugging here, since this looks like it was all on the Config. Manager (which, btw, is the subsystem in Windows that implements the registry, hence the registry keys in the pool allocations). It just looks too much like a hardware fluke right now, so I still recommend approaching it with that hypothesis in mind.

To be continued...
 

Attachments

Hi fellas,

Continuing on my venture into this, I've come to the conclusion that unfortunately we won't be able to gain much more from this. From what is displayed in the raw thread stack I haven't been able to discover the erroneous allocation made, and I don't believe it will actually show up on this thread's context, in that the pool was allocated too far back in time and/or from another thread. This is one of those moments where a Driver Verifier crashdump followed by looking at the pool allocation logs generated by DV using !verifier 82 (that's the 0x2 and 0x80 switches). However, my suspicion of hardware problems has so far indeed shown accurate, in that replacing the RAM has so far stabilized the system. While this was an unfortunate case where we were too late (crash happened well after the bad pool allocation) and without the right tools (Driver Verifier), I hope people are willing to at least venture this far and use it as a means to solidify their understanding of how this all operates.

Again, any questions are welcome. Sorry for the cliffhanger guys, I wish I had more to work with than a minidump!
 
Back
Top