Bit Flips

Vir Gnarus

BSOD Kernel Dump Expert
Joined
Mar 2, 2012
Posts
474
Hi fellas, a bit of a short one this time, but worth mentioning. It's pretty much a copy-and-paste from this thread, with some extra explanations and modifications. BSODs are attached.

There are sometimes when you are struggling to find a pattern but whatever you do find ends up still confusing you, or you find no pattern at all, and are at a loss on clues. This is one instance where scrutiny of the registers pays off, and that simple patterns like this can be found where all other clues are missing.

In this case, at first glance it seems to exhibit wild behavior. The faulting stacks displayed that this error occurred anywhere on nearly anything and at any time, so I initially perceived it as either a driver corrupting something that ends up getting triggered by innocent drivers handling the memory, or hardware.

Digging in deeper, I noticed that practically all the crashes displayed that an unhandled exception occurred that was not dealt with. The error reports were typically an access violation (c0000005) and it was always an attempt to read address 0xffffffffffffffff. As an example (that I'll use throughout this post), here is one snippet of the readout from the !analyze -v for one of the crashdumps:

Code:
EXCEPTION_RECORD:  fffff800043df638 -- (.exr 0xfffff800043df638)
ExceptionAddress: fffff88004152b3d (dxgmms1!VIDMM_GLOBAL::UnreferenceDmaBuffer+0x000000000000007d)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: ffffffffffffffff
Attempt to read from address [COLOR=#ff0000]ffffffffffffffff[/COLOR]

TRAP_FRAME:  fffff800043df6e0 -- (.trap 0xfffff800043df6e0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=[COLOR=#ff0000]ff7ffa800a0a5ed0 [/COLOR]rbx=0000000000000000 rcx=fffffa80077b4470
rdx=[COLOR=#006400]fffffa800a3cd820 [/COLOR]rsi=0000000000000000 rdi=0000000000000000
rip=fffff88004152b3d rsp=fffff800043df870 rbp=fffffa8007270d50
 r8=000000000000008e  r9=fffffa80077b6d58 r10=fffffa80092707d0
r11=0000000000000002 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz na pe nc
dxgmms1!VIDMM_GLOBAL::UnreferenceDmaBuffer+0x7d:
fffff880`04152b3d f0834018ff      lock add dword ptr [rax+18h],0FFFFFFFFh ds:[COLOR=#ff0000][I]ff7ffa80`0a0a5ee8[/I][/COLOR]=????????

Look at the bad address it's trying to read in the exception record, then look at the very bottom under the register list where it shows you the actual instruction trying to read that address. As you notice, the actual address it was trying to read (italicized for emphasis) doesn't look anywhere near the obviously bad address that the exception record shows. In fact this one looks rather legit.

Now at first I'm completely baffled by this a bit, because for some reason it always wants to read that bad address even though it isn't exactly pointing to it. However, upon careful scrutiny, I found some discrepancies. Look at the instruction, which is lock add dword ptr [rax+18h]. Notice that it is trying to read the value stored in the rax register and add 18h to it, then the resulting value is to be a pointer leading to the data it wants to deal with. Now, look at the rax register, which is ff7ffa800a0a5ed0. Compare it to the other registers that have similar address names, like rdx and rcx. Notice anything odd? That's right, for some reason a 7 managed to be present in that last portion of the address, making it FF7FF as opposed to the others which are FFFFFF. Why is it there? Well let's take a gander using the .formats command to evaluate these numbers in other formats and compare again (note the bold digits in Binary):
Code:
2: kd> 0: kd> [COLOR=Blue].formats [/COLOR][COLOR=Red]ff7ffa800a0a5ed0[/COLOR][COLOR=Blue];.formats [/COLOR][COLOR=Green]fffffa800a3cd820[/COLOR]
Evaluate expression:
  Hex:     ff7ffa80`0a0a5ed0
  Decimal: -36034844164464944
  Octal:   1775777650001202457320
  Binary:  11111111 [COLOR=Red][B]0[/B][/COLOR]1111111 11111010 10000000 00001010 00001010 01011110 11010000
  Chars:   .....^.
  Time:    ***** Invalid FILETIME
  Float:   low 6.66229e-033 high -3.40254e+038
  Double:  -1.4035e+306
Evaluate expression:
  Hex:     fffffa80`0a3cd820
  Decimal: -6047142193120
  Octal:   1777777650001217154040
  Binary:  11111111 [COLOR=Green][B]1[/B][/COLOR]1111111 11111010 10000000 00001010 00111100 11011000 00100000
  Chars:   .....<. 
  Time:    ***** Invalid FILETIME
  Float:   low 9.09252e-033 high -1.#QNAN
  Double:  -1.#QNAN

This is a classic case of a bit flip, which is a situation in which a single bit has inadvertently been flipped to 0 for what seems to be no reason. If you examine any of the other crashes, this shows up pretty much all the time (though not in the same registers). Now if it were the case of a long string of bits being changed, we can possibly attribute that to driver passing a bad address to the register or other hardware malfunctioning like HD or RAM (most likely). But with very small cases with only 1 bit flipped in this manner, I've only found this being caused from PSU, Mobo or CPU problems, with CPU being most likely cause.
 

Attachments

Last edited:
It seems that the general protection fault handler always raises a c0000005 exception with "Attempt to read from address ffffffffffffffff". Try "int 8" in user mode for an example. But why does an invalid memory access raise a GPF instead of page fault in the flat paged memory model of Windows?
 
This is one of those questions where I remember getting the answer somewhere, but I don't remember the answer. I believe it has a lot to do with the actual address, because Windows embeds safeties to detect attempting to access memory using a null reference, so it may hit a different exception routine to handle this, compared to what may be hit when an address that may actually be addressable but isn't legit (or accessed at wrong IRQL). It just assumes that there's no legitimate reason to access address 0000000000000000 or ffffffffffffffff, and that it only occurred because of using a null pointer.
 
Sorry to bring up an old thread, but which two addresses did you use with the formats command? I couldn't see them anywhere in the above examples you provided, or maybe I just missed it completely :huh:
 
Doh! Thanks for catching that. Actually what happened was I incidentally demonstrated the .formats command from another crashdump the client provided that displayed identical symptoms.

In fact, I also noticed in this that the crashdumps aren't attached afterall! I'll see if I can scrounge it up and make the corrections/additions.
 
Thanks a mil for that, mate. Though, I'm not sure that applies to the specific case of a null pointer dereference which involves either an all zero or all F's address, because both of those fit as a canonical address. Though again, in this case, neither of those addresses are the ones that was really accessed, but rather ff7ffa800a0a5ed0, which is noncanonical indeed.
 
Code:
int main()
{
    int* a = (int*)0xff7ffa800a0a5ed0;
    *a = 5;
}

av.PNG
 
Thanks. Have you checked to see if the same occurs during access to a canonical address?
 
Back
Top