I am in despair! Persistent authenticamd.sys BSOD's

CheekyKid · Jun 28, 2024

Hello everyone!

I am hoping for someone to help me find out what is causing all the BSOD'S. They happen randomly, while my pc is under no stress. Just being on desktop and on youtube or whatever other app will result in a blue screen. This happens
once a day or it may even take longer but it will happen! My cpu temps so no issues there and and I have run windows diagnostic memory test (extended) overnight and it came out without any errors. All my drivers are up to date included motherboards. All drivers taken from the manufacturer not from random websites. Even with a clean installation of windows 11 I still get this blue screen. I simply cannot tell what it could be and I don't want to start buying new hardware only for tests. File attached.

Thanking you a lot in advance!

ubuysa · Jun 29, 2024

Hello, and welcome to the forum!

One BSOD is a 0x133 DPC_WATCHDOG_VIOLATION that may have been caused by your Nvidia graphics driver (or the graphics card itself). The version of nvlddmkm.sys (the Nvidia graphics driver) that you have installed is old, dating from 2020...

Code:

0: kd> lmvm nvlddmkm
Browse full module list
start             end                 module name
fffff806`4c8a0000 fffff806`4e813000   nvlddmkm T (no symbols)          
    Loaded symbol image file: nvlddmkm.sys
    Image path: \SystemRoot\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_1c83a5d7cffd7bff\nvlddmkm.sys
    Image name: nvlddmkm.sys
    Browse all global symbols  functions  data
    Timestamp:        Thu Oct  1 07:08:42 2020 (5F75564A)
    CheckSum:         01F01EC5
    ImageSize:        01F73000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

That may well be the driver version available from your motherboard vendor's website but for a desktop card you should be using the Nvidia driver download site and install the latest driver from there (it's dated 27th June 2024). It's possible that this may be the source of all your problems, so do this first and then see how things go.

The other four dumps are all 0x123 WHEA_UNCORRECTABLE_ERROR bugchecks, as you might imagine these are typically caused by a hardware failure, although they can sometimes be caused by a bad driver. All of the dumps show the same failure bucket...

Code:

FAILURE_BUCKET_ID:  0x124_0_AuthenticAMD_MEMORY__UNKNOWN_FATAL_IMAGE_AuthenticAMD.sys

These suggest that the failing hardware may well be either RAM or possibly the CPU. Evidence from your System and Application logs makes me think that bad RAM is more likely, I can see a couple of WHEA errors in your System log...

Code:

Log Name:      System
Source:        Microsoft-Windows-WHEA-Logger
Date:          18/06/2024 22:34:34
Event ID:      18
Task Category: None
Level:         Error
Keywords:    
User:          LOCAL SERVICE
Computer:      AMD
Description:
A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 8

The details view of this entry contains further information.

The Cache Hierarchy Error type reported there could be bad RAM or the CPU cache, but in your Application log there are a fair number of Application errors with 0xC0000005 exception codes, this is an invalid memory access exception. These could also be CPU of course, but bad RAM is more likely and it's easier to test.

I can see that you have 32GB or RAM in two 16GB sticks. You could run Memtest86 over that RAM - the Windows memory tester is not very thorough at all - but that will take a long time and you won't be able to use the PC whilst Memtest86 is running. A better, and far more reliable, RAM test is to remove one stick of RAM and run on just the one 16GB stick for a few days (or until you get a BSOD). Check with you motherboard manual that the one stick is in the correct slot (typically A2). After a few days (or a BSOD) swap the RAM sticks over and run on just the other 16GB stick for a few days (or until you get a BSOD).

If it BSODs on one stick but not the other then you've located the problem. If it BSODs on both sticks then it's unlikely to be a RAM problem and we'll then talk about testing the CPU.

CheekyKid · Jun 29, 2024

ubuysa said:
Hello, and welcome to the forum!

One BSOD is a 0x133 DPC_WATCHDOG_VIOLATION that may have been caused by your Nvidia graphics driver (or the graphics card itself). The version of nvlddmkm.sys (the Nvidia graphics driver) that you have installed is old, dating from 2020...

Code:

0: kd> lmvm nvlddmkm Browse full module list start end module name fffff806`4c8a0000 fffff806`4e813000 nvlddmkm T (no symbols) Loaded symbol image file: nvlddmkm.sys Image path: \SystemRoot\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_1c83a5d7cffd7bff\nvlddmkm.sys Image name: nvlddmkm.sys Browse all global symbols functions data Timestamp: Thu Oct 1 07:08:42 2020 (5F75564A) CheckSum: 01F01EC5 ImageSize: 01F73000 Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4 Information from resource tables:

That may well be the driver version available from your motherboard vendor's website but for a desktop card you should be using the Nvidia driver download site and install the latest driver from there (it's dated 27th June 2024). It's possible that this may be the source of all your problems, so do this first and then see how things go.

The other four dumps are all 0x123 WHEA_UNCORRECTABLE_ERROR bugchecks, as you might imagine these are typically caused by a hardware failure, although they can sometimes be caused by a bad driver. All of the dumps show the same failure bucket...

Code:

FAILURE_BUCKET_ID: 0x124_0_AuthenticAMD_MEMORY__UNKNOWN_FATAL_IMAGE_AuthenticAMD.sys

These suggest that the failing hardware may well be either RAM or possibly the CPU. Evidence from your System and Application logs makes me think that bad RAM is more likely, I can see a couple of WHEA errors in your System log...

Code:

Log Name: System Source: Microsoft-Windows-WHEA-Logger Date: 18/06/2024 22:34:34 Event ID: 18 Task Category: None Level: Error Keywords: User: LOCAL SERVICE Computer: AMD Description: A fatal hardware error has occurred. Reported by component: Processor Core Error Source: Machine Check Exception Error Type: Cache Hierarchy Error Processor APIC ID: 8 The details view of this entry contains further information.

The Cache Hierarchy Error type reported there could be bad RAM or the CPU cache, but in your Application log there are a fair number of Application errors with 0xC0000005 exception codes, this is an invalid memory access exception. These could also be CPU of course, but bad RAM is more likely and it's easier to test.

I can see that you have 32GB or RAM in two 16GB sticks. You could run Memtest86 over that RAM - the Windows memory tester is not very thorough at all - but that will take a long time and you won't be able to use the PC whilst Memtest86 is running. A better, and far more reliable, RAM test is to remove one stick of RAM and run on just the one 16GB stick for a few days (or until you get a BSOD). Check with you motherboard manual that the one stick is in the correct slot (typically A2). After a few days (or a BSOD) swap the RAM sticks over and run on just the other 16GB stick for a few days (or until you get a BSOD).

If it BSODs on one stick but not the other then you've located the problem. If it BSODs on both sticks then it's unlikely to be a RAM problem and we'll then talk about testing the CPU.

Thanks for your reply!

Oh, it just escaped me with my newest installation to update my Nvidia drivers. I never really get the DPC_WATCHDOG_VIOLATION, this was one off. All the previous minidumps before I started saving them only had AuthenticAMD.sys errors.

For how long should I run memtest86? 1 pass was completed successfully without errors.

CheekyKid · Jun 29, 2024

I use for my tests Passmarks memtest edition. Is that ok?

CheekyKid · Jun 29, 2024

This is the result of the Ram test so far, no errors. Can I trust this test completely or my ram could still be the problem? Next step is to run my machine with one stick. Within the span of 2-4 days I will get a BSOD usually.

ubuysa · Jun 30, 2024

The Passmark version is the correct one and we recommend running it twice (for 8 iterations of the 13 different tests). I'd have preferred that you remove one RAM stick for a few days.

Have you updated the Nvidia graphics driver? It's important that you do that first.

CheekyKid · Jun 30, 2024

ubuysa said:
The Passmark version is the correct one and we recommend running it twice (for 8 iterations of the 13 different tests). I'd have preferred that you remove one RAM stick for a few days.

Have you updated the Nvidia graphics driver? It's important that you do that first.

Yes graphics driver updated and I have removed 1 stick starting yesterday evening. I need give it some time now to see if this is going to be resolved.

ubuysa · Jul 1, 2024

Please do. Bad RAM is such a common cause of BSODs that you need to be 100% certain that it's not causing yours.

CheekyKid · Jul 4, 2024

Even with 1 stick I just had another crash. No BSOD, just straight shutdown and no minidump generated. Is it now time to try the other Ram stick? I had a good few days but back to crashes now. At least they don't happen too often but still quite irritating.

Event viewer shows this error:
A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 8

The details view of this entry contains further information.

CheekyKid · Jul 4, 2024

Also had 2 lock ups where my PC freezes on windows desktop and only hard shut down works. When these issues occur my system is not under stress as I don't pay games and my cpu temps are fine. Also nothing is overclocked. I hope its not my CPU as I have no warranty on it.

CheekyKid · Jul 4, 2024

I switched to my other stick and waiting to see if am going to experience further crashes. If yes we can then rule out that its a RAM problem, right?

ubuysa · Jul 5, 2024

That 'cache hierarchy error' can be caused by bad RAM. You really ned to be certain that you RAM is good before we look elsewhere. It would appear already that this is probably a hardware failure.

CheekyKid · Jul 9, 2024

The culprit for the freezes was the evernote program that I had. Whenever I put it on for a bit it would memory leak and kill my system. I have since removed it and these issues have now resolved. So far so good on 1 stick with regards to BSOD'S. I wonder if taking the DIMMS out and resettling them may have fixed the issue though I don't understand why this would help? Anyone could provide the technical explanation for it?

Still a bit early to judge but so far so good with this stick, fingers crossed.

ubuysa · Jul 10, 2024

With modern RAM and the speeds at which it operates the tiniest microscopic bit of dust trapped between a pin on the RAM stick and the slot in the motherboard can cause havoc. We often ask people to remove and reseat RAM to help clear any potential bits of dust out. In addition, if that RAM isn't quite fully seated that can cause strange issues, so we do a reseat to ensure it's fully home. The same technique is useful for M.2 storage drives too, so if you ever suspect an M.2 SSD may be a bit flaky then remove it and reseat it.

CheekyKid · Sep 12, 2024

I am back and haven't made any progress. My latest minidump file is: 091224-9515-01.dmp and I still experience the same error. Can you tell whether from this dump file we can see any new information?

ubuysa · Sep 13, 2024

We'd always prefer that you run the Sysnative file collection app again and upload the new output file. We often need more data than just the dump....

That latest dump however indicates a hardware error. It's a 0x124 WHEA_UNCORRECTABLE_ERROR (WHEA is the Windows Hardware Error Architecture), the failure bucket suggests a memory error...

Code:

FAILURE_BUCKET_ID:  0x124_0_AuthenticAMD_MEMORY__UNKNOWN_FATAL_IMAGE_AuthenticAMD.sys

Though we can't eliminate the CPU as the cause here yet.

Looking back at your earlier Sysnative output there are several WHEA information log entries reporting hardware errors...

Code:

Log Name:      System
Source:        Microsoft-Windows-WHEA-Logger
Date:          20/08/2024 12:58:18
Event ID:      3
Task Category: None
Level:         Information
Keywords:      WHEA Error Event Logs
User:          LOCAL SERVICE
Computer:      Minotaur
Description:
A hardware event has occurred. An informational record describing the condition is contained in the data section of this event.

Unfortunately this message doesn't give any indication of the device that may be at fault.

Can you please navigate to C:\Windows\LiveKernelReports, there may be several sub-folders under there. Check in each sub-folder and upload all dumps (.dmp) that you find in there. These are dumps written when Windows encounters a problem from which it's able to recover. These dumps may help us identify what hardware may be failing.

I think it's also well worth putting your CPU under stress to see whether that may be at fault. Before you do this please give the PC a good clean inside, ensure that all dust filters are clean, and position the PC so that it gets a good airflow around and through the case. This CPU stress test will make the CPU run hot so you need to ensure your cooling is at it's best...

Download Prime95 and a CPU temperature monitor (CoreTemp will do).
Keep the temperature monitor running all the time you run Prime95. Your CPU will get hot!
Run each of the three Prime95 tests (smallFFTs, largeFFTs, and Blend) one after the other for a minimum of 1 hour per test, 2 hours per test would be better.
If Prime95 generates error messages, if the system crashes/freezes/BSODs, or if your CPU temp approaches 90°C (Tmax for your CPU), then stop Prime95 and let us know what happened.

Note that a properly cooled and stable CPU should be able to run all Prime95 tests pretty much indefinitely.

FYI: The small FFT test stresses the CPU more than RAM. The large FFT test stresses RAM more than the CPU. The Blend test is a mixture of the two.

CheekyKid · Sep 13, 2024

ubuysa said:
We'd always prefer that you run the Sysnative file collection app again and upload the new output file. We often need more data than just the dump....

That latest dump however indicates a hardware error. It's a 0x124 WHEA_UNCORRECTABLE_ERROR (WHEA is the Windows Hardware Error Architecture), the failure bucket suggests a memory error...

Code:

FAILURE_BUCKET_ID: 0x124_0_AuthenticAMD_MEMORY__UNKNOWN_FATAL_IMAGE_AuthenticAMD.sys

Though we can't eliminate the CPU as the cause here yet.

Looking back at your earlier Sysnative output there are several WHEA information log entries reporting hardware errors...

Code:

Log Name: System Source: Microsoft-Windows-WHEA-Logger Date: 20/08/2024 12:58:18 Event ID: 3 Task Category: None Level: Information Keywords: WHEA Error Event Logs User: LOCAL SERVICE Computer: Minotaur Description: A hardware event has occurred. An informational record describing the condition is contained in the data section of this event.

Unfortunately this message doesn't give any indication of the device that may be at fault.

Can you please navigate to C:\Windows\LiveKernelReports, there may be several sub-folders under there. Check in each sub-folder and upload all dumps (.dmp) that you find in there. These are dumps written when Windows encounters a problem from which it's able to recover. These dumps may help us identify what hardware may be failing.

I think it's also well worth putting your CPU under stress to see whether that may be at fault. Before you do this please give the PC a good clean inside, ensure that all dust filters are clean, and position the PC so that it gets a good airflow around and through the case. This CPU stress test will make the CPU run hot so you need to ensure your cooling is at it's best...

Download Prime95 and a CPU temperature monitor (CoreTemp will do).

Keep the temperature monitor running all the time you run Prime95. Your CPU will get hot!

Run each of the three Prime95 tests (smallFFTs, largeFFTs, and Blend) one after the other for a minimum of 1 hour per test, 2 hours per test would be better.

If Prime95 generates error messages, if the system crashes/freezes/BSODs, or if your CPU temp approaches 90°C (Tmax for your CPU), then stop Prime95 and let us know what happened.

Note that a properly cooled and stable CPU should be able to run all Prime95 tests pretty much indefinitely.

FYI: The small FFT test stresses the CPU more than RAM. The large FFT test stresses RAM more than the CPU. The Blend test is a mixture of the two.

Thanks for that.

C:\Windows\LiveKernelReports has some folders but they are all empty.

I will test the CPU and get back to you so that we can at least rule out the cpu.

CheekyKid · Saturday at 11:04 PM

Ok I have an update.

The WHEA_UNCORRECTABLE_ERROR 0x124(0x0, 0xFFFFBB8897A37028, 0xBC800800, 0x60C0859) errors seem to stop when I let my fans via the MSI app to run as they want without my intervention on "Balanced" option. I've been running on this setting for a while and no errors whatsoever however if I change it to "silent" or custom setting and there lower a bit the CPU fan threshold setting I start getting the same errors again. It seems to be a heat related issue. I cannot stand my CPU fans as they are noisy when running quickly so at this point I think I need to consider a water cooling solution for the CPU.

xilolee · Sunday at 12:26 PM

£60.82: ARCTIC Liquid Freezer III 280

CheekyKid · Sunday at 5:32 PM

xilolee said:
£60.82: ARCTIC Liquid Freezer III 280

Is this quiet? what about the Be quiet! Silent Loop 2 280mm for £88? I care more about silent operation than performance

I am in despair! Persistent authenticamd.sys BSOD's

Member

Attachments

Sysnative Staff BSOD Kernel Dump Senior AnalystContributor

Member

Member

Member

Attachments

Sysnative Staff BSOD Kernel Dump Senior AnalystContributor

Member

Sysnative Staff BSOD Kernel Dump Senior AnalystContributor

Member

Member

Member

Sysnative Staff BSOD Kernel Dump Senior AnalystContributor

Member

Sysnative Staff BSOD Kernel Dump Senior AnalystContributor

Member

Sysnative Staff BSOD Kernel Dump Senior AnalystContributor

Member

Member

Moderator

Member

Sysnative Staff
BSOD Kernel Dump Senior Analyst
Contributor

Sysnative Staff
BSOD Kernel Dump Senior Analyst
Contributor

Sysnative Staff
BSOD Kernel Dump Senior Analyst
Contributor

Sysnative Staff
BSOD Kernel Dump Senior Analyst
Contributor

Sysnative Staff
BSOD Kernel Dump Senior Analyst
Contributor

Sysnative Staff
BSOD Kernel Dump Senior Analyst
Contributor