Freezing/BSOD DPC WATCHDOG VIOLATION

Lightning13 · Sep 12, 2024

Honestly, I don't really know how I got it to work. I just removed the first file that it was trying to grab, maybe I messed up that file.

I have unplugged all connections to my PC besides displays, keyboard, mouse(BT), and ethernet. Still got some crashing.

I tried to use your command, but it didn't work. I'm getting some error, thought I was copying too much of the line but I guess not. I attached an image of the window.

Lightning13 · Sep 12, 2024

Also I just crashed during idle, which I don't think has really happened before. It seems to have the same error though. Here's an updated Sysnative Collection file.

Lightning13 · Sep 12, 2024

I was able to track down the path manually in RegEdit, here's a snapshot of what I found. If you need more just let me know.

Do I now try to change this value from 1 to 0, and see if that helps?

Lightning13 · Sep 12, 2024

Changed it and still crashing. I will change it back now?

ubuysa · Sep 13, 2024

Lightning13 · Sep 13, 2024

I just built this PC a couple months ago, my settings say that I'm up to date.

Should I just go ahead and update to Windows 11?

ubuysa · Sep 14, 2024

Personally I would strongly advise against updating a system know to have issues to Windows 11. You just create a whole new set of problems. We need to find out why the system is crashing before it's wise to consider updating to Windows 11.

The most recent dump (Fri Sept 13) is a 0x116 VIDEO_TDR_FAILURE, the other dumps also highlight nvlddmkm.sys (the Nvidia graphics driver). THis does look like a graphics problem. I'm still concerned that the PSU might not be up to that RTX4090 with power spikes on load. Did you ever upgrade to the 1000W PSU?

Lightning13 · Sep 14, 2024

ubuysa said:
Personally I would strongly advise against updating a system know to have issues to Windows 11. You just create a whole new set of problems. We need to find out why the system is crashing before it's wise to consider updating to Windows 11.

The most recent dump (Fri Sept 13) is a 0x116 VIDEO_TDR_FAILURE, the other dumps also highlight nvlddmkm.sys (the Nvidia graphics driver). THis does look like a graphics problem. I'm still concerned that the PSU might not be up to that RTX4090 with power spikes on load. Did you ever upgrade to the 1000W PSU?

I haven’t yet. Just waiting on shipment and also will be gone this week for work. So it will be around a week or so before I can do anything else.

I’ll swap the PSU ASAP and update.

x BlueRobot · Sep 16, 2024

Lightning13 said:
Changed it and still crashing. I will change it back now?

No, just leave it as it is.

ET_Explorer · Sep 19, 2024

msinfo32 > Problem Devices shows >
Microsoft Basic Display Adapter PCI\VEN_1002&DEV_164E&SUBSYS_88771043&REV_CB\4&1EBE6A9C&0&0041 This device is disabled.

ET_Explorer · Sep 19, 2024

DxDiagx86 Windows Error Reporting shows app crashes involving AWGameLibrary.UCSubAgent.exe

Lightning13 · Sep 29, 2024

Okay so I installed the new 1000W PSU, still crashing. I then uninstalled AlienWare Control Center app, still crashing. Here are the most recent files.

Please let me know what I can try next.

ET_Explorer · Sep 30, 2024

Windows Error Reporting:

Fault bucket 0x116_TdrBCR:4:C000009A_Tdr:9_IMAGE_nvlddmkm.sys_Ada_SCG3D-AMD#0, type 0
nvlddmkm.sys allowing your computer to communicate with NVIDIA devices like the GPU.
Event Name: BlueScreen

+++ WER5 +++:
Fault bucket 1861905259368572200, type 1
Event Name: APPCRASH
Problem signature:
P1: FMSIScan.exe
P4: atiadlxx.dll

+++ WER6 +++:
Fault bucket , type 0
Event Name: APPCRASH
Problem signature:
P1: FMSIScan.exe
4: atiadlxx.dll

+++ WER7 +++:
Fault bucket 1448329291139938386, type 5
Event Name: RADAR_PRE_LEAK_WOW64
Problem signature:
P1: asus_framework.exe
^^^^^^^^RADAR_PRE_LEAK_64 can indicate issues with your system's hardware configuration."

Dump Files: Shows problems with CPU and GPU

ubuysa · Sep 30, 2024

OK, so we know that the lack of power during GPU spikes was not the issue, but it's still wise to have upgraded the PSU.

The two dumps both point very clearly at either the graphics driver or the graphics card...

One is a 0x116 VIDEO_TDR_FAILURE bugcheck, this happens when the Windows Timeout Detection and Recovery feature (TDR), which detects a graphics hang and resets the graphics driver and graphics card, fails to recover from the hang. The cause here is almost certainly either the driver or the card.

The other dump is a 0x133 DPC_WATCHDOG_TIMEOUT. A DPC is a Deferred Procedure Call and they are typically used in the back-end of device interrupt processing, the DPC code is part of the device driver. In this dump the graphics driver (nvlddmkm.sys) is where the failure happens...

Code:

FAILURE_BUCKET_ID: 0x133_ISR_nvlddmkm!unknown_function

Note that this failure bucket blames the ISR, the Interrupt Service Routine, which is the front-end of device interrupt processing, the ISR code is also part of the device driver. Long running of either the ISR or the DPC will cause this 0x133 bugcheck.

You thus have two dumps here, both pointing very clearly at a graphics problem, just as the earlier dump did. We now know it's not a power problem, although be sure that the additional power cable is securely plugged into the 4090 card and the PSU.

The first thing I'd suggest is that you remove the 4090 card and then re-seat it firmly. You'd be surprised how many times this simple action solves the problem. The slightest bit of dust or dirt between a card pin and the slot can cause all sorts of issues.

If that doesn't help then your next best option is to remove the 4090 and plug the monitor into the motherboard port and use the Radeon graphics iGPU on the CPU and see whether it crashes or BSODs then. If it's stable without the 4090 installed then you know for certain that the problem does lie with the 4090 or the driver. You might combine this test with removing and re-seating the 4090.

The best way to check whether the 4090 or the driver is at fault is to download the four most recent driver versions for that card from the Nvidia website. Also download DDU. Use DDU to uninstall the existing driver and then manually install the most recent driver. If it crashes or BSODs use DDU again to remove that driver and then manually install the next most recent driver. Keep doing this until you either find a driver where it's stable or it BSODs/crashes on every driver version. If it fails on the four most recent driver versions then the problem is likely the 4090 card.

xilolee · Sep 30, 2024

Check if the temperatures are within the limits.
Using SpeedFan, log the GPU/CPU temperatures and fan speeds to a logfile.

There are also a new BIOS and new chipset drivers.
Also, you may want to disable the RGB lights on your graphics card as they may cause problems.

Lightning13 · Sep 30, 2024

ubuysa said:
OK, so we know that the lack of power during GPU spikes was not the issue, but it's still wise to have upgraded the PSU.

The two dumps both point very clearly at either the graphics driver or the graphics card...

One is a 0x116 VIDEO_TDR_FAILURE bugcheck, this happens when the Windows Timeout Detection and Recovery feature (TDR), which detects a graphics hang and resets the graphics driver and graphics card, fails to recover from the hang. The cause here is almost certainly either the driver or the card.

The other dump is a 0x133 DPC_WATCHDOG_TIMEOUT. A DPC is a Deferred Procedure Call and they are typically used in the back-end of device interrupt processing, the DPC code is part of the device driver. In this dump the graphics driver (nvlddmkm.sys) is where the failure happens...

Code:

FAILURE_BUCKET_ID: 0x133_ISR_nvlddmkm!unknown_function

Note that this failure bucket blames the ISR, the Interrupt Service Routine, which is the front-end of device interrupt processing, the ISR code is also part of the device driver. Long running of either the ISR or the DPC will cause this 0x133 bugcheck.

You thus have two dumps here, both pointing very clearly at a graphics problem, just as the earlier dump did. We now know it's not a power problem, although be sure that the additional power cable is securely plugged into the 4090 card and the PSU.

The first thing I'd suggest is that you remove the 4090 card and then re-seat it firmly. You'd be surprised how many times this simple action solves the problem. The slightest bit of dust or dirt between a card pin and the slot can cause all sorts of issues.

If that doesn't help then your next best option is to remove the 4090 and plug the monitor into the motherboard port and use the Radeon graphics iGPU on the CPU and see whether it crashes or BSODs then. If it's stable without the 4090 installed then you know for certain that the problem does lie with the 4090 or the driver. You might combine this test with removing and re-seating the 4090.

The best way to check whether the 4090 or the driver is at fault is to download the four most recent driver versions for that card from the Nvidia website. Also download DDU. Use DDU to uninstall the existing driver and then manually install the most recent driver. If it crashes or BSODs use DDU again to remove that driver and then manually install the next most recent driver. Keep doing this until you either find a driver where it's stable or it BSODs/crashes on every driver version. If it fails on the four most recent driver versions then the problem is likely the 4090 card.

Let me start by saying I was using this GPU in a previous build with no crashes. It’s now in a water block as well as using a riser cable. I used the stock cable that came with the T1 and after some crashing I replaced it. I thought this fixed my problems but now obviously I’m crashing more. I’ve seen loads and loads of people expressing issues with riser cables, and I can try to get another one to test or bypass it altogether. My only issue is just the way the build is, it would be very difficult for me to bypass the riser cable. But I think this would have to be something to try as well, bypass the riser and go straight to the motherboard.

I’ve reseated the riser cable/GPU multiple times. I have also tried to use DDU multiple times. But only with the most recent driver, I can try it with some previous versions too. Maybe with an older version?

Lightning13 · Sep 30, 2024

xilolee said:
Check if the temperatures are within the limits.
Using SpeedFan, log the GPU/CPU temperatures and fan speeds to a logfile.

There are also a new BIOS and new chipset drivers.
Also, you may want to disable the RGB lights on your graphics card as they may cause problems.

I’ll try these and update with some result.

Does SpeedFan log the temps even in the event of a crash? For the most part, my temps have been fine from what I see.

My GPU is in a water block so lighting is not connected.

xilolee · Sep 30, 2024

Lightning13 said:
I’ll try these and update with some result.

Does SpeedFan log the temps even in the event of a crash? For the most part, my temps have been fine from what I see.

My GPU is in a water block so lighting is not connected.

Up to the crash.

Lightning13 · Sep 30, 2024

Okay so I have been replicating the crash by doing 3DMark Steel Nomad stress test. It consistently crashes on loop 2-4 out of 20 total that it should do.

I unplugged my GPU and used my motherboards HDMI. I performed the same Steel Nomad stress test and it went all the way through the test, very slowly of course.

I have also performed a DDU and install of the two most recent drivers and also the oldest one I was able to download, from around March 2024. I am still getting crashes after these steps.

I saw some post about going into ‘Debug Mode’ in Nvidia Control Panel, this actually let the test run a bit longer, to around loop 5-6 but still crashed.

My next steps will be to bypass the riser cable and connect my GPU directly to the motherboard. If this works, I’ll get a new riser cable. If still crashing, I’ll look into RMA’ing the card.

ET_Explorer · Sep 30, 2024

Lightning13 said:
I’ll get a new riser cable.

What is the reason for having a riser cable if you can connect the gpu directly to the motherboard?

Freezing/BSOD DPC WATCHDOG VIOLATION

Active member

Attachments

Active member

Attachments

Active member

Attachments

Active member

Sysnative Staff BSOD Kernel Dump Senior AnalystContributor

Active member

Sysnative Staff BSOD Kernel Dump Senior AnalystContributor

Active member

Administrator

Active member

Active member

Active member

Attachments

Active member

Sysnative Staff BSOD Kernel Dump Senior AnalystContributor

Moderator

Active member

Active member

Moderator

Active member

Active member

Sysnative Staff
BSOD Kernel Dump Senior Analyst
Contributor

Sysnative Staff
BSOD Kernel Dump Senior Analyst
Contributor

Sysnative Staff
BSOD Kernel Dump Senior Analyst
Contributor