BSOD Root Cause

robdb

Member
Joined
Aug 14, 2024
Posts
9
Hello,

A member in this thread referred me to create a post here to troubleshoot BSODs. As I mentioned in that thread, I haven't experienced any BSODs since I replaced RAM. However, according to the logs, the responder pointed out that post-RAM swap crashes have occurred.

  • A brief description of your problem (but you can also include the steps you tried)
The original cause of the issues was that Windows updates took multiple hours, resulting in a forced reboot during upgrades. Subsequent system instability ensued, as Windows updates failed to install, BSODs, and DSIM and SFC failed to repair the system.
  • System Manufacturer?
HP
  • Laptop or Desktop?
Desktop
  • Exact model number (if laptop, check label on bottom)
OMEN by HP Obelisk Desktop 875-1040st CTO
  • OS ? (Windows 11, 10, 8.1, 8, 7, Vista)
Windows 10 22H2
  • x86 (32bit) or x64 (64bit)?
x64
  • What was original installed OS on system?
Windows 10 Home
  • Is the OS an OEM version (came pre-installed on system) or full retail version (YOU purchased it from retailer)?
OEM
  • Age of system? (hardware)
Built to order, received in January of 2020
  • Age of OS installation?
Uncertain
  • Have you re-installed the OS?
I'm not sure if I reinstalled the OS or cloned my drive when I moved to a new system drive.
  • CPU
i7-9700K
  • RAM (brand, EXACT model, what slots are you using?)
G.SKILL, F4-3600C16D-32GVKC, all four slots.
  • Video Card
NVIDIA RTX2070 Super
  • MotherBoard - (if NOT a laptop)
HP OEM
  • Power Supply - brand & wattage (if laptop, skip this one)
HP 750W
  • Is driver verifier enabled or disabled?
Disabled (I think - verifier /querysettings shows no flags enabled)
  • What security software are you using? (Firewall, antivirus, antimalware, antispyware, and so forth)
Built-in Windows Tools
  • Are you using proxy, vpn, ipfilters or similar software?
Windows built-in and Tailscale
  • Are you using Disk Image tools? (like daemon tools, alcohol 52% or 120%, virtual CloneDrive, roxio software)
No
  • Are you currently under/overclocking? Are there overclocking software installed on your system?
I think the HP OC tools are running a profile that isn't standard, but it's been a while since I've even looked at them.

Speccy Snapshot
 

Attachments

System Information > Problem Devices:

Realtek RTL8822BE 802.11ac PCIe Adapter PCI\VEN_10EC&DEV_B822&SUBSYS_831B103C&REV_00\00E04CFFFEB8220100 This device is disabled.
Generic SuperSpeed USB Hub
USB\VID_2109&PID_0817\5&3627C2AD&0&21 This device cannot start.
 
I have manually disabled my wireless NIC because I don't want to use it, preferring ethernet instead. The USB device didn't recover after the problematic Windows update earlier today, but I was able to reseat it and clear the error.
 
I rather suspect that it's your Samsung 980 Pro NVMe drive. The System log contains a whole bunch of WHEA Logger warnings for your 980 Pro going back to the start of the log on 28th August. Here's an example...
Code:
Log Name:      System
Source:        Microsoft-Windows-WHEA-Logger
Date:          16/10/2024 23:08:55
Event ID:      17
Task Category: None
Level:         Warning
Keywords:    
User:          LOCAL SERVICE
Computer:      VORTEX
Description:
A corrected hardware error has occurred.

Component: PCI Express Endpoint
Error Source: Advanced Error Reporting (PCI Express)

Primary Bus:Device:Function: 0x5:0x0:0x0
Secondary Bus:Device:Function: 0x0:0x0:0x0
Primary Device Name:PCI\VEN_144D&DEV_A80A&SUBSYS_A801144D&REV_00
Secondary Device Name:
Note that these are warning messages not errors, you'd have likely had a BSOD on errors. The device hardware ID of PCI\VEN_144D&DEV_A80A is your Samsung 980 Pro. It would appear that your Samsung 980 Pro isn't completely happy, I would download Samsung Magician and use that to check the health and status of the drive. Also do a full diagnostic test and check for driver updates and/or firmware updates for the drive.
 
This is an interesting find. It's my old system drive, which I replaced in 2023. I moved it to a secondary M.2 slot because I only replaced it for performance improvements. I don't use it, and there are no files on it. I have run Magician on my system since I installed it many years ago. You're correct that there have been no BSODs - and as I mentioned, I haven't had any BSODs since I replaced two of my RAM sticks.

A cursory opening of Magician does not report any health issues, and no firmware updates are available. Digging a little deeper, a full diagnostic scan reports the drive is in good condition with no errors in any LBAs. An extended SMART self-test passed.

Are there any other diagnostics I could perform to check the drive further? I'm happy to remove it, as I'm not using it.

------

Unrelated to the drive issue, would the expertise of those reviewing this information concur with the member in the other thread who suggested I look into resolving BSOD errors before repairing/restoring corrupted files? I'm wondering if I may be in a catch-22/loop scenario where random failures are being introduced by file corruption (or vice versa, for that matter).

Thank you all for the ongoing assistance.
 
If there are no files on the drive then remove it for a week or so and see whether the log messages stop. In my experience the M.2 socket is less than perfect, reseating M.2 drives often resolves niggly issues, but I would leave it out for a while first.
 
I've had the M.2 removed for a while now; however, I've since discovered that I had more bad RAM. I went through the HP hardware tests and tested each stick individually to find the bad RAM. Are there any tests or diagnostics I should rerun now?
 
There are more thorough RAM testers but if the HP hardware test is finding bad RAM then that needs to be replaced ASAP. Be aware that RAM works best in macthed sets, so I would not advise just replacing the faulty sticks but instead replace the whole RAM. Buy a pack of matched sticks, because all the internal timings will match and they will work well together. If you just replace the faulty sticks you may well run into internal timing conflicts. Mismatched RAM causes no end of problems.

Replace the RAM and then see how thgings are. If you want to stress test the system then I would suggest OCCT, it's the tool I use to check the stability of my own new builds.
 
Back
Top