3.0.0

Even when running 6.2.9200 with all symbols found in local sym cache (i.e., MSDL SYM server not used as I turn wifi Internet OFF), kd speed is awful compared to 6.11 - same test.

EDIT: Windbg 6.2.9200 exhibits same basic slowness as kd
 
After the changes I mentioned in prev posts I don't see differences between versions: analysis of 98 dumps in 6.11.1.404 takes 66.908 sec vs 68.431 sec in 6.2.9200.16384.

m.g.
 
Parallelization is now working. It will be a week before I iron out all the bugs and functionality it broke, but I should have an update in by next weekend.

OK, guys, maybe there's something wrong with my installation, but basic analysis in multiple threads is _much_ faster on my system. The app (+kd.exe) spends most of the time on I/O so CPU is rather idle, even for more than 20 threads. I wrote simple test app that invokes some commands: ".reload, !analyze -v, r, kv, lmtn, lmtsmn, .bugcheck, !peb, !sysinfo machineid, !sysinfo cpuspeed, !sysinfo smbios" (every command invoked separatedly) for each of the .dmp files in directory containing the app. I utilize the PowerDbg via automation (which, in fact, uses the cdb.exe debugger), so it's quite heavy solution, but still much faster than current version of the SysnativeBSODApps. Of course, it's only debugger part of the app, but I suspect, that all other activity (downloading drivers.txt file, parsing the text files and combining into result files) is rather negligible. For example, analysis of 90 sample minidumps in 20 threads takes about 6 minutes, which in SysnativeBSODApps takes more than 100 minutes.

m.g.
 
2.11.4 now has the parallel threading built in as an advanced option: Options->ParallelThreading->(ChangeNumberThreads). Let me know if there is anything further I can do to speed up the debugging process.

Sysnative BSOD Processing Apps 2.11.4


Just as an FYI to those who use the parallel threading: cdb.exe uses less of the processor than kd.exe, so if you do not want the apps to bog down your system during parallel threading, cdb.exe is the way to go.

Thanks to mgrzeg for the inspiration to do parallel threading, and also for the info about cdb.exe
 
Last edited:
Thanks for this version! :) I didn't think about current, 2.x builds, rather 3.0.0 release, for few months, so you are faster, than I could imagine! :)
Full analysis of 98 .dmps (the same set I used earlier): "742.911 seconds to run the apps", after manually disabling _NT_SYMBOL_PATH, which is much beter than before :). Few ideas:
1. Don't wait for all 20 dumps to run next 20, you can use semaphore or other sync obj to see which are finished and then start next (I'm using thread pool with n=20 ManualResetEvents or less if there are less .dmp files - see the code). As I said before - timings differ between dumps.
2. I use _NT_SYMBOL_PATH for years to feed many tools: debugging tools from MS, sysinternals (procmon, procexp, vmmap), disasm (IDA, PEBrowsePro), Perf Tools (xperf, xperfview, wpa), to name just a few. And every tool needs it's own treatment. Consider the analysis path I wrote about, so you can get close to my results.
3. Consider using additional low-priority threads for 'drivers.txt' download and post-processing of .dmp analysis results (I think you use one thread for all, but I may be mistaken).

m.g.
 
You're welcome! :-}

1. Yes, I am aware that there are faster methods to do this. This was just a first iteration to see if it could be done and what the results would be. I will work on more later. There are a few more pressing concerns that would be considered big bugs to fix before looking into increasing parallel optimization.

I also find that 10 threads works faster than 20 on my system, so it may vary by system what the faster method is.


2. Path doesn't matter with parallel threading. I found that the apps run about as fast with online symbols as local with parallel threading.


3. Not possible to do multiple threading for driver parsing since it is needed in a sequential set of steps to setup the output files, but thanks for the idea anyway. The one place I could speed this up is for 3rd party driver checking, but unless you are running 600+ .dmps, that step takes less than 0.1 seconds anyway.
 
2. Path doesn't matter with parallel threading. I found that the apps run about as fast with online symbols as local with parallel threading.

I was talking about the path of analysis:
1. clear the _NT_SYMBOL_PATH;
2. run kd.exe (without _NT_SYMBOL_PATH) and:
2.1 .sympath srv*c:\path_to_symbols*http://msdl.microsoft.com/downloads/symbols
2.2 .reload
2.3 .sympath srv*c:\path_to_symbols <- this is crucial!!! Symbols are already downloaded, so no more online checks and symsrv loads only local symbols.
2.4 !analyze -v; !sysinfo cpuspeed; !sysinfo SMBIOS; lmtsmn; q;

Try to do this directly from WinDbg.exe, even with latest version of dbg tools -> the longest lmtsmn takes miliseconds, the same as other commands :)

m.g.
 
Thanks!!!

jcgriff2 and I had a conversation about this the other day. He found that if he removed his internet connection, the debugging tools ran much faster even using local symbols. I wonder if srv*C:\symbols forces the debugging tools not to check online at all even with local symbols set up as the path.

Appreciate you laying out the steps so specifically. I will take a look at that over the coming weekend. :-}

2. Path doesn't matter with parallel threading. I found that the apps run about as fast with online symbols as local with parallel threading.

I was talking about the path of analysis:
1. clear the _NT_SYMBOL_PATH;
2. run kd.exe (without _NT_SYMBOL_PATH) and:
2.1 .sympath srv*c:\path_to_symbols*http://msdl.microsoft.com/downloads/symbols
2.2 .reload
2.3 .sympath srv*c:\path_to_symbols <- this is crucial!!! Symbols are already downloaded, so no more online checks and symsrv loads only local symbols.
2.4 !analyze -v; !sysinfo cpuspeed; !sysinfo SMBIOS; lmtsmn; q;

Try to do this directly from WinDbg.exe, even with latest version of dbg tools -> the longest lmtsmn takes miliseconds, the same as other commands :)

m.g.
 
2.12.0 - saved another 100 secs for 98 .dmps (20 threads running, time to run = 628.54 seconds)! :) For 10 threads it was more than 700 secs.

m.g.
 
I can now run with 20 threads on my system with local symbols: 90 .dmps in 70 seconds, 70 .dmps in 55 seconds, and 10 .dmps in a little less than 10 seconds. Amazing!!!

First run against 90 .dmps: 97.462 sec
Second run: 63.426 sec

First run with your 90 .dmps was: 70 seconds
second run w/ your 90 .dmps was: 53 seconds.

70.561 seconds to runDmps()
53.829 seconds to runDmps()

:-}


One problem I am running into with kd.exe: user commands using !niemiro slow the system to a crawl and actually cause jerky mouse behavior. cdb.exe does not cause the same problem. It seems to be due to the fast I/O processing since the !niemiro commands do not take as long as the default commands, so the output is being saved and loaded at a faster pace. cdb.exe definitely seems like the way to go for parallel threading, though.
 
Last edited:
Which would you recommend using, cdb.exe or kd.exe for general dump processing with Parallel Threading enabled? Which one is faster? Do they both do the same job?
 
cdb.exe seems to work better on my system for parallel threading, and both give the same output. cdb.exe is the console debugger for the Windows debugging tools. The only real difference between it and kd.exe is cdb.exe seems to be designed more for live user/process debugging to find problems with the system it is running on. It also has the ability to analyze multiple threads whereas kd.exe does not, which may be why cdb.exe also runs better with multi-threading enabled.

 
Last edited:
I can now run with 20 threads on my system with local symbols: 90 .dmps in 70 seconds, 70 .dmps in 55 seconds, and 10 .dmps in a little less than 10 seconds. Amazing!!!

This is great! :)
Are you talking about latest, 2.13.3 version? I tried different setups and my best result is "318.286 seconds to runDmps()" for 90 .dmps.

m.g.
 
I can now run with 20 threads on my system with local symbols: 90 .dmps in 70 seconds, 70 .dmps in 55 seconds, and 10 .dmps in a little less than 10 seconds. Amazing!!!

First run against 90 .dmps: 97.462 sec
Second run: 63.426 sec

First run with your 90 .dmps was: 70 seconds
second run w/ your 90 .dmps was: 53 seconds.

70.561 seconds to runDmps()
53.829 seconds to runDmps()

:-}


One problem I am running into with kd.exe: user commands using !niemiro slow the system to a crawl and actually cause jerky mouse behavior. cdb.exe does not cause the same problem. It seems to be due to the fast I/O processing since the !niemiro commands do not take as long as the default commands, so the output is being saved and loaded at a faster pace. cdb.exe definitely seems like the way to go for parallel threading, though.

Yikes, how much load does that give you? If your computer is average, and you don't have some 6-8 core Xenon or Intel processor with a 4.0+Ghz rate, then it should be fine for others, but keep in mind that not everybody has the same hardware as you lol.

Speed isn't always everything with threads keep in mind. Although that's some cool information in those links that I will thank you for :)
 
Yikes, how much load does that give you? If your computer is average, and you don't have some 6-8 core Xenon or Intel processor with a 4.0+Ghz rate, then it should be fine for others, but keep in mind that not everybody has the same hardware as you lol.

Speed isn't always everything with threads keep in mind. Although that's some cool information in those links that I will thank you for :)

It's actually not much load. 20-30% most of the time on an i7 2.4 GHz processor that boosts to 3.4 GHz during the run of the .dmps on this system. It does get up to 80-90% and my fans start whirring a bit, but my PhD simulation causes more heat than the apps do. :-}

Also, the apps have the option to choose how many threads to run on, so users with older hardware can experiment to find out what their hardware can handle.
 
Yikes, how much load does that give you? If your computer is average, and you don't have some 6-8 core Xenon or Intel processor with a 4.0+Ghz rate, then it should be fine for others, but keep in mind that not everybody has the same hardware as you lol.

Speed isn't always everything with threads keep in mind. Although that's some cool information in those links that I will thank you for :)

It's actually not much load. 20-30% most of the time on an i7 2.4 GHz processor that boosts to 3.4 GHz during the run of the .dmps on this system. It does get up to 80-90% and my fans start whirring a bit, but my PhD simulation causes more heat than the apps do. :-}

Also, the apps have the option to choose how many threads to run on, so users with older hardware can experiment to find out what their hardware can handle.

Also, the apps have the option to choose how many threads to run on

Some people would look at that as a crazy option to give the end-user unless it was a program designed for programmers that understand the workings on that kind of level. With BSOD analysts, i'm sure they have an idea of threads however. Usually you wouldn't do that though.

80-90% is a lot of load though. Is this still with KD? or CDB, NTSD?
 
Last edited:
This is with cdb.exe and with kd.exe; both have similar performance in terms of processor usage, but cdb.exe handles threading better and hangs the system less.

As to the option for threading, it is not enabled by default and was added as an advanced option for users to change if desired.
 
Hmm, I'm bookmarking this thread, I'm going to have to look into this and the difference between what KD does and CDB.
 
Wow, the apps are fast with Parallel Threading!!!

Using cdb.exe (6.11), I ran 136 dumps in 59 SECONDS!!!! WOW. :thud:I had 20 threads enabled, local symbols, no user commands and CPU was running at 100% throughout processing.

Using cdb.exe (6.2.9200), the same 136 dumps ran in a much less spectacular 209 seconds. Not sure why that was so slow....

This stresses my computer more than Prime95! :lol: But it is great to be able to process that many dumps so fast
 
Last edited:
Wow, the apps are fast with Parallel Threading!!!

Using cdb.exe (6.11), I ran 136 dumps in 59 SECONDS!!!! WOW. :thud:I had 20 threads enabled, local symbols, no user commands and CPU was running at 100% throughout processing.

Using cdb.exe (6.2.9200), the same 136 dumps ran in a much less spectacular 209 seconds. Not sure why that was so slow....

This stresses my computer more than Prime95! :lol: But it is great to be able to process that many dumps so fast

> 136 / ~ 20 threads = (6.8 sequential)
> 59 seconds / 6.8
= ~8.67...

Roughly 8.67 (.68 rounded) seconds to run a single dump then? How many dumps simultaneously?

Your specs show that you've got an i7 as well, x64 bit OS, with 6GB RAM (physical?).
 

Has Sysnative Forums helped you? Please consider donating to help us support the site!

Back
Top