BSOD on every server in the cluster

Were RDMA, Jumbo and SR-IOV enabled before the network card was replaced on any of them?
 
Does a crash happen only when Jumbo is disabled? What about RDMA and SR-IOV?
 
After disabling Jumbo I do not see any crashes. RDMA is enabled. SR-IOV had to be disabled a couple of days ago, as sometimes virtual machines lose the network. I did not find the reason and solution.
 
Apologies for some delays in my response, the family's a little sick, and I've been a little sick myself as a result. I'm mostly fine, but not fully 100% yet.

Have you tried disabling RDMA and enabling Jumbo on one of the machines? Because a few things have changed in a short period (network card and a few settings), I'm trying to pinpoint this to a specific element so we can hopefully find a better solution.
 
Hi. I hope your family and you are okay now. I visit the hospital myself.

No, we can't disable RDMA as it causes a catastrophic drop in performance in our software. I can try to schedule it for the next maintenance window. Today I went to work and found out that we have a big audit and we have to cancel all scheduled work related to stopping services.
 
Something else you could try is install the previous network card with RDMA and Jumbo frames enabled and see if that works, assuming this is possible.
 
Hi,

It's been a while, are there updates?
 
Hi, colleague.

Sorry for the long reply, I just got out of the hospital after surgery. Everything is fine now.

My colleague checked the tests while I was away. The results are as follows:

1) Enabling jumbo results in a blue screen even if rdma is off.

2) When sr-iov is on in the virtual machine settings, it results in periodic connection failure. It doesn't matter whether rdma or jumbo is on/off. You can move the VM to another node, reboot the VM - the connection appears.

3) During all this time, with jumbo off, there was not a single blue screen.

Unfortunately, we were unable to run tests on old network cards - they were sold.
 
Hi, colleague.

Sorry for the long reply, I just got out of the hospital after surgery. Everything is fine now.

My colleague checked the tests while I was away. The results are as follows:

1) Enabling jumbo results in a blue screen even if rdma is off.

2) When sr-iov is on in the virtual machine settings, it results in periodic connection failure. It doesn't matter whether rdma or jumbo is on/off. You can move the VM to another node, reboot the VM - the connection appears.

3) During all this time, with jumbo off, there was not a single blue screen.

Unfortunately, we were unable to run tests on old network cards - they were sold.
 
I'm afraid there's not much more I can offer at this point since it seems clear where the issue lies.
 
I am very grateful to you for your help. It completely solved the problem. Thank you very much for not refusing to help. God bless you and your whole family.
 

Has Sysnative Forums helped you? Please consider donating to help us support the site!

Back
Top