8 Mar
2011
Networks are created by connecting device NICs (Network Interface Cards) together using Ethernet cables. When two NICs are connected together they must operate at the same speed and duplex. On managed devices you can usually choose fixed settings or a setting called autonegotiate for each NIC. Unmanaged devices always have their NICs set to autonegotiate.
The main cause of NIC duplex conflicts is when one NIC has fixed settings (say 100M full duplex) and its partner is set to autonegotiate. So far, that doesn't sound like a problem does it. Surely the autonegotiating NIC will simply realize its partner is 100M and full duplex and set itself to match. If only! The clue is in the name. If a NIC has fixed settings it cannot 'negotiate' with another NIC so when the autonegotiating NIC tries to 'negotiate' the other end simply doesn't respond. In this situation, an autonegotiating NIC nearly always sets itself to half duplex.
Why is that crucial to application performance?
Imagine Mary and Fred are two partner NICs. Mary, has fixed settings of 100M and full duplex so can talk and listen at the same time. Fred, ever the diplomat, is set to autonegotiate. Because her settings are fixed, Mary cannot participate in the autonegotiation process so Fred sets himself to half duplex and can therefore only be talking or listening. Sadly, both Mary and Fred operate under the assumption that their partners are set the same as themselves and that's were the trouble starts.
Fred starts talking to Mary but Mary also starts talking to Fred. As Fred beat Mary to start the conversation he cannot hear what Mary is saying and all of Mary's eloquence is wasted (i.e Fred drops all incoming packets from Mary). Mary, assuming that Fred can talk and listen at the same time, continues talking while Fred is talking - more packets dropped.
Now Fred has stopped talking and can listen to Mary but, depending how long Fred was talking for, dozens, hundreds, thousands of Mary's packets have already been dropped and as Mary operates at layer 2 there is no immediate awareness they've been dropped - that comes later. Mary is still talking and now Fred is listening, but can no longer talk, so his transmit buffer begins to overflow with conversations queuing up behind him - more packets evaporate.
By now an unknown number of packets have been dropped in both directions. TCP's job is to recover this situation, recognise packets have been dropped and retransmit them. Things go from bad to worse. First, TCP assumes that packet loss means congestion so it slows down all the flows with the lost packets - a performance killer in it's own right. Second the whole process repeats resulting in a cycle of more packet loss, retransmissions and TCP continuing the ramp down in flow rates because of the perceived but non-existant congestion.
Imagine now that Mary and Fred are the NICs between a data centre's switch and WAN router and that every conversation for the entire business passes between them. Get the picture? When Mary is talking the entire organisation's users cannot receive anything. When Fred is talking user requests are being simply dropped from the network before they get to the application servers.
That's why duplex conflicts cause unpredictable application performance problems with severities ranging from periodic & mild to complete collapse.
Duplex conflicts can be detected by inspecting the NIC counters on devices looking for incrementing error or collision counts. That's OK (if very tedious) for devices you own and manage but what about those you don't?
The best way by far, in fact the only reliable way I know, is to use PathView Cloud to test end to end network Paths. PathView let's you see the performance of your entire network from a single screen. PathView's diagnostics include tests to identify and locate NIC duplex conflicts no matter who owns the network devices in the Path.
You can read a much more technically expansive White Paper about the problem here.
Got a long term performance problem that you just can't pinpoint? Sign up to try PathView free. You'll know what's wrong and how to fix it in minutes.