Flow Control: to be or not to be?

  • We deploy two 3750-X switches in stack and connect Dell Storage to 10G-T ports. Dell recommends to use flow control on this ports, but some people have a lot of problems with this feature (packet loss, traffic blocking).

    So, is it best practise to use flow control on 10G ports?

    fcoe, iscsi, nfs or cifs???

    Did any answer help you? if so, you should accept the answer so that the question doesn't keep popping up forever, looking for an answer. Alternatively, you could provide and accept your own answer.

    Did any answer help you? if so, you should accept the answer so that the question doesn't keep popping up forever, looking for an answer. Alternatively, you could post and accept your own answer.

  • I think it's also relevant to understand directionality of pause frames and what that means.

    Essentially what sending pause frame means is 'I am congested, and I prefer that you buffer frame in your TX, rather than I buffer in my RX'

    3750-X cannot send pause frames, it can only receive them.

    This means if 3750-X buffers are in danger of being depleted (which is very easy, 3750-X has tiny buffer and is badly suited for applications where egress capacity isn't significantly more than ingress), there is nothing pause frames can do, 3750-X cannot do the desirable action and ask sender to slow down (causing Dell to buffer them).

    However if the Dell is receiving data so fast that it is in danger of being congested it can send pause frame to 3750-X and ask 3750-X to stop sending (effectively it asks 3750-X to buffer frames FOR it, so it does not have to buffer them). This, in my opinion does not make sense, I expect every storage device to have more buffers than 3750-X (<1ms per port on average), I'd expect you only increase packet loss by asking 3750-X to do your buffering, as it'll drop them sooner.

    As I see it, you can only enable pause frames to the direction where it does not even make sense for this application.

  • FCoE storage relies on the assumption of what they call lossless Ethernet... FCoE storage is also notoriously quirky about interoperability. The official answer is "yes"; enable flow control on all FCoE storage ports, but test thoroughly before putting the system into production. All that said, a 3750X doesn't support all the FCoE extensions required to do it properly, so I can only hope you're talking about IP-based storage...

    If it is any other IP-based storage technology, I would keep flow control turned off and let the upper layer protocols deal with drops... they are used to it.

    would you mind elaborating on what you have seen as quirky ?

    @IanK, driver issues like this are not uncommon, especially with Brand X CNA driver, and Brand Y FCoE switchport... where X != Y

  • There is a traditional standard for flow control (802.3x) that is a mac-layer frame that causes all traffic on the wire to pause while the signaling switch transmits its buffers. This is exactly wrong for FCoE (which responds badly to dropped frames) and is distinct from priority flow control (PFC) which is a component of Data Center Bridging (DCB).

    In contrast to 802.3x, PFC allows the traffic to be paused on a per-CoS basis. This is a key element in providing lossless forwarding, as non-protected classes of traffic can be slowed down such that there is always bandwidth available for critical traffic.

    The 3750X doesn't support DCB (..or PFC) and isn't intended as a platform for lossless Ethernet. It does support the older style of flow control.

    I'm going to assume that the storage in question is IP-based (i.e. Equallogic), in which case you should follow the storage vendor's recommendations and enable flow control end-to-end. Some have found anomalous issues with this setup and have gotten better results with flow control disabled, but I wouldn't try this unless dictated during troubleshooting.

    per-CoS pause frame is called 802.1Qbb. In this particular example, I'm not sure if it's relevant, as OP has ONLY storage device in port, so all traffic is equal. In interface which shares many traffic types, 802.3x is too big hammer and IMHO causes more problems than it solves, 802.1Qbb is less of a hammer, but only if you think about your COS classification very carefully.

License under CC-BY-SA with attribution

Content dated before 7/24/2021 11:53 AM

Tags used