[Enhancement] Add independent control for UDP GSO and GRO (--gso, --gro)
Problem Statement
Currently, iperf3 supports UDP Generic Segmentation Offload (GSO) and Generic Receive Offload (GRO) via a single combined flag: --gsro. While this is convenient, it forces both offloads to be active simultaneously.
Testing shows that while GSO (sender-side) is almost universally beneficial, GRO (receiver-side) can cause significant CPU saturation on the receiver depending on the hardware and kernel version. Currently, there is no way to benefit from GSO without incurring the receiver-side overhead of GRO.
Proposed Solution
I propose splitting the offload controls into two independent flags while maintaining backward compatibility:
--gso: Enables only UDP_SEGMENT on the sender.
--gro: Enables only UDP_GRO on the receiver.
--gsro: Remains as a legacy/convenience flag that enables both.
Technical Motivation & Test Evidence
Testing across different Network Interface Cards (NICs) reveals that a "one-size-fits-all" approach to offloading is suboptimal.
Test Environments
- Env 1: Intel X552 (10GbE) | Ubuntu 22.04 (Kernel 6.8)
- Env 2: Broadcom BCM57800 (10GbE) | Ubuntu 22.04 (Kernel 5.15)
- Env 3: Mellanox ConnectX-5 (25GbE+) | Ubuntu 24.04 (Kernel 6.17)
Comparative Data
| Setup |
Mode |
Sender CPU |
Receiver CPU |
Notes |
| Intel |
Baseline |
78.3% |
43.1% |
|
| Intel |
--gsro |
32.1% |
100.0% |
Receiver saturated by GRO |
| Intel |
--gso |
32.0% |
41.4% |
Optimal efficiency |
| Broadcom |
Baseline |
99.9% |
100.0% |
9.92 Gbps |
| Broadcom |
--gsro |
70.6% |
99.5% |
9.92 Gbps |
| Broadcom |
--gso |
72.1% |
98.6% |
9.92 Gbps (Stable) |
| Mellanox |
Baseline |
99.8% |
76.8% |
Bitrate stuck at 18.7G |
| Mellanox |
--gsro |
66.0% |
99.8% |
Bitrate reached 25G |
| Mellanox |
--gso |
65.7% |
96.5% |
Bitrate reached 25G |
Observations & Hardware Variance
- The Intel Bottleneck: On the X552, GRO is extremely "expensive," saturating the receiver CPU. Splitting the flags allows us to keep the 46% CPU saving on the sender (via GSO) without breaking the receiver.
- The Broadcom Stability: Unlike the Intel card, the Broadcom BCM57800 shows similar behavior whether using
--gsro or --gso. While it doesn't suffer as much from GRO overhead, providing independent flags ensures consistent behavior across different testing toolsets.
- The Mellanox Throughput: On high-speed ConnectX-5 cards, GSO is the difference between hitting line rate (25G) or being CPU-bound at 18G. As with the Intel tests, using only GSO keeps the receiver from hitting the 100% ceiling.
Conclusion: Because different NIC models (Intel vs. Broadcom vs. Mellanox) handle offloads differently, users need granular control to avoid artificial bottlenecks during performance validation.
Observations
- GSO Efficiency: In all environments, GSO drastically reduced sender CPU load or allowed for higher bitrates by offloading segmentation to the NIC.
- GRO Bottleneck: In Environment 1 and 3, GRO caused the receiver CPU to hit 100% saturation. Being able to disable GRO while keeping GSO active allows for higher performance tests without receiver-side bottlenecks.
Implementation Details
The changes involve:
- Updating
iperf_api.h/c to include the new boolean flags in the test settings.
- Modifying the logic in the UDP stream handlers to check for
--gso or --gro specifically.
- Updating documentation in
iperf3.1 (man page).
Use Cases
- Precise performance tuning for high-speed (10G/40G/100G) networks.
- Isolating kernel stack vs. hardware offload issues during debugging.
- Supporting environments where only the sender hardware supports offloading.
[Enhancement] Add independent control for UDP GSO and GRO (--gso, --gro)
Problem Statement
Currently,
iperf3supports UDP Generic Segmentation Offload (GSO) and Generic Receive Offload (GRO) via a single combined flag:--gsro. While this is convenient, it forces both offloads to be active simultaneously.Testing shows that while GSO (sender-side) is almost universally beneficial, GRO (receiver-side) can cause significant CPU saturation on the receiver depending on the hardware and kernel version. Currently, there is no way to benefit from GSO without incurring the receiver-side overhead of GRO.
Proposed Solution
I propose splitting the offload controls into two independent flags while maintaining backward compatibility:
--gso: Enables onlyUDP_SEGMENTon the sender.--gro: Enables onlyUDP_GROon the receiver.--gsro: Remains as a legacy/convenience flag that enables both.Technical Motivation & Test Evidence
Testing across different Network Interface Cards (NICs) reveals that a "one-size-fits-all" approach to offloading is suboptimal.
Test Environments
Comparative Data
--gsro--gso--gsro--gso--gsro--gsoObservations & Hardware Variance
--gsroor--gso. While it doesn't suffer as much from GRO overhead, providing independent flags ensures consistent behavior across different testing toolsets.Conclusion: Because different NIC models (Intel vs. Broadcom vs. Mellanox) handle offloads differently, users need granular control to avoid artificial bottlenecks during performance validation.
Observations
Implementation Details
The changes involve:
iperf_api.h/cto include the new boolean flags in the test settings.--gsoor--grospecifically.iperf3.1(man page).Use Cases