Traffic outage

When using `sar` to monitor network traffic, an issue involving intermittent periods of zero reported traffic was observed. An analysis using Codex revealed that this anomaly stems from the driver's statistics-gathering path itself, rather than indicating an actual "traffic outage."

1. The logic for updating `netdev` statistics in this specific driver version contains a significant flaw. Tools such as `sar -n DEV 1` ultimately invoke the `ndo_get_stats64` handler, which maps to `ice_get_stats64()`; see `/ethernet-linux-ice-main/ethernet-linux-ice-main/src/ice_main.c:9752`. The intended design here was to first call `ice_update_vsi_ring_stats(vsi)` during the read operation—thereby calculating the current ring statistics—and then return the resulting values.

2. However, the implementation of `ice_update_vsi_ring_stats()` is defective: it accumulates the statistics into a temporary `vsi_stats` buffer but subsequently calls `kfree(vsi_stats)` directly without writing the accumulated TX/RX packet and byte counts back to the persistent `vsi->net_stats` structure. See lines `/ethernet-linux-ice-main/ethernet-linux-ice-main/src/ice_main.c:9462` through `/ethernet-linux-ice-main/ethernet-linux-ice-main/src/ice_main.c:9515`. This implies that the values ​​read by `ice_get_stats64()` are frequently *not* the "ring counts freshly aggregated during the current request," but rather stale cached data.

3. This stale cached data is primarily derived from periodic updates performed by the watchdog mechanism; specifically, the `ice_watchdog_subtask()` function updates these statistics once every `pf->serv_tmr_period` (defined as `HZ`) intervals. See `/ethernet-linux-ice-main/ethernet-linux-ice-main/src/ice_main.c:2245` and `/ethernet-linux-ice-main/ethernet-linux-ice-main/src/ice_main.c:5583`. If a sampling tool happens to perform a read precisely between two such watchdog updates, it will observe that the "current value remains unchanged." When converted into a traffic rate, this translates to a value of zero. Conversely, during the *next* sampling cycle—when the tool reads the value that now includes all the counts accumulated since the previous update—it captures a significantly larger figure. This results in the observed alternating pattern of "0 / very large value / 0 / very large value." The pattern shown in your screenshot aligns very closely with this scenario. 
4. There is also a secondary implementation issue: `ice_fetch_u64_stats_per_ring()` passes the `struct ice_q_stats` by value; consequently, `u64_stats_fetch_begin/retry` do not actually provide protection for the reading of the raw ring counters. (See:
`ethernet-linux-ice-main/ethernet-linux-ice-main/src/ice_main.c:9392`). This is more of a consistency flaw—it typically won't reliably cause the "reset-to-zero-every-other-second" behavior—but it does indicate that the quality of the current statistics-handling code is indeed problematic. 

Conclusion: What you are observing appears to be an "anomaly/lag in driver statistics updates," rather than the actual business traffic intermittently dropping to zero. The primary cause is highly likely that `ice_update_vsi_ring_stats()` fails to write the aggregated ring statistics back to `vsi->net_stats`; as a result, `sar` can only intermittently read the stale values ​​that were last refreshed by the watchdog timer. 

If you intend to fix this, I recommend prioritizing the following two points:

1. After the aggregation process within `ice_update_vsi_ring_stats()` is complete, write the values ​​from `vsi_stats->{tx_packets, tx_bytes, rx_packets, rx_bytes}` back to `vsi->net_stats`. 
2. Modify `ice_fetch_u64_stats_per_ring()` to accept a pointer—`struct ice_q_stats *stats`—and perform the dereferenced reads directly within the `u64_stats_fetch_begin/retry` loop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Traffic outage #54

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Traffic outage #54

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions