Skip to content

ip_loadbalance_frr_test.sh fails sporadically (100% packet loss) when using real hardware ports #463

@rsafrono

Description

@rsafrono

Description

Platform

NUMA 0
======

Memory: 125GB
2MB hugepages: 0
1GB hugepages: 8

CPUs
----

Model name:                         Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz
Cores IDs:
0,48	6,54	12,60	18,66	24,72	30,78	36,84	42,90
2,50	8,56	14,62	20,68	26,74	32,80	38,86	44,92
4,52	10,58	16,64	22,70	28,76	34,82	40,88	46,94

NICs
----

SLOT          DRIVER    IFNAME    MAC                LINK/STATE  SPEED   DEVICE
0000:18:00.0  tg3       eno1      34:73:5a:9d:c1:5c  1/up        1Gb/s   NetXtreme BCM5720 Gigabit Ethernet PCIe
0000:18:00.1  tg3       eno2      34:73:5a:9d:c1:5d  0/down      -       NetXtreme BCM5720 Gigabit Ethernet PCIe
0000:19:00.0  tg3       eno3      34:73:5a:9d:c1:5e  0/down      -       NetXtreme BCM5720 Gigabit Ethernet PCIe
0000:19:00.1  tg3       eno4      34:73:5a:9d:c1:5f  0/down      -       NetXtreme BCM5720 Gigabit Ethernet PCIe
0000:3b:00.0  i40e      ens1f0    6c:fe:54:0a:af:b0  1/up        10Gb/s  Ethernet Controller X710 for 10GbE SFP+
0000:3b:00.1  i40e      ens1f1    6c:fe:54:0a:af:b1  1/up        10Gb/s  Ethernet Controller X710 for 10GbE SFP+
0000:3b:00.2  i40e      ens1f2    6c:fe:54:0a:af:b2  1/up        10Gb/s  Ethernet Controller X710 for 10GbE SFP+
0000:3b:00.3  i40e      ens1f3    6c:fe:54:0a:af:b3  1/up        10Gb/s  Ethernet Controller X710 for 10GbE SFP+
0000:3b:02.0  vfio-pci  -         -                  -/-         -       Ethernet Virtual Function 700 Series
0000:3b:02.1  iavf      ens1f0v1  32:9b:fa:83:d2:cc  1/up        10Gb/s  Ethernet Virtual Function 700 Series
0000:3b:06.0  vfio-pci  -         -                  -/-         -       Ethernet Virtual Function 700 Series
0000:3b:06.1  iavf      ens1f1v1  7a:8e:ac:a0:a7:16  1/up        10Gb/s  Ethernet Virtual Function 700 Series
0000:3b:0a.0  vfio-pci  -         -                  -/-         -       Ethernet Virtual Function 700 Series
0000:3b:0a.1  iavf      ens1f2v1  fe:6a:c6:fc:4a:33  1/up        10Gb/s  Ethernet Virtual Function 700 Series
0000:3b:0e.0  vfio-pci  -         -                  -/-         -       Ethernet Virtual Function 700 Series
0000:3b:0e.1  iavf      ens1f3v1  92:d3:a6:40:bd:b4  1/up        10Gb/s  Ethernet Virtual Function 700 Series
ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 34:73:5a:9d:c1:5c brd ff:ff:ff:ff:ff:ff
    altname enp24s0f0
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 34:73:5a:9d:c1:5d brd ff:ff:ff:ff:ff:ff
    altname enp24s0f1
4: eno3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 34:73:5a:9d:c1:5e brd ff:ff:ff:ff:ff:ff
    altname enp25s0f0
5: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 6c:fe:54:0a:af:b0 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 42:26:a7:61:c4:46 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust on
    vf 1     link/ether 32:9b:fa:83:d2:cc brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust on
    altname enp59s0f0
6: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 34:73:5a:9d:c1:5f brd ff:ff:ff:ff:ff:ff
    altname enp25s0f1
7: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 6c:fe:54:0a:af:b1 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether f6:28:f8:0a:b3:7f brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust on
    vf 1     link/ether 7a:8e:ac:a0:a7:16 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust on
    altname enp59s0f1
8: ens1f2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 6c:fe:54:0a:af:b2 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether a6:1f:92:fd:89:da brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust on
    vf 1     link/ether fe:6a:c6:fc:4a:33 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust on
    altname enp59s0f2
9: ens1f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 6c:fe:54:0a:af:b3 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 86:c4:af:d9:c9:72 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust on
    vf 1     link/ether 92:d3:a6:40:bd:b4 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust on
    altname enp59s0f3
10: ens1f0v1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 32:9b:fa:83:d2:cc brd ff:ff:ff:ff:ff:ff
    altname enp59s0f0v1
11: ens1f1v1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 7a:8e:ac:a0:a7:16 brd ff:ff:ff:ff:ff:ff
    altname enp59s0f1v1
12: ens1f2v1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether fe:6a:c6:fc:4a:33 brd ff:ff:ff:ff:ff:ff
    altname enp59s0f2v1
13: ens1f3v1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 92:d3:a6:40:bd:b4 brd ff:ff:ff:ff:ff:ff
    altname enp59s0f3v1
14: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0

Symptoms

+ ip -n n1 route add default via 172.16.2.1
+ ip netns exec n0 ping -i0.01 -c3 -n 192.0.0.2
PING 192.0.0.2 (192.0.0.2) 56(84) bytes of data.
NOTICE: GROUT: [rx p0] 32:9b:fa:83:d2:cc > 42:26:a7:61:c4:46 / IP 172.16.0.2 > 192.0.0.2 ttl=64 proto=ICMP(1) / ICMP echo request id=41431 seq=1, (pkt_len=98)
NOTICE: GROUT: [tx p1] f6:28:f8:0a:b3:7f > 7a:8e:ac:a0:a7:16 / IP 172.16.0.2 > 192.0.0.2 ttl=63 proto=ICMP(1) / ICMP echo request id=41431 seq=1, (pkt_len=98)
NOTICE: GROUT: [rx p0] 32:9b:fa:83:d2:cc > 42:26:a7:61:c4:46 / IP 172.16.0.2 > 192.0.0.2 ttl=64 proto=ICMP(1) / ICMP echo request id=41431 seq=2, (pkt_len=98)
NOTICE: GROUT: [tx p1] f6:28:f8:0a:b3:7f > 7a:8e:ac:a0:a7:16 / IP 172.16.0.2 > 192.0.0.2 ttl=63 proto=ICMP(1) / ICMP echo request id=41431 seq=2, (pkt_len=98)
NOTICE: GROUT: [rx p0] 32:9b:fa:83:d2:cc > 42:26:a7:61:c4:46 / IP 172.16.0.2 > 192.0.0.2 ttl=64 proto=ICMP(1) / ICMP echo request id=41431 seq=3, (pkt_len=98)
NOTICE: GROUT: [tx p1] f6:28:f8:0a:b3:7f > 7a:8e:ac:a0:a7:16 / IP 172.16.0.2 > 192.0.0.2 ttl=63 proto=ICMP(1) / ICMP echo request id=41431 seq=3, (pkt_len=98)
NOTICE: GROUT: [cp rx p0] 86:c5:64:e2:68:4a > 33:33:00:00:00:02 / IPv6 fe80::4026:a7ff:fe61:c446 > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=86:c5:64:e2:68:4a, (pkt_len=70)
NOTICE: GROUT: [tx p0] 86:c5:64:e2:68:4a > 33:33:00:00:00:02 / IPv6 fe80::4026:a7ff:fe61:c446 > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=86:c5:64:e2:68:4a, (pkt_len=70)
NOTICE: GROUT: [rx p2] fe:6a:c6:fc:4a:33 > 33:33:00:00:00:16 / IPv6 :: > ff02::16 ttl=1 proto=HOPOPT(0) / ICMPv6 type=143 code=0, (pkt_len=90)
NOTICE: GROUT: [rx p0] fe:6a:c6:fc:4a:33 > 33:33:00:00:00:16 / IPv6 :: > ff02::16 ttl=1 proto=HOPOPT(0) / ICMPv6 type=143 code=0, (pkt_len=90)
NOTICE: GROUT: [drop icmp6_input_unsupported] 2 packets
NOTICE: GROUT: [rx p2] fe:6a:c6:fc:4a:33 > 33:33:00:00:00:16 / IPv6 fe80::fc6a:c6ff:fefc:4a33 > ff02::16 ttl=1 proto=HOPOPT(0) / ICMPv6 type=143 code=0, (pkt_len=90)
NOTICE: GROUT: [rx p0] fe:6a:c6:fc:4a:33 > 33:33:00:00:00:16 / IPv6 fe80::fc6a:c6ff:fefc:4a33 > ff02::16 ttl=1 proto=HOPOPT(0) / ICMPv6 type=143 code=0, (pkt_len=90)
NOTICE: GROUT: [drop icmp6_input_unsupported] 2 packets
NOTICE: GROUT: [rx p0] fe:6a:c6:fc:4a:33 > 33:33:00:00:00:02 / IPv6 fe80::fc6a:c6ff:fefc:4a33 > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=fe:6a:c6:fc:4a:33, (pkt_len=70)
NOTICE: GROUT: [rx p2] fe:6a:c6:fc:4a:33 > 33:33:00:00:00:02 / IPv6 fe80::fc6a:c6ff:fefc:4a33 > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=fe:6a:c6:fc:4a:33, (pkt_len=70)
NOTICE: GROUT: [tx p0] 42:26:a7:61:c4:46 > 33:33:00:00:00:01 / IPv6 fe80::4026:a7ff:fe61:c446 > ff02::1 ttl=255 proto=ICMPv6(58) / ICMPv6 router advert / Option src_lladdr=42:26:a7:61:c4:46, (pkt_len=78)
NOTICE: GROUT: [tx p2] a6:1f:92:fd:89:da > 33:33:00:00:00:01 / IPv6 fe80::a41f:92ff:fefd:89da > ff02::1 ttl=255 proto=ICMPv6(58) / ICMPv6 router advert / Option src_lladdr=a6:1f:92:fd:89:da, (pkt_len=78)
NOTICE: GROUT: [rx p2] fe:6a:c6:fc:4a:33 > 33:33:00:00:00:16 / IPv6 fe80::fc6a:c6ff:fefc:4a33 > ff02::16 ttl=1 proto=HOPOPT(0) / ICMPv6 type=143 code=0, (pkt_len=90)
NOTICE: GROUT: [rx p0] fe:6a:c6:fc:4a:33 > 33:33:00:00:00:16 / IPv6 fe80::fc6a:c6ff:fefc:4a33 > ff02::16 ttl=1 proto=HOPOPT(0) / ICMPv6 type=143 code=0, (pkt_len=90)
NOTICE: GROUT: [drop icmp6_input_unsupported] 2 packets
NOTICE: GROUT: [rx p0] 32:9b:fa:83:d2:cc > 33:33:00:00:00:16 / IPv6 fe80::309b:faff:fe83:d2cc > ff02::16 ttl=1 proto=HOPOPT(0) / ICMPv6 type=143 code=0, (pkt_len=90)
NOTICE: GROUT: [drop icmp6_input_unsupported] 1 packets
NOTICE: GROUT: [rx p2] 32:9b:fa:83:d2:cc > 33:33:00:00:00:16 / IPv6 fe80::309b:faff:fe83:d2cc > ff02::16 ttl=1 proto=HOPOPT(0) / ICMPv6 type=143 code=0, (pkt_len=90)
NOTICE: GROUT: [rx p2] 32:9b:fa:83:d2:cc > 33:33:00:00:00:02 / IPv6 fe80::309b:faff:fe83:d2cc > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=32:9b:fa:83:d2:cc, (pkt_len=70)
NOTICE: GROUT: [drop icmp6_input_unsupported] 1 packets
NOTICE: GROUT: [rx p0] 32:9b:fa:83:d2:cc > 33:33:00:00:00:02 / IPv6 fe80::309b:faff:fe83:d2cc > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=32:9b:fa:83:d2:cc, (pkt_len=70)
NOTICE: GROUT: [tx p2] a6:1f:92:fd:89:da > 33:33:00:00:00:01 / IPv6 fe80::a41f:92ff:fefd:89da > ff02::1 ttl=255 proto=ICMPv6(58) / ICMPv6 router advert / Option src_lladdr=a6:1f:92:fd:89:da, (pkt_len=78)
NOTICE: GROUT: [tx p0] 42:26:a7:61:c4:46 > 33:33:00:00:00:01 / IPv6 fe80::4026:a7ff:fe61:c446 > ff02::1 ttl=255 proto=ICMPv6(58) / ICMPv6 router advert / Option src_lladdr=42:26:a7:61:c4:46, (pkt_len=78)
NOTICE: GROUT: [rx p1] 7a:8e:ac:a0:a7:16 > 33:33:00:00:00:16 / IPv6 fe80::788e:acff:fea0:a716 > ff02::16 ttl=1 proto=HOPOPT(0) / ICMPv6 type=143 code=0, (pkt_len=90)
NOTICE: GROUT: [drop icmp6_input_unsupported] 1 packets
NOTICE: GROUT: [cp rx p1] 7a:54:02:9a:08:49 > 33:33:00:00:00:02 / IPv6 fe80::f428:f8ff:fe0a:b37f > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=7a:54:02:9a:08:49, (pkt_len=70)
NOTICE: GROUT: [rx p1] 7a:8e:ac:a0:a7:16 > 33:33:00:00:00:02 / IPv6 fe80::788e:acff:fea0:a716 > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=7a:8e:ac:a0:a7:16, (pkt_len=70)
NOTICE: GROUT: [tx p1] 7a:54:02:9a:08:49 > 33:33:00:00:00:02 / IPv6 fe80::f428:f8ff:fe0a:b37f > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=7a:54:02:9a:08:49, (pkt_len=70)
NOTICE: GROUT: [tx p1] f6:28:f8:0a:b3:7f > 33:33:00:00:00:01 / IPv6 fe80::f428:f8ff:fe0a:b37f > ff02::1 ttl=255 proto=ICMPv6(58) / ICMPv6 router advert / Option src_lladdr=f6:28:f8:0a:b3:7f, (pkt_len=78)
NOTICE: GROUT: [rx p1] 7a:8e:ac:a0:a7:16 > 33:33:00:00:00:16 / IPv6 fe80::788e:acff:fea0:a716 > ff02::16 ttl=1 proto=HOPOPT(0) / ICMPv6 type=143 code=0, (pkt_len=90)
NOTICE: GROUT: [drop icmp6_input_unsupported] 1 packets
NOTICE: GROUT: [rx p0] 32:9b:fa:83:d2:cc > 33:33:00:00:00:16 / IPv6 fe80::309b:faff:fe83:d2cc > ff02::16 ttl=1 proto=HOPOPT(0) / ICMPv6 type=143 code=0, (pkt_len=90)
NOTICE: GROUT: [drop icmp6_input_unsupported] 1 packets
NOTICE: GROUT: [rx p2] 32:9b:fa:83:d2:cc > 33:33:00:00:00:16 / IPv6 fe80::309b:faff:fe83:d2cc > ff02::16 ttl=1 proto=HOPOPT(0) / ICMPv6 type=143 code=0, (pkt_len=90)
NOTICE: GROUT: [drop icmp6_input_unsupported] 1 packets
NOTICE: GROUT: [cp rx p2] b2:37:c5:7f:ce:e4 > 33:33:00:00:00:02 / IPv6 fe80::a41f:92ff:fefd:89da > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=b2:37:c5:7f:ce:e4, (pkt_len=70)
NOTICE: GROUT: [tx p2] b2:37:c5:7f:ce:e4 > 33:33:00:00:00:02 / IPv6 fe80::a41f:92ff:fefd:89da > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=b2:37:c5:7f:ce:e4, (pkt_len=70)
NOTICE: GROUT: [rx p1] 7a:8e:ac:a0:a7:16 > f6:28:f8:0a:b3:7f / ARP request who has 172.16.1.1? tell 172.16.1.2, (pkt_len=60)
NOTICE: GROUT: [tx p1] f6:28:f8:0a:b3:7f > 7a:8e:ac:a0:a7:16 / ARP reply 172.16.1.1 is at f6:28:f8:0a:b3:7f, (pkt_len=42)
NOTICE: GROUT: [rx p1] 92:d3:a6:40:bd:b4 > ff:ff:ff:ff:ff:ff / IP 0.0.0.0 > 255.255.255.255 ttl=64 proto=UDP(17) / UDP 68 > 67, (pkt_len=328)
NOTICE: GROUT: [cp rx p0] 86:c5:64:e2:68:4a > 33:33:00:00:00:02 / IPv6 fe80::4026:a7ff:fe61:c446 > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=86:c5:64:e2:68:4a, (pkt_len=70)
NOTICE: GROUT: [tx p0] 86:c5:64:e2:68:4a > 33:33:00:00:00:02 / IPv6 fe80::4026:a7ff:fe61:c446 > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=86:c5:64:e2:68:4a, (pkt_len=70)
NOTICE: GROUT: [cp rx p1] 7a:54:02:9a:08:49 > 33:33:00:00:00:02 / IPv6 fe80::f428:f8ff:fe0a:b37f > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=7a:54:02:9a:08:49, (pkt_len=70)
NOTICE: GROUT: [tx p1] 7a:54:02:9a:08:49 > 33:33:00:00:00:02 / IPv6 fe80::f428:f8ff:fe0a:b37f > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=7a:54:02:9a:08:49, (pkt_len=70)
NOTICE: GROUT: [cp rx p2] b2:37:c5:7f:ce:e4 > 33:33:00:00:00:02 / IPv6 fe80::a41f:92ff:fefd:89da > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=b2:37:c5:7f:ce:e4, (pkt_len=70)
NOTICE: GROUT: [tx p2] b2:37:c5:7f:ce:e4 > 33:33:00:00:00:02 / IPv6 fe80::a41f:92ff:fefd:89da > ff02::2 ttl=255 proto=ICMPv6(58) / ICMPv6 router solicit / Option src_lladdr=b2:37:c5:7f:ce:e4, (pkt_len=70)

--- 192.0.0.2 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 22ms

How to reproduce

NET_INTERFACES='ens1f0v1 ens1f1v1 ens1f2v1 ens1f3v1' VFIO_PCI_PORTS='0000:3b:02.0 0000:3b:06.0 0000:3b:0a.0 0000:3b:0e.0' smoke/ip_loadbalance_frr_test.sh build/

Note: the issue does not happen in all cases, sometimes the test passes well, so probably there is some race condition.

Version

grout v0.14.2-3-ge2f2dc78 (DPDK 25.11.0)

Metadata

Metadata

Assignees

No one assigned

    Labels

    triageNeeds review and proper triage

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions