User Tools

Site Tools


documentation:examples:forwarding_performance_lab_of_a_pc_engines_apu

Forwarding performance lab of a PC Engines APU

Hardware detail

This lab will test a PC Engines APU 1 (dmesg):

Forwarding performance of APU version 2 is here.

Lab set-up

For more information about full setup of this lab: Setting up a forwarding performance benchmark lab (switch configuration, etc.).

Diagram

 +------------------------------------------+      +-----------------------+
 |              Device under Test           |      |       Packet gen      |
 |                                          |      |                       |
 |                     re1: 198.18.0.207/24 |<=====| igb2: 198.18.0.203/24 |
 |                           2001:2::207/64 |      |        2001:2::203/64 |
 |                        00:0d:b9:3c:dd:3d |      |     00:1b:21:c4:95:7a |
 |                                          |      |                       |
 |                     re2: 198.19.0.207/24 |=====>| igb3: 198.19.0.203/24 |
 |                      2001:2:0:8000::8/64 |      | 2001:2:0:8000::203/64 |
 |                        00:0d:b9:3c:dd:3e |      |     00:1b:21:c4:95:7b |
 |                                          |      |                       |
 |               static routes              |      |                       |
 |      198.19.0.0/16 => 198.19.0.203       |      +-----------------------+
 |      198.18.0.0/16 => 198.18.0.203       |
 | 2001:2::/49        => 2001:2::203        |
 | 2001:2:0:8000::/49 => 2001:2:0:8000::203 |
 |                                          |
 |        static arp and ndp                |
 | 198.18.0.203        => 00:1b:21:c4:95:7a |
 | 2001:2::203                              |
 |                                          |
 | 198.19.0.203        => 00:1b:21:c4:95:7b |
 | 2001:2:0:8000::203                       |
 |                                          |
 +------------------------------------------+

The generator MUST generate lot's of IP flows (multiple source/destination IP addresses and/or UDP src/dst port) and minimum packet size (for generating maximum packet rate) with one of these commands:

Multiple source/destination IP addresses (don't forget to precise port to use for avoiding to use port number 0 filtered by pf):

pkt-gen -U -i igb3 -f tx -n 80000000 -l 60 -d 198.19.10.1:2000-198.19.10.20 -D 00:0d:b9:3c:dd:3e -s 198.18.10.1:2000-198.18.10.100 -w 4

Receiver will use these commands:

pkt-gen -i igb2 -f rx -w 4

Basic configuration

Disabling Ethernet flow-control

re(4) drivers didn't seems to support flow-control and the switch confirms this behavior:

switch#sh int Gi1/0/16 flowcontrol
Port       Send FlowControl  Receive FlowControl  RxPause TxPause
           admin    oper     admin    oper
---------  -------- -------- -------- --------    ------- -------
Gi1/0/16   Unsupp.  Unsupp.  off      off         0       0
switch#sh int Gi1/0/17 flowcontrol
Port       Send FlowControl  Receive FlowControl  RxPause TxPause
           admin    oper     admin    oper
---------  -------- -------- -------- --------    ------- -------
Gi1/0/17   Unsupp.  Unsupp.  off      off         0       0

Static routes and ARP entries

Configure static routes, configure IP addresses and static ARP. A router should not use LRO and TSO. BSDRP disable by default using a RC script (disablelrotso_enable=“YES” in /etc/rc.conf.misc), but re(4) drivers didn't support it.

/etc/rc.conf:

# IPv4 router
gateway_enable="YES"
ifconfig_re1="inet 198.18.0.207/24"
ifconfig_re2="inet 198.19.0.207/24"
static_routes="generator receiver"
route_generator="-net 198.18.0.0/16 198.18.0.203"
route_receiver="-net 198.19.0.0/16 198.19.0.203"
static_arp_pairs="receiver generator"
static_arp_generator="198.18.0.203 00:1b:21:c4:95:7a"
static_arp_receiver="198.19.0.203 00:1b:21:c4:95:7b"

# IPv6 router
ipv6_gateway_enable="YES"
ipv6_activate_all_interfaces="YES"
ipv6_static_routes="generator receiver"
ipv6_route_generator="2001:2:: -prefixlen 49 2001:2::203"
ipv6_route_receiver="2001:2:0:8000:: -prefixlen 49 2001:2:0:8000::203"
ifconfig_re1_ipv6="inet6 2001:2::207 prefixlen 64"
ifconfig_re2_ipv6="inet6 2001:2:0:8000::207 prefixlen 64"
static_ndp_pairs="receiver generator"
static_ndp_generator="2001:2::203 00:1b:21:c4:95:7a"
static_ndp_receiver="2001:2:0:8000::203 00:1b:21:c4:95:7b"

Default forwarding rate

We start the first test by starting one packet generator at gigabit line-rate (1.488Mpps) and found:

  • APU is still responsive during this test (thanks to the dual core);
  • About 154Kpps are accepted by the re(4) Ethernet interface.
[root@BSDRP]~# netstat -iw 1
            input        (Total)           output
   packets  errs idrops      bytes    packets  errs      bytes colls
    154273     0     0    9256386     154241     0    9256550     0
    154081     0     0    9244866     154081     0    9244982     0
    154113     0     0    9246786     154113     0    9246902     0
    154151     0     0    9249066     154177     0    9249182     0
    154139     0     0    9248346     154113     0    9248462     0
    154113     0     0    9246786     154113     0    9246902     0
    154145     0     0    9248706     154145     0    9248822     0
    154193     0     0    9251586     154209     0    9252662     0
    154135     0     0    9248106     154145     0    9247322     0
    154139     0     0    9248346     154113     0    9248402     0
    154151     0     0    9249066     154177     0    9249242     0
    154145     0     0    9248706     154145     0    9248822     0
    154147     0     0    9248826     154145     0    9248882     0
    154169     0     0    9250146     154177     0    9250262     0
    154145     0     0    9248706     154113     0    9248822     0

The forwarding rate is not very high: RealTek NIC are not very very fast and doesn't support multi-queues and it's only a 1Ghz CPU. We notice input error counters of re(4) are not updated: re(4) drivers bugs?.

We can force drivers stats with this command:

[root@BSDRP]~# sysctl dev.re.1.stats=1
dev.re.1.stats: -1 -> -1
[root@BSDRP]~# dmesg
(etc...)
re1 statistics:
Tx frames : 6
Rx frames : 16394206
Tx errors : 0
Rx errors : 0
Rx missed frames : 16421
Rx frame alignment errs : 0
Tx single collisions : 0
Tx multiple collisions : 0
Rx unicast frames : 16394204
Rx broadcast frames : 2
Rx multicast frames : 0
Tx aborts : 0
Tx underruns : 0

But the RX missed frame counter is still not accurate.

About system load during this test:

[root@BSDRP]/# top -nCHSIzs1
last pid:  4067;  load averages:  0.49,  0.16,  0.04  up 0+01:04:10    18:32:04
86 processes:  3 running, 67 sleeping, 16 waiting

Mem: 6312K Active, 19M Inact, 75M Wired, 12M Buf, 1849M Free
Swap:


  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME     CPU COMMAND
   11 root       -92    -     0K   256K WAIT    0   0:25  68.26% intr{irq260: re1}
   11 root       -92    -     0K   256K WAIT    0   0:03   8.25% intr{irq261: re2}

Firewalls impact

This test will generate 2000 different flows by using 2000 different UDP destination ports.

pf and ipfw configurations used are detailed on the previous Forwarding performance lab of an IBM System x3550 M3 with Intel 82580.

Graph

scale information about Gigabit Ethernet:

  • 1.488Mpps is the maximum paquet-per-second (pps) rate with smallest 46 bytes packets.
  • 81Kpps is the minimum pps rate with biggest 1500 bytes packets.

forwarding and firewalling rate with a PC Engines APU running FreeBSD FreeBSD 10.3

Ministat

All benchs were done 5 times, with a reboot between them.

x forwarding
+ ipfw-statefull
* pf-statefull
+--------------------------------------------------------------------------+
|*                                                                         |
|*                            +                                           x|
|*                            +                                           x|
|*                            +                                           x|
|*                           ++                                           x|
|                                                                         A|
|                            |A                                            |
|A                                                                         |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5        154144        154200        154167      154171.6     20.671236
+   5        113357        114637        114173      114152.6     486.93151
Difference at 95.0% confidence
        -40019 +/- 502.612
        -25.9574% +/- 0.326008%
        (Student's t, pooled s = 344.623)
*   5         88037         88385         88108         88169     144.98793
Difference at 95.0% confidence
        -66002.6 +/- 151.034
        -42.8111% +/- 0.0979651%
        (Student's t, pooled s = 103.559)

Netmap's pkt-gen performance

re(4) has netmap support… what's about the rate with the netmap's packet generator/receiver ?

As a receiver (the sender is emitting at 1.48 Mpps):

[root@APU]~# pkt-gen -i re1 -f rx -w 4 -c 2
854.089137 main [1641] interface is re1
854.089501 extract_ip_range [275] range is 10.0.0.1:0 to 10.0.0.1:0
854.089523 extract_ip_range [275] range is 10.1.0.1:0 to 10.1.0.1:0
854.111967 main [1824] mapped 334980KB at 0x801dff000
Receiving from netmap:re1: 1 queues, 1 threads and 2 cpus.
854.112634 main [1904] Wait 4 secs for phy reset
858.123495 main [1906] Ready...
858.123756 nm_open [457] overriding ifname re1 ringid 0x0 flags 0x1
859.124355 receiver_body [1189] waiting for initial packets, poll returns 0 0
(etc...)
862.129332 main_thread [1438] 579292 pps (580438 pkts in 1001978 usec)
863.131433 main_thread [1438] 579115 pps (580332 pkts in 1002101 usec)
894.184371 main_thread [1438] 577549 pps (578725 pkts in 1002036 usec)
895.185330 main_thread [1438] 577483 pps (578037 pkts in 1000959 usec)
896.191334 main_thread [1438] 580069 pps (583552 pkts in 1006004 usec)
897.193330 main_thread [1438] 578174 pps (579328 pkts in 1001996 usec)
898.195328 main_thread [1438] 581974 pps (583137 pkts in 1001998 usec)
899.196916 main_thread [1438] 579600 pps (580520 pkts in 1001588 usec)
900.198344 main_thread [1438] 578366 pps (579191 pkts in 1001427 usec)
901.200327 main_thread [1438] 579327 pps (580476 pkts in 1001984 usec)
902.202328 main_thread [1438] 581601 pps (582765 pkts in 1002001 usec)
903.204329 main_thread [1438] 577499 pps (578655 pkts in 1002001 usec)

Netmap usage improve the receiving packet rate to about 580Kpps only: It's strange that it didn't reach the maximum Ethernet frame rate (1.48Mpps) with netmap.

As a packet generator:

[root@APU]~# pkt-gen -i re1 -f tx -w 4 -c 2 -n 80000000 -l 60 -d 2.1.3.1-2.1.3.20 -D 00:1b:21:d4:3f:2a -s 1.1.3.3-1.1.3
.100 -c 2
759.415059 main [1641] interface is re1
759.415387 extract_ip_range [275] range is 1.1.3.3:0 to 1.1.3.100:0
759.415409 extract_ip_range [275] range is 2.1.3.1:0 to 2.1.3.20:0
759.922110 main [1824] mapped 334980KB at 0x801dff000
Sending on netmap:re1: 1 queues, 1 threads and 2 cpus.
1.1.3.3 -> 2.1.3.1 (00:00:00:00:00:00 -> 00:1b:21:d4:3f:2a)
759.922737 main [1880] --- SPECIAL OPTIONS: copy
759.922750 main [1902] Sending 512 packets every  0.000000000 s
759.922763 main [1904] Wait 4 secs for phy reset
763.923715 main [1906] Ready...
763.924310 nm_open [457] overriding ifname re1 ringid 0x0 flags 0x1
763.924929 sender_body [1016] start
764.926557 main_thread [1438] 407993 pps (408672 pkts in 1001665 usec)
765.928548 main_thread [1438] 408091 pps (408904 pkts in 1001991 usec)
766.929550 main_thread [1438] 407939 pps (408348 pkts in 1001002 usec)
767.931548 main_thread [1438] 407808 pps (408623 pkts in 1001998 usec)
768.933359 main_thread [1438] 407880 pps (408619 pkts in 1001811 usec)
769.934548 main_thread [1438] 408138 pps (408623 pkts in 1001189 usec)
770.936548 main_thread [1438] 407825 pps (408641 pkts in 1002000 usec)
(etc...)
792.976553 main_thread [1438] 407872 pps (408690 pkts in 1002005 usec)
793.978549 main_thread [1438] 408184 pps (408999 pkts in 1001996 usec)
794.980547 main_thread [1438] 408201 pps (409017 pkts in 1001998 usec)
795.982552 main_thread [1438] 407892 pps (408710 pkts in 1002005 usec)
796.984546 main_thread [1438] 407984 pps (408798 pkts in 1001994 usec)
797.986546 main_thread [1438] 408069 pps (408885 pkts in 1002000 usec)
798.988442 main_thread [1438] 408080 pps (408854 pkts in 1001896 usec)
799.989548 main_thread [1438] 407815 pps (408266 pkts in 1001106 usec)
^C800.990686 main_thread [1438] 183685 pps (183894 pkts in 1001137 usec)
Sent 14897486 packets, 60 bytes each, in 36.52 seconds.
Speed: 407.98 Kpps Bandwidth: 195.83 Mbps (raw 274.16 Mbps)

Still not able to reach the maximum Ethernet throughput with netmap !?!? Realtek chipset limitation ?

documentation/examples/forwarding_performance_lab_of_a_pc_engines_apu.txt · Last modified: 2017/01/28 14:18 by olivier