User Tools

Site Tools


documentation:examples:forwarding_performance_lab_of_a_pc_engines_apu

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

documentation:examples:forwarding_performance_lab_of_a_pc_engines_apu [2017/01/28 14:18] (current)
Line 1: Line 1:
 +====== Forwarding performance lab of a PC Engines APU ======
 +{{description>​Forwarding performance lab of a dual core AMD G series T40E APU (1 GHz) with 3 Realtek RTL8111E Gigabit}}
 +===== Hardware detail =====
  
 +This lab will test a [[http://​www.pcengines.ch/​apu.htm|PC Engines APU 1]] ([[PC Engines APU|dmesg]]):​
 +   * Dual core [[http://​www.amd.com/​us/​Documents/​49282_G-Series_platform_brief.pdf|AMD G-T40E Processor]] (1 GHz)
 +   * 3 Realtek RTL8111E Gigabit Ethernet ports
 +   * 2Gb of RAM
 +
 + ​[[documentation:​examples:​Forwarding performance lab of a PC Engines APU2|Forwarding performance of APU version 2 is here.]] ​
 +===== Lab set-up =====
 +
 +For more information about full setup of this lab: [[documentation:​examples:​Setting up a forwarding performance benchmark lab]] (switch configuration,​ etc.).
 +==== Diagram ====
 +
 +<​code>​
 + ​+------------------------------------------+ ​     +-----------------------+
 + ​| ​             Device under Test           ​| ​     |       ​Packet gen      |
 + ​| ​                                         |      |                       |
 + ​| ​                    re1: 198.18.0.207/​24 |<=====| igb2: 198.18.0.203/​24 |
 + ​| ​                          ​2001:​2::​207/​64 |      |        2001:​2::​203/​64 |
 + ​| ​                       00:​0d:​b9:​3c:​dd:​3d |      |     ​00:​1b:​21:​c4:​95:​7a |
 + ​| ​                                         |      |                       |
 + ​| ​                    re2: 198.19.0.207/​24 |=====>| igb3: 198.19.0.203/​24 |
 + ​| ​                     2001:​2:​0:​8000::​8/​64 |      | 2001:​2:​0:​8000::​203/​64 |
 + ​| ​                       00:​0d:​b9:​3c:​dd:​3e |      |     ​00:​1b:​21:​c4:​95:​7b |
 + ​| ​                                         |      |                       |
 + ​| ​              ​static routes ​             |      |                       |
 + ​| ​     198.19.0.0/​16 => 198.19.0.203 ​      ​| ​     +-----------------------+
 + ​| ​     198.18.0.0/​16 => 198.18.0.203 ​      |
 + | 2001:​2::/​49 ​       => 2001:​2::​203 ​       |
 + | 2001:​2:​0:​8000::/​49 => 2001:​2:​0:​8000::​203 |
 + ​| ​                                         |
 + ​| ​       static arp and ndp                |
 + | 198.18.0.203 ​       => 00:​1b:​21:​c4:​95:​7a |
 + | 2001:​2::​203 ​                             |
 + ​| ​                                         |
 + | 198.19.0.203 ​       => 00:​1b:​21:​c4:​95:​7b |
 + | 2001:​2:​0:​8000::​203 ​                      |
 + ​| ​                                         |
 + ​+------------------------------------------+
 +</​code>​
 +
 +The generator **MUST** generate lot's of IP flows (multiple source/​destination IP addresses and/or UDP src/dst port) and minimum packet size (for generating maximum packet rate) with one of these commands:
 +
 +Multiple source/​destination IP addresses (don't forget to precise port to use for avoiding to use port number 0 filtered by pf):
 +<​code>​
 +pkt-gen -U -i igb3 -f tx -n 80000000 -l 60 -d 198.19.10.1:​2000-198.19.10.20 -D 00:​0d:​b9:​3c:​dd:​3e -s 198.18.10.1:​2000-198.18.10.100 -w 4
 +</​code>​
 +
 +Receiver will use these commands:
 +<​code>​
 +pkt-gen -i igb2 -f rx -w 4
 +</​code>​
 +===== Basic configuration =====
 +
 +==== Disabling Ethernet flow-control ===
 +
 +re(4) drivers didn't seems to support flow-control and the switch confirms this behavior:
 +<​code>​
 +switch#sh int Gi1/0/16 flowcontrol
 +Port       Send FlowControl ​ Receive FlowControl ​ RxPause TxPause
 +           ​admin ​   oper     ​admin ​   oper
 +--------- ​ -------- -------- -------- -------- ​   ------- -------
 +Gi1/​0/​16 ​  ​Unsupp. ​ Unsupp. ​ off      off         ​0 ​      0
 +switch#sh int Gi1/0/17 flowcontrol
 +Port       Send FlowControl ​ Receive FlowControl ​ RxPause TxPause
 +           ​admin ​   oper     ​admin ​   oper
 +--------- ​ -------- -------- -------- -------- ​   ------- -------
 +Gi1/​0/​17 ​  ​Unsupp. ​ Unsupp. ​ off      off         ​0 ​      0
 +</​code>​
 +==== Static routes and ARP entries ==== 
 +
 +Configure static routes, configure IP addresses and static ARP.
 +A router [[Documentation:​Technical docs:​Performance|should not use LRO and TSO]]. BSDRP disable by default using a RC script (disablelrotso_enable="​YES"​ in /​etc/​rc.conf.misc),​ but re(4) drivers didn't support it.
 +
 +/​etc/​rc.conf:​
 +<​code>​
 +# IPv4 router
 +gateway_enable="​YES"​
 +ifconfig_re1="​inet 198.18.0.207/​24"​
 +ifconfig_re2="​inet 198.19.0.207/​24"​
 +static_routes="​generator receiver"​
 +route_generator="​-net 198.18.0.0/​16 198.18.0.203"​
 +route_receiver="​-net 198.19.0.0/​16 198.19.0.203"​
 +static_arp_pairs="​receiver generator"​
 +static_arp_generator="​198.18.0.203 00:​1b:​21:​c4:​95:​7a"​
 +static_arp_receiver="​198.19.0.203 00:​1b:​21:​c4:​95:​7b"​
 +
 +# IPv6 router
 +ipv6_gateway_enable="​YES"​
 +ipv6_activate_all_interfaces="​YES"​
 +ipv6_static_routes="​generator receiver"​
 +ipv6_route_generator="​2001:​2::​ -prefixlen 49 2001:​2::​203"​
 +ipv6_route_receiver="​2001:​2:​0:​8000::​ -prefixlen 49 2001:​2:​0:​8000::​203"​
 +ifconfig_re1_ipv6="​inet6 2001:2::207 prefixlen 64"
 +ifconfig_re2_ipv6="​inet6 2001:​2:​0:​8000::​207 prefixlen 64"
 +static_ndp_pairs="​receiver generator"​
 +static_ndp_generator="​2001:​2::​203 00:​1b:​21:​c4:​95:​7a"​
 +static_ndp_receiver="​2001:​2:​0:​8000::​203 00:​1b:​21:​c4:​95:​7b"​
 +</​code>​
 +
 +===== Default forwarding rate =====
 +
 +We start the first test by starting one packet generator at gigabit line-rate (1.488Mpps) and found:
 +  * APU is still responsive during this test (thanks to the dual core);
 +  * About 154Kpps are accepted by the re(4) Ethernet interface.
 +
 +<​code>​
 +[root@BSDRP]~#​ netstat -iw 1
 +            input        (Total) ​          ​output
 +   ​packets ​ errs idrops ​     bytes    packets ​ errs      bytes colls
 +    154273 ​    ​0 ​    ​0 ​   9256386 ​    ​154241 ​    ​0 ​   9256550 ​    0
 +    154081 ​    ​0 ​    ​0 ​   9244866 ​    ​154081 ​    ​0 ​   9244982 ​    0
 +    154113 ​    ​0 ​    ​0 ​   9246786 ​    ​154113 ​    ​0 ​   9246902 ​    0
 +    154151 ​    ​0 ​    ​0 ​   9249066 ​    ​154177 ​    ​0 ​   9249182 ​    0
 +    154139 ​    ​0 ​    ​0 ​   9248346 ​    ​154113 ​    ​0 ​   9248462 ​    0
 +    154113 ​    ​0 ​    ​0 ​   9246786 ​    ​154113 ​    ​0 ​   9246902 ​    0
 +    154145 ​    ​0 ​    ​0 ​   9248706 ​    ​154145 ​    ​0 ​   9248822 ​    0
 +    154193 ​    ​0 ​    ​0 ​   9251586 ​    ​154209 ​    ​0 ​   9252662 ​    0
 +    154135 ​    ​0 ​    ​0 ​   9248106 ​    ​154145 ​    ​0 ​   9247322 ​    0
 +    154139 ​    ​0 ​    ​0 ​   9248346 ​    ​154113 ​    ​0 ​   9248402 ​    0
 +    154151 ​    ​0 ​    ​0 ​   9249066 ​    ​154177 ​    ​0 ​   9249242 ​    0
 +    154145 ​    ​0 ​    ​0 ​   9248706 ​    ​154145 ​    ​0 ​   9248822 ​    0
 +    154147 ​    ​0 ​    ​0 ​   9248826 ​    ​154145 ​    ​0 ​   9248882 ​    0
 +    154169 ​    ​0 ​    ​0 ​   9250146 ​    ​154177 ​    ​0 ​   9250262 ​    0
 +    154145 ​    ​0 ​    ​0 ​   9248706 ​    ​154113 ​    ​0 ​   9248822 ​    0
 +
 +</​code>​
 +
 +The forwarding rate is not very high: RealTek NIC are not very very fast and doesn'​t support multi-queues and it's only a 1Ghz CPU. We notice input error counters of re(4) are not updated: re(4) drivers bugs?.
 +
 +We can force drivers stats with this command:
 +<​code>​
 +[root@BSDRP]~#​ sysctl dev.re.1.stats=1
 +dev.re.1.stats:​ -1 -> -1
 +[root@BSDRP]~#​ dmesg
 +(etc...)
 +re1 statistics:
 +Tx frames : 6
 +Rx frames : 16394206
 +Tx errors : 0
 +Rx errors : 0
 +Rx missed frames : 16421
 +Rx frame alignment errs : 0
 +Tx single collisions : 0
 +Tx multiple collisions : 0
 +Rx unicast frames : 16394204
 +Rx broadcast frames : 2
 +Rx multicast frames : 0
 +Tx aborts : 0
 +Tx underruns : 0
 +</​code>​
 +
 +But the RX missed frame counter is still not accurate.
 +
 +About system load during this test:
 +
 +<​code>​
 +[root@BSDRP]/#​ top -nCHSIzs1
 +last pid:  4067;  load averages: ​ 0.49,  0.16,  0.04  up 0+01:​04:​10 ​   18:32:04
 +86 processes: ​ 3 running, 67 sleeping, 16 waiting
 +
 +Mem: 6312K Active, 19M Inact, 75M Wired, 12M Buf, 1849M Free
 +Swap:
 +
 +
 +  PID USERNAME ​  PRI NICE   ​SIZE ​   RES STATE   ​C ​  ​TIME ​    CPU COMMAND
 +   11 root       ​-92 ​   -     ​0K ​  256K WAIT    0   ​0:​25 ​ 68.26% intr{irq260:​ re1}
 +   11 root       ​-92 ​   -     ​0K ​  256K WAIT    0   ​0:​03 ​  8.25% intr{irq261:​ re2}
 +
 +</​code>​
 +
 +===== Firewalls impact =====
 +
 +This test will generate 2000 different flows by using 2000 different UDP destination ports.
 +
 +pf and ipfw configurations used are detailed on the previous [[documentation:​examples:​forwarding_performance_lab_of_an_ibm_system_x3550_m3_with_intel_82580#​firewall_impact|Forwarding performance lab of an IBM System x3550 M3 with Intel 82580]].
 +
 +==== Graph ====
 +
 +scale information about Gigabit Ethernet:
 +  * 1.488Mpps is the maximum paquet-per-second (pps) rate with smallest 46 bytes packets.
 +  * 81Kpps is the minimum pps rate with biggest 1500 bytes packets.
 +
 +{{bench.forwarding.and.firewalling.rate.on.pc.engines.apu.png|forwarding and firewalling rate with a PC Engines APU running FreeBSD FreeBSD 10.3}}
 +
 +
 +==== Ministat ====
 +
 +All benchs were done 5 times, with a reboot between them.
 +
 +<​code>​
 +x forwarding
 ++ ipfw-statefull
 +* pf-statefull
 ++--------------------------------------------------------------------------+
 +|*                                                                         |
 +|*                            +                                           x|
 +|*                            +                                           x|
 +|*                            +                                           x|
 +|*                           ​++ ​                                          x|
 +|                                                                         A|
 +|                            |A                                            |
 +|A                                                                         |
 ++--------------------------------------------------------------------------+
 +    N           ​Min ​          ​Max ​       Median ​          ​Avg ​       Stddev
 +x   ​5 ​       154144 ​       154200 ​       154167 ​     154171.6 ​    ​20.671236
 ++   ​5 ​       113357 ​       114637 ​       114173 ​     114152.6 ​    ​486.93151
 +Difference at 95.0% confidence
 +        -40019 +/- 502.612
 +        -25.9574% +/- 0.326008%
 +        (Student'​s t, pooled s = 344.623)
 +*   ​5 ​        ​88037 ​        ​88385 ​        ​88108 ​        ​88169 ​    ​144.98793
 +Difference at 95.0% confidence
 +        -66002.6 +/- 151.034
 +        -42.8111% +/- 0.0979651%
 +        (Student'​s t, pooled s = 103.559)
 +</​code>​
 +
 +===== Netmap'​s pkt-gen performance =====
 +
 +re(4) has [[http://​info.iet.unipi.it/​~luigi/​netmap/​|netmap]] support… what's about the rate with the netmap'​s packet generator/​receiver ?
 +
 +As a receiver (the sender is emitting at 1.48 Mpps):
 +<​code>​
 +[root@APU]~#​ pkt-gen -i re1 -f rx -w 4 -c 2
 +854.089137 main [1641] interface is re1
 +854.089501 extract_ip_range [275] range is 10.0.0.1:0 to 10.0.0.1:0
 +854.089523 extract_ip_range [275] range is 10.1.0.1:0 to 10.1.0.1:0
 +854.111967 main [1824] mapped 334980KB at 0x801dff000
 +Receiving from netmap:re1: 1 queues, 1 threads and 2 cpus.
 +854.112634 main [1904] Wait 4 secs for phy reset
 +858.123495 main [1906] Ready...
 +858.123756 nm_open [457] overriding ifname re1 ringid 0x0 flags 0x1
 +859.124355 receiver_body [1189] waiting for initial packets, poll returns 0 0
 +(etc...)
 +862.129332 main_thread [1438] 579292 pps (580438 pkts in 1001978 usec)
 +863.131433 main_thread [1438] 579115 pps (580332 pkts in 1002101 usec)
 +894.184371 main_thread [1438] 577549 pps (578725 pkts in 1002036 usec)
 +895.185330 main_thread [1438] 577483 pps (578037 pkts in 1000959 usec)
 +896.191334 main_thread [1438] 580069 pps (583552 pkts in 1006004 usec)
 +897.193330 main_thread [1438] 578174 pps (579328 pkts in 1001996 usec)
 +898.195328 main_thread [1438] 581974 pps (583137 pkts in 1001998 usec)
 +899.196916 main_thread [1438] 579600 pps (580520 pkts in 1001588 usec)
 +900.198344 main_thread [1438] 578366 pps (579191 pkts in 1001427 usec)
 +901.200327 main_thread [1438] 579327 pps (580476 pkts in 1001984 usec)
 +902.202328 main_thread [1438] 581601 pps (582765 pkts in 1002001 usec)
 +903.204329 main_thread [1438] 577499 pps (578655 pkts in 1002001 usec)
 +</​code>​
 +
 +Netmap usage improve the receiving packet rate to about 580Kpps only:​ It'​s strange that it didn't reach the maximum Ethernet frame rate (1.48Mpps) with netmap.
 +
 +As a packet generator:
 +<​code>​
 +[root@APU]~#​ pkt-gen -i re1 -f tx -w 4 -c 2 -n 80000000 -l 60 -d 2.1.3.1-2.1.3.20 -D 00:​1b:​21:​d4:​3f:​2a -s 1.1.3.3-1.1.3
 +.100 -c 2
 +759.415059 main [1641] interface is re1
 +759.415387 extract_ip_range [275] range is 1.1.3.3:0 to 1.1.3.100:0
 +759.415409 extract_ip_range [275] range is 2.1.3.1:0 to 2.1.3.20:0
 +759.922110 main [1824] mapped 334980KB at 0x801dff000
 +Sending on netmap:re1: 1 queues, 1 threads and 2 cpus.
 +1.1.3.3 -> 2.1.3.1 (00:​00:​00:​00:​00:​00 -> 00:​1b:​21:​d4:​3f:​2a)
 +759.922737 main [1880] --- SPECIAL OPTIONS: copy
 +759.922750 main [1902] Sending 512 packets every  0.000000000 s
 +759.922763 main [1904] Wait 4 secs for phy reset
 +763.923715 main [1906] Ready...
 +763.924310 nm_open [457] overriding ifname re1 ringid 0x0 flags 0x1
 +763.924929 sender_body [1016] start
 +764.926557 main_thread [1438] 407993 pps (408672 pkts in 1001665 usec)
 +765.928548 main_thread [1438] 408091 pps (408904 pkts in 1001991 usec)
 +766.929550 main_thread [1438] 407939 pps (408348 pkts in 1001002 usec)
 +767.931548 main_thread [1438] 407808 pps (408623 pkts in 1001998 usec)
 +768.933359 main_thread [1438] 407880 pps (408619 pkts in 1001811 usec)
 +769.934548 main_thread [1438] 408138 pps (408623 pkts in 1001189 usec)
 +770.936548 main_thread [1438] 407825 pps (408641 pkts in 1002000 usec)
 +(etc...)
 +792.976553 main_thread [1438] 407872 pps (408690 pkts in 1002005 usec)
 +793.978549 main_thread [1438] 408184 pps (408999 pkts in 1001996 usec)
 +794.980547 main_thread [1438] 408201 pps (409017 pkts in 1001998 usec)
 +795.982552 main_thread [1438] 407892 pps (408710 pkts in 1002005 usec)
 +796.984546 main_thread [1438] 407984 pps (408798 pkts in 1001994 usec)
 +797.986546 main_thread [1438] 408069 pps (408885 pkts in 1002000 usec)
 +798.988442 main_thread [1438] 408080 pps (408854 pkts in 1001896 usec)
 +799.989548 main_thread [1438] 407815 pps (408266 pkts in 1001106 usec)
 +^C800.990686 main_thread [1438] 183685 pps (183894 pkts in 1001137 usec)
 +Sent 14897486 packets, 60 bytes each, in 36.52 seconds.
 +Speed: 407.98 Kpps Bandwidth: 195.83 Mbps (raw 274.16 Mbps)
 +</​code>​
 +
 +Still not able to reach the maximum Ethernet throughput with netmap !?!? Realtek chipset limitation ? 
documentation/examples/forwarding_performance_lab_of_a_pc_engines_apu.txt · Last modified: 2017/01/28 14:18 (external edit)