User Tools

Site Tools


documentation:examples:forwarding_performance_lab_of_a_pc_engines_apu

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

documentation:examples:forwarding_performance_lab_of_a_pc_engines_apu [2017/01/28 14:18] (current)
Line 1: Line 1:
 +====== Forwarding performance lab of a PC Engines APU ======
 +{{description>Forwarding performance lab of a dual core AMD G series T40E APU (1 GHz) with 3 Realtek RTL8111E Gigabit}}
 +===== Hardware detail =====
  
 +This lab will test a [[http://www.pcengines.ch/apu.htm|PC Engines APU 1]] ([[PC Engines APU|dmesg]]):
 +   * Dual core [[http://www.amd.com/us/Documents/49282_G-Series_platform_brief.pdf|AMD G-T40E Processor]] (1 GHz)
 +   * 3 Realtek RTL8111E Gigabit Ethernet ports
 +   * 2Gb of RAM
 +
 + [[documentation:examples:Forwarding performance lab of a PC Engines APU2|Forwarding performance of APU version 2 is here.]] 
 +===== Lab set-up =====
 +
 +For more information about full setup of this lab: [[documentation:examples:Setting up a forwarding performance benchmark lab]] (switch configuration, etc.).
 +==== Diagram ====
 +
 +<code>
 + +------------------------------------------+      +-----------------------+
 +              Device under Test                |       Packet gen      |
 +                                          |      |                       |
 +                     re1: 198.18.0.207/24 |<=====| igb2: 198.18.0.203/24 |
 +                           2001:2::207/64 |      |        2001:2::203/64 |
 +                        00:0d:b9:3c:dd:3d |      |     00:1b:21:c4:95:7a |
 +                                          |      |                       |
 +                     re2: 198.19.0.207/24 |=====>| igb3: 198.19.0.203/24 |
 +                      2001:2:0:8000::8/64 |      | 2001:2:0:8000::203/64 |
 +                        00:0d:b9:3c:dd:3e |      |     00:1b:21:c4:95:7b |
 +                                          |      |                       |
 +               static routes              |      |                       |
 +      198.19.0.0/16 => 198.19.0.203            +-----------------------+
 +      198.18.0.0/16 => 198.18.0.203       |
 + | 2001:2::/49        => 2001:2::203        |
 + | 2001:2:0:8000::/49 => 2001:2:0:8000::203 |
 +                                          |
 +        static arp and ndp                |
 + | 198.18.0.203        => 00:1b:21:c4:95:7a |
 + | 2001:2::203                              |
 +                                          |
 + | 198.19.0.203        => 00:1b:21:c4:95:7b |
 + | 2001:2:0:8000::203                       |
 +                                          |
 + +------------------------------------------+
 +</code>
 +
 +The generator **MUST** generate lot's of IP flows (multiple source/destination IP addresses and/or UDP src/dst port) and minimum packet size (for generating maximum packet rate) with one of these commands:
 +
 +Multiple source/destination IP addresses (don't forget to precise port to use for avoiding to use port number 0 filtered by pf):
 +<code>
 +pkt-gen -U -i igb3 -f tx -n 80000000 -l 60 -d 198.19.10.1:2000-198.19.10.20 -D 00:0d:b9:3c:dd:3e -s 198.18.10.1:2000-198.18.10.100 -w 4
 +</code>
 +
 +Receiver will use these commands:
 +<code>
 +pkt-gen -i igb2 -f rx -w 4
 +</code>
 +===== Basic configuration =====
 +
 +==== Disabling Ethernet flow-control ===
 +
 +re(4) drivers didn't seems to support flow-control and the switch confirms this behavior:
 +<code>
 +switch#sh int Gi1/0/16 flowcontrol
 +Port       Send FlowControl  Receive FlowControl  RxPause TxPause
 +           admin    oper     admin    oper
 +---------  -------- -------- -------- --------    ------- -------
 +Gi1/0/16   Unsupp.  Unsupp.  off      off               0
 +switch#sh int Gi1/0/17 flowcontrol
 +Port       Send FlowControl  Receive FlowControl  RxPause TxPause
 +           admin    oper     admin    oper
 +---------  -------- -------- -------- --------    ------- -------
 +Gi1/0/17   Unsupp.  Unsupp.  off      off               0
 +</code>
 +==== Static routes and ARP entries ==== 
 +
 +Configure static routes, configure IP addresses and static ARP.
 +A router [[Documentation:Technical docs:Performance|should not use LRO and TSO]]. BSDRP disable by default using a RC script (disablelrotso_enable="YES" in /etc/rc.conf.misc), but re(4) drivers didn't support it.
 +
 +/etc/rc.conf:
 +<code>
 +# IPv4 router
 +gateway_enable="YES"
 +ifconfig_re1="inet 198.18.0.207/24"
 +ifconfig_re2="inet 198.19.0.207/24"
 +static_routes="generator receiver"
 +route_generator="-net 198.18.0.0/16 198.18.0.203"
 +route_receiver="-net 198.19.0.0/16 198.19.0.203"
 +static_arp_pairs="receiver generator"
 +static_arp_generator="198.18.0.203 00:1b:21:c4:95:7a"
 +static_arp_receiver="198.19.0.203 00:1b:21:c4:95:7b"
 +
 +# IPv6 router
 +ipv6_gateway_enable="YES"
 +ipv6_activate_all_interfaces="YES"
 +ipv6_static_routes="generator receiver"
 +ipv6_route_generator="2001:2:: -prefixlen 49 2001:2::203"
 +ipv6_route_receiver="2001:2:0:8000:: -prefixlen 49 2001:2:0:8000::203"
 +ifconfig_re1_ipv6="inet6 2001:2::207 prefixlen 64"
 +ifconfig_re2_ipv6="inet6 2001:2:0:8000::207 prefixlen 64"
 +static_ndp_pairs="receiver generator"
 +static_ndp_generator="2001:2::203 00:1b:21:c4:95:7a"
 +static_ndp_receiver="2001:2:0:8000::203 00:1b:21:c4:95:7b"
 +</code>
 +
 +===== Default forwarding rate =====
 +
 +We start the first test by starting one packet generator at gigabit line-rate (1.488Mpps) and found:
 +  * APU is still responsive during this test (thanks to the dual core);
 +  * About 154Kpps are accepted by the re(4) Ethernet interface.
 +
 +<code>
 +[root@BSDRP]~# netstat -iw 1
 +            input        (Total)           output
 +   packets  errs idrops      bytes    packets  errs      bytes colls
 +    154273            9256386     154241        9256550     0
 +    154081            9244866     154081        9244982     0
 +    154113            9246786     154113        9246902     0
 +    154151            9249066     154177        9249182     0
 +    154139            9248346     154113        9248462     0
 +    154113            9246786     154113        9246902     0
 +    154145            9248706     154145        9248822     0
 +    154193            9251586     154209        9252662     0
 +    154135            9248106     154145        9247322     0
 +    154139            9248346     154113        9248402     0
 +    154151            9249066     154177        9249242     0
 +    154145            9248706     154145        9248822     0
 +    154147            9248826     154145        9248882     0
 +    154169            9250146     154177        9250262     0
 +    154145            9248706     154113        9248822     0
 +
 +</code>
 +
 +The forwarding rate is not very high: RealTek NIC are not very very fast and doesn't support multi-queues and it's only a 1Ghz CPU. We notice input error counters of re(4) are not updated: re(4) drivers bugs?.
 +
 +We can force drivers stats with this command:
 +<code>
 +[root@BSDRP]~# sysctl dev.re.1.stats=1
 +dev.re.1.stats: -1 -> -1
 +[root@BSDRP]~# dmesg
 +(etc...)
 +re1 statistics:
 +Tx frames : 6
 +Rx frames : 16394206
 +Tx errors : 0
 +Rx errors : 0
 +Rx missed frames : 16421
 +Rx frame alignment errs : 0
 +Tx single collisions : 0
 +Tx multiple collisions : 0
 +Rx unicast frames : 16394204
 +Rx broadcast frames : 2
 +Rx multicast frames : 0
 +Tx aborts : 0
 +Tx underruns : 0
 +</code>
 +
 +But the RX missed frame counter is still not accurate.
 +
 +About system load during this test:
 +
 +<code>
 +[root@BSDRP]/# top -nCHSIzs1
 +last pid:  4067;  load averages:  0.49,  0.16,  0.04  up 0+01:04:10    18:32:04
 +86 processes:  3 running, 67 sleeping, 16 waiting
 +
 +Mem: 6312K Active, 19M Inact, 75M Wired, 12M Buf, 1849M Free
 +Swap:
 +
 +
 +  PID USERNAME   PRI NICE   SIZE    RES STATE     TIME     CPU COMMAND
 +   11 root       -92    -     0K   256K WAIT    0   0:25  68.26% intr{irq260: re1}
 +   11 root       -92    -     0K   256K WAIT    0   0:03   8.25% intr{irq261: re2}
 +
 +</code>
 +
 +===== Firewalls impact =====
 +
 +This test will generate 2000 different flows by using 2000 different UDP destination ports.
 +
 +pf and ipfw configurations used are detailed on the previous [[documentation:examples:forwarding_performance_lab_of_an_ibm_system_x3550_m3_with_intel_82580#firewall_impact|Forwarding performance lab of an IBM System x3550 M3 with Intel 82580]].
 +
 +==== Graph ====
 +
 +scale information about Gigabit Ethernet:
 +  * 1.488Mpps is the maximum paquet-per-second (pps) rate with smallest 46 bytes packets.
 +  * 81Kpps is the minimum pps rate with biggest 1500 bytes packets.
 +
 +{{bench.forwarding.and.firewalling.rate.on.pc.engines.apu.png|forwarding and firewalling rate with a PC Engines APU running FreeBSD FreeBSD 10.3}}
 +
 +
 +==== Ministat ====
 +
 +All benchs were done 5 times, with a reboot between them.
 +
 +<code>
 +x forwarding
 ++ ipfw-statefull
 +* pf-statefull
 ++--------------------------------------------------------------------------+
 +|*                                                                         |
 +|*                            +                                           x|
 +|*                            +                                           x|
 +|*                            +                                           x|
 +|*                           ++                                           x|
 +|                                                                         A|
 +|                            |A                                            |
 +|A                                                                         |
 ++--------------------------------------------------------------------------+
 +    N           Min           Max        Median           Avg        Stddev
 +x          154144        154200        154167      154171.6     20.671236
 ++          113357        114637        114173      114152.6     486.93151
 +Difference at 95.0% confidence
 +        -40019 +/- 502.612
 +        -25.9574% +/- 0.326008%
 +        (Student's t, pooled s = 344.623)
 +*           88037         88385         88108         88169     144.98793
 +Difference at 95.0% confidence
 +        -66002.6 +/- 151.034
 +        -42.8111% +/- 0.0979651%
 +        (Student's t, pooled s = 103.559)
 +</code>
 +
 +===== Netmap's pkt-gen performance =====
 +
 +re(4) has [[http://info.iet.unipi.it/~luigi/netmap/|netmap]] support… what's about the rate with the netmap's packet generator/receiver ?
 +
 +As a receiver (the sender is emitting at 1.48 Mpps):
 +<code>
 +[root@APU]~# pkt-gen -i re1 -f rx -w 4 -c 2
 +854.089137 main [1641] interface is re1
 +854.089501 extract_ip_range [275] range is 10.0.0.1:0 to 10.0.0.1:0
 +854.089523 extract_ip_range [275] range is 10.1.0.1:0 to 10.1.0.1:0
 +854.111967 main [1824] mapped 334980KB at 0x801dff000
 +Receiving from netmap:re1: 1 queues, 1 threads and 2 cpus.
 +854.112634 main [1904] Wait 4 secs for phy reset
 +858.123495 main [1906] Ready...
 +858.123756 nm_open [457] overriding ifname re1 ringid 0x0 flags 0x1
 +859.124355 receiver_body [1189] waiting for initial packets, poll returns 0 0
 +(etc...)
 +862.129332 main_thread [1438] 579292 pps (580438 pkts in 1001978 usec)
 +863.131433 main_thread [1438] 579115 pps (580332 pkts in 1002101 usec)
 +894.184371 main_thread [1438] 577549 pps (578725 pkts in 1002036 usec)
 +895.185330 main_thread [1438] 577483 pps (578037 pkts in 1000959 usec)
 +896.191334 main_thread [1438] 580069 pps (583552 pkts in 1006004 usec)
 +897.193330 main_thread [1438] 578174 pps (579328 pkts in 1001996 usec)
 +898.195328 main_thread [1438] 581974 pps (583137 pkts in 1001998 usec)
 +899.196916 main_thread [1438] 579600 pps (580520 pkts in 1001588 usec)
 +900.198344 main_thread [1438] 578366 pps (579191 pkts in 1001427 usec)
 +901.200327 main_thread [1438] 579327 pps (580476 pkts in 1001984 usec)
 +902.202328 main_thread [1438] 581601 pps (582765 pkts in 1002001 usec)
 +903.204329 main_thread [1438] 577499 pps (578655 pkts in 1002001 usec)
 +</code>
 +
 +Netmap usage improve the receiving packet rate to about 580Kpps only: It's strange that it didn't reach the maximum Ethernet frame rate (1.48Mpps) with netmap.
 +
 +As a packet generator:
 +<code>
 +[root@APU]~# pkt-gen -i re1 -f tx -w 4 -c 2 -n 80000000 -l 60 -d 2.1.3.1-2.1.3.20 -D 00:1b:21:d4:3f:2a -s 1.1.3.3-1.1.3
 +.100 -c 2
 +759.415059 main [1641] interface is re1
 +759.415387 extract_ip_range [275] range is 1.1.3.3:0 to 1.1.3.100:0
 +759.415409 extract_ip_range [275] range is 2.1.3.1:0 to 2.1.3.20:0
 +759.922110 main [1824] mapped 334980KB at 0x801dff000
 +Sending on netmap:re1: 1 queues, 1 threads and 2 cpus.
 +1.1.3.3 -> 2.1.3.1 (00:00:00:00:00:00 -> 00:1b:21:d4:3f:2a)
 +759.922737 main [1880] --- SPECIAL OPTIONS: copy
 +759.922750 main [1902] Sending 512 packets every  0.000000000 s
 +759.922763 main [1904] Wait 4 secs for phy reset
 +763.923715 main [1906] Ready...
 +763.924310 nm_open [457] overriding ifname re1 ringid 0x0 flags 0x1
 +763.924929 sender_body [1016] start
 +764.926557 main_thread [1438] 407993 pps (408672 pkts in 1001665 usec)
 +765.928548 main_thread [1438] 408091 pps (408904 pkts in 1001991 usec)
 +766.929550 main_thread [1438] 407939 pps (408348 pkts in 1001002 usec)
 +767.931548 main_thread [1438] 407808 pps (408623 pkts in 1001998 usec)
 +768.933359 main_thread [1438] 407880 pps (408619 pkts in 1001811 usec)
 +769.934548 main_thread [1438] 408138 pps (408623 pkts in 1001189 usec)
 +770.936548 main_thread [1438] 407825 pps (408641 pkts in 1002000 usec)
 +(etc...)
 +792.976553 main_thread [1438] 407872 pps (408690 pkts in 1002005 usec)
 +793.978549 main_thread [1438] 408184 pps (408999 pkts in 1001996 usec)
 +794.980547 main_thread [1438] 408201 pps (409017 pkts in 1001998 usec)
 +795.982552 main_thread [1438] 407892 pps (408710 pkts in 1002005 usec)
 +796.984546 main_thread [1438] 407984 pps (408798 pkts in 1001994 usec)
 +797.986546 main_thread [1438] 408069 pps (408885 pkts in 1002000 usec)
 +798.988442 main_thread [1438] 408080 pps (408854 pkts in 1001896 usec)
 +799.989548 main_thread [1438] 407815 pps (408266 pkts in 1001106 usec)
 +^C800.990686 main_thread [1438] 183685 pps (183894 pkts in 1001137 usec)
 +Sent 14897486 packets, 60 bytes each, in 36.52 seconds.
 +Speed: 407.98 Kpps Bandwidth: 195.83 Mbps (raw 274.16 Mbps)
 +</code>
 +
 +Still not able to reach the maximum Ethernet throughput with netmap !?!? Realtek chipset limitation ? 
documentation/examples/forwarding_performance_lab_of_a_pc_engines_apu.txt · Last modified: 2017/01/28 14:18 (external edit)