documentation:examples:forwarding_performance_lab_of_a_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
documentation:examples:forwarding_performance_lab_of_a_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr [2017/01/30 15:38] – external edit 127.0.0.1 | documentation:examples:forwarding_performance_lab_of_a_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr [2019/12/27 11:41] (current) – [Configuration and tuning] olivier | ||
---|---|---|---|
Line 41: | Line 41: | ||
The generator **MUST** generate lot's of smallest IP flows (multiple source/ | The generator **MUST** generate lot's of smallest IP flows (multiple source/ | ||
- | Here is an example for generating | + | Here is an example for generating |
< | < | ||
- | pkt-gen -i vcxl0 -f tx -n 1000000000 -l 60 -d 198.19.10.1:2000-198.19.10.10 -D 00: | + | pkt-gen -N -f tx -w 2 -i vcxl0 -n 1000000000 -l 60 -4 -p 2 -S 00: |
</ | </ | ||
And the same with IPv6 flows (minimum frame size of 62 here): | And the same with IPv6 flows (minimum frame size of 62 here): | ||
< | < | ||
- | pkt-gen -f tx -i vcxl0 -n 1000000000 -l 62 -6 -d " | + | pkt-gen |
</ | </ | ||
- | |||
<note warning> | <note warning> | ||
Line 57: | Line 56: | ||
Receiver will use this command: | Receiver will use this command: | ||
< | < | ||
- | pkt-gen -i vcxl1 -f rx -w 4 | + | pkt-gen -i vcxl1 -f rx -w 2 |
</ | </ | ||
- | ==== Basic configuration ==== | + | ==== Configuration and tuning ==== |
- | + | ||
- | === Disabling Ethernet flow-control === | + | |
- | + | ||
- | First, disable Ethernet flow-control on both servers. Chelsio T540 are configured like this: | + | |
- | < | + | |
- | echo " | + | |
- | echo " | + | |
- | service sysctl reload | + | |
- | </ | + | |
- | + | ||
- | === Disabling LRO and TSO === | + | |
- | + | ||
- | A router [[Documentation: | + | |
- | + | ||
- | ==== IP Configuration | + | |
- | + | ||
- | / | + | |
- | < | + | |
- | # IPv4 router | + | |
- | gateway_enable=" | + | |
- | ifconfig_cxl0=" | + | |
- | ifconfig_cxl1=" | + | |
- | static_routes=" | + | |
- | route_generator=" | + | |
- | route_receiver=" | + | |
- | static_arp_pairs=" | + | |
- | static_arp_generator=" | + | |
- | static_arp_receiver=" | + | |
- | + | ||
- | # IPv6 router | + | |
- | ipv6_gateway_enable=" | + | |
- | ipv6_activate_all_interfaces=" | + | |
- | ifconfig_cxl0_ipv6=" | + | |
- | ifconfig_cxl1_ipv6=" | + | |
- | ipv6_static_routes=" | + | |
- | ipv6_route_generator=" | + | |
- | ipv6_route_receiver=" | + | |
- | static_ndp_pairs=" | + | |
- | static_ndp_generator=" | + | |
- | static_ndp_receiver=" | + | |
- | </ | + | |
- | + | ||
- | ===== Routing performance with default value ===== | + | |
- | ==== Default forwarding performance in front of a line-rate generator ==== | + | |
- | + | ||
- | Trying the " | + | |
- | + | ||
- | < | + | |
- | [root@hp]~# netstat -iw 1 | + | |
- | input (Total) | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | | + | |
- | </ | + | |
- | + | ||
- | The traffic is correctly load-balanced between NIC-queue/ | + | |
- | + | ||
- | < | + | |
- | [root@hp]~# vmstat -i | grep t5nex0 | + | |
- | irq291: t5nex0: | + | |
- | irq292: t5nex0: | + | |
- | irq293: t5nex0: | + | |
- | irq294: t5nex0: | + | |
- | irq295: t5nex0: | + | |
- | irq296: t5nex0: | + | |
- | irq297: t5nex0: | + | |
- | irq298: t5nex0: | + | |
- | irq299: t5nex0: | + | |
- | irq305: t5nex0: | + | |
- | irq306: t5nex0: | + | |
- | irq307: t5nex0: | + | |
- | irq308: t5nex0: | + | |
- | irq309: t5nex0: | + | |
- | irq310: t5nex0: | + | |
- | irq311: t5nex0: | + | |
- | irq312: t5nex0: | + | |
- | + | ||
- | [root@hp]~# top -nCHSIzs1 | + | |
- | last pid: 2032; load averages: | + | |
- | 205 processes: 12 running, 106 sleeping, 87 waiting | + | |
- | + | ||
- | Mem: 13M Active, 728K Inact, 504M Wired, 23M Buf, 62G Free | + | |
- | Swap: | + | |
- | + | ||
- | + | ||
- | PID USERNAME | + | |
- | 11 root | + | |
- | 11 root | + | |
- | 11 root | + | |
- | 11 root | + | |
- | 11 root | + | |
- | 11 root | + | |
- | 11 root | + | |
- | 11 root | + | |
- | 11 root | + | |
- | </ | + | |
- | + | ||
- | Where the system spend this time? | + | |
- | + | ||
- | < | + | |
- | [root@hp]~# kldload hwpmc | + | |
- | [root@hp]~# pmcstat -TS CPU_CLK_UNHALTED_CORE -w 1 | + | |
- | PMC: [CPU_CLK_UNHALTED_CORE] Samples: 320832 (100.0%) , 0 unresolved | + | |
- | + | ||
- | %SAMP IMAGE FUNCTION | + | |
- | 21.4 kernel | + | |
- | 15.8 kernel | + | |
- | 8.8 kernel | + | |
- | 6.3 kernel | + | |
- | 4.1 kernel | + | |
- | 3.6 kernel | + | |
- | 2.6 kernel | + | |
- | 2.0 kernel | + | |
- | 2.0 kernel | + | |
- | 2.0 libc.so.7 | + | |
- | 1.7 kernel | + | |
- | 1.6 kernel | + | |
- | 1.5 kernel | + | |
- | 1.4 kernel | + | |
- | 1.3 kernel | + | |
- | 1.2 kernel | + | |
- | 1.2 kernel | + | |
- | 1.1 kernel | + | |
- | 1.1 kernel | + | |
- | 1.0 kernel | + | |
- | 1.0 kernel | + | |
- | + | ||
- | </ | + | |
- | + | ||
- | Some lock contention on the fib4_lookup_nh_basic. | + | |
- | + | ||
- | ==== Equilibrium throughput ==== | + | |
- | + | ||
- | Previous methodology, | + | |
- | + | ||
- | === IPv4 === | + | |
- | + | ||
- | From the pkt-generator, | + | |
- | < | + | |
- | [root@pkt-gen]~# | + | |
- | Benchmark tool using equilibrium throughput method | + | |
- | - Benchmark mode: Throughput (pps) for Router | + | |
- | - UDP load = 18B, IPv4 packet size=46B, Ethernet frame size=60B | + | |
- | - Link rate = 10000 Kpps | + | |
- | - Tolerance = 0.01 | + | |
- | Iteration 1 | + | |
- | - Offering load = 5000 Kpps | + | |
- | - Step = 2500 Kpps | + | |
- | - Measured forwarding rate = 5000 Kpps | + | |
- | Iteration 2 | + | |
- | - Offering load = 7500 Kpps | + | |
- | - Step = 2500 Kpps | + | |
- | - Trend = increasing | + | |
- | - Measured forwarding rate = 5440 Kpps | + | |
- | Iteration 3 | + | |
- | - Offering load = 6250 Kpps | + | |
- | - Step = 1250 Kpps | + | |
- | - Trend = decreasing | + | |
- | - Measured forwarding rate = 5437 Kpps | + | |
- | Iteration 4 | + | |
- | - Offering load = 5625 Kpps | + | |
- | - Step = 625 Kpps | + | |
- | - Trend = decreasing | + | |
- | - Measured forwarding rate = 5442 Kpps | + | |
- | Iteration 5 | + | |
- | - Offering load = 5313 Kpps | + | |
- | - Step = 312 Kpps | + | |
- | - Trend = decreasing | + | |
- | - Measured forwarding rate = 5313 Kpps | + | |
- | Iteration 6 | + | |
- | - Offering load = 5469 Kpps | + | |
- | - Step = 156 Kpps | + | |
- | - Trend = increasing | + | |
- | - Measured forwarding rate = 5434 Kpps | + | |
- | Iteration 7 | + | |
- | - Offering load = 5391 Kpps | + | |
- | - Step = 78 Kpps | + | |
- | - Trend = decreasing | + | |
- | - Measured forwarding rate = 5390 Kpps | + | |
- | Estimated Equilibrium Ethernet throughput= 5390 Kpps (maximum value seen: 5442 Kpps) | + | |
- | </ | + | |
- | + | ||
- | => About the same performance as the "under DOS" bench (only running multiple times this same benchs can give valide " | + | |
- | + | ||
- | === IPv6 === | + | |
- | + | ||
- | From the pkt-generator, | + | |
- | < | + | |
- | [root@pkt-gen]~# | + | |
- | Benchmark tool using equilibrium throughput method | + | |
- | - Benchmark mode: Throughput (pps) for Router | + | |
- | - UDP load = 0B, IPv6 packet size=48B, Ethernet frame size=62B | + | |
- | - Link rate = 10000 Kpps | + | |
- | - Tolerance = 0.01 | + | |
- | Iteration 1 | + | |
- | - Offering load = 5000 Kpps | + | |
- | - Step = 2500 Kpps | + | |
- | - Measured forwarding rate = 2681 Kpps | + | |
- | Iteration 2 | + | |
- | - Offering load = 2500 Kpps | + | |
- | - Step = 2500 Kpps | + | |
- | - Trend = decreasing | + | |
- | - Measured forwarding rate = 2499 Kpps | + | |
- | Iteration 3 | + | |
- | - Offering load = 3750 Kpps | + | |
- | - Step = 1250 Kpps | + | |
- | - Trend = increasing | + | |
- | - Measured forwarding rate = 2682 Kpps | + | |
- | Iteration 4 | + | |
- | - Offering load = 3125 Kpps | + | |
- | - Step = 625 Kpps | + | |
- | - Trend = decreasing | + | |
- | - Measured forwarding rate = 2681 Kpps | + | |
- | Iteration 5 | + | |
- | - Offering load = 2813 Kpps | + | |
- | - Step = 312 Kpps | + | |
- | - Trend = decreasing | + | |
- | - Measured forwarding rate = 2681 Kpps | + | |
- | Iteration 6 | + | |
- | - Offering load = 2657 Kpps | + | |
- | - Step = 156 Kpps | + | |
- | - Trend = decreasing | + | |
- | - Measured forwarding rate = 2657 Kpps | + | |
- | Iteration 7 | + | |
- | - Offering load = 2735 Kpps | + | |
- | - Step = 78 Kpps | + | |
- | - Trend = increasing | + | |
- | - Measured forwarding rate = 2680 Kpps | + | |
- | Iteration 8 | + | |
- | - Offering load = 2696 Kpps | + | |
- | - Step = 39 Kpps | + | |
- | - Trend = decreasing | + | |
- | - Measured forwarding rate = 2679 Kpps | + | |
- | Estimated Equilibrium Ethernet throughput= 2679 Kpps (maximum value seen: 2682 Kpps) | + | |
- | </ | + | |
- | + | ||
- | From 5.4Mpps in IPv4, it lower to 2.67Mppps in IPv6 (no fastforward with IPv6). | + | |
- | ==== Firewall impact ==== | + | |
- | + | ||
- | One rule for each firewall | + | |
- | + | ||
- | {{documentation: | + | |
- | + | ||
- | ===== tuning ===== | + | |
- | + | ||
- | === BIOS === | + | |
- | + | ||
- | Disable Hyperthreading: | + | |
- | + | ||
- | {{documentation: | + | |
- | + | ||
- | === Chelsio drivers === | + | |
- | + | ||
- | == Reducing NIC queues (FreeBSD 11.0 or older only)== | + | |
- | + | ||
- | By default queues are: | + | |
- | * TX: 16 or ncpu if ncpu< | + | |
- | * RX: 8 or ncpu if ncpu<8 | + | |
- | + | ||
- | Then in our case there are equal to 8: | + | |
- | < | + | |
- | [root@hp]~# sysctl dev.cxl.3.nrxq | + | |
- | dev.cxl.3.nrxq: | + | |
- | [root@hp]~# sysctl dev.cxl.3.ntxq | + | |
- | dev.cxl.3.ntxq: | + | |
- | </ | + | |
- | + | ||
- | Here is how to changes number of queue to 4: | + | |
- | < | + | |
- | mount -uw / | + | |
- | echo ' | + | |
- | echo ' | + | |
- | mount -ur / | + | |
- | reboot | + | |
- | </ | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | <note warning> | + | |
- | On a 8 cores machine, we had to reduce NIC queue numbers to 4 on FreeBSD 11.0 and older. | + | |
- | </ | + | |
- | + | ||
- | === descriptor ring size === | + | |
- | + | ||
- | The size, in number of entries, of the descriptor ring used for a RX and TX queue are 1024 by default. | + | |
- | + | ||
- | < | + | |
- | [root@hp]~# sysctl dev.cxl.3.qsize_rxq | + | |
- | dev.cxl.3.qsize_rxq: | + | |
- | [root@hp]~# sysctl dev.cxl.2.qsize_rxq | + | |
- | dev.cxl.2.qsize_rxq: | + | |
- | </ | + | |
- | + | ||
- | Let's change them to different value (1024, 2048 and 4096) and measuring the impact: | + | |
- | + | ||
- | < | + | |
- | mount -uw / | + | |
- | echo ' | + | |
- | echo ' | + | |
- | mount -ur / | + | |
- | reboot | + | |
- | </ | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | Ministat: | + | |
- | + | ||
- | < | + | |
- | x pps.qsize1024 | + | |
- | + pps.qsize2048 | + | |
- | * pps.qsize4096 | + | |
- | +--------------------------------------------------------------------------+ | + | |
- | |x x *+x | + | |
- | | | + | |
- | | |___________________A_____M____________|| | + | |
- | | |______________A_M____________| | + | |
- | +--------------------------------------------------------------------------+ | + | |
- | N | + | |
- | x | + | |
- | + | + | |
- | No difference proven at 95.0% confidence | + | |
- | * | + | |
- | No difference proven at 95.0% confidence | + | |
- | </ | + | |
- | + | ||
- | By reading the graphic it seems there is a better behaviour with a qsize of 2048, but ministat answers to 5 benchs says there is not. | + | |
+ | [[https:// | ||
+ | === Results === | ||
+ | {{https:// |
documentation/examples/forwarding_performance_lab_of_a_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr.1485787113.txt.gz · Last modified: 2017/01/30 15:38 by 127.0.0.1