Both sides previous revisionPrevious revisionNext revision | Previous revision |
documentation:technical_docs:performance [2019/12/27 00:18] – [Disabling Hyper Threading (HT)] olivier | documentation:technical_docs:performance [2025/08/15 15:52] (current) – olivier |
---|
====== FreeBSD forwarding Performance ====== | ====== FreeBSD Forwarding Performance ====== |
{{description>Tips and information about FreeBSD forwarding performance}} | {{description>Tips and information about FreeBSD forwarding performance}} |
There are lot's of guide about [[http://serverfault.com/questions/64356/freebsd-performance-tuning-sysctls-loader-conf-kernel|tuning FreeBSD TCP performance]] (where the FreeBSD host is an end-point of the TCP session), but it's not the same that tunig forwarding performance (where the FreeBSD host don't have to read the TCP information of the packet being forwarded) or firewalling performance. | There are lots of guides about [[http://serverfault.com/questions/64356/freebsd-performance-tuning-sysctls-loader-conf-kernel|tuning FreeBSD TCP performance]] (where the FreeBSD host is an endpoint of the TCP session), but it's not the same as tuning forwarding (where the FreeBSD host does not need to read the TCP information of the packets being forwarded) or firewalling performance. |
| |
===== Concepts ===== | ===== Concepts ===== |
==== How to bench a router ==== | ==== How to bench a router ==== |
| |
Benchmarking a router **is not** measuring the maximum bandwidth crossing the router, but it's about measuring the network throughput (in packets-per-second unit): | Benchmarking a router **is not** about measuring the maximum bandwidth crossing the router; it's about measuring the network throughput in packets-per-second (pps): |
* [[http://www.ietf.org/rfc/rfc1242.txt|RFC1242: Benchmarking Terminology for Network Interconnection Devices]] | * [[http://www.ietf.org/rfc/rfc1242.txt|RFC1242: Benchmarking Terminology for Network Interconnection Devices]] |
* [[http://www.ietf.org/rfc/rfc2544.txt|RFC2544: Benchmarking Methodology for Network Interconnect Devices]] | * [[http://www.ietf.org/rfc/rfc2544.txt|RFC2544: Benchmarking Methodology for Network Interconnect Devices]] |
==== Definition ==== | ==== Definition ==== |
| |
Clear definition regarding some relations between the bandwidth and frame rate is mandatory: | A clear definition of the relationship between bandwidth and frame rate is necessary: |
* [[http://www.cisco.com/web/about/security/intelligence/network_performance_metrics.html|Bandwidth, Packets Per Second, and Other Network Performance Metrics]]: The relationship of bandwidth and packet forwarding rate | * [[http://www.cisco.com/web/about/security/intelligence/network_performance_metrics.html|Bandwidth, Packets Per Second, and Other Network Performance Metrics]]: The relationship of bandwidth and packet forwarding rate |
* [[http://wiki.networksecuritytoolkit.org/nstwiki/index.php/LAN_Ethernet_Maximum_Rates,_Generation,_Capturing_%26_Monitoring|LAN Ethernet Maximum Rates, Generation, Capturing & Monitoring]] : Give another good explanation of the Ethernet maximum rates | * [[http://wiki.networksecuritytoolkit.org/nstwiki/index.php/LAN_Ethernet_Maximum_Rates,_Generation,_Capturing_%26_Monitoring|LAN Ethernet Maximum Rates, Generation, Capturing & Monitoring]] : Give another good explanation of the Ethernet maximum rates |
* [[http://data.guug.de/slides/lk2008/10G_preso_lk2008.pdf|Towards 10Gb/s open-source routing]] (2008): Include an hardware comparison between a "real" router and a PC. | * [[http://data.guug.de/slides/lk2008/10G_preso_lk2008.pdf|Towards 10Gb/s open-source routing]] (2008): Include an hardware comparison between a "real" router and a PC. |
* [[https://wiki.fd.io/images/7/7b/Performance_Consideration_for_packet_processing_on_Intel_Architecture.pptx|Performance consideration for packet processing on Intel Architecture (ppt)]] | * [[https://wiki.fd.io/images/7/7b/Performance_Consideration_for_packet_processing_on_Intel_Architecture.pptx|Performance consideration for packet processing on Intel Architecture (ppt)]] |
| |
==== FreeBSD ==== | ==== FreeBSD ==== |
| |
Here are some benchs regarding network forwarding performance of FreeBSD (made by BSDRP team): | Here are some benchmarks regarding FreeBSD network forwarding performance, conducted by the BSDRP team: |
* AsiaBSDCon 2018 - Tuning FreeBSD for routing and firewalling ([[https://people.freebsd.org/~olivier/talks/2018_AsiaBSDCon_Tuning_FreeBSD_for_routing_and_firewalling-Paper.pdf|paper]],[[https://people.freebsd.org/~olivier/talks/2018_AsiaBSDCon_Tuning_FreeBSD_for_routing_and_firewalling-Slides.pdf|slides]] and [[https://www.youtube.com/watch?v=SLlzep0IxVY|video]]) | * AsiaBSDCon 2018 - Tuning FreeBSD for routing and firewalling ([[https://people.freebsd.org/~olivier/talks/2018_AsiaBSDCon_Tuning_FreeBSD_for_routing_and_firewalling-Paper.pdf|paper]],[[https://people.freebsd.org/~olivier/talks/2018_AsiaBSDCon_Tuning_FreeBSD_for_routing_and_firewalling-Slides.pdf|slides]] and [[https://www.youtube.com/watch?v=SLlzep0IxVY|video]]) |
* [[http://blog.cochard.me/2015/09/receipt-for-building-10mpps-freebsd.html|Recipe for building a 10Mpps FreeBSD based router]] | * [[http://blog.cochard.me/2015/09/receipt-for-building-10mpps-freebsd.html|Recipe for building a 10Mpps FreeBSD based router]] |
===== Bench lab ===== | ===== Bench lab ===== |
| |
The [[bench lab]] should permit to measure the pps. For obtaining accurate result the [[http://www.ietf.org/rfc/rfc2544.txt|RFC 2544 (Benchmarking Methodology for Network Interconnect Devices)]] is a good reference. If switches are used, they need to have proper configuration too, refers to the [[documentation:examples:setting_up_a_forwarding_performance_benchmark_lab|BSDRP performance lab]] for some examples. | The [[bench lab]] should be set to measure pps. For obtaining accurate results the [[http://www.ietf.org/rfc/rfc2544.txt|RFC 2544 (Benchmarking Methodology for Network Interconnect Devices)]] is a good reference. If switches are used, they need to have proper configuration, refers to the [[documentation:examples:setting_up_a_forwarding_performance_benchmark_lab|BSDRP performance lab]] for examples. |
| |
===== Tuning ===== | ===== Tuning ===== |
==== Literature ==== | ==== Literature ==== |
| |
Here is a list of sources about optimizing/analysis forwarding performance under FreeBSD. | Here is a list of sources for optimizing and analysis forwarding performance under FreeBSD. |
| |
How to bench or tune the network stack: | How to benchmark or tune the network stack: |
* [[http://wiki.freebsd.org/NetworkPerformanceTuning | FreeBSD Network Performance Tuning]]: What need to be done to tune networking stack | * [[http://wiki.freebsd.org/NetworkPerformanceTuning | FreeBSD Network Performance Tuning]]: What need to be done to tune networking stack |
* [[http://www.slideshare.net/brendangregg/meetbsd2014-performance-analysis | Brendan Gregg's Performance analysis presentation]]: The "must read" HOW TO | * [[http://www.slideshare.net/brendangregg/meetbsd2014-performance-analysis | Brendan Gregg's Performance analysis presentation]]: The "must read" HOW TO |
==== Multiple flows ==== | ==== Multiple flows ==== |
| |
Don't try to bench a router with only one flow (same source|destination address and same source|destination port): You need to generate multiples flows. | Do not try to benchmark a router with only one flow (same source and destination address, and same source and destination port): You need to generate multiples flows. |
Multi-queue NIC uses feature like [[https://en.wikipedia.org/wiki/Toeplitz_Hash_Algorithm|Toeplitz Hash Algorithm]] that balance multiples flows between all cores. Then generating only one flow will use only one NIC queue. | Multi-queue NIC uses feature like the [[https://en.wikipedia.org/wiki/Toeplitz_Hash_Algorithm|Toeplitz Hash Algorithm]] that balance multiples flows across all cores. Generating only one flow will use only a single NIC queue and core. |
| |
During your load, check that each queues are used with sysctl or with [[https://github.com/ocochard/BSDRP/blob/master/BSDRP/Files/usr/local/bin/nic-queue-usage|python script like this one]] that will display real-time usage of each queue. | During your load test, check that each queue is used with sysctl or a [[https://github.com/ocochard/BSDRP/blob/master/BSDRP/Files/usr/local/bin/nic-queue-usage|python script like this one]] that displays real-time queue usage. |
| |
On this example we can see that all flows are correctly shared between each 8 queues (about 340K paquets-per-seconds for each): | In this example, all flows are correctly shared between the 8 queues (about 340K paquets-per-seconds for each): |
<code> | <code> |
| |
| |
<note warning> | <note warning> |
Beware of configurations setup that prevent multi-queue, like GRE,GIF,IPSec tunnels or PPPoE (= same source/destination address). If PPPoE usage is mandatory on your Gigabit Internet link, using small hardware, like 4 cores AMD GX (PC Engines APU2), will prevent to reach Gigabit speed. | Beware of configurations that prevent multi-queueing, such as GRE, GIF, and IPSec tunnels or PPPoE (which use the same source/destination address). If you must use PPPoE usage on your Gigabit Internet link, using small hardware like a 4-cores AMD GX (PC Engines APU2) will prevent you from reaching Gigabit speed. |
</note> | </note> |
==== Choosing good Hardware ==== | |
| ==== Choosing hardware ==== |
=== CPU === | === CPU === |
| |
Avoid NUMA architecture but prefer a CPU in only one package with maximum core (8 or 16). | Avoid NUMA architecture and instead prefer a CPU in a single package with maximum number of cores. |
If you are using NUMA, you need to check that inbound/outbound NIC queues are correctly bind to their local package to avoid useless QPI crossing. | If you are using NUMA, you need to check that inbound and outbound NIC queues are correctly bound to their local domain to avoid unnecessary QPI crossing. |
| |
=== Network Interface Card === | === Network Interface Card === |
| |
Mellanox or Chelsio, by mixing good chipset and excellent drivers are an excellent choice. | Mellanox or Chelsio, which combine good chipsets and excellent drivers, are an excellent choice. |
| |
Intel seems to have problem for managing lot's of PPS (= IRQ). | Intel seems to have problems managing a large number of PPS (interrupts) and they developers team seems to lack FreeBSD developers. |
| |
Avoid "embedded" NIC into common Dell/HP servers like these one that are very bad regarding their maximum packets-per-second performance: | Avoid "embedded" NICs on common Dell/HP servers, as they perform very poorly in terms of maximum packets-per-second performance: |
* 10G Emulex OneConnect (be3) | * 10G Emulex OneConnect (be3) |
* 10G Broadcom NetXtreme II BCM57810 | * 10G Broadcom NetXtreme II BCM57810 |
| |
==== Choosing good FreeBSD release ==== | ==== Choosing the right FreeBSD release ==== |
| |
| Before tuning, you need to use the good FreeBSD version, which mean a recent FreeBSD (main branch advised). |
| |
Before tuning, you need to use the good FreeBSD version... this mean a recent FreeBSD -head. | BSDRP follows the FreeBSD main branch to strike a balance between recent features and stability (yes, it is quiet stable). |
| |
BSDRP is currently following FreeBSD 12-stable branch, to try to have a mix between recent features and stability. | ==== Disabling Hyper-Threading (on specific CPUs only) ==== |
==== Disabling Hyper Threading (HT) ==== | |
| |
By default a multi-queue NIC drivers create one queue per core. | By default, a multi-queue NIC drivers create one queue per core. |
But on older CPU (like Xeon E5-2650 V1) those logical cores didn't help at all for managing interrupts generated by high speed NIC (this is not true on 13-head since [[https://svnweb.freebsd.org/base?view=revision&revision=354338|r354338]] and the new machdep.hyperthreading_intr_allowed that allow interrupts on HTT logical CPUs. | However, on some older CPUs (like Xeon E5-2650 V1), these logical cores do not help at all with managing interrupts generated by a high-speed NIC. |
| |
HT can be disabled with this command: | HT can be disabled with this command: |
</code> | </code> |
| |
Here is an example on a Xeon E5 2650 (8c,16t) and 10G Chelsio NIC where it improve performance by disabling HT: | Here is an example on a Xeon E5 2650 (8c,16t) with a 10G Chelsio NIC, where disabling HT improve performance: |
| |
<code> | <code> |
There is a benefit of about 24% to disable hyper threading on this old CPU. | There is a benefit of about 24% to disable hyper threading on this old CPU. |
| |
But here is another example where there is a benefit to kept it enabled (and with the NIC configured to uses all the treads) on Xeon E5 2650L (10c, 20t): | However, here is another example on a Xeon E5 2650L (10c, 20t) where it is a beneficial to kept HT enabled and configure the NIC to use all threads: |
| |
<code> | <code> |
| |
==== fastforwarding ==== | ==== fastforwarding ==== |
| |
=== FreeBSD 10.3 or older === | |
| |
You should enable fastforwarding with a: | |
<code> | |
echo "net.inet.ip.fastforwarding=1" >> /etc/sysctl.conf | |
service sysctl restart | |
</code> | |
| |
=== FreeBSD 12.0 or newer === | === FreeBSD 12.0 or newer === |
==== Entropy harvest impact ==== | ==== Entropy harvest impact ==== |
| |
Lot's of tuning guide indicate to disable: | Many tuning guide suggest disabling: |
* kern.random.sys.harvest.ethernet | * kern.random.sys.harvest.ethernet |
* kern.random.sys.harvest.interrupt | * kern.random.sys.harvest.interrupt |
</code> | </code> |
| |
And we can notice on forwarding performance of a FreeBSD 11.1: | On a FreeBSD 11.1, we can see the impact on forwarding performance: |
| |
<code> | <code> |
{{documentation:technical_docs:entropy_source_impact.png|Impact of disabling some entropy source on FreeBSD forwarding performance}} | {{documentation:technical_docs:entropy_source_impact.png|Impact of disabling some entropy source on FreeBSD forwarding performance}} |
| |
==== Polling mode ==== | |
| |
Polling can be used in 2 cases: | |
* On **old hardware only** (where Ethernet card doesn't support **Intelligent interrupt management**), using the polling mode can improve performance by reducing CPU interrupt | |
* When used [[http://lists.freebsd.org/pipermail/freebsd-net/2013-May/035626.html|for usage in a Virtual Machine]] but don't forgot to [[https://lists.freebsd.org/pipermail/freebsd-net/2015-March/041657.html|overwrite the default HZ value in this case too]]. | |
For enabling polling mode: | |
- Edit /etc/rc.conf.misc and replace //polling_enable="NO"// by //polling_enable="YES"// | |
- Execute: service polling start | |
| |
=== NIC drivers compatibility matrix === | |
| |
BSDRP can use some special features on somes NIC: | |
* [[http://www.freebsd.org/cgi/man.cgi?query=polling|Ethernet device polling]] for high performance with Ethernet controllers that didn't include interrupt management feature or [[http://info.iet.unipi.it/~luigi/papers/20130520-rizzo-vm.pdf|for usage in a VM]]. | |
* [[http://www.freebsd.org/cgi/man.cgi?query=altq|ALTQ]] for queuing, but try to use [[http://www.freebsd.org/cgi/man.cgi?query=dummynet|dummynet]] in place | |
| |
And only theses devices support these modes: | |
| |
^ name ^ Description ^ Polling ^ ALTQ ^ | |
| ae | Attansic/Atheros L2 FastEthernet controller driver | no | yes | | |
| age | Attansic/Atheros L1 Gigabit Ethernet driver | no | yes | | |
| alc | Atheros AR813x/AR815x Gigabit/Fast Ethernet driver | no | yes | | |
| ale | Atheros AR8121/AR8113/AR8114 Gigabit/Fast Ethernet driver | no | yes | | |
| bce | Broadcom NetXtreme II (BCM5706/5708/5709/5716) PCI/PCIe Gigabit Ethernet adapter driver | no | yes | | |
| bfe | Broadcom BCM4401 Ethernet Device Driver | no | yes | | |
| bge | Broadcom BCM570x/5714/5721/5722/5750/5751/5752/5789 PCI Gigabit Ethernet adapter driver | yes | yes | | |
| cas | Sun Cassini/Cassini+ and National Semiconductor DP83065 Saturn Gigabit Ethernet driver | no | yes | | |
| cxgbe | Chelsio T4 and T5 based 40Gb, 10Gb, and 1Gb Ethernet adapter driver | no | yes | | |
| dc | DEC/Intel 21143 and clone 10/100 Ethernet driver | yes | yes | | |
| de | DEC DC21x4x Ethernet device driver | no | yes | | |
| ed | NE-2000 and WD-80x3 Ethernet driver | no | yes | | |
| em | Intel(R) PRO/1000 Gigabit Ethernet adapter driver | yes | yes | | |
| et | Agere ET1310 10/100/Gigabit Ethernet driver | no | yes | | |
| ep | Ethernet driver for 3Com Etherlink III (3c5x9) interfaces | no | yes | | |
| fxp | Intel EtherExpress PRO/100 Ethernet device driver | yes | yes | | |
| gem | ERI/GEM/GMAC Ethernet device driver | no | yes | | |
| hme | Sun Microelectronics STP2002-STQ Ethernet interfaces device driver | no | yes | | |
| igb | Intel(R) PRO/1000 PCI Express Gigabit Ethernet adapter driver | yes | needs IGB_LEGACY_TX | | |
| ixgb(e) | Intel(R) 10Gb Ethernet driver | yes | needs IGB_LEGACY_TX | | |
| jme | JMicron Gigabit/Fast Ethernet driver | no | yes | | |
| le | AMD Am7900 LANCE and Am79C9xx ILACC/PCnet Ethernet interface driver | no | yes | | |
| msk | Marvell/SysKonnect Yukon II Gigabit Ethernet adapter driver | no | yes | | |
| mxge | Myricom Myri10GE 10 Gigabit Ethernet adapter driver | no | yes | | |
| my | Myson Technology Ethernet PCI driver | no | yes | | |
| nfe | NVIDIA nForce MCP Ethernet driver | yes | yes | | |
| nge | National Semiconductor PCI Gigabit Ethernet adapter driver | yes | no | | |
| nve | NVIDIA nForce MCP Networking Adapter device driver | no | yes | | |
| qlxgb | QLogic 10 Gigabit Ethernet & CNA Adapter Driver | no | yes | | |
| re | RealTek 8139C+/8169/816xS/811xS/8101E PCI/PCIe Ethernet adapter driver | yes | yes | | |
| rl | RealTek 8129/8139 Fast Ethernet device driver | yes | yes | | |
| sf | Adaptec AIC‐6915 "Starfire" PCI Fast Ethernet adapter driver | yes | yes | | |
| sge | Silicon Integrated Systems SiS190/191 Fast/Gigabit Ethernet driver | no | yes | | |
| sis | SiS 900, SiS 7016 and NS DP83815/DP83816 Fast Ethernet device driver | yes | yes | | |
| sk | SysKonnect SK-984x and SK-982x PCI Gigabit Ethernet adapter driver | yes | yes | | |
| ste | Sundance Technologies ST201 Fast Ethernet device driver | no | yes | | |
| stge | Sundance/Tamarack TC9021 Gigabit Ethernet adapter driver | yes | yes | | |
| ti | Alteon Networks Tigon I and Tigon II Gigabit Ethernet driver | no | yes | | |
| txp | 3Com 3XP Typhoon/Sidewinder (3CR990) Ethernet interface | no | yes | | |
| vge | VIA Networking Technologies VT6122 PCI Gigabit Ethernet adapter driver | yes | yes | | |
| vr | VIA Technologies Rhine I/II/III Ethernet device driver | yes | yes | | |
| xl | 3Com Etherlink XL and Fast Etherlink XL Ethernet device driver | yes | yes | | |
| |
Using others NIC will works too :-) | |
==== NIC drivers tuning ==== | ==== NIC drivers tuning ==== |
| |
| |
Now you can display the most time consumed process with: | Now you can display the most time consumed process with: |
| * AMD: ls_not_halted_cyc |
| * Intel: cpu_clk_unhalted.thread_p |
| * ARM: CPU_CYCLES |
<code> | <code> |
pmcstat -TS inst_retired.any_p -w1 | pmcstat -TS cpu_clk_unhalted.thread_p -w1 |
</code> | </code> |
| |