User Tools

Site Tools


documentation:technical_docs:performance

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
documentation:technical_docs:performance [2025/08/15 15:30] – [Choosing good FreeBSD release] olivierdocumentation:technical_docs:performance [2025/08/15 15:52] (current) olivier
Line 1: Line 1:
-====== FreeBSD forwarding Performance ======+====== FreeBSD Forwarding Performance ======
 {{description>Tips and information about FreeBSD forwarding performance}} {{description>Tips and information about FreeBSD forwarding performance}}
-There are lot'of guide about [[http://serverfault.com/questions/64356/freebsd-performance-tuning-sysctls-loader-conf-kernel|tuning FreeBSD TCP performance]] (where the FreeBSD host is an end-point of the TCP session), but it's not the same that tunig forwarding performance (where the FreeBSD host don't have to read the TCP information of the packet being forwarded) or firewalling performance.+There are lots of guides about [[http://serverfault.com/questions/64356/freebsd-performance-tuning-sysctls-loader-conf-kernel|tuning FreeBSD TCP performance]] (where the FreeBSD host is an endpoint of the TCP session), but it's not the same as tuning forwarding (where the FreeBSD host does not need to read the TCP information of the packets being forwarded) or firewalling performance.
  
 ===== Concepts ===== ===== Concepts =====
Line 7: Line 7:
 ==== How to bench a router ==== ==== How to bench a router ====
  
-Benchmarking a router **is not** measuring the maximum bandwidth crossing the router, but it's about measuring the network throughput (in packets-per-second unit):+Benchmarking a router **is not** about measuring the maximum bandwidth crossing the routerit's about measuring the network throughput in packets-per-second (pps):
   * [[http://www.ietf.org/rfc/rfc1242.txt|RFC1242: Benchmarking Terminology for Network Interconnection Devices]]   * [[http://www.ietf.org/rfc/rfc1242.txt|RFC1242: Benchmarking Terminology for Network Interconnection Devices]]
   * [[http://www.ietf.org/rfc/rfc2544.txt|RFC2544: Benchmarking Methodology for Network Interconnect Devices]]   * [[http://www.ietf.org/rfc/rfc2544.txt|RFC2544: Benchmarking Methodology for Network Interconnect Devices]]
Line 15: Line 15:
 ==== Definition ==== ==== Definition ====
  
-Clear definition regarding some relations between the bandwidth and frame rate is mandatory:+A clear definition of the relationship between bandwidth and frame rate is necessary:
   * [[http://www.cisco.com/web/about/security/intelligence/network_performance_metrics.html|Bandwidth, Packets Per Second, and Other Network Performance Metrics]]: The relationship of bandwidth and packet forwarding rate   * [[http://www.cisco.com/web/about/security/intelligence/network_performance_metrics.html|Bandwidth, Packets Per Second, and Other Network Performance Metrics]]: The relationship of bandwidth and packet forwarding rate
   * [[http://wiki.networksecuritytoolkit.org/nstwiki/index.php/LAN_Ethernet_Maximum_Rates,_Generation,_Capturing_%26_Monitoring|LAN Ethernet Maximum Rates, Generation, Capturing & Monitoring]] : Give another good explanation of the Ethernet maximum rates   * [[http://wiki.networksecuritytoolkit.org/nstwiki/index.php/LAN_Ethernet_Maximum_Rates,_Generation,_Capturing_%26_Monitoring|LAN Ethernet Maximum Rates, Generation, Capturing & Monitoring]] : Give another good explanation of the Ethernet maximum rates
Line 28: Line 28:
   * [[http://data.guug.de/slides/lk2008/10G_preso_lk2008.pdf|Towards 10Gb/s open-source routing]] (2008): Include an hardware comparison between a "real" router and a PC.   * [[http://data.guug.de/slides/lk2008/10G_preso_lk2008.pdf|Towards 10Gb/s open-source routing]] (2008): Include an hardware comparison between a "real" router and a PC.
   * [[https://wiki.fd.io/images/7/7b/Performance_Consideration_for_packet_processing_on_Intel_Architecture.pptx|Performance consideration for packet processing on Intel Architecture (ppt)]]   * [[https://wiki.fd.io/images/7/7b/Performance_Consideration_for_packet_processing_on_Intel_Architecture.pptx|Performance consideration for packet processing on Intel Architecture (ppt)]]
 +
 ==== FreeBSD ==== ==== FreeBSD ====
  
-Here are some benchs regarding network forwarding performance of FreeBSD (made by BSDRP team):+Here are some benchmarks regarding FreeBSD network forwarding performance, conducted by the BSDRP team:
   * AsiaBSDCon 2018 - Tuning FreeBSD for routing and firewalling ([[https://people.freebsd.org/~olivier/talks/2018_AsiaBSDCon_Tuning_FreeBSD_for_routing_and_firewalling-Paper.pdf|paper]],[[https://people.freebsd.org/~olivier/talks/2018_AsiaBSDCon_Tuning_FreeBSD_for_routing_and_firewalling-Slides.pdf|slides]] and [[https://www.youtube.com/watch?v=SLlzep0IxVY|video]])   * AsiaBSDCon 2018 - Tuning FreeBSD for routing and firewalling ([[https://people.freebsd.org/~olivier/talks/2018_AsiaBSDCon_Tuning_FreeBSD_for_routing_and_firewalling-Paper.pdf|paper]],[[https://people.freebsd.org/~olivier/talks/2018_AsiaBSDCon_Tuning_FreeBSD_for_routing_and_firewalling-Slides.pdf|slides]] and [[https://www.youtube.com/watch?v=SLlzep0IxVY|video]])
   * [[http://blog.cochard.me/2015/09/receipt-for-building-10mpps-freebsd.html|Recipe for building a 10Mpps FreeBSD based router]]   * [[http://blog.cochard.me/2015/09/receipt-for-building-10mpps-freebsd.html|Recipe for building a 10Mpps FreeBSD based router]]
Line 40: Line 41:
 ===== Bench lab ===== ===== Bench lab =====
  
-The [[bench lab]] should permit to measure the pps. For obtaining accurate result the [[http://www.ietf.org/rfc/rfc2544.txt|RFC 2544 (Benchmarking Methodology for Network Interconnect Devices)]] is a good reference. If switches are used, they need to have proper configuration too, refers to the [[documentation:examples:setting_up_a_forwarding_performance_benchmark_lab|BSDRP performance lab]] for some examples.+The [[bench lab]] should be set to measure pps. For obtaining accurate results the [[http://www.ietf.org/rfc/rfc2544.txt|RFC 2544 (Benchmarking Methodology for Network Interconnect Devices)]] is a good reference. If switches are used, they need to have proper configuration, refers to the [[documentation:examples:setting_up_a_forwarding_performance_benchmark_lab|BSDRP performance lab]] for examples.
  
 ===== Tuning ===== ===== Tuning =====
Line 46: Line 47:
 ==== Literature ==== ==== Literature ====
  
-Here is a list of sources about optimizing/analysis forwarding performance under FreeBSD.+Here is a list of sources for optimizing and analysis forwarding performance under FreeBSD.
  
-How to bench or tune the network stack:+How to benchmark or tune the network stack:
   * [[http://wiki.freebsd.org/NetworkPerformanceTuning |  FreeBSD Network Performance Tuning]]: What need to be done to tune networking stack   * [[http://wiki.freebsd.org/NetworkPerformanceTuning |  FreeBSD Network Performance Tuning]]: What need to be done to tune networking stack
   * [[http://www.slideshare.net/brendangregg/meetbsd2014-performance-analysis | Brendan Gregg's Performance analysis presentation]]: The "must read" HOW TO   * [[http://www.slideshare.net/brendangregg/meetbsd2014-performance-analysis | Brendan Gregg's Performance analysis presentation]]: The "must read" HOW TO
Line 67: Line 68:
 ==== Multiple flows ==== ==== Multiple flows ====
  
-Don'try to bench a router with only one flow (same source|destination address and same source|destination port): You need to generate multiples flows. +Do not try to benchmark a router with only one flow (same source and destination addressand same source and destination port): You need to generate multiples flows. 
-Multi-queue NIC uses feature like [[https://en.wikipedia.org/wiki/Toeplitz_Hash_Algorithm|Toeplitz Hash Algorithm]] that balance multiples flows between all cores. Then generating only one flow will use only one NIC queue.+Multi-queue NIC uses feature like the [[https://en.wikipedia.org/wiki/Toeplitz_Hash_Algorithm|Toeplitz Hash Algorithm]] that balance multiples flows across all cores. Generating only one flow will use only a single NIC queue and core.
  
-During your load, check that each queues are used with sysctl or with [[https://github.com/ocochard/BSDRP/blob/master/BSDRP/Files/usr/local/bin/nic-queue-usage|python script like this one]] that will display real-time usage of each queue.+During your load test, check that each queue is used with sysctl or a  [[https://github.com/ocochard/BSDRP/blob/master/BSDRP/Files/usr/local/bin/nic-queue-usage|python script like this one]] that displays real-time queue usage.
  
-On this example we can see that all flows are correctly shared between each 8 queues (about 340K paquets-per-seconds for each):+In this exampleall flows are correctly shared between the 8 queues (about 340K paquets-per-seconds for each):
 <code> <code>
  
Line 85: Line 86:
  
 <note warning> <note warning>
-Beware of configurations setup that prevent multi-queuelike GRE,GIF,IPSec tunnels or PPPoE (same source/destination address). If PPPoE usage is mandatory on your Gigabit Internet link, using small hardwarelike 4 cores AMD GX (PC Engines APU2)will prevent to reach Gigabit speed.+Beware of configurations that prevent multi-queueingsuch as GRE, GIF, and IPSec tunnels or PPPoE (which use the same source/destination address). If you must use PPPoE usage on your Gigabit Internet link, using small hardware like 4-cores AMD GX (PC Engines APU2) will prevent you from reaching Gigabit speed.
 </note> </note>
 +
 ==== Choosing hardware ==== ==== Choosing hardware ====
 === CPU === === CPU ===
  
-Avoid NUMA architecture but prefer a CPU in only one package with maximum core (8 or 16)+Avoid NUMA architecture and instead prefer a CPU in a single package with maximum number of cores
-If you are using NUMA, you need to check that inbound/outbound NIC queues are correctly bind to their local domain to avoid useless QPI crossing.+If you are using NUMA, you need to check that inbound and outbound NIC queues are correctly bound to their local domain to avoid unnecessary QPI crossing.
  
 === Network Interface Card === === Network Interface Card ===
  
-Mellanox or Chelsio, by mixing good chipset and excellent drivers are an excellent choice.+Mellanox or Chelsio, which combine good chipsets and excellent driversare an excellent choice.
  
-Intel seems to have problem for managing lot'of PPS (= IRQ).+Intel seems to have problems managing a large number of PPS (interruptsand they developers team seems to lack FreeBSD developers.
  
-Avoid "embedded" NIC into common Dell/HP servers like these one that are very bad regarding their maximum packets-per-second performance:+Avoid "embedded" NICs on common Dell/HP servers, as they perform very poorly in terms of maximum packets-per-second performance:
   * 10G Emulex OneConnect (be3)   * 10G Emulex OneConnect (be3)
   * 10G Broadcom NetXtreme II BCM57810   * 10G Broadcom NetXtreme II BCM57810
  
-==== Choosing good FreeBSD release ====+==== Choosing the right FreeBSD release ====
  
-Before tuning, you need to use the good FreeBSD version... this mean a recent FreeBSD -head.+Before tuning, you need to use the good FreeBSD version, which mean a recent FreeBSD (main branch advised).
  
-BSDRP is following FreeBSD main branchto try to have mix between recent features and stability+BSDRP follows the FreeBSD main branch to strike balance between recent features and stability (yes, it is quiet stable).
-==== Disabling Hyper Threading (on specific CPU only====+
  
-By default a multi-queue NIC drivers create one queue per core. +==== Disabling Hyper-Threading (on specific CPUs only) ==== 
-But on some older CPU (like Xeon E5-2650 V1) those logical cores didn'help at all for managing interrupts generated by high speed NIC.+ 
 +By defaulta multi-queue NIC drivers create one queue per core. 
 +However, on some older CPUs (like Xeon E5-2650 V1),  these logical cores do not help at all with managing interrupts generated by high-speed NIC.
  
 HT can be disabled with this command: HT can be disabled with this command:
Line 118: Line 121:
 </code> </code>
  
-Here is an example on a Xeon E5 2650 (8c,16t) and 10G Chelsio NIC where it improve performance by disabling HT:+Here is an example on a Xeon E5 2650 (8c,16t) with a 10G Chelsio NICwhere disabling HT improve performance:
  
 <code> <code>
Line 147: Line 150:
 There is a benefit of about 24% to disable hyper threading on this old CPU. There is a benefit of about 24% to disable hyper threading on this old CPU.
  
-But here is another example where there is a benefit to kept it enabled (and with the NIC configured to uses all the treads) on Xeon E5 2650L (10c, 20t):+However, here is another example on a Xeon E5 2650L (10c, 20t) where it is a beneficial to kept HT enabled and configure the NIC to use all threads:
  
 <code> <code>
Line 173: Line 176:
  
 ==== fastforwarding ==== ==== fastforwarding ====
- 
-=== FreeBSD 10.3 or older === 
- 
-You should enable fastforwarding with a: 
-<code> 
-echo "net.inet.ip.fastforwarding=1" >> /etc/sysctl.conf 
-service sysctl restart 
-</code> 
  
 === FreeBSD 12.0 or newer === === FreeBSD 12.0 or newer ===
Line 193: Line 188:
 ==== Entropy harvest impact ==== ==== Entropy harvest impact ====
  
-Lot's of tuning guide indicate to disable:+Many tuning guide suggest disabling:
   * kern.random.sys.harvest.ethernet   * kern.random.sys.harvest.ethernet
   * kern.random.sys.harvest.interrupt   * kern.random.sys.harvest.interrupt
Line 211: Line 206:
 </code> </code>
  
-And we can notice on forwarding performance of a FreeBSD 11.1:+On a FreeBSD 11.1, we can see the impact on forwarding performance:
  
 <code> <code>
documentation/technical_docs/performance.txt · Last modified: 2025/08/15 15:52 by olivier

Except where otherwise noted, content on this wiki is licensed under the following license: BSD 2-Clause
Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki