User Tools

Site Tools


documentation:technical_docs:performance

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revisionBoth sides next revision
documentation:technical_docs:performance [2019/03/26 00:22] – external edit 127.0.0.1documentation:technical_docs:performance [2020/01/18 01:02] – [Polling mode] olivier
Line 27: Line 27:
   * [[http://www.telematica.polito.it/oldsite/courmayeur06/papers/06-A.2.1.pdf|RFC2544 Performance Evaluation for a Linux Based Open Router]] (2006, June)   * [[http://www.telematica.polito.it/oldsite/courmayeur06/papers/06-A.2.1.pdf|RFC2544 Performance Evaluation for a Linux Based Open Router]] (2006, June)
   * [[http://data.guug.de/slides/lk2008/10G_preso_lk2008.pdf|Towards 10Gb/s open-source routing]] (2008): Include an hardware comparison between a "real" router and a PC.   * [[http://data.guug.de/slides/lk2008/10G_preso_lk2008.pdf|Towards 10Gb/s open-source routing]] (2008): Include an hardware comparison between a "real" router and a PC.
 +  * [[https://wiki.fd.io/images/7/7b/Performance_Consideration_for_packet_processing_on_Intel_Architecture.pptx|Performance consideration for packet processing on Intel Architecture (ppt)]]
 ==== FreeBSD ==== ==== FreeBSD ====
  
Line 90: Line 91:
  
 Avoid NUMA architecture but prefer a CPU in only one package with maximum core (8 or 16). Avoid NUMA architecture but prefer a CPU in only one package with maximum core (8 or 16).
-If you are using NUMA, check that inbound/outbound NIC queues are correctly mapping to the same package.+If you are using NUMA, you need to check that inbound/outbound NIC queues are correctly bind to their local package to avoid useless QPI crossing.
  
 === Network Interface Card === === Network Interface Card ===
Line 104: Line 105:
 ==== Choosing good FreeBSD release ==== ==== Choosing good FreeBSD release ====
  
-Before tuning, you need to use the good FreeBSD version. +Before tuning, you need to use the good FreeBSD version... this mean a recent FreeBSD -head.
-This mean a FreeBSD -head version older than r309257 (Andrey V. Elsukov 's improvement: Rework ip_tryforward() to use FIB4 KPI) backported to FreeBSD 11-stable r310771 (MFC to stable).+
  
-BSDRP since version 1.70 is using a FreeBSD 11-stable (r312663that includes this improvement.+BSDRP is currently following FreeBSD 12-stable branch, to try to have a mix between recent features and stability. 
 +==== Disabling Hyper Threading (on specific CPU only====
  
-{{documentation:technical_docs:2016-performance-evolution.png|2016 Forwarding performance evolution of FreeBSD -head on a 8 core Atom}} +By default multi-queue NIC drivers create one queue per core. 
- +But on some older CPU (like Xeon E5-2650 V1) those logical cores didn't help at all for managing interrupts generated by high speed NIC.
-For better (and linear scale) performance there is the [[https://svnweb.freebsd.org/base/projects/routing/|projects/routing]] too [[http://blog.cochard.me/2015/09/receipt-for-building-10mpps-freebsd.html|that still give better performance]]. +
- +
-==== Disabling Hyper Threading ==== +
- +
-Disable Hyper Threading (HT): By default, lot's of multi-queue NIC drivers create one queue per core. +
-But "logicalcores didn't help at all for managing interrupts generated by high speed NIC.+
  
 HT can be disabled with this command: HT can be disabled with this command:
Line 123: Line 118:
 </code> </code>
  
-Here is an example on a 8cores x hardware threads Intel CPU and 10G Chelsio NIC:+Here is an example on a Xeon E5 2650 (8c,16t) and 10G Chelsio NIC where it improve performance by disabling HT:
  
 <code> <code>
-x HT-enabled-8rxq(default).packets-per-seconds +x HT-enabled-8rxq(default): inet packets-per-second forwarded 
-+ HT-enabled-16rxq.packets-per-seconds ++ HT-enabled-16rxq: inet packets-per-second forwarded 
-* HT-disabled.packets-per-seconds+* HT-disabled-8rxq: inet packets-per-seconds forwarded
 +--------------------------------------------------------------------------+ +--------------------------------------------------------------------------+
 |                                                                        **| |                                                                        **|
Line 150: Line 145:
 </code> </code>
  
-There is a benefit of about 24% to disable hyper threading.+There is a benefit of about 24% to disable hyper threading on this old CPU. 
 + 
 +But here is another example where there is a benefit to kept it enabled (and with the NIC configured to uses all the treads) on Xeon E5 2650L (10c, 20t): 
 + 
 +<code> 
 +x HT on, 8q (default): inet4 packets-per-second forwarded 
 ++ HT off, 8q: inet4 packets-per-second forwarded 
 +* HT on, 16q: inet4 packets-per-second forwarded 
 ++--------------------------------------------------------------------------+ 
 +|x x              ++                                                    *| 
 +|x xx            +++                                                 * *  *| 
 +||AM|            |A_|                                                |_MA_|| 
 ++--------------------------------------------------------------------------+ 
 +    N           Min           Max        Median           Avg        Stddev 
 +x         4265579     4433699.5     4409249.5     4359580.3       81559.4 
 ++         5257621       5443012       5372493     5372693.5     73316.243 
 +Difference at 95.0% confidence 
 +        1.01311e+06 +/- 113098 
 +        23.2388% +/- 2.94299% 
 +        (Student's t, pooled s = 77547.4) 
 +*         8566972       8917315     8734750.5     8769616.1     147186.74 
 +Difference at 95.0% confidence 
 +        4.41004e+06 +/- 173536 
 +        101.157% +/- 5.21388% 
 +        (Student's t, pooled s = 118987) 
 +</code>
  
 ==== fastforwarding ==== ==== fastforwarding ====
Line 235: Line 255:
 {{documentation:technical_docs:entropy_source_impact.png|Impact of disabling some entropy source on FreeBSD forwarding performance}} {{documentation:technical_docs:entropy_source_impact.png|Impact of disabling some entropy source on FreeBSD forwarding performance}}
  
-==== Polling mode ====+==== Polling mode (very old hardware) ====
  
 Polling can be used in 2 cases: Polling can be used in 2 cases:
Line 746: Line 766:
 </code> </code>
  
-On this case the bootleneck is just the network stack.+On this case the bootleneck is just the network stack (most of the time spend into function ip_findroute called by ip_tryforward).
  
 == CPU cycles spent == == CPU cycles spent ==
Line 761: Line 781:
 <code> <code>
 pmcstat -z 50 -S cpu_clk_unhalted.thread -l 20 -O /data/pmc.out pmcstat -z 50 -S cpu_clk_unhalted.thread -l 20 -O /data/pmc.out
 +pmcstat -R /data/pmc.out -z50 -G /data/pmc.stacks
 +less /data/pmc.stacks
 </code> </code>
  
-Then analyses the output with: +=== Lock contention source === 
-<code> + 
-fetch http://BSDRP-release-debug +To identifying lock contention source (like if function lock_delay or __mtx_lock_sleep was quite high from the pcm output), you can try to search which lock is contended and why with lockstat. 
-tar xzfv BSDRP-release-debug.tar.xz + 
-pmcannotate /data/pmc.out /data/debug/boot/kernel/kernel.symbols +You can generate 2 output
-</code>+  * contented locks broken down by type: <code>lockstat -x aggsize=4m sleep 10 > lock-type.txt</code> 
 +  * stacks associated with the lock contention to identify the source: <code>lockstat -x aggsize=4m -s 10 sleep 10 > lock-stacks.txt </code>
documentation/technical_docs/performance.txt · Last modified: 2020/01/18 01:04 by olivier

Except where otherwise noted, content on this wiki is licensed under the following license: BSD 2-Clause
Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki