User Tools

Site Tools


documentation:technical_docs:performance

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
documentation:technical_docs:performance [2019/06/27 01:37] – [Where is the bottleneck ?] olivierdocumentation:technical_docs:performance [2020/01/18 01:02] – [Polling mode] olivier
Line 27: Line 27:
   * [[http://www.telematica.polito.it/oldsite/courmayeur06/papers/06-A.2.1.pdf|RFC2544 Performance Evaluation for a Linux Based Open Router]] (2006, June)   * [[http://www.telematica.polito.it/oldsite/courmayeur06/papers/06-A.2.1.pdf|RFC2544 Performance Evaluation for a Linux Based Open Router]] (2006, June)
   * [[http://data.guug.de/slides/lk2008/10G_preso_lk2008.pdf|Towards 10Gb/s open-source routing]] (2008): Include an hardware comparison between a "real" router and a PC.   * [[http://data.guug.de/slides/lk2008/10G_preso_lk2008.pdf|Towards 10Gb/s open-source routing]] (2008): Include an hardware comparison between a "real" router and a PC.
 +  * [[https://wiki.fd.io/images/7/7b/Performance_Consideration_for_packet_processing_on_Intel_Architecture.pptx|Performance consideration for packet processing on Intel Architecture (ppt)]]
 ==== FreeBSD ==== ==== FreeBSD ====
  
Line 90: Line 91:
  
 Avoid NUMA architecture but prefer a CPU in only one package with maximum core (8 or 16). Avoid NUMA architecture but prefer a CPU in only one package with maximum core (8 or 16).
-If you are using NUMA, check that inbound/outbound NIC queues are correctly mapping to the same package.+If you are using NUMA, you need to check that inbound/outbound NIC queues are correctly bind to their local package to avoid useless QPI crossing.
  
 === Network Interface Card === === Network Interface Card ===
Line 104: Line 105:
 ==== Choosing good FreeBSD release ==== ==== Choosing good FreeBSD release ====
  
-Before tuning, you need to use the good FreeBSD version. +Before tuning, you need to use the good FreeBSD version... this mean a recent FreeBSD -head.
-This mean a FreeBSD -head version older than r309257 (Andrey V. Elsukov 's improvement: Rework ip_tryforward() to use FIB4 KPI) backported to FreeBSD 11-stable r310771 (MFC to stable).+
  
-BSDRP since version 1.70 is using a FreeBSD 11-stable (r312663that includes this improvement.+BSDRP is currently following FreeBSD 12-stable branch, to try to have a mix between recent features and stability. 
 +==== Disabling Hyper Threading (on specific CPU only====
  
-{{documentation:technical_docs:2016-performance-evolution.png|2016 Forwarding performance evolution of FreeBSD -head on a 8 core Atom}} +By default multi-queue NIC drivers create one queue per core. 
- +But on some older CPU (like Xeon E5-2650 V1) those logical cores didn't help at all for managing interrupts generated by high speed NIC.
-For better (and linear scale) performance there is the [[https://svnweb.freebsd.org/base/projects/routing/|projects/routing]] too [[http://blog.cochard.me/2015/09/receipt-for-building-10mpps-freebsd.html|that still give better performance]]. +
- +
-==== Disabling Hyper Threading ==== +
- +
-Disable Hyper Threading (HT): By default, lot's of multi-queue NIC drivers create one queue per core. +
-But "logicalcores didn't help at all for managing interrupts generated by high speed NIC.+
  
 HT can be disabled with this command: HT can be disabled with this command:
Line 123: Line 118:
 </code> </code>
  
-Here is an example on a 8cores x hardware threads Intel CPU and 10G Chelsio NIC:+Here is an example on a Xeon E5 2650 (8c,16t) and 10G Chelsio NIC where it improve performance by disabling HT:
  
 <code> <code>
-x HT-enabled-8rxq(default).packets-per-seconds +x HT-enabled-8rxq(default): inet packets-per-second forwarded 
-+ HT-enabled-16rxq.packets-per-seconds ++ HT-enabled-16rxq: inet packets-per-second forwarded 
-* HT-disabled.packets-per-seconds+* HT-disabled-8rxq: inet packets-per-seconds forwarded
 +--------------------------------------------------------------------------+ +--------------------------------------------------------------------------+
 |                                                                        **| |                                                                        **|
Line 150: Line 145:
 </code> </code>
  
-There is a benefit of about 24% to disable hyper threading.+There is a benefit of about 24% to disable hyper threading on this old CPU. 
 + 
 +But here is another example where there is a benefit to kept it enabled (and with the NIC configured to uses all the treads) on Xeon E5 2650L (10c, 20t): 
 + 
 +<code> 
 +x HT on, 8q (default): inet4 packets-per-second forwarded 
 ++ HT off, 8q: inet4 packets-per-second forwarded 
 +* HT on, 16q: inet4 packets-per-second forwarded 
 ++--------------------------------------------------------------------------+ 
 +|x x              ++                                                    *| 
 +|x xx            +++                                                 * *  *| 
 +||AM|            |A_|                                                |_MA_|| 
 ++--------------------------------------------------------------------------+ 
 +    N           Min           Max        Median           Avg        Stddev 
 +x         4265579     4433699.5     4409249.5     4359580.3       81559.4 
 ++         5257621       5443012       5372493     5372693.5     73316.243 
 +Difference at 95.0% confidence 
 +        1.01311e+06 +/- 113098 
 +        23.2388% +/- 2.94299% 
 +        (Student's t, pooled s = 77547.4) 
 +*         8566972       8917315     8734750.5     8769616.1     147186.74 
 +Difference at 95.0% confidence 
 +        4.41004e+06 +/- 173536 
 +        101.157% +/- 5.21388% 
 +        (Student's t, pooled s = 118987) 
 +</code>
  
 ==== fastforwarding ==== ==== fastforwarding ====
Line 235: Line 255:
 {{documentation:technical_docs:entropy_source_impact.png|Impact of disabling some entropy source on FreeBSD forwarding performance}} {{documentation:technical_docs:entropy_source_impact.png|Impact of disabling some entropy source on FreeBSD forwarding performance}}
  
-==== Polling mode ====+==== Polling mode (very old hardware) ====
  
 Polling can be used in 2 cases: Polling can be used in 2 cases:
Line 746: Line 766:
 </code> </code>
  
-On this case the bootleneck is just the network stack.+On this case the bootleneck is just the network stack (most of the time spend into function ip_findroute called by ip_tryforward).
  
 == CPU cycles spent == == CPU cycles spent ==
Line 764: Line 784:
 less /data/pmc.stacks less /data/pmc.stacks
 </code> </code>
 +
 +=== Lock contention source ===
 +
 +To identifying lock contention source (like if function lock_delay or __mtx_lock_sleep was quite high from the pcm output), you can try to search which lock is contended and why with lockstat.
 +
 +You can generate 2 output:
 +  * contented locks broken down by type: <code>lockstat -x aggsize=4m sleep 10 > lock-type.txt</code>
 +  * stacks associated with the lock contention to identify the source: <code>lockstat -x aggsize=4m -s 10 sleep 10 > lock-stacks.txt </code>
documentation/technical_docs/performance.txt · Last modified: 2020/01/18 01:04 by olivier

Except where otherwise noted, content on this wiki is licensed under the following license: BSD 2-Clause
Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki