User Tools

Site Tools


documentation:technical_docs:performance

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

documentation:technical_docs:performance [2012/04/30 15:28]
documentation:technical_docs:performance [2013/05/17 14:13] (current)
olivier [Guides]
Line 1: Line 1:
 +====== FreeBSD forwarding Performance ======
 +{{description>​Tips and information about FreeBSD forwarding performance}}
 +There are lot's of guide about [[http://​serverfault.com/​questions/​64356/​freebsd-performance-tuning-sysctls-loader-conf-kernel|tuning FreeBSD TCP performance]] (where the FreeBSD host is an end-point of the TCP session), but it's not the same that tunig forwarding performance (where the FreeBSD host don't have to read the TCP information of the packet being forwarded).
 +
 +===== Concept =====
 +
 +==== How to bench a router ====
 +
 +Benchmarking a router is not measuring the maximum bandwidth crossing the router:
 +  * [[http://​www.ietf.org/​rfc/​rfc2544.txt|RFC2544:​ Benchmarking Methodology for Network Interconnect Devices]]
 +  * [[http://​tools.ietf.org/​html/​rfc3222|RFC3222:​ Terminology for Forwarding Information Base (FIB) based Router Performance]]
 +
 +==== Definition ====
 +
 +Clear definition regarding some relations is mandatory:
 +  * [[http://​www.cisco.com/​web/​about/​security/​intelligence/​network_performance_metrics.html|Bandwidth,​ Packets Per Second, and Other Network Performance Metrics]]
 +
 +===== Benchmarks =====
 +
 +==== Cisco or Linux ====
 +
 +  * [[http://​www.cisco.com/​web/​partners/​downloads/​765/​tools/​quickreference/​routerperformance.pdf|Routing performance of Cisco routers]] (PDF)
 +  * [[http://​www.telematica.polito.it/​oldsite/​courmayeur06/​papers/​06-A.2.1.pdf|RFC2544 Performance Evaluation for a Linux Based Open Router]]
 +  * [[http://​data.guug.de/​slides/​lk2008/​10G_preso_lk2008.pdf|Towards 10Gb/s open-source routing]]:​ Include an hardware comparison between a "​real"​ router and a PC.
 +==== FreeBSD ====
 +
 +Here are some benchs regarding network forwarding performance of FreeBSD:
 +  * [[http://​lists.freebsd.org/​pipermail/​freebsd-net/​2012-July/​032832.html|FreeBSD as 10 Giagbit router-on-a-stick]]:​ About 1Mpps, this thread have lot's of very useful tips.
 +  * [[http://​www.net.t-labs.tu-berlin.de/​papers/​SWF-PCCH10GEE-07.pdf|Packet capture in 10-Gigabit environments using Contemporary Commodity Hardware
 +]] (pdf)
 +
 +===== Bench lab =====
 +
 +The bench lab should permit to measure the pps. For obtaining accurate result the [[http://​www.ietf.org/​rfc/​rfc2544.txt|RFC 2544 (Benchmarking Methodology for Network Interconnect Devices)]] is a good reference.
 +
 +==== Packet generator ====
 +
 +A packet generator
 +  * [[http://​wiki.networksecuritytoolkit.org/​nstwiki/​index.php/​LAN_Ethernet_Maximum_Rates,​_Generation,​_Capturing_%26_Monitoring|LAN Ethernet Maximum Rates, Generation, Capturing & Monitoring]] ... on GNU/Linux
 +  * pkt-gen from the netmap suite
 +
 +===== Tuning FreeBSD =====
 +
 +==== Guides ====
 +
 +Here is a list of sources about optimizing forwarding performance under FreeBSD.
 +
 +How to bench or tune the network stack:
 +  * [[http://​wiki.freebsd.org/​NetworkPerformanceTuning |  FreeBSD Network Performance Tuning]]: What need to be done to tune networking stack
 +  * [[https://​calomel.org/​network_performance.html | Calomel.org advice for 10Giga tunning]]: A simple and rapid guide
 +  * [[http://​www.freebsd.org/​projects/​netperf/​index.html|FreeBSD Network Performance Project (netperf)]]
 +  * [[http://​www.watson.org/​~robert/​freebsd/​netperf/​20051027-eurobsdcon2005-netperf.pdf|Introduction to Multithreading and Multiprocessing in the FreeBSD SMPng Network Stack]], EuroBSDCon 2005 (PDF)
 +  * [[http://​www.freebsd.org/​cgi/​man.cgi?​query=tuning&​apropos=0&​sektion=0&​manpath=FreeBSD+8.2-RELEASE&​arch=default&​format=html|man tuning]] : performance tuning under FreeBSD
 +  * [[http://​wwwx.cs.unc.edu/​~krishnan/​classes/​spring_07/​os_impl/​report.pdf|Improving Memory and Interrupt Processing in FreeBSD Network Stack]] (PDF)
 +  * [[http://​conferences.sigcomm.org/​sigcomm/​2009/​workshops/​presto/​papers/​p37.pdf|Optimizing the BSD Routing System for Parallel Processing]] (PDF)
 +  * [[https://​people.sunyit.edu/​~sengupta/​CSC521/​systemperformance.ppt|Using netstat and vmstat for performance analysis]] (Powerpoint))
 +  * [[http://​www.freebsd.org/​cgi/​man.cgi?​query=polling&​sektion=4|polling man page]] (Warning: enabling polling is not a good idea with the new generation of Ethernet controller that include interruption control)
 +  * [[http://​info.iet.unipi.it/​~luigi/​polling/​|Device Polling support for FreeBSD ]], the original presentation of polling implementation
 +  * [[http://​www.freebsd.org/​doc/​en_US.ISO8859-1/​books/​handbook/​configtuning-kernel-limits.html|Tuning Kernel Limits]] on the FreeBSD Handbook
 +
 +FreeBSD Experimental high-performance network stacks:
 +  * [[http://​info.iet.unipi.it/​~luigi/​netmap/​|Netmap - memory mapping of network devices]] //"​(...)a single core running at 1.33GHz can generate the 14.8Mpps that saturate a 10GigE interface."//​
 +
 +==== Where is the bottleneck ? ====
 +
 +Tools:
 +  * [[http://​www.freebsd.org/​cgi/​man.cgi?​query=netstat|netstat]]:​ show network status
 +  * [[http://​www.freebsd.org/​cgi/​man.cgi?​query=vmstat|vmstat]]:​ report virtual memory statistics
 +  * [[http://​www.freebsd.org/​cgi/​man.cgi?​query=top|top]]:​ display and update information about the top cpu processes
 +
 +
 +=== Packet traffic ===
 +
 +Display the information regarding packet traffic, with refresh each second.
 +
 +Here is a first example:
 +
 +<​code>​
 +[root@BSDRP3]~#​ netstat -i -h -w 1
 +            input        (Total) ​          ​output
 +   ​packets ​ errs idrops ​     bytes    packets ​ errs      bytes colls
 +      370k     ​0 ​    ​0 ​       38M       ​370k ​    ​0 ​       38M     0
 +      369k     ​0 ​    ​0 ​       38M       ​368k ​    ​0 ​       38M     0
 +      370k     ​0 ​    ​0 ​       38M       ​370k ​    ​0 ​       38M     0
 +      373k     ​0 ​    ​0 ​       38M       ​376k ​    ​0 ​       38M     0
 +      370k     ​0 ​    ​0 ​       38M       ​368k ​    ​0 ​       38M     0
 +      368k     ​0 ​    ​0 ​       38M       ​368k ​    ​0 ​       38M     0
 +      368k     ​0 ​    ​0 ​       38M       ​369k ​    ​0 ​       38M     0
 +</​code>​
 +
 +=> This system is forwarding 370Kpps (in and out) without any in/out errs (The packet generator used netblast with 64B packet-size a 370Kpps).
 +
 +Here is a second example:
 +
 +<​code>​
 +[root@BSDRP3]~#​ netstat -ihw 1
 +            input        (Total) ​          ​output
 +   ​packets ​ errs idrops ​     bytes    packets ​ errs      bytes colls
 +      399k  915k     ​0 ​       25M       ​395k ​    ​0 ​       24M     0
 +      398k  914k     ​0 ​       24M       ​398k ​    ​0 ​       24M     0
 +      399k  915k     ​0 ​       25M       ​399k ​    ​0 ​       25M     0
 +      398k  915k     ​0 ​       24M       ​397k ​    ​0 ​       24M     0
 +      399k  914k     ​0 ​       25M       ​398k ​    ​0 ​       24M     0
 +      398k  914k     ​0 ​       24M       ​400k ​    ​0 ​       25M     0
 +      398k  915k     ​0 ​       24M       ​396k ​    ​0 ​       24M     0
 +      400k  915k     ​0 ​       25M       ​401k ​    ​0 ​       25M     0
 +      397k  914k     ​0 ​       24M       ​397k ​    ​0 ​       24M     0
 +      398k  914k     ​0 ​       24M       ​399k ​    ​0 ​       25M     0
 +      400k  914k     ​0 ​       25M       ​401k ​    ​0 ​       25M     0
 +      398k  914k     ​0 ​       24M       ​397k ​    ​0 ​       24M     0
 +</​code>​
 +
 +=> This system is forwarding about 400Kpps (in and out), but it's overloaded because it drops (errs) about 914Kpps (the generator used netmap pkt-gen with 64B packet size at a rate of 1.34Mpps).
 +
 +
 +=== Interrupt usage ===
 +
 +Report on the number of interrupts taken by each device since system startup.
 +
 +Here is a first example:
 +<​code>​
 +[root@BSDRP3]~#​ vmstat -i
 +interrupt ​                         total       rate
 +irq4: uart0                         ​6670 ​         5
 +irq14: ata0                            5          0
 +irq16: bge0                           ​27 ​         0
 +irq17: em0 bge1                  5209668 ​      4510
 +cpu0:​timer ​                      ​1299291 ​      1124
 +irq256: ahci0                       ​1172 ​         1
 +Total                            6516833 ​      5642
 +</​code>​
 +
 +=> Notice that em0 and bge1 are sharing the same IRQ. It's not a good news.
 +
 +Here is a second example:
 +
 +<​code>​
 +[root@BSDRP3]#​ vmstat -i
 +interrupt ​                         total       rate
 +irq4: uart0                        17869          0
 +irq14: ata0                            5          0
 +irq16: bge0                            1          0
 +irq17: em0 bge1                        2          0
 +cpu0:​timer ​                    ​214331752 ​      1125
 +irq256: ahci0                       ​1725 ​         0
 +Total                          214351354 ​      1126
 +</​code>​
 +
 +=> Almost zero rate and counters regarding NIC IRQ means polling is enabled: IRQ management of current NIC avoid the use of polling.
 +
 +=== Memory Buffer ===
 +
 +Show statistics recorded by the memory management routines. The network manages a private pool of memory buffers.
 +
 +<​code>​
 +[root@BSDRP3]~#​ netstat -m
 +5220/​810/​6030 mbufs in use (current/​cache/​total)
 +5219/​675/​5894/​512000 mbuf clusters in use (current/​cache/​total/​max)
 +5219/669 mbuf+clusters out of packet secondary zone in use (current/​cache)
 +0/​0/​0/​256000 4k (page size) jumbo clusters in use (current/​cache/​total/​max)
 +0/​0/​0/​128000 9k jumbo clusters in use (current/​cache/​total/​max)
 +0/0/0/64000 16k jumbo clusters in use (current/​cache/​total/​max)
 +11743K/​1552K/​13295K bytes allocated to network (current/​cache/​total)
 +0/0/0 requests for mbufs denied (mbufs/​clusters/​mbuf+clusters)
 +0/0/0 requests for jumbo clusters denied (4k/9k/16k)
 +0/0/0 sfbufs in use (current/​peak/​max)
 +0 requests for sfbufs denied
 +0 requests for sfbufs delayed
 +0 requests for I/O initiated by sendfile
 +0 calls to protocol drain routines
 +</​code>​
 +
 +Or more verbose:
 +
 +<​code>​
 +[root@BSDRP3]~#​ vmstat -z | head -1 ; vmstat -z | grep -i mbuf
 +ITEM                   ​SIZE ​ LIMIT     ​USED ​    ​FREE ​     REQ FAIL SLEEP
 +mbuf_packet: ​           256,      0,    5221,     ​667,​414103198, ​  ​0, ​  0
 +mbuf:                   ​256, ​     0,       ​1, ​    ​141, ​    ​135, ​  ​0, ​  0
 +mbuf_cluster: ​         2048, 512000, ​   5888,       ​6, ​   5888,   ​0, ​  0
 +mbuf_jumbo_page: ​      4096, 256000, ​      ​0, ​      ​0, ​      ​0, ​  ​0, ​  0
 +mbuf_jumbo_9k: ​        9216, 128000, ​      ​0, ​      ​0, ​      ​0, ​  ​0, ​  0
 +mbuf_jumbo_16k: ​      ​16384, ​ 64000, ​      ​0, ​      ​0, ​      ​0, ​  ​0, ​  0
 +mbuf_ext_refcnt: ​         4,      0,       ​0, ​      ​0, ​      ​0, ​  ​0, ​  0
 +</​code>​
 +
 +=> No "​failed"​ here.
 +
 +=== CPU / NIC ===
 +
 +top can give very usefull information regarding the CPU/NIC affinity:
 +
 +<​code>​
 +[root@BSDRP3]~#​ top -nCHSIzs1
 +last pid:  1392;  load averages: ​ 0.15,  0.48,  0.33  up 0+00:​22:​06 ​   15:44:26
 +75 processes: ​ 2 running, 57 sleeping, 16 waiting
 +
 +Mem: 10M Active, 8752K Inact, 79M Wired, 272K Cache, 17M Buf, 878M Free
 +Swap:
 +
 +
 +  PID USERNAME PRI NICE   ​SIZE ​   RES STATE    TIME    CPU COMMAND
 +    0 root     ​-92 ​   0     ​0K ​  176K -       16:24 96.39% kernel{em0 taskq}
 +   11 root     ​-92 ​   -     ​0K ​  256K WAIT     ​1:​01 ​ 5.76% intr{irq17: em0 bge1}
 +</​code>​
 +
 +=> Not very interesting output one CPU here.
 +
 +Here is another example on a 2 cores computer :
 +
 +<​code>​
 +[root@BSDRP2]~#​ top -nCHSIzs1 | awk '$5 ~ /(K|SIZE)/ { printf "%7s %2s %6s %10s %15s %s\n", $7, $8, $9, $10, $11, $12}'
 +  STATE  C   ​TIME ​       CPU         ​COMMAND
 +   ​CPU0 ​ 0   ​7:​23 ​    ​99.76% ​     kernel{em0 rxq}
 +    RUN  1   ​0:​44 ​     6.40%    intr{irq260:​ bce1}
 + ​istorm ​ 1   ​4:​18 ​     4.05%    intr{irq256:​ em0:rx
 +    RUN  1   ​0:​04 ​     0.68%    intr{irq258:​ em0:link}
 +</​code>​
 +
 +=> em0 is under interrupt storm, and consume 100% of CPU n°1.
 +
 +=== Drivers ===
 +
 +Depending the NIC drivers used, there are some counters available:
 +
 +<​code>​
 +[root@BSDRP3]~#​ sysctl dev.em.0.mac_stats. | grep -v ': 0'
 +dev.em.0.mac_stats.missed_packets:​ 221189883
 +dev.em.0.mac_stats.recv_no_buff:​ 94987654
 +dev.em.0.mac_stats.total_pkts_recvd:​ 351270928
 +dev.em.0.mac_stats.good_pkts_recvd:​ 130081045
 +dev.em.0.mac_stats.bcast_pkts_recvd:​ 1
 +dev.em.0.mac_stats.rx_frames_64:​ 2
 +dev.em.0.mac_stats.rx_frames_65_127:​ 130081043
 +dev.em.0.mac_stats.good_octets_recvd:​ 14308901524
 +dev.em.0.mac_stats.good_octets_txd:​ 892
 +dev.em.0.mac_stats.total_pkts_txd:​ 10
 +dev.em.0.mac_stats.good_pkts_txd:​ 10
 +dev.em.0.mac_stats.bcast_pkts_txd:​ 2
 +dev.em.0.mac_stats.mcast_pkts_txd:​ 5
 +dev.em.0.mac_stats.tx_frames_64:​ 2
 +dev.em.0.mac_stats.tx_frames_65_127:​ 8
 +</​code>​
 +
 +=> Notice the high level of missed_packets and recv_no_buff.
 +It's a problem regarding performance of the NIC or its drivers (on this example, the packet generator send packet at a rate about 1.38Mpps).