This shows you the differences between two versions of the page.
|
documentation:technical_docs:performance [2012/04/30 15:28] |
documentation:technical_docs:performance [2013/05/17 14:13] (current) olivier [Guides] |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== FreeBSD forwarding Performance ====== | ||
| + | {{description>Tips and information about FreeBSD forwarding performance}} | ||
| + | There are lot's of guide about [[http://serverfault.com/questions/64356/freebsd-performance-tuning-sysctls-loader-conf-kernel|tuning FreeBSD TCP performance]] (where the FreeBSD host is an end-point of the TCP session), but it's not the same that tunig forwarding performance (where the FreeBSD host don't have to read the TCP information of the packet being forwarded). | ||
| + | |||
| + | ===== Concept ===== | ||
| + | |||
| + | ==== How to bench a router ==== | ||
| + | |||
| + | Benchmarking a router is not measuring the maximum bandwidth crossing the router: | ||
| + | * [[http://www.ietf.org/rfc/rfc2544.txt|RFC2544: Benchmarking Methodology for Network Interconnect Devices]] | ||
| + | * [[http://tools.ietf.org/html/rfc3222|RFC3222: Terminology for Forwarding Information Base (FIB) based Router Performance]] | ||
| + | |||
| + | ==== Definition ==== | ||
| + | |||
| + | Clear definition regarding some relations is mandatory: | ||
| + | * [[http://www.cisco.com/web/about/security/intelligence/network_performance_metrics.html|Bandwidth, Packets Per Second, and Other Network Performance Metrics]] | ||
| + | |||
| + | ===== Benchmarks ===== | ||
| + | |||
| + | ==== Cisco or Linux ==== | ||
| + | |||
| + | * [[http://www.cisco.com/web/partners/downloads/765/tools/quickreference/routerperformance.pdf|Routing performance of Cisco routers]] (PDF) | ||
| + | * [[http://www.telematica.polito.it/oldsite/courmayeur06/papers/06-A.2.1.pdf|RFC2544 Performance Evaluation for a Linux Based Open Router]] | ||
| + | * [[http://data.guug.de/slides/lk2008/10G_preso_lk2008.pdf|Towards 10Gb/s open-source routing]]: Include an hardware comparison between a "real" router and a PC. | ||
| + | ==== FreeBSD ==== | ||
| + | |||
| + | Here are some benchs regarding network forwarding performance of FreeBSD: | ||
| + | * [[http://lists.freebsd.org/pipermail/freebsd-net/2012-July/032832.html|FreeBSD as 10 Giagbit router-on-a-stick]]: About 1Mpps, this thread have lot's of very useful tips. | ||
| + | * [[http://www.net.t-labs.tu-berlin.de/papers/SWF-PCCH10GEE-07.pdf|Packet capture in 10-Gigabit environments using Contemporary Commodity Hardware | ||
| + | ]] (pdf) | ||
| + | |||
| + | ===== Bench lab ===== | ||
| + | |||
| + | The bench lab should permit to measure the pps. For obtaining accurate result the [[http://www.ietf.org/rfc/rfc2544.txt|RFC 2544 (Benchmarking Methodology for Network Interconnect Devices)]] is a good reference. | ||
| + | |||
| + | ==== Packet generator ==== | ||
| + | |||
| + | A packet generator | ||
| + | * [[http://wiki.networksecuritytoolkit.org/nstwiki/index.php/LAN_Ethernet_Maximum_Rates,_Generation,_Capturing_%26_Monitoring|LAN Ethernet Maximum Rates, Generation, Capturing & Monitoring]] ... on GNU/Linux | ||
| + | * pkt-gen from the netmap suite | ||
| + | |||
| + | ===== Tuning FreeBSD ===== | ||
| + | |||
| + | ==== Guides ==== | ||
| + | |||
| + | Here is a list of sources about optimizing forwarding performance under FreeBSD. | ||
| + | |||
| + | How to bench or tune the network stack: | ||
| + | * [[http://wiki.freebsd.org/NetworkPerformanceTuning | FreeBSD Network Performance Tuning]]: What need to be done to tune networking stack | ||
| + | * [[https://calomel.org/network_performance.html | Calomel.org advice for 10Giga tunning]]: A simple and rapid guide | ||
| + | * [[http://www.freebsd.org/projects/netperf/index.html|FreeBSD Network Performance Project (netperf)]] | ||
| + | * [[http://www.watson.org/~robert/freebsd/netperf/20051027-eurobsdcon2005-netperf.pdf|Introduction to Multithreading and Multiprocessing in the FreeBSD SMPng Network Stack]], EuroBSDCon 2005 (PDF) | ||
| + | * [[http://www.freebsd.org/cgi/man.cgi?query=tuning&apropos=0&sektion=0&manpath=FreeBSD+8.2-RELEASE&arch=default&format=html|man tuning]] : performance tuning under FreeBSD | ||
| + | * [[http://wwwx.cs.unc.edu/~krishnan/classes/spring_07/os_impl/report.pdf|Improving Memory and Interrupt Processing in FreeBSD Network Stack]] (PDF) | ||
| + | * [[http://conferences.sigcomm.org/sigcomm/2009/workshops/presto/papers/p37.pdf|Optimizing the BSD Routing System for Parallel Processing]] (PDF) | ||
| + | * [[https://people.sunyit.edu/~sengupta/CSC521/systemperformance.ppt|Using netstat and vmstat for performance analysis]] (Powerpoint)) | ||
| + | * [[http://www.freebsd.org/cgi/man.cgi?query=polling&sektion=4|polling man page]] (Warning: enabling polling is not a good idea with the new generation of Ethernet controller that include interruption control) | ||
| + | * [[http://info.iet.unipi.it/~luigi/polling/|Device Polling support for FreeBSD ]], the original presentation of polling implementation | ||
| + | * [[http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/configtuning-kernel-limits.html|Tuning Kernel Limits]] on the FreeBSD Handbook | ||
| + | |||
| + | FreeBSD Experimental high-performance network stacks: | ||
| + | * [[http://info.iet.unipi.it/~luigi/netmap/|Netmap - memory mapping of network devices]] //"(...)a single core running at 1.33GHz can generate the 14.8Mpps that saturate a 10GigE interface."// | ||
| + | |||
| + | ==== Where is the bottleneck ? ==== | ||
| + | |||
| + | Tools: | ||
| + | * [[http://www.freebsd.org/cgi/man.cgi?query=netstat|netstat]]: show network status | ||
| + | * [[http://www.freebsd.org/cgi/man.cgi?query=vmstat|vmstat]]: report virtual memory statistics | ||
| + | * [[http://www.freebsd.org/cgi/man.cgi?query=top|top]]: display and update information about the top cpu processes | ||
| + | |||
| + | |||
| + | === Packet traffic === | ||
| + | |||
| + | Display the information regarding packet traffic, with refresh each second. | ||
| + | |||
| + | Here is a first example: | ||
| + | |||
| + | <code> | ||
| + | [root@BSDRP3]~# netstat -i -h -w 1 | ||
| + | input (Total) output | ||
| + | packets errs idrops bytes packets errs bytes colls | ||
| + | 370k 0 0 38M 370k 0 38M 0 | ||
| + | 369k 0 0 38M 368k 0 38M 0 | ||
| + | 370k 0 0 38M 370k 0 38M 0 | ||
| + | 373k 0 0 38M 376k 0 38M 0 | ||
| + | 370k 0 0 38M 368k 0 38M 0 | ||
| + | 368k 0 0 38M 368k 0 38M 0 | ||
| + | 368k 0 0 38M 369k 0 38M 0 | ||
| + | </code> | ||
| + | |||
| + | => This system is forwarding 370Kpps (in and out) without any in/out errs (The packet generator used netblast with 64B packet-size a 370Kpps). | ||
| + | |||
| + | Here is a second example: | ||
| + | |||
| + | <code> | ||
| + | [root@BSDRP3]~# netstat -ihw 1 | ||
| + | input (Total) output | ||
| + | packets errs idrops bytes packets errs bytes colls | ||
| + | 399k 915k 0 25M 395k 0 24M 0 | ||
| + | 398k 914k 0 24M 398k 0 24M 0 | ||
| + | 399k 915k 0 25M 399k 0 25M 0 | ||
| + | 398k 915k 0 24M 397k 0 24M 0 | ||
| + | 399k 914k 0 25M 398k 0 24M 0 | ||
| + | 398k 914k 0 24M 400k 0 25M 0 | ||
| + | 398k 915k 0 24M 396k 0 24M 0 | ||
| + | 400k 915k 0 25M 401k 0 25M 0 | ||
| + | 397k 914k 0 24M 397k 0 24M 0 | ||
| + | 398k 914k 0 24M 399k 0 25M 0 | ||
| + | 400k 914k 0 25M 401k 0 25M 0 | ||
| + | 398k 914k 0 24M 397k 0 24M 0 | ||
| + | </code> | ||
| + | |||
| + | => This system is forwarding about 400Kpps (in and out), but it's overloaded because it drops (errs) about 914Kpps (the generator used netmap pkt-gen with 64B packet size at a rate of 1.34Mpps). | ||
| + | |||
| + | |||
| + | === Interrupt usage === | ||
| + | |||
| + | Report on the number of interrupts taken by each device since system startup. | ||
| + | |||
| + | Here is a first example: | ||
| + | <code> | ||
| + | [root@BSDRP3]~# vmstat -i | ||
| + | interrupt total rate | ||
| + | irq4: uart0 6670 5 | ||
| + | irq14: ata0 5 0 | ||
| + | irq16: bge0 27 0 | ||
| + | irq17: em0 bge1 5209668 4510 | ||
| + | cpu0:timer 1299291 1124 | ||
| + | irq256: ahci0 1172 1 | ||
| + | Total 6516833 5642 | ||
| + | </code> | ||
| + | |||
| + | => Notice that em0 and bge1 are sharing the same IRQ. It's not a good news. | ||
| + | |||
| + | Here is a second example: | ||
| + | |||
| + | <code> | ||
| + | [root@BSDRP3]# vmstat -i | ||
| + | interrupt total rate | ||
| + | irq4: uart0 17869 0 | ||
| + | irq14: ata0 5 0 | ||
| + | irq16: bge0 1 0 | ||
| + | irq17: em0 bge1 2 0 | ||
| + | cpu0:timer 214331752 1125 | ||
| + | irq256: ahci0 1725 0 | ||
| + | Total 214351354 1126 | ||
| + | </code> | ||
| + | |||
| + | => Almost zero rate and counters regarding NIC IRQ means polling is enabled: IRQ management of current NIC avoid the use of polling. | ||
| + | |||
| + | === Memory Buffer === | ||
| + | |||
| + | Show statistics recorded by the memory management routines. The network manages a private pool of memory buffers. | ||
| + | |||
| + | <code> | ||
| + | [root@BSDRP3]~# netstat -m | ||
| + | 5220/810/6030 mbufs in use (current/cache/total) | ||
| + | 5219/675/5894/512000 mbuf clusters in use (current/cache/total/max) | ||
| + | 5219/669 mbuf+clusters out of packet secondary zone in use (current/cache) | ||
| + | 0/0/0/256000 4k (page size) jumbo clusters in use (current/cache/total/max) | ||
| + | 0/0/0/128000 9k jumbo clusters in use (current/cache/total/max) | ||
| + | 0/0/0/64000 16k jumbo clusters in use (current/cache/total/max) | ||
| + | 11743K/1552K/13295K bytes allocated to network (current/cache/total) | ||
| + | 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) | ||
| + | 0/0/0 requests for jumbo clusters denied (4k/9k/16k) | ||
| + | 0/0/0 sfbufs in use (current/peak/max) | ||
| + | 0 requests for sfbufs denied | ||
| + | 0 requests for sfbufs delayed | ||
| + | 0 requests for I/O initiated by sendfile | ||
| + | 0 calls to protocol drain routines | ||
| + | </code> | ||
| + | |||
| + | Or more verbose: | ||
| + | |||
| + | <code> | ||
| + | [root@BSDRP3]~# vmstat -z | head -1 ; vmstat -z | grep -i mbuf | ||
| + | ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP | ||
| + | mbuf_packet: 256, 0, 5221, 667,414103198, 0, 0 | ||
| + | mbuf: 256, 0, 1, 141, 135, 0, 0 | ||
| + | mbuf_cluster: 2048, 512000, 5888, 6, 5888, 0, 0 | ||
| + | mbuf_jumbo_page: 4096, 256000, 0, 0, 0, 0, 0 | ||
| + | mbuf_jumbo_9k: 9216, 128000, 0, 0, 0, 0, 0 | ||
| + | mbuf_jumbo_16k: 16384, 64000, 0, 0, 0, 0, 0 | ||
| + | mbuf_ext_refcnt: 4, 0, 0, 0, 0, 0, 0 | ||
| + | </code> | ||
| + | |||
| + | => No "failed" here. | ||
| + | |||
| + | === CPU / NIC === | ||
| + | |||
| + | top can give very usefull information regarding the CPU/NIC affinity: | ||
| + | |||
| + | <code> | ||
| + | [root@BSDRP3]~# top -nCHSIzs1 | ||
| + | last pid: 1392; load averages: 0.15, 0.48, 0.33 up 0+00:22:06 15:44:26 | ||
| + | 75 processes: 2 running, 57 sleeping, 16 waiting | ||
| + | |||
| + | Mem: 10M Active, 8752K Inact, 79M Wired, 272K Cache, 17M Buf, 878M Free | ||
| + | Swap: | ||
| + | |||
| + | |||
| + | PID USERNAME PRI NICE SIZE RES STATE TIME CPU COMMAND | ||
| + | 0 root -92 0 0K 176K - 16:24 96.39% kernel{em0 taskq} | ||
| + | 11 root -92 - 0K 256K WAIT 1:01 5.76% intr{irq17: em0 bge1} | ||
| + | </code> | ||
| + | |||
| + | => Not very interesting output one CPU here. | ||
| + | |||
| + | Here is another example on a 2 cores computer : | ||
| + | |||
| + | <code> | ||
| + | [root@BSDRP2]~# top -nCHSIzs1 | awk '$5 ~ /(K|SIZE)/ { printf "%7s %2s %6s %10s %15s %s\n", $7, $8, $9, $10, $11, $12}' | ||
| + | STATE C TIME CPU COMMAND | ||
| + | CPU0 0 7:23 99.76% kernel{em0 rxq} | ||
| + | RUN 1 0:44 6.40% intr{irq260: bce1} | ||
| + | istorm 1 4:18 4.05% intr{irq256: em0:rx | ||
| + | RUN 1 0:04 0.68% intr{irq258: em0:link} | ||
| + | </code> | ||
| + | |||
| + | => em0 is under interrupt storm, and consume 100% of CPU n°1. | ||
| + | |||
| + | === Drivers === | ||
| + | |||
| + | Depending the NIC drivers used, there are some counters available: | ||
| + | |||
| + | <code> | ||
| + | [root@BSDRP3]~# sysctl dev.em.0.mac_stats. | grep -v ': 0' | ||
| + | dev.em.0.mac_stats.missed_packets: 221189883 | ||
| + | dev.em.0.mac_stats.recv_no_buff: 94987654 | ||
| + | dev.em.0.mac_stats.total_pkts_recvd: 351270928 | ||
| + | dev.em.0.mac_stats.good_pkts_recvd: 130081045 | ||
| + | dev.em.0.mac_stats.bcast_pkts_recvd: 1 | ||
| + | dev.em.0.mac_stats.rx_frames_64: 2 | ||
| + | dev.em.0.mac_stats.rx_frames_65_127: 130081043 | ||
| + | dev.em.0.mac_stats.good_octets_recvd: 14308901524 | ||
| + | dev.em.0.mac_stats.good_octets_txd: 892 | ||
| + | dev.em.0.mac_stats.total_pkts_txd: 10 | ||
| + | dev.em.0.mac_stats.good_pkts_txd: 10 | ||
| + | dev.em.0.mac_stats.bcast_pkts_txd: 2 | ||
| + | dev.em.0.mac_stats.mcast_pkts_txd: 5 | ||
| + | dev.em.0.mac_stats.tx_frames_64: 2 | ||
| + | dev.em.0.mac_stats.tx_frames_65_127: 8 | ||
| + | </code> | ||
| + | |||
| + | => Notice the high level of missed_packets and recv_no_buff. | ||
| + | It's a problem regarding performance of the NIC or its drivers (on this example, the packet generator send packet at a rate about 1.38Mpps). | ||