PC 1 and PC 2 are both old Thin Client HP Compaq T5000 with BSDRP installed:
PC 1 (10.0.1.1) will be use as data sender, and PC 2 (10.0.1.2) as data receiver.
BSDRP includes 3 network benchmark tools:
All tests will be done in UDP: The purpose is to have a simple packet generator.
First we will start Iperf, IPv4 mode on PC 2 (receiver):
[root@PC2]~# iperf -s -u ------------------------------------------------------------ Server listening on UDP port 5001 Receiving 1470 byte datagrams UDP buffer size: 41.1 KByte (default) ------------------------------------------------------------
Then, from PC 1, we will start sending data to PC 2:
We need to run it 3 times minimum…
[root@PC1]~# iperf -u -c 10.0.1.2 -f m -b 200M -t 60 ------------------------------------------------------------ Client connecting to 10.0.1.2, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 0.01 MByte (default) ------------------------------------------------------------ [ 3] local 10.0.1.1 port 62471 connected with 10.0.1.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-60.0 sec 505 MBytes 70.5 Mbits/sec [ 3] Sent 359938 datagrams [ 3] Server Report: [ 3] 0.0-60.0 sec 505 MBytes 70.5 Mbits/sec 0.197 ms 27/359937 (0.0075%) [ 3] 0.0-60.0 sec 1 datagrams received out-of-order [root@PC1]~# iperf -u -c 10.0.1.2 -f m -b 200M -t 60 ------------------------------------------------------------ Client connecting to 10.0.1.2, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 0.01 MByte (default) ------------------------------------------------------------ [ 3] local 10.0.1.1 port 20006 connected with 10.0.1.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-60.0 sec 505 MBytes 70.6 Mbits/sec [ 3] Sent 360103 datagrams [ 3] Server Report: [ 3] 0.0-60.0 sec 503 MBytes 70.4 Mbits/sec 0.225 ms 967/360102 (0.27%) [ 3] 0.0-60.0 sec 1 datagrams received out-of-order [root@PC1]~# iperf -u -c 10.0.1.2 -f m -b 200M -t 60 ------------------------------------------------------------ Client connecting to 10.0.1.2, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 0.01 MByte (default) ------------------------------------------------------------ [ 3] local 10.0.1.1 port 48945 connected with 10.0.1.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-60.0 sec 505 MBytes 70.6 Mbits/sec [ 3] Sent 360306 datagrams [ 3] Server Report: [ 3] 0.0-60.0 sec 505 MBytes 70.6 Mbits/sec 0.229 ms 24/360305 (0.0067%) [ 3] 0.0-60.0 sec 1 datagrams received out-of-order
We can see that Iperf can generate about 70Mbit/s of IPv4 traffic using full packet size (1470).
During theses IPerf IPv4 tests, here is the system load:
on PC1:
15 processes: 2 running, 12 slee 0.25, 0.51, 0.46 up 0+00:59:00 12:59:20 CPU: 3.5% user3 0.0% nice1 23.0% system, 68.5% interrupt, 5.1% idle Mem: 4.3K Active, 5740K Inac17.96M Wired, 72.4 Cache, 18M B 5.4196M Free
On PC2:
15 processes: 4 running, 11 sleeping CPU: 0.4% user, 0.0% nice, 7.7% system, 80.7% interrupt, 11.2% idle Mem: 8552K Active, 5852K Inact, 16M Wired, 764K Cache, 19M Buf, 196M Free
First we will start Iperf, IPv6 mode on PC 2 (receiver):
[root@PC2]~# iperf -s -u ------------------------------------------------------------ Server listening on UDP port 5001 Receiving 1470 byte datagrams UDP buffer size: 41.1 KByte (default) ------------------------------------------------------------
Then, from PC 1, we will start sending data to PC 2:
We need to run it 3 times minimum…
[root@PC1]~# iperf -V -u -c 2001:db8::2 -f m -b 200M -t 60 ------------------------------------------------------------ Client connecting to 2001:db8::2, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 0.01 MByte (default) ------------------------------------------------------------ [ 3] local 2001:db8::1 port 26388 connected with 2001:db8::2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-60.0 sec 372 MBytes 52.0 Mbits/sec [ 3] Sent 265193 datagrams [ 3] Server Report: [ 3] 0.0-60.0 sec 372 MBytes 52.0 Mbits/sec 0.340 ms 0/265192 (0%) [ 3] 0.0-60.0 sec 1 datagrams received out-of-order [root@PC1]~# iperf -V -u -c 2001:db8::2 -f m -b 200M -t 60 ------------------------------------------------------------ Client connecting to 2001:db8::2, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 0.01 MByte (default) ------------------------------------------------------------ [ 3] local 2001:db8::1 port 33815 connected with 2001:db8::2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-60.0 sec 371 MBytes 51.9 Mbits/sec [ 3] Sent 264895 datagrams [ 3] Server Report: [ 3] 0.0-60.0 sec 371 MBytes 51.9 Mbits/sec 0.337 ms 0/264894 (0%) [ 3] 0.0-60.0 sec 1 datagrams received out-of-order [root@PC1]~# iperf -V -u -c 2001:db8::2 -f m -b 500M -t 60 ------------------------------------------------------------ Client connecting to 2001:db8::2, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 0.01 MByte (default) ------------------------------------------------------------ [ 3] local 2001:db8::1 port 46141 connected with 2001:db8::2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-60.0 sec 371 MBytes 51.8 Mbits/sec [ 3] Sent 264493 datagrams [ 3] Server Report: [ 3] 0.0-60.0 sec 370 MBytes 51.7 Mbits/sec 0.332 ms 532/264492 (0.2%) [ 3] 0.0-60.0 sec 1 datagrams received out-of-order
During theses buggy IPerf IPv6 tests, here is the system load:
on PC1:
14 processes: 2 running, 12 slee 0.45, 0.62, 0.47 up 0+00:55:09 12:55:29 CPU: 3.5% user, 0.0% nice, 39.3% system, 57.2% interrupt, 0.0% idle Mem: 0.8K Active, 5744K Inac44.06M Wired, 55.3 Cache, 18M Buf, 196M Free
On PC2:
last pid: 1188; load averages: 0.37, 0.41, 0.58 up 0+00:56:27 12:33:22 15 processes: 4 running, 11 sleeping CPU: 3.1% user, 0.0% nice, 8.1% system, 69.0% interrupt, 19.8% idle Mem: 8548K Active, 5856K Inact, 16M Wired, 764K Cache, 19M Buf, 196M Free
Now we will test the FreeBSD Netblast using the same packet size as iperf (we are not measuring forwarding performance here, only bandwidth).
First, we will start netreceive on PC 2 (receiver, UDP port 9090) and monitor the network usage with systat:
[root@pc2]~#netreceive 9090 & [root@pc2]~#systat -ifstat :scale mbit
Then on the PC 1 (sender), we start to send data, 3 runs:
[root@PC1]~# netblast 10.0.1.2 9090 1470 60 start: 1325166025.031250619 finish: 1325166085.073804351 send calls: 403121 send errors: 0 send success: 403121 approx send rate: 6718 approx error rate: 0 approx throughput: 81 Mib/s [root@PC1]~# netblast 10.0.1.2 9090 1470 60 start: 1325166135.210622858 finish: 1325166195.252919574 send calls: 402469 send errors: 0 send success: 402469 approx send rate: 6707 approx error rate: 0 approx throughput: 81 Mib/s [root@PC1]~# netblast 10.0.1.2 9090 1470 60 start: 1325166224.939911636 finish: 1325166284.982166727 send calls: 402361 send errors: 0 send success: 402361 approx send rate: 6706 approx error rate: 0 approx throughput: 81 Mib/s
We have the same result regarding throughput receive by PC2:
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average ||||
Interface Traffic Peak Total
vr0 in 38.237 Mb/s 81.254 Mb/s 1.341 GB
out 0.000 Mb/s 0.000 Mb/s 1.685 MB
And the packet loss by PC2 is quiet low:
[root@PC2]~# netstat -ss
udp:
1610767 datagrams received
2 broadcast/multicast datagrams undelivered
18614 dropped due to full socket buffers
1592151 delivered
ip:
1610767 total packets received
1610767 packets for this host
During this IPv4 netblast test, the CPU usage of PC 1 was:
last pid: 1333; load averages: 0.76, 0.78, 0.46 up 0+01:48:28 13:48:48 8 processes: 2 running, 6 sleeping CPU: 1.6% user, 0.0% nice, 14.4% system, 84.0% interrupt, 0.0% idle Mem: 5892K Active, 5552K Inact, 16M Wired, 760K Cache, 19M Buf, 199M Free
and on PC2:
last pid: 1287; load averages: 0.79, 0.46, 0.25 up 0+01:50:15 13:27:10 8 processes: 2 running, 6 sleeping CPU: 0.4% user, 0.0% nice, 6.9% system, 89.3% interrupt, 3.4% idle Mem: 6008K Active, 5856K Inact, 16M Wired, 764K Cache, 19M Buf, 199M Free
We notice very high CPU interrupt usage on both side: Theses interrupts are generated by the NIC.
There is 10Mbit/s difference between IPv4 iperf (70Mbit/s) and IPv4 netblast (80Mbit/s) !
We kept the netreceive/systat on PC 2 like the IPv4 netblas test.
Then on the PC 1 (sender), we start to send data, 3 runs:
[root@PC1]/tmp# netblast 2001:db8::2 9090 1430 60 start: 1325175022.973543284 finish: 1325175083.016163506 send calls: 388622 send errors: 0 send success: 388622 approx send rate: 6477 approx error rate: 0 approx throughput: 77 Mib/s [root@PC1]/tmp# netblast 2001:db8::2 9090 1430 60 start: 1325174850.578943412 finish: 1325174910.621620344 send calls: 389590 send errors: 0 send success: 389590 approx send rate: 6493 approx error rate: 0 approx throughput: 77 Mib/s [root@PC1]/tmp# netblast 2001:db8::2 9090 1430 60 start: 1325174934.827957183 finish: 1325174994.869647677 send calls: 389118 send errors: 0 send success: 389118 approx send rate: 6485 approx error rate: 0 approx throughput: 77 Mib/s
We have the same result regarding throughput receive by PC2:
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average |||||||||||||
Interface Traffic Peak Total
vr0 in 77.460 Mb/s 77.507 Mb/s 2.969 GB
out 0.000 Mb/s 0.000 Mb/s 1.707 MB
And the packet loss by PC2 is quiet low:
[root@PC2]~# netstat -ss -f inet6
udp:
779940 datagrams received
2 broadcast/multicast datagrams undelivered
32877 dropped due to full socket buffers
747061 delivered
ip6:
779953 total packets received
779938 packets for this host
14 packets sent from this host
Input histogram:
UDP: 779938
ICMP6: 15
Mbuf statistics:
0 one mbuf
779953 one ext mbuf
0 two or more ext mbuf
During this IPv6 netblast test, the CPU usage of PC 1 was:
last pid: 2031; load averages: 1.95, 0.97, 0.68 up 0+04:26:24 16:26:44 9 processes: 3 running, 6 sleeping CPU: 2.3% user, 0.0% nice, 15.6% system, 82.1% interrupt, 0.0% idle Mem: 6588K Active, 6156K Inact, 16M Wired, 304K Cache, 20M Buf, 198M Free
and on PC2:
last pid: 2011; load averages: 0.58, 0.50, 0.53 up 0+04:27:10 16:04:05 10 processes: 2 running, 8 sleeping CPU: 1.1% user, 0.0% nice, 8.0% system, 90.4% interrupt, 0.4% idle Mem: 7092K Active, 6720K Inact, 16M Wired, 308K Cache, 20M Buf, 196M Free
We notice very high CPU interrupt usage on both side: Theses interrupts are generated by the NIC.
There are 4 Mbit/s difference between IPv4 netblast (81Mbit/s) and IPv6 netblast (77Mbit/s), and using the same packet size (1430 for IPv4 and IPv6) there are still a 3Mbit/s gap.
During the netblast test, there was a very high level of CPU interrupt, here is how to reduce it:
On both PC 1 and PC 2, enable NIC polling:
Now, restart the netblast bench again:
From PC 1, re-send data to PC 2:
[root@PC1]~# netblast 10.0.1.2 9090 1470 60 start: 1325178791.914166744 finish: 1325178851.956046648 send calls: 5166494 send errors: 4706483 send success: 460011 approx send rate: 7666 approx error rate: 0 approx throughput: 92 Mib/s [root@PC1]~# netblast 2001:db8::2 9090 1430 60 start: 1325178962.979784581 finish: 1325179023.021798021 send calls: 2347583 send errors: 1880785 send success: 466798 approx send rate: 7779 approx error rate: 0 approx throughput: 92 Mib/s
With polling enabled, the maximum throughput was increase to 92 Mbit/s !
And the new CPU load on PC 1 (sender):
last pid: 1108; load averages: 0.59, 0.66, 0.42 up 0+00:12:17 17:18:42 7 processes: 2 running, 5 sleeping CPU: 8.6% user, 0.0% nice, 77.4% system, 14.0% interrupt, 0.0% idle Mem: 4924K Active, 3244K Inact, 15M Wired, 696K Cache, 16M Buf, 203M Free
During the same time, on the PC 2 (receiver) side, the measured incoming bandwidth:
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average ||||
Interface Traffic Peak Total
vr0 in 92.638 Mb/s 92.642 Mb/s 826.272 MB
out 0.000 Mb/s 0.000 Mb/s 0.518 KB
And, PC 2 (receive), CPU usage:
last pid: 1112; load averages: 0.44, 0.53, 0.33 up 0+00:11:14 16:56:05 7 processes: 1 running, 6 sleeping CPU: 0.8% user, 0.0% nice, 12.1% system, 19.1% interrupt, 68.1% idle Mem: 5100K Active, 3748K Inact, 15M Wired, 696K Cache, 17M Buf, 202M Free
UDP stat on PC2:
udp:
1441998 datagrams received
1 broadcast/multicast datagram undelivered
4585 dropped due to full socket buffers
1437412 delivered
ip:
1379658 total packets received
1379656 packets for this host
2 packets not forwardable
The main goal of a packet generator, is not to generate bandwidth usage (by using big packet), but to generate lot's of packet per second (pps) by using small packet. We need to use a small packet size for generate high number of pps
But, what is the maximum frame per second of an 100Mb/s Ethernet link ?
The document Bandwidth, Packets Per Second, and Other Network Performance Metrics give, in table 1 “Maximum Frame Rate and Throughput Calculations For a 1-Gb/s Ethernet Link” the answer: The minimum total frame physical size is 84 byte (including inter frame gap). We can now calculate the maximum frame per second for a FastEthernet link:
100,000,000 b/s / (84 B * 8 b/B)] == 148 809 f/s == 149Kpps
[root@PC1]~# netblast 10.0.1.2 9090 64 60 start: 1325177692.699528012 finish: 1325177752.741911611 send calls: 434693 send errors: 0 send success: 434693 approx send rate: 7244 approx error rate: 0 approx throughput: 6 Mib/s [root@PC1]~# netblast 2001:db8::2 9090 64 60 start: 1325177922.574319641 finish: 1325177982.615896712 send calls: 412694 send errors: 0 send success: 412694 approx send rate: 6878 approx error rate: 0 approx throughput: 6 Mib/s
Without polling, this device is able to generate about 7 kpps IPv4 and 6.9 kpps IPv6.
We have a strange result with polling:
[root@PC1]~# netblast 10.0.1.2 9090 64 60 start: 1325179600.042469922 finish: 1325179660.084268251 send calls: 4190292 send errors: 0 send success: 4190292 approx send rate: 69838 approx error rate: 0 approx throughput: 59 Mib/s [root@PC1]~# netblast 10.0.1.2 9090 64 60 start: 1325179714.014201525 finish: 1325179774.055861009 send calls: 4227953 send errors: 0 send success: 4227953 approx send rate: 70465 approx error rate: 0 approx throughput: 59 Mib/s [root@PC1]~# netblast 2001:db8::2 9090 64 60 start: 1325180513.563255106 finish: 1325180573.604868216 send calls: 2977488 send errors: 0 send success: 2977488 approx send rate: 49624 approx error rate: 0 approx throughput: 50 Mib/s [root@PC1]~# netblast 2001:db8::2 9090 64 60 start: 1325180599.589205331 finish: 1325180659.630854200 send calls: 3007401 send errors: 0 send success: 3007401 approx send rate: 50123 approx error rate: 0 approx throughput: 50 Mib/s
Throughput receive on PC 2:
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average |||||
Interface Traffic Peak Total
vr0 in 44.072 Mb/s 44.072 Mb/s 749.572 MB
out 0.000 Mb/s 0.000 Mb/s 1.754 KB
With polling, this device is able to generate about 70 kpps IPv4 and 50 kpps IPv6: Almost x10 better performance than without polling !
For information, UDP stat on PC:
[root@PC2]~# netstat -ss
udp:
19145968 datagrams received
8 broadcast/multicast datagrams undelivered
2951504 dropped due to full socket buffers
16194456 delivered
ip:
10732239 total packets received
10732231 packets for this host
ip6:
8413759 total packets received
8413739 packets for this host
26 packets sent from this host
Now that we have packet generator, we can build the full lab using a PC engine WRAP 1e203 as router.
We will begin by testing the maximum throughput and pps that this device can generate. Same test between WRAP and PC2.
Max throughput:
[root@wrap]/# netblast 10.0.2.2 9090 1470 60 start: 946717690.521191814 finish: 946717750.558459518 send calls: 147019 send errors: 0 send success: 147019 approx send rate: 2450 approx error rate: 0 approx throughput: 29 Mib/s
Max PPS:
[root@wrap]/# netblast 10.0.2.2 9090 64 60 start: 946718035.917633110 finish: 946718095.954543332 send calls: 248378 send errors: 0 send success: 248378 approx send rate: 4139 approx error rate: 0 approx throughput: 3 Mib/s
⇒ Without polling, about 4Kpps only.
Max throughput:
[root@wrap]~# netblast 10.0.2.2 9090 1470 60 start: 947743395.648211295 finish: 947743455.685888369 send calls: 71946 send errors: 0 send success: 71946 approx send rate: 1199 approx error rate: 0 approx throughput: 14 Mib/s
Note: Lower throughput with polling enabled !
Max PPS:
[root@wrap]~# netblast 10.0.2.2 9090 64 60 start: 947743786.379087368 finish: 947743846.417400442 send calls: 117315 send errors: 0 send success: 117315 approx send rate: 1955 approx error rate: 0 approx throughput: 1 Mib/s
⇒ With polling enabled, it generate about 2Kpps only.
Enabling polling for end-point packet generator is not a good idea on the WRAP.
PC1 will generate its maximum pps to PC2 across WRAP.
From PC1:
[root@PC1]~# netblast 10.0.2.2 9090 64 60 start: 1326210268.937558253 finish: 1326210328.979533141 send calls: 4053658 send errors: 0 send success: 4053658 approx send rate: 67560 approx error rate: 0 approx throughput: 57 Mib/s
WRAP begin to full its log with:
interrupt storm detected on "irq10:"; throttling interrupt source interrupt storm detected on "irq10:"; throttling interrupt source interrupt storm detected on "irq10:"; throttling interrupt source interrupt storm detected on "irq10:"; throttling interrupt source interrupt storm detected on "irq10:"; throttling interrupt source interrupt storm detected on "irq10:"; throttling interrupt source interrupt storm detected on "irq10:"; throttling interrupt source
Then it crash:
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xc
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc07533f7
stack pointer = 0x28:0xc7eb3af0
frame pointer = 0x28:0xc7eb3b1c
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 11 (irq10: sis0)
trap number = 12
panic: page fault
cpuid = 0
Uptime: 8m5s
Cannot dump. Device not defined or unavailable