====== Dropping packets at high rate ======
===== Objective =====
{{:documentation:examples:labs.examples.ddos.png}}
===== Using IPFW =====
==== Standard IP level configuration ====
The configuration file of an IPFW in standard mode is this one:
- First rule is to deny a blacklist table (IP addresses)
- Second rule is to allow all the rest
- Disable the outgoing [[https://www.freebsd.org/cgi/man.cgi?query=pfil&apropos=0&sektion=0&manpath=FreeBSD+12.1-RELEASE+and+Ports&arch=default&format=html|pfil(9)]] hook at IP level because we don't need to filter outgoing traffic in this case
#!/bin/sh
set -eu
fwcmd="/sbin/ipfw"
${fwcmd} -f flush
${fwcmd} table blacklist destroy || true
${fwcmd} table blacklist create type addr
${fwcmd} table blacklist add 198.18.2.0/24
${fwcmd} add deny udp from table\(blacklist\) to any
${fwcmd} add pass ip from any to any
pfilctl unlink -o ipfw:default inet || true
pfilctl unlink -o ipfw:default6 inet6 || true
==== NIC level configuration ====
Currently the [[https://svnweb.freebsd.org/changeset/base/343631|Pfil Memory Pointer Hooks]] feature is supported by [[https://svnweb.freebsd.org/changeset/base/346632|iflib]], [[https://svnweb.freebsd.org/changeset/base/356613|vtnet]], [[https://svnweb.freebsd.org/changeset/base/346247|Mellanox]] and [[https://svnweb.freebsd.org/changeset/base/357483|Chelsio]] drivers.
The configuration file of an IPFW-at-NIC-level is this one:
- First rule is to deny a blacklist table (IP addresses)
- Second rule is to allow all the rest
- Enabling pfil(9) at NIC level (in only)
- Removing pfil(9) from the IP level (in & out)
#!/bin/sh
set -eu
fwcmd="/sbin/ipfw"
${fwcmd} -f flush
${fwcmd} table blacklist destroy || true
${fwcmd} table blacklist create type addr
${fwcmd} table blacklist add 198.18.2.0/24
${fwcmd} add deny udp from table\(blacklist\) to any
${fwcmd} add pass ip from any to any
if pfilctl link -i ipfw:default-link cxl0; then
pfilctl unlink -i ipfw:default inet || true
pfilctl unlink -o ipfw:default inet || true
pfilctl unlink -i ipfw:default6 inet6 || true
pfilctl unlink -o ipfw:default6 inet6 || true
fi
==== Performance benches ====
Hardware:
* Intel Xeon CPU E5-2697A v4 @ 2.60GHz (16 cores, 32 threads)
* Input NIC (filtering): Chelsio T580-LP-CR (QSFP+ 40GBASE-SR4)
* Output NIC: Mellanox ConnectX-4 MCX416A-CCAT (QSFP28 100GBASE-SR4)
* FreeBSD 13.0-CURRENT r357572
Here is the rate of inet4 (legitimate) packets-per-second forwarded while dropping 42Mpps of denied packets using the different configuration sets:
x ipfw-standard
+ ipfw-at-nic-level
+--------------------------------------------------------------------------+
| +|
| +|
| +|
| x +|
|xx xx +|
||MA| |
| A|
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 9902014.5 10004945 9940500.5 9949589.9 41823.097
+ 5 12013847 12018302 12015790 12015667 1720.41
Difference at 95.0% confidence
2.06608e+06 +/- 43167.6
20.7655% +/- 0.523817%
(Student's t, pooled s = 29598.4)
On the 14Mpps of legitimate traffic, this generic (ie: supported by multi drivers) software firewall is still able to forward 12Mpps while droping 42Mpps of denied packets.
===== Using Chelsio's TCAM firewall =====
Chelsio NIC allows to configure hardware firewall with the use of cxgbetool(8): The [[https://service.chelsio.com/beta/drivers/ChelsioUwire-3.1.0.0/Chelsio-UnifiedWire-Linux-UserGuide.pdf|linux user guide]] gives a lots more details than the [[https://service.chelsio.com/beta/drivers/ChelsioUwire-FBSD-3.3.0.1/Chelsio-UnifiedWire-FreeBSD-UserGuide.pdf|FreeBSD user guide]].
A Chelsio NIC is defined by its family name + id and the port id (if it's a 4 ports, port 0 to 4).
Example with only one Chelsio (t5nex0) with 2 ports (0 and 1)
# grep t.nex /var/run/dmesg.boot
t5nex0: mem 0xf9300000-0xf937ffff,0xf8000000-0xf8ffffff,0xf9984000-0xf9985fff irq 40 at device 0.4 on pci4
cxl0: on t5nex0
cxl1: on t5nex0
t5nex0: PCIe gen3 x8, 2 ports, 66 MSI-X interrupts, 130 eq, 65 iq
Translating the firewall rule for the Chelsio:
* Add a filter to drop packets incoming from Chelsio NIC 0 (t5nex0) port 0 matching source IP range 198.18.2.0/24
# cxgbetool t5nex0 filter 0 iport 0 sip 198.18.2.0/24 action drop
# cxgbetool t5nex0 filter list
Idx Hits FCoE Port vld:VLAN Prot MPS Frag DIP SIP DPORT SPORT Action
0 0 0/0 0/7 0:0000/0:0000 00/00 0/0 0/0 00000000/00000000 c6120200/ffffff00 0000/0000 0000/0000 Drop
To check the packet dropping rate, this a small script will be used:
#!/bin/sh
set -euf -o pipefail
if [ $# -eq 0 ]; then
echo "Need Chelsio nexus name (examble: t5nex0)"
echo "List of Nexus detected:"
grep t.nex /var/run/dmesg.boot || true
exit 1
fi
VALUE=$(cxgbetool $1 filter list | awk '{if (NR!=1) {print $2}}')
echo "Filter hit rate"
while true; do
sleep 1
NEW_VALUE=$(cxgbetool $1 filter list | awk '{if (NR!=1) {print $2}}')
RATE=$((NEW_VALUE - VALUE))
VALUE=${NEW_VALUE}
echo ${RATE}
done
And its output during the 42Mpps DDoS:
# /tmp/cxgbe-filter-rate.sh t5nex0
32361520
32365722
32368802
32492850
32494303
32434792
32398556
The script report an hardware dropping rate of 32Mpps: Where are the other 10Mpps ?
Let's read the [[https://cgit.freebsd.org/src/tree/sys/dev/cxgbe/firmware/t5fw_cfg_hashfilter.txt|Chelsio default firmware configuration file of our T5 family NIC]]:
# TCAM has 8K cells; each region must start at a multiple of 128 cell.
# Each entry in these categories takes 4 cells each. nhash will use the
# TCAM iff there is room left (that is, the rest don't add up to 2048).
nroute = 32
nclip = 32
nfilter = 1008
nserver = 512
nhash = 524288
And we can display the current value applied:
# sysctl -n dev.t5nex.0.misc.devlog | grep -w le
12 576921 INFO RES le configuration: nentries 2048 route 32 clip 32 filter 1440 server 416 active 128 hash 0 nserversram 0
16 619796 INFO RES le initialization: nentries 2048 route 32 clip 32 filter 1440 server 416 active 128 hash 0 nserversram 0
To improve the TCAM performance for a filtering usage, all unused "regions" will be disabled to kept only the route and filter (32 entries for route + 2016 for filter = 2048 total).
For that we need to download a [[https://cgit.freebsd.org/src/tree/sys/dev/cxgbe/firmware/t5fw_cfg_hashfilter.txt|default TCAM firmware configuration file for our T5 NIC]] to modify its parameters then load the modified configuration into the NIC flash and instruct the NIC to use the file from its flash.
# fetch -o /etc/t5fw.txt https://cgit.freebsd.org/src/plain/sys/dev/cxgbe/firmware/t5fw_cfg_hashfilter.txt
# sed -i "" -e "s/nclip.*/nclip = 0/" /etc/t5fw.txt
# sed -i "" -e "s/nfilter.*/nfilter = 2016/" /etc/t5fw.txt
# sed -i "" -e "s/nserver.*/nserver = 0/" /etc/t5fw.txt
# sed -i "" -e "s/nhash.*/nhash = 0/" /etc/t5fw.txt
# echo 'hw.cxgbe.config_file="flash"' >> /boot/loader.conf.local
# cxgbetool t5nex0 loadcfg /etc/t5fw.txt
# reboot
Check the new tuned parameters:
# sysctl -n dev.t5nex.0.misc.devlog | grep -w le
12 690716 INFO RES le configuration: nentries 2048 route 32 clip 32 filter 1024 server 0 active 960 hash 0 nserversram 0
We confirm that regions server and hash are at 0 (disabled): Notice that region clip is not disabled and filter didn't have the size we've instructed, but the filtering performance expectation are matched.
Now the packet drop rate by the TCAM firewall match the generator's 42Mpps:
# /tmp/cxgbe-filter-rate.sh t5nex0
42218564
42147194
42229442
42171263
42210128
42165602
42165483
42223090
And the firewall is now able to forward all packets **without being too busy** in the same time:
[root@firewall]~# netstat -ihw 1
input (Total) output
packets errs idrops bytes packets errs bytes colls
46M 0 5.0k 2.7G 14M 0 801M 0
46M 0 2.6k 2.7G 14M 0 801M 0
46M 0 2.1k 2.8G 14M 0 801M 0
45M 0 515 2.7G 14M 0 801M 0
46M 0 1.7k 2.7G 14M 0 801M 0
46M 0 422 2.7G 14M 0 801M 0
46M 0 3.9k 2.7G 14M 0 801M 0
[root@firewall]~# nstat -I cxl0
InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
45.60 0.00 23.35 0.00 1553 1 67.15 393 3168947 1728658 123.75
45.56 0.00 23.33 0.00 4497 1 66.87 398 3171213 1729358 123.75
45.53 0.00 23.31 0.00 12418 1 66.76 372 3182570 1734057 123.75
45.52 0.00 23.31 0.00 9215 1 65.43 371 3183916 1735372 123.75