PfSense KVM with Proxmox problem: pings go out, but other traffic not? [Solved]
-
I have a Proxmox cluster of machines installed and want to set up pfSense 2.2 as a firewall/router to control access and allow VPN L2TP/IPsec connections to be made to the VM in the cluster. Here's my setup:
I have battled with the L2TP setup extensively. Once the link is up, I can ping all the hosts on the Cluster LAN, but not SSH, HTTP or anything else for that matter.
I then realised that if I attempt to access the internet from any of the hosts on the LAN, I can ping anything on the internet, but also not use other services like SSH for example.
To be more specific. Host S1 has the following network config: (vmbr0 is LAN in my diagram above and vmbr1 is WAN)
# network interface settings auto lo iface lo inet loopback auto eth0 iface eth0 inet manual auto eth1 iface eth1 inet manual auto eth3 iface eth3 inet manual iface eth2 inet manual auto bond0 iface bond0 inet manual slaves eth0 eth1 bond_miimon 100 bond_mode 802.3ad bond_xmit_hash_policy layer2 auto vmbr0 iface vmbr0 inet static address 192.168.121.33 netmask 255.255.255.0 dns-nameserver 192.168.121.1 8.8.8.8 bridge_ports bond0 bridge_stp off bridge_fd 0 gateway 192.168.121.1 auto vmbr1 iface vmbr1 inet manual bridge_ports eth3 bridge_stp off bridge_fd 0
Eth3 does not have an ip address on the host, only via vmbr1 on the WAN port of the pfSense VM, thus not allowing any traffic into/out of the network except through the pfSense WAN port.
The routes on S1:
~# ip route 192.168.121.0/24 dev vmbr0 proto kernel scope link src 192.168.121.33 default via 192.168.121.1 dev vmbr0
~# ip addr show 1: lo: <loopback,up,lower_up>mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <broadcast,multicast,slave,up,lower_up>mtu 1500 qdisc mq master bond0 state UP qlen 1000 link/ether 00:21:28:8e:b6:52 brd ff:ff:ff:ff:ff:ff 3: eth1: <broadcast,multicast,slave,up,lower_up>mtu 1500 qdisc mq master bond0 state UP qlen 1000 link/ether 00:21:28:8e:b6:52 brd ff:ff:ff:ff:ff:ff 4: eth2: <broadcast,multicast>mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:21:28:8e:b6:54 brd ff:ff:ff:ff:ff:ff 5: eth3: <broadcast,multicast,up,lower_up>mtu 1500 qdisc mq state UP qlen 1000 link/ether 00:21:28:8e:b6:55 brd ff:ff:ff:ff:ff:ff 6: bond0: <broadcast,multicast,master,up,lower_up>mtu 1500 qdisc noqueue state UP link/ether 00:21:28:8e:b6:52 brd ff:ff:ff:ff:ff:ff 7: vmbr0: <broadcast,multicast,up,lower_up>mtu 1500 qdisc noqueue state UNKNOWN link/ether 00:21:28:8e:b6:52 brd ff:ff:ff:ff:ff:ff inet 192.168.121.33/24 brd 192.168.121.255 scope global vmbr0 inet6 fe80::221:28ff:fe8e:b652/64 scope link valid_lft forever preferred_lft forever 8: vmbr1: <broadcast,multicast,up,lower_up>mtu 1500 qdisc noqueue state UNKNOWN link/ether 00:21:28:8e:b6:55 brd ff:ff:ff:ff:ff:ff inet6 fe80::221:28ff:fe8e:b655/64 scope link valid_lft forever preferred_lft forever 9: venet0: <broadcast,pointopoint,noarp,up,lower_up>mtu 1500 qdisc noqueue state UNKNOWN link/void inet6 fe80::1/128 scope link valid_lft forever preferred_lft forever 10: tap103i0: <broadcast,multicast,promisc,up,lower_up>mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500 link/ether a6:54:df:4a:c4:39 brd ff:ff:ff:ff:ff:ff inet6 fe80::a454:dfff:fe4a:c439/64 scope link valid_lft forever preferred_lft forever 11: tap103i1: <broadcast,multicast,promisc,up,lower_up>mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500 link/ether 1a:96:aa:1e:32:8f brd ff:ff:ff:ff:ff:ff inet6 fe80::1896:aaff:fe1e:328f/64 scope link valid_lft forever preferred_lft forever 12: tap101i0: <broadcast,multicast,promisc,up,lower_up>mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500 link/ether 4e:15:b3:15:b1:4b brd ff:ff:ff:ff:ff:ff inet6 fe80::4c15:b3ff:fe15:b14b/64 scope link valid_lft forever preferred_lft forever 13: tap105i0: <broadcast,multicast,promisc,up,lower_up>mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500 link/ether 0e:5c:b9:c1:e4:17 brd ff:ff:ff:ff:ff:ff inet6 fe80::c5c:b9ff:fec1:e417/64 scope link valid_lft forever preferred_lft forever</broadcast,multicast,promisc,up,lower_up></broadcast,multicast,promisc,up,lower_up></broadcast,multicast,promisc,up,lower_up></broadcast,multicast,promisc,up,lower_up></broadcast,pointopoint,noarp,up,lower_up></broadcast,multicast,up,lower_up></broadcast,multicast,up,lower_up></broadcast,multicast,master,up,lower_up></broadcast,multicast,up,lower_up></broadcast,multicast></broadcast,multicast,slave,up,lower_up></broadcast,multicast,slave,up,lower_up></loopback,up,lower_up>
Just for clarity: I use bonded LACP ethernet between my hosts since I'm also using ceph to virtualise the hard disks into a cluster. That's where the bonded ethernet fits in.
My question is: What is wrong with the setup and what should I add to pfSense make this work?
~# traceroute 8.8.8.8 traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets 1 pfSense.aaaa.com (192.168.121.1) 0.251 ms 0.232 ms 0.231 ms 2 xx.yy.zz.129 (xx.yy.zz.129) 1.020 ms 1.039 ms 1.045 ms 3 xx.qq.ww.34 (xx.qq.ww.34) 16.400 ms 16.437 ms 16.411 ms 4 google.jb1.napafrica.net (196.46.25.166) 16.566 ms 16.576 ms 16.579 ms 5 72.14.239.35 (72.14.239.35) 16.950 ms 72.14.239.117 (72.14.239.117) 16.922 ms 72.14.239.35 (72.14.239.35) 17.106 ms 6 * * * 7 * * *
However, the following just sits there:
~# wget http://www.google.com --2015-02-17 00:11:50-- http://www.google.com/ Resolving www.google.com (www.google.com)... 216.58.223.36, 2c0f:fb50:4002:803::2004 Connecting to www.google.com (www.google.com)|216.58.223.36|:80... connected. HTTP request sent, awaiting response...
pfSense has the following firewall rules:
| IPv4 |
| Destination | Gateway | Flags | Use | MTU | NetIF |
| default | 41.71.68.129 | UGS | 20076 | 1366 | vtnet1 |
| 41.71.68.128/29 | link#2 | U | 30020 | 1366 | vtnet1 |
| 41.71.68.130 | link#2 | UHS | 8 | 16384 | lo0 |
| 127.0.0.1 | link#5 | UH | 110 | 16384 | lo0 |
| 192.168.121.0/24 | link#1 | U | 9665 | 1500 | vtnet0 |
| 197.242.201.218 | 41.71.68.129 | UGHS | 0 | 1366 | vtnet1 | -
You are running into this problem: https://forum.pfsense.org/index.php?topic=88467.0
Disable tx offloading on the pfSense interfaces on the hypervisor side and you're good to go. If you want to overkill is, disable tx offloading on the whole bridge, or all bridges.
If your bridge were to be called lan-br, you would run: sudo ethtool -K lan-br tx off
-
Spot on, John!
I'll have read up on how to make the ethtool changes permanent. On Debian/Ubuntu, in /etc/network/interfaces, simply add:
iface vmbr0 inet static
address 192.168.121.33
netmask 255.255.255.0
gateway 192.168.121.1
post-up /sbin/ethtool -K $IFACE tx offOther distro's will have similar mechanisms.
Also: pfSense has the option to turn "Hardware Checksum Offloading" off. Check it under System: Advanced: Networking
Thank you!
-
Tiny remark in that: if you disable tx offloading on the complete bridge, all traffic with no checksum will be recalculated. That means that the CPU will need to do a checksum on every single packet passing that bridge. Depending on the amount of traffic, that might be quite a heavy impact on performance.
Because the problem lies within FreeBSD and it's netfront handling, in practise, only disabling tx offloading on the hypervisor-side interface for the LAN-interface for pfSense is enough.
That means that if you can add an ethtool line somewhere after the VM for pfSense is started to just disable tx offloading for that single pfSense LAN interface, it might save CPU cycles for the traffic passing that bridge that is between VM's and not to pfSense. -
@johnkeates:
Tiny remark in that: if you disable tx offloading on the complete bridge, all traffic with no checksum will be recalculated. That means that the CPU will need to do a checksum on every single packet passing that bridge. Depending on the amount of traffic, that might be quite a heavy impact on performance.
Because the problem lies within FreeBSD and it's netfront handling, in practise, only disabling tx offloading on the hypervisor-side interface for the LAN-interface for pfSense is enough.
That means that if you can add an ethtool line somewhere after the VM for pfSense is started to just disable tx offloading for that single pfSense LAN interface, it might save CPU cycles for the traffic passing that bridge that is between VM's and not to pfSense.Is it not enough to just uncheck the box in the pfsense VM for tx offloading?
-
No it is not. The problem isn't with pfSense, but with packets from the other VM's not having correct checksums and getting dropped inside the netfront driver on BSD.
Packets for pfSense need to have a correct checksum before they reach any pfSense virtual interface.
-
I'm curious, what is the recommended method for disabling tx checksums on the hypervisor side for kvm/qemu? I'm running proxmox and it creates tap interfaces for each VM (tap110i0), ethtool doesn't seem to support tap interfaces, it returns:
ethtool -K tap118i0 tx off Cannot change tx-checksumming Actual changes: generic-segmentation-offload: on
I would prefer to not disable it on the bridge interface (vmbr0) due to the performance hit johnkeates mentions. Also, does this only affect virtio, is e1000e unaffected?