Inter-Device Connectivity Issues on pfSense

AG23

Hi there,

I have a homelab setup and will soon upgrade to an ISP offering fiber optic speeds above 1Gbps. To prepare, I upgraded my firewall appliance to support 6x 2.5Gb LAN using the latest Intel chipset.

Hardware Specifications:
Model: Glovary Firewall Mini PC Octa Core i3 N305
RAM: DDR5 16GB
Storage: 128GB NVMe SSD
LAN Ports: 6 x 2.5GbE i226V
Other Features: Fanless, AES-NI, Type-C Port, TF Card Slot

Setup (Bare-metal):
Before integrating the new router into my network, I performed a clean install and configuration of pfSense 2.7.2 community edition.

Network Configuration:
OS: pfSense 2.7.2 community edition
ISP Connection: Fiber <-> 10GB converter <-> RJ45

pfSense Interfaces:

Port 0 (WAN): Will be used to connect to ISP
Port 1 (LAN): Connected to a 1Gb switch with other 1Gb devices
Port 2 (LAN2): Connected to a 2.5Gb switch with multiple 2.5Gb and some 1Gb devices
Port 3 (WLAN): Connected to a Wireless Access Point in AP Mode
Port 4 (Guest - 192.168.60.1/24): Connected to another Wireless Access Point in AP Mode
Port 5 (OPT4 - 192.168.2.1/24): Initial configuration to set up the bridge interface (will be deleted later)
Bridge0 (192.168.178.1/24): Configured in pfSense to group LAN, LAN2, and WLAN interfaces into the same subnet

Kea DHCP Configuration:

GUEST: 192.168.60.100-200, Gateway 192.168.60.1
OPT4: 192.168.2.100-200, Gateway 192.168.2.1
BRIDGE: 192.168.178.100-200, Gateway 192.168.178.1

Firewall Rules:

All interfaces, except WAN, are set to allow all IPv4 traffic for any protocol to troubleshoot this issue.

VLANs:

No VLAN configurations are present.

The Problem:
After configuring the setup, I tested connectivity by connecting two devices to different ports (Port 1 and Port 2). Both devices received IP addresses from the DHCP server (192.168.178.1) and can ping each other.
When connecting one of the devices to Port 4 (the GUEST interface), it retrieves its DHCP address from the DHCP server (192.168.60.1) but the ping check fails. Both devices can ping their respective gateways.

Troubleshooting Steps Taken:

Checked/Enabled ICMP-4 on both machines.
Connected both devices to BRIDGE (Port 1, 2, or 3) interface, ping check OK.
Connected one device to the BRIDGE interface and one device to the GUEST interface, ping check 'Request timed out.' Both devices can ping their respective gateways.
Deleted all rules and set allow-all rules on all interfaces (excluding WAN).
Connected one device to the BRIDGE interface and one device to the GUEST interface, ping check 'Request timed out.' Both devices can ping their respective gateways.
Disabled hardware checksum offloading as it can cause rare issues.
Connected one device to the BRIDGE interface and one device to the GUEST interface, ping check 'Request timed out.' Both devices can ping their respective gateways.

Despite these efforts, the problem persists. My guess is that I am overlooking some setting that needs to be configured, but at this point, I don’t know where to look anymore. I'm looking for guidance on resolving this issue.

I've included some screenshots for reference. Any help would be greatly appreciated.

stephenw10

Are you sure the targets you are pinging between allow pings from outside their own subnet? Windows firewall will block that by default.

Check the state table in Diag > States whist you're pinging. Make sure it's opening states on both interfaces.

AG23

@stephenw10 Good question. I checked and found that ICMPv4-Out was indeed disabled for all profiles on both devices. I enabled ICMPv4 for both inbound and outbound traffic for all profiles public, private and domain in Windows for both devices. Unfortunately, this did not resolve the issue. Both devices still respond with 'Request Timed out'.

I also checked the "Diag > States" page while pinging (see attachment for the results). During the ping command, I observed that the first number in the "Packets" column increased with pings from the 192.168.178.100 subnet to 192.168.60.101, and the second number increased with pings from the device in the 192.168.60.101 subnet to 192.168.178.100.

stephenw10

Windows uses the ID for all ping traffic (1). So if you send pings from 192.168.60.101 to 192.168.178.100, whilst a ping is already running the other way, it will use the existing open states. Hence you see that as replies.

It's opening states correctly there though. It looks like the hosts are still just not replying.

You could confirm that with a packet capture on the guest interface to be sure the ping requests are leaving there towards the correct MAC.

AG23

@stephenw10 Based on your response, It got me thinking the issue might be with the ping requests and how the Windows machines respond to them rather than a pfsense network configuration problem. To be sure, I ran a simple web server on the machine connected to the GUEST interface and successfully connected to it from the machine on the LAN interface.

I believe that using only a ping as a test mechanism to verify the connection between the two subnets led me in the wrong direction. Can I conclude that being able to connect to the web server on port 5000 indicates that pfSense is working correctly?

If so, I will proceed with configuring the router, assuming that a ping is not a reliable test method, before integrating it into the network.

(favicon.ico 404 error is expected because it doesn't have a favicon.ico)

stephenw10

Yup seems good.

Something is rejecting the pings on those hosts.

HLPPC

@AG23 When you set up a bridge, using an intel NIC, pfSense puts interrupts on the same CPU thread.

If you try some TCP/IP protocols from differing interfaces from the bridge, the interrupt to create the packet and/or state may not work properly.

Try, in shell, to verify this with:

sysctl -a | grep interrupt

Also dmesg | grep msi

There is a lot of stuff going on with a multicore cpu.

Also, with that bridge, an ISP may try and send you spanning tree protocol and with STP, and you could have loop detection errors with switches depending on vendor. Maybe I could advise using netgate's L2 firewall. And a /23 subnet for communicating between two different subnets.

Also, some realtek NICs use DPKG and try to prioritize some UDP packets before others in their chipsets. Good luck.

HLPPC

@AG23 also there are packet filters on bridge interfaces and member interfaces and a default deny rule could possibly be blocking your pings.

HLPPC

@AG23

https://man.freebsd.org/cgi/man.cgi?query=if_bridge&sektion=4&apropos=0&manpath=FreeBSD+10.1-RELEASE

https://youtu.be/XoLPGH4awKc?si=zLdYHiHUouDkxScT

https://docs.netgate.com/pfsense/en/latest/bridges/create.html

Careful with creating bridges. You may end up needing way better hardware to manage everything.

at 16:40 in his video he says "we don't want to allow spanning tree to just happen, but it will" and if your ISP starts sending it to you, you may need to call them.

AG23

@stephenw10: Thanks for the help! I will continue configuring the router. Once I plug it into my network, I can perform more extensive testing.

@HLPPC:
/root: sysctl -a | grep interrupt
igc0: Using MSI-X interrupts with 5 vectors
igc1: Using MSI-X interrupts with 5 vectors
igc2: Using MSI-X interrupts with 5 vectors
igc3: Using MSI-X interrupts with 5 vectors
igc4: Using MSI-X interrupts with 5 vectors
igc5: Using MSI-X interrupts with 5 vectors
atrtc0: Can't map interrupt.
vm.stats.vm.v_interrupt_free_min: 2
hw.bxe.interrupt_mode: 2
hw.ix.max_interrupt_rate: 31250
hw.igc.max_interrupt_rate: 20000
hw.em.max_interrupt_rate: 8000
hw.cxgbe.interrupt_types: 7
dev.igc.5.interrupts.rx_desc_min_thresh: 0
dev.igc.5.interrupts.asserts: 1
dev.igc.4.interrupts.rx_desc_min_thresh: 0
dev.igc.4.interrupts.asserts: 57662
dev.igc.3.interrupts.rx_desc_min_thresh: 0
dev.igc.3.interrupts.asserts: 1
dev.igc.2.interrupts.rx_desc_min_thresh: 0
dev.igc.2.interrupts.asserts: 1
dev.igc.1.interrupts.rx_desc_min_thresh: 0
dev.igc.1.interrupts.asserts: 60463
dev.igc.0.interrupts.rx_desc_min_thresh: 0
dev.igc.0.interrupts.asserts: 1

/root: dmesg | grep msi
dmseg: Command not found.
/root: dmesg
dmseg: Command not found.
/root: which dmesg
dmseg: Command not found.

I did have packet logging enabled for the GUEST and BRIDGE interfaces, disabled it now with no difference in ping results are still 'Request timed out.'.

My ISP provided me with a 'Huawei OptiXstar HN8010Ts-20' which has 1x SC/APC and 1x 10GB RJ45. My plan is to connect the pfSense device directly to the Huawei device. From my ISP community forum, I seen how to connect my own router directly and i will need to configure VLAN 300 on igc0 (WAN).

I'm somewhat familiar with Spanning Tree Protocol (STP) and the problems that can arise when bridging ports and creating loops but I hadn't considered that my ISP could cause this problem. I will investigate if others have encountered these issues in my ISP community forum and keep this in mind if i run into further problems.

stephenw10

@AG23 said in Inter-Device Connectivity Issues on pfSense:

dmseg: Command not found.

It's: dmesg

AG23

/root: dmesg | grep MSI
igc0: Using MSI-X interrupts with 5 vectors
igc1: Using MSI-X interrupts with 5 vectors
igc2: Using MSI-X interrupts with 5 vectors
igc3: Using MSI-X interrupts with 5 vectors
igc4: Using MSI-X interrupts with 5 vectors
igc5: Using MSI-X interrupts with 5 vectors
ig4iic0: Using MSI

I don't have in-depth knowledge about interrupt handling. So I'm not sure how to interpret this or whether it could cause performance issues, either in combination with a bridge interface or generally with a 2.5GB network.

HLPPC

@AG23 I suppose I was wrong about searching interrupts although I swear I found them before. Regardless, here is the correct search:

sysctl -ad | grep link_irq

On my bridge the output is:
dev.igc.x.link_irq

And all of the bridged interfaces are on the same IRQ. Somewhere I could probably find which thread it is.

Had to boot some old equipment for that.

remove the d from -ad to drop descriptions

HLPPC

@AG23

I don't have in-depth knowledge about interrupt handling. So I'm not sure how to interpret this or whether it could cause performance issues, either in combination with a bridge interface or generally with a 2.5GB network.

Well, cpu mapping and machdep are kind of crazy sometimes. And NTP servers can slow you down by 20ms while using them or traffic shaping them.

I don't recommend mixing 2500mbps and 1gbps links because duplexes get messed with and autonegotiation can be wild with NBase-T downshifting. And mixing them can cause bufferbloat. Eliciting head of line qos features on various hardware.

Also, sometimes you may want crossover cables. And your ISP may have SD-WAN software trying to manage your bandwidth for and with you.

You may also need perfect cat5e cables or shielded cat6 or cat6a cables with appropriate electrical grounding to minimize crosstalk at those speeds, or keep all of your cables parallel and away from power sources.

With Nbase-T I have ran into issues where an uplink was 2500 mbps and a downlink was 1gbps, and in promiscuous mode or during a TCP dump, the link may even start communicating with a motherboard's built in-NIC if it is from an external vendor, causing nbase-t connections to use a gigabit mobo's drivers for kernel level packet capturing, resulting in gigabit speeds.

Also, sometimes gigabit devices and programs try to scan your network with curl and UDP packets and run into the 2500 mbps linkspeeds, especially at ISP router all in ones, causing them to have issues with POSIX and causing memory issues because the data was schmoving too fast.

Also, DNS and DNSSEC can allow for encryption but links getting slower from encryption may cause a downshift. And encryption suddenly giving up can cause data to suddenly move faster. It is a good idea to use an unmanaged switch with no web interface between your pfSense and ISP router. I found a yuanley 2500mbps unmanaged switch that forces nbase-t to stay that way. They have no weird ethertypes either.

HLPPC

@AG23

https://archive.nbaset.ethernetalliance.org/wp-content/uploads/2017/05/NBASET-Downshift-WP-1217.pdf