PFsense fails to reply to ARP request



  • Hi,

    Very rarely, our PFsense router doenst reply to ARP request anymore, causing the internet to be unreachable from the LAN.
    A reboot of the PFsense router fixes the issue. The router itself is reachable remotely over the internet during such outage.

    I fired up the packet capture tool on the PFsense during the outage, which you can find here: https://www.cloudshark.org/captures/553e8d631a65

    What could cause such behaviour?


  • LAYER 8 Global Moderator

    So that is sniff on pfsense lan?

    What is the mask on your interface?  I see one packet from 192.168.30.11 to 8.8.8.8 what mac did it send that too?  Why would you be seeing traffic from 192.168.30/?  When your pfsense lan IP is 192.168.1.254/?



  • Yep, the sniff is on the PFsense LAN. We have a couple of guest (wifi) networks on a different VLAN. Not sure why that traffic shows up on the LAN capture…


  • LAYER 8 Global Moderator

    It shouldn't so you should figure out why that is for sure.  It looks to be tagged with vlan ID 56, and being sent to a 00:dd:2a:e8:31:02

    What is that?  I don't show anything for maker of 00:dd:2a ??

    Your saying when this happens you reboot pfsense and everything is fine?  Did you try just downing and up the interface, or down up the switch port that is connected.  Is that mac your pfsense interface?

    So the pfsense lan interface is physical interface, and the IP your arping for is on that interface its not a vip? And pfsense is not VM or anything is it?  What hardware is it running on?



  • 00:dd:2a:e8:31:02 is the MAC of the LAN interface on the PFsense. The (tagged)  VLAN`s are also created on this interface. VLAN ID 56 is one of our guest networks indeed.
    I didnt try to just disable and enable the interface. Will do that next time it happens.

    The PFsense lan is a physical interface, connected to a non managed switch. The IP the clients ARP`ing for is not a VIP. Pfsense is running on a router motherboard with 6 INTEL network interfaces, 2GB RAM, 32GB SSD, 1037U celeron CPU.

    Thanks for your effort so far!


  • LAYER 8 Global Moderator

    "The PFsense lan is a physical interface, connected to a non managed switch."

    How are you doing vlan tagging if your connected to a dumb switch??



  • The switch doenst strip the VLAN tags, so that works perfect. In fact, most unmanaged switches are able to let VLAN tags through.
    Also, this issue occured when we had a different (managed) switch in place as well


  • LAYER 8 Global Moderator

    while the switch might not strip the tags it doesn't actually isolate the traffic so you have not actual barrier between your vlans.  You might as well just be running multiple layer 3 on the same layer 2.

    So if I am on your guest vlan, I could access stuff on your other vlans if I just arp for the IP, etc.

    Its a borked configuration for sure.



  • I setup the firewall to not allow vlans to access eachothers subnet. This works fine.

    As I said, the issue happend with a managed switch as well.


  • LAYER 8 Netgate

    Maybe an improperly-configured managed switch.


  • LAYER 8 Global Moderator

    ^ possible.  Whatever problems your seeing are now suspect to the fact your trying to run multiple layer3 over the same layer 2, be it your tagging traffic or not.  Trying to use vlans on a switch that does not support vlans throws in all kinds of new variables that could be causing problems.

    I also have a question to this hardware your running on and the mac..  I can not find the maker of that mac anywhere..

    That you think its ok to run tagged traffic over a dumb switch just because it doesn't strip them..  Just makes no sense to me..
    "I setup the firewall to not allow vlans to access eachothers subnet. This works fine. "

    No it doesn't.. Your on 1 big broadcast domain, Anyone can can talk to anyone else if they know the IP address, or just freaking arp for it… So while your system might keep grandma jane from talking to stuff on the other vlans.  Anyone with clue one or how to google for informatoin can talk to anything else they want no matter what firewall rules you put in place on pfsense..  Because no matter if you tag the traffic or not your connected to the same layer 2.  Your switch is not isolating the traffic, its dumb!!



  • That you think its ok to run tagged traffic over a dumb switch just because it doesn't strip them..  Just makes no sense to me..

    Actually, it's entirely accurate.  A VLAN frame is still just an Ethernet frame, with the VLAN header inserted.  A non-managed switch should pass VLAN frames, but it can't create or terminate them.


  • LAYER 8 Netgate

    But you are mixing your broadcast domains. It's messy and pretty much an invalid configuration just about guaranteed to fail in unpredictable ways such as what you are seeing.

    Actually, it's entirely accurate.  A VLAN frame is still just an Ethernet frame, with the VLAN header inserted. A non-managed switch should pass VLAN frames, but it can't create or terminate them.

    It also forwards all frames to all ports instead of just the ports on that VLAN, which is broken behavior.

    If you think "pfSense is failing to respond to ARP request" and you have a misconfigured network, don't be surprised if people tell you to fix your network first.

    At least back up your claim with some packet captures.



  • It also forwards all frames to all ports instead of just the ports on that VLAN, which is broken behavior.

    You may want to read up on Ethernet frames.  An Ethernet frame contains destination and source MACs, payload and frame check sequence.  Everything between the destination MAC and FCS is payload, including the VLAN header(s).  Since the MAC addresses are in the same location as always, switch forwarding works as always.  Passing VLAN traffic through an un-managed switch is no different than passing it through a trunk port in a managed switch, in that all traffic, VLAN or not is passing through it.  Please compare an un-managed switch, with a managed switch where all ports are trunk ports.  As for broadcast domains, the same amount of traffic is present, but devices see it according to the VLAN, if any, ithey're listing to.

    https://en.wikipedia.org/wiki/Ethernet_frame
    https://en.wikipedia.org/wiki/Virtual_LAN


  • LAYER 8 Netgate

    And you may want to read up on broadcast domains. Your configuration is nothing that someone who wanted to actually do work on his network would do.

    And, again, provide the pcap of pfSense not responding to ARP.

    Not sure why that traffic shows up on the LAN capture…

    Because you are mixing all your broadcast domains up. They are to be separated using VLANs for a reason.



  • I am quite familiar with what broadcast domains are.  I am well aware that VLANs are used to isolate broadcast domains.  However, again, VLANs passing through an un-managed switch is no different than passing through a managed switch with all ports configured as trunks.  In neither case is the VLAN traffic isolated from non-VLAN.  It is only the access ports, on a managed switch, configured for a specific VLAN or native LAN that there's any difference.  Then you only see the VLAN or native traffic, as configured.

    BTW, I am a Cisco CCNA and have been working with LANs since 1978 (yep, before there was such a thing as Ethernet).  I have also worked with the original 10base5 "ThiickNet" Ethernet and Token Ring.  I have worked with VLANs and had them pass through an un-managed switch without issue.  I have fired up Wireshark to see the mix on LAN & VLAN traffic on a network.  You can connect a computer to a trunk port, configure the NIC for whatever combination of VLANs you wish (very easy to do in Linux) and not even need an access port configured for a VLAN.  Again, I have done that.  I have even worked on systems with double VLAN headers, so yes, I have lots of experience in this area.


  • LAYER 8 Netgate

    I'm a CCNA too and have tapped my share of thicknet and there is no way I would ever design a network passing dot1q through an unmanaged switch.

    Difference is I am not posting on some forum asking why ARP is showing up on the wrong layer 3 interface.



  • I would ever design a network passing dot1q through an unmanaged switch.

    I guess it's time for a "black box" test.  You have a switch, you don't know whether it's a managed switch with all trunk ports or an un-managed switch and no way to find out.  Please explain any difference when passing mixed VLAN and non-VLAN traffic through it that might help you decide and why.  If you can't determine a difference, then there will be no difference in practice.


  • LAYER 8 Netgate

    If you know you are dealing with all trunk ports connected and you know the switch can handle the increased frame size and doesn't choke on passing the traffic and you know that all ports should receive all VLANs then you could use one. I would still not do it in any network that mattered.

    Obviously OP is missing something in this regard.



  • Why do you think a switch can't handle the increase frame size?  It's only with Ethertype/Length field used for length that it's an issue.  With Ethernet II (DIX), there is no length field and the frame ends only when the data stops.  This is what allows jumbo frames.  However, jumbo frames are so much larger that hardware has to be built to handle it. Standard Ethernet II frames can handle 1536 bytes, with IP MTU limited to 1500, so there's plenty of space for the VLAN header or even 2 at 4 bytes per header..  If a switch can properly handle Ethernet II, it can handle VLAN & IP and IP is normally carried on Ethernet II.

    Incidentally, over the years, I've often challenged "common knowledge" and found that it's not entirely accurate.  This is one example, where people make assumptions based on this common knowledge.  While they're generally true, they're not absolutely true.  I try to verify this through experiment, if possible.  You can do the same with an un-managed switch, Wireshark and a couple of computers running Linux.  Give it a try and see what turns up.  Try again with a managed switch and trunk ports.  You can also learn a lot by getting into the details of the protocols to see where limits may or may not exist.  One example of this is the length field in 802.3 vs type field in Ethernet II.  It is that length field that's the origin of the 1500 MTU limit, even though IP doesn't normally use 802.3.  In comparison, other network types, such as token ring or WiFi have a much larger MTU.  In fact the maximum IP MTU is 65K and even that's exceeded in some circumstances with "Jumbograms".

    https://en.wikipedia.org/wiki/Jumbogram

    Bottom line, don't take "common knowledge" as absolute.  It isn't.


  • LAYER 8 Netgate

    Thank you very much for the education. I hadn't learned anything new today.

    I have seen it both work and not work. I have seen bridge devices both properly pass 1q and not. Like I said, if all those conditions are true, you can use one.



  • This discussion provides much useful information to me, thank you guys.

    I setup a test lab to match the production environment. The only thing different is the switch (a 8 port 100mbit tplink switch instead of a 24 port tp link gigabit switch)

    Funny thing is, that when I capture on the LAN (non tagged) interface while querying a nameserver on the guest network, no traffic is captured from the guest network.

    Also, I cant reach the regular LAN from withitn the guest lan. Even when I change my IP to something in the LAN range / subnet. This conflicts with some posts here. Can somebody tell me what is supposed to happen?

    Im also wondering how you guys would setup a network with 2 wifi guest networks and one lan wifi network. I use Engenius accesspoints and bind the SSIDs to the VLAN tags. What is the preferred way of setting this up?



  • Ok, weird things are going on here. Yesterday I removed all VLAN`s  and guestnetworks, because I wanted to fix the intermittent downtime.

    Today, we experienced the same issues. We only have one LAN now. I made a capture again during the outage. https://www.cloudshark.org/captures/66c61a1e0b60

    Where should I be looking for? Could the PFsense box have a hardware issue?


  • LAYER 8 Global Moderator

    Same issues??  I see a bunch of syn going to different IPs via https, and those not getting any sort of answer so client sends a retrans.  And yeah seeing lots of arps for 192.168.1.254 and not seeing any answers

    What exactly is your pfsense running on, and with what nic.  I see dest to 00:dd:2a:e8:31:02 which you had stated was the mac of pfsense.  But this 00:dd:2a does not seem to be owned/registered by anyone..  I have tried multiple databases, wiresharks lookup, https://macvendors.com/ etc..

    If your using some cheap ass nic then I would look to that being a problem..

    "I`m also wondering how you guys would setup a network with 2 wifi guest networks and one lan wifi network"

    As to how I would set that up.  Well you would tag the SSIDs you want as guest, and just use a native untagged network for the lan.  Or you could tag all 3 if you wanted.  Your switch port connection to your AP(s) would be trunked with the vlans you want to allow, ie the ones your using as guest and native vlan set for your non tagged ssid if your going that route.  The port that connects to pfsense would be the same setting trunked with the specific vlans allowed and native vlan set if using native vlan.

    As to the switch config - in what possible scenario would you set a switch to all trunked ports with all vlans allowed?  Does not matter if the switch doesn't strip the tags or not.  It is not the way to do it.. be it you think its not common knowledge or not that dumb switches don't strip tags.  That has NOTHING to do with it..  The problem is the dumb switch doesn't freaking isolate the traffic.. Your running multiple layer 3 networks over the same layer 2 in this scenario..  And that is just borked plain and simple!  You use a smart/managed switch so you can create the multiple layer 2 networks on the 1 piece of hardware.

    Does not matter if running multiple layer 3s over the same layer 2 "can" sometimes work – why would anyone with any networking experience all ever do such a thing???  That your doing it or even suggesting it there is nothing wrong with it is just mind numbing!!



  • Same issues as in evrything is unreachable from within the LAN interface. The hardware has 6 Intel 1GBE lan ports. Strange it doesnt show up in the MAC databases. If this a hardware issues, it would be a first (in 20 deployments).

    I will swap the hardware this week and report back.

    Thanks for your input on how to setup the guest networks. I will build a test lab and fiddle around with it.


  • LAYER 8 Global Moderator

    "The hardware has 6 Intel 1GBE lan ports."

    What is the hardware?  What are the nics that provide the ports?

    That is not a intel mac.. You can lookup all the mac for intel, that is not listed for intel.



  • I replaced the router with one that I pulled out in a working state.

    Today they had the same issue. For some reason , it always happens after office hours. If it happens I always get a call from the first one to be at the office that the 'internet' is not working.

    I just investigated the logs. The firewall log shows that at 07:39 , devices fail to obtain an IP address. on 7:33 there is some IPv6 traffic, which I can not pinpoint (as I disabled ipv6)

    What can you make out of the log file?

    Could any external factor cause this behaviour?

    fw_logs.zip



  • @johnpoz:

    "The hardware has 6 Intel 1GBE lan ports."

    What is the hardware?  What are the nics that provide the ports?

    That is not a intel mac.. You can lookup all the mac for intel, that is not listed for intel.

    When I installed Windows on this box, they identified themselves as Intel nics, and installed an Intel driver. Im not sure why the MAC seems to be unregistrered.


  • LAYER 8 Netgate

    You are seeing traffic from link-local addresses (169.254.0.0/16). Looks like clients are failing to get DHCP on LAN for some reason. Hard to tell from that. What's in the DHCP logs?


Log in to reply