Can't ping anything in LAN, everything else works?



  • I'm sure the fix for my problem is simple, but I've been banging my head against for awhile with no progress. I have two pfSense VMs on ESXi in a failover configuration (they are both running on the same server until i get a second server to migrate one of them to).

    Each pfSense VM has five interfaces, one for the WAN connection, and four for seperate vlans (ESXi vSwitch is handling the vlans, so each pfSense VM doesnt have vlan subinterfaces), this is the interface setup:

    PF1:

    vmx0(VLAN0) (WAN): 192.168.1.10

    vmx1(VLAN10) (LAN): 10.1.0.1

    vmx2(VLAN20) (OPT1): 10.2.0.1

    vmx3(VLAN30) (OPT2): 10.3.0.1

    vmx4(VLAN40) (OPT3): 10.4.0.1 (used for PFsync)

    PF2:

    vmx0(VLAN0) (WAN): 192.168.1.11

    vmx1(VLAN10) (LAN): 10.1.0.2

    vmx2(VLAN20) (OPT1): 10.2.0.2

    vmx3(VLAN30) (OPT2): 10.3.0.2

    vmx4(VLAN40) (OPT3): 10.4.0.2 (used for PFsync)

    Right now, almost everything works. I can shut one off and the other takes over the CARP interface IP, they replicate states between each other, all seems well. BUT, they cant ping between each other, or to any other device, on the LAN (VLAN10) interface.

    I've been through every setting in each VM and in ESXi that I can think of and cannot get ping between them or to other devices working on VLAN10, even though all the other interfaces work fine and devices TO the pfSense VMs interface works fine.

    Heres what happens when I ping around:

    Source IP: Dest IP:

    VLAN10: PF1 10.1.0.1 >>> PF2 10.1.0.2: FAIL

    VLAN10: PF1 10.1.0.1 >>> PC 10.1.255.254: FAIL

    VLAN10: PC 10.1.255.254 >>> PF1 10.1.0.1: OK (can access webGui no problem, and reach internet)

    VLAN10: PF2 10.1.0.2 >>> PF1 10.1.0.1: FAIL

    VLAN10: PF2 10.1.0.2 >>> PC 10.1.255.254: FAIL

    VLAN10: PC 10.1.255.254 >>> PF2 10.1.0.2: OK (can access webGui no problem, and reach internet)

    VLAN20: PF1 10.2.0.1 >>> PF2 10.2.0.2: OK (and all other devices)

    VLAN30: PF1 10.3.0.1 >>> PF2 10.3.0.2: OK (and all other devices)

    VLAN40: PF1 10.4.0.1 >>> PF2 10.4.0.2: OK (nothing on this VLAN except the two pfSense VMs)

    So, in short, pinging works everywhere to everything EXCEPT from the pfSense LAN/VLAN10 interfaces to other devices and to each other. Devices using them as the gateway have no problem reaching the internet or other networks, I didn't even realize there was a problem until I noticed that DHCP failover wasn't working for just VLAN10, but worked on VLAN20 and VLAN30.

    The first thought was that the firewall was blocking this somehow, but the firewall rules are wide open, and there isn't too many configuration options between the interfaces and ESXi that could really mess with this.

    So now i'm wondering, is the "LAN" designated interface treated differently by pfSense vs the OPT interfaces? I'm no networking genius by any means, but this really seems like it should be working. Does anyone have any ideas what could be causing this not to work?

    Here's a screenshot of the VLAN10 LAN interface firewall rules: screenshot

    the 102 and 103 interfaces are exactly the same, and they work just fine for this.


  • Netgate Administrator

    Is it NATing out of the LAN incorrectly maybe?

    Run a pcap on that interface are the ping requests actually leaving? With the correct source IP/MAC?

    Steve



  • I'm using hybrid outbound NAT mode and all the mappings are the pfSense generated ones, no changes there. No custom port forwarding, or 1:1 mappings.

    Using the webGui built in packet capture, I can capture ICMP echo requests and replys between working interfaces, and from my PC to PF1 (10.1.255.254 >>> 10.1.0.1). But when running the capture on 10.1.0.1 >>> 10.1.255.254, no packets are captured, no echo request appears to be sent from the 10.1.0.1 address.


  • Netgate Administrator

    But the error you see is no replies as though it does have a route and is sending?

    Check the routing table.

    Check the state table while pinging, which interface is it opening a state on for the ping?

    If there is no state do you have outbound blocking firewall rules there?

    Steve



  • I assumed routing wouldn't really be in the picture as the 10.1 network was a directly connected interface but it seems it may be an issue. I just checked the state table as you suggested and something appears to be off. It appears that outbound pings are originating from my WAN CARP interface IP. Is that expected behavior to the LAN network, why wouldn't the other networks experience the same thing?

    Heres the state table when pinging from 10.1.0.1 >>> 10.1.0.2:
    state_table.jpg

    And heres a working scenario from 10.3.0.1 >>> 10.3.0.2:
    working.jpg

    Heres my routing table:
    routing table.jpg


  • Netgate Administrator

    Seems like you have a bad NAT rule. It's NAT traffic leaving LAN to the WAN CARP VIP.

    Post the NAT rules table.

    Steve



  • The NAT rule was the problem. I apologize, I didn't pay enough attention when I looked at it before and told you that it was just auto generated ones. There was one rule that I had made when I first set the routers up and was configuring high availability that NAT-ed the 10.1 network to the 192.168.1.12 WAN CARP address. The guide I used when when I was configuring high availability said it was needed for seamless failover, and I totally forgot about it. The 10.2-10.4 networks were created later on and no rule was made for them, so they were unaffected. I disabled the rule and now everything looks to be working as it should.

    Thanks a lot for your help, this was driving me crazy and I was about to just rebuild them both from scratch.


  • Netgate Administrator

    That would do it if it was on the LAN interface.

    However you do need to NAT the internal subnets to the WAN CARP VIP on the WAN interface. Without that when it fails over the states will no longer be valid and new states have to be created.
    https://docs.netgate.com/pfsense/en/latest/highavailability/configuring-high-availability.html#setup-manual-outbound-nat

    Steve


Log in to reply