Unexplained NAT failure.



  • Hi,

    Long story. At the moment I'm in Bangkok and I have the buss breathing down my neck back home in Australia. Some time yesterday when I was en-route to Thailand, my pFsense box failed and I have no idea why.

    The configuration is as follows:

    WAN with Internet connectivity. The gateway IP is part of our public /29 subnet and the interface also has an IP on that subnet.
    LAN - 10.0.0.0/24 network, with 10.0.0.254 address interface.
    OPT1 - Bridged with WAN. to create bridge0. No IP address.

    I'm behind a restrictive firewall here, but I can still glean some information…

    1. I can connect to the pFsense web configurator via the WAN interface.
    2. I can SSH to the pFsense box.

    When logged in via SSH, I can ping Internet hosts, LAN hosts, and OPT1 hosts. When SSH'd to LAN hosts (using the pFsense console to SSH) I can ping the pFsense box - both LAN and WAN IP's but not Internet hosts and not OPT1 hosts.

    pfTop displays:
    pfTop: Down State no entries, View: default, Order: none, Cache: 10000

    And there are no states in the states table.

    I have told the firewall to log all traffic on the LAN interface. Nothing shows.

    I need to head to 711 to buy a power adaptor, 'cause my laptop is about to die.

    Any thoughts will be much appreciated. I'll be able to offer more information when I have electricity.

    Thanks,

    Finch!

    PS. Here's the routing table, edited to hid the WAN subnet...

    Internet:
    Destination        Gateway            Flags    Refs      Use  Netif Expire
    default            1xx.1xx.2xx.5x     UGS         0     1572    re0
    10.0.0.0           link#2             U           0      618    em0
    xxxx             link#2             UHS         0        0    lo0
    localhost          link#7             UH          0      137    lo0
    1xx.1xx.2xx.xx/29  link#1             U           0        0    re0
    xxxx               link#1             UHS         0        0    lo0



  • Still no dice.

    I'm weirded out by nothing in the states table, nothing in pftop, and nothing in the firewall logs… and what's more, I'm confused as to why this box is exhibiting this behaviour with no changes having been made in months.

    One would think that whilst I am connected via both SSH and HTTPS there would be at least some states or something showing in pftop. But there's nothing.

    The behaviour doesn't match any kind of hardware failure that I've encountered before - if a NIC were to fail I'd be unable to access the networks it's connected to.

    It appears to be a NAT problem, but the states and pftop issues are kinda confusing me...



  • Something I've noticed that may be part of the problem…

    When I reload the NAT configuration and monitor the reload, it does not progress beyond this message:

    Processing early nat rules for package /usr/local/pkg/squid.inc...

    As part of the troubleshooting process I removed Squid. There is no squid.inc in that directory.

    Edit. I'm trying to reinstall Squid. The installation hangs at "Reconfiguring filter... One moment please..." with the message "This operation may take quite some time, please be patient.  Do not press stop or attempt to navigate away from this page during this process."

    It is taking quite some time... it's a fast machine - Q6600 - and it's been taking quite some time for the past ten minutes. The system log displays this:

    Aug 6 15:05:06 squid[17791]: Bungled (null) line 182: http_reply_access allow all
    Aug 6 15:05:06 Squid_Alarm[17117]: Attempting restart…
    Aug 6 15:05:06 Squid_Alarm[16609]: Squid has exited. Reconfiguring filter.
    Aug 6 15:05:01 squid[13956]: Bungled (null) line 182: http_reply_access allow all
    Aug 6 15:05:01 check_reload_status: Reloading filter

    It doesn't tell me which file line 182 is in so I can investigate…



  • If you remove squid, do you get a working firewall?



  • @podilarius:

    If you remove squid, do you get a working firewall?

    Nearly. I talked someone through a factory reset. Now LAN clients can see everything but the servers on the OPT1 bridge.

    Everyone else - the Internet on the WAN interface - can see those servers. I think I know what the problem is - it's with the KVM configuration for the servers. pfSense is passing the packets.

    I'm struggling to install the OpenVPN export tool because apparently the pfSense box has no Internet connectivity even though I'm connected to it via SSH and HTTP and it can resolve hostnames. But there's an error message when I navigate to the list of packages to install.



  • I have managed to access the box running KVM and the Ovirt management tool. Hooray for port forwarding! Unfortunately rebooting that machine hasn't made a difference.

    At the moment, the situation is this:

    • LAN can see the Internet. This is a good thing.
    • WAN can see the Internet. Very good!
    • Internet can see OPT1. Great success.
    • LAN cannot see servers on OPT1. Very bad.

    Basically, from my hotel in Bangkok I can ping/access servers on OPT1 but I cannot do so from the pfSense box itself, nor from the LAN.

    This is driving me nuts. I really don't want to spend 14 hours getting home to perform a 4 hour fix (building a new box with new hardware) and then head back here… another 14 hours... ugh.

    Any suggestions?


  • Banned

    Are they on different physical interfaces??



  • Yes, they are. WAN is re0, LAN is em0, and OPT1 (connected to the virtualised servers) is on em1.


  • Banned

    do you have rules for em0 -> em1 ?



  • I have the default LAN to anywhere rule enabled and I have a rule on OPT1 allowing all traffic - OPT1 is essentially unfiltered at the moment.



  • I would check the routes. Then I would check NAT rules (outbound as well). Are there any other rules besides your allow all in opt1 and LAN firewall rules?
    Have you done a tcpdump at the pfsense machine first on LAN, then on opt1 to see if the traffic is making it through the firewall correctly?



  • Well, it looks as though the problem has resolved itself.

    And by that I mean that the guy who unplugged a cable from a switch and then replaced it to a different port finally told me that he had done so. Unsurprisingly, when that was fixed, the problem "magically" disappeared.

    Thanks for the help, guys :)



  • Ah … the user tried to hide his mistake ... happens all the time. Glad you have the issue resolved ... and don't have to make crazy flight plans for a 1 minute fix.


Locked