Issues after upgrade to 2.4.4 on all firewalls : Diagnostic ->Tables is empty



  • Hi to all,
    I need your advice about a problem I encountered after upgrading to version 2.4.4
    The problem is present on a group of 6 firewalls I upgraded from various version (2.3.x and 2.4.x)
    All of these installation are into a vmware esx environment.

    The way I choose for upgrading was
    *save old configuration
    *fresh installation
    *configuration import
    and all of the went fine.

    The problem is not present at the boot of the device; the firewall suddenly starts blocking connection that are allowed by rules after a random time (No time schedules)

    I will focus on a particular device (upgraded from 2.3.5 → 2.4.4) but the issue is the same on all of them.
    This one in particular is an interfnal firewall and has a single interface: the clients connect to the openvpn service and get routed to the lan. It is configured with 2 CPU and 1 GB ram (same as before the upgrade)
    The configuration is in place since a few year, we only add and remove clients to allow their connection.

    So the rule is:
    allow source_group → destination_group destination_port

    Furter investigations shows that
    diagnostic → tables → source_group table
    is empty

    At the same time
    pfctl -t source_group -T show
    is empty

    The problem is more evident on this rule because it is needed by many users, but it appens even on other tables and makes the corresponding rules useless.

    I had no other way to allow connection throug the firewall than rebooting.
    I tried creating a new source_group, a new destination_group, recreating the rules from zero.
    The problem is still present.
    Sometimes it appens twice in a day, sometimes it appens less frequently: so it is not easy to keep an eye on it.
    I’ve seen no information about problems into the logs.
    Does anyone have the same problem?
    Can someone suggest a way to solve?


  • Netgate Administrator

    What is that table made up of? All IPs? FQDNs? A combination?

    If you go to Status > Filter Reload does it show an error?

    If there are FQDNs in that check the DNS logs for errors resolving them.

    Steve



  • Hi,
    sorry for the delay in my answer.
    The table is made of alias or group of alias.
    Each of those is an IP: no fqdn and no unresolvable addresses.
    No mix between ip and FQDN.

    When the problem will happen again, I will check for Status -> filter Reload.

    I'm triyng to install from scratch a 2.4.3 version, but what i see is that i cannot install some package: openvpn client export for example.
    Are we forced to use 2.4.4?
    Is there a way to install packages for 2.4.3?



  • Hi to all,
    I can confirm that Status -> Filter Reload show no errors when the problem riser,
    and even after forcing a filter reload there are no error but the problem persists.
    Any idea?


  • Netgate Administrator

    Can we see how these are defined? Which are failing?
    Is it only those that are nested aliases?

    When we've seen errors like this before it has often been because something is attempted to be resolved as an FQDN that should not be. Due to an odd character for example or a typo in an IP. But I would expect that to show a DNS error.

    Steve



  • Hi,
    all table is failing: completely empty.
    The structure of the source_group is something like
    address01
    address02
    sub_grp1
    sub_grp2

    ...

    where sub_grp1 is
    address11
    address12

    ...

    All of the addessxx are static private ip addreses 10.0.1.x for example.
    The same configuration was working fine before update to 2.4.4
    In effect I have no DNS errors.



  • @chriva said in Issues after upgrade to 2.4.4 on all firewalls : Diagnostic ->Tables is empty:

    All of the addessxx are static private ip addreses 10.0.1.x for example.

    Then it shouldn't be that difficult to show actual screenshots of them.


  • Netgate Administrator

    Mmm, nothing special there. There must be something different about how they are configured.
    You have other aliases that do populate? The sub groups still populate correctly?

    Steve



  • @Grimson
    Here you have the screenshot of how groups are made.
    I can confirm that they are all made of static ip addresses.
    @stephenw10
    It is not easy to answer your question: in general some alias where populated, some where not populated.
    My access to https gui comes from a rule with an alias and it usually works, but not always since the upgrade.
    At hte same way some of the subgroups wehre populated, some other not.

    0_1543418120488_ip.png
    0_1543418123859_sub_group.png 0_1543418129772_group.png


  • Netgate Administrator

    Hmm, it still looks exactly like the sort of issue we saw previously where it tries to resolve one of those things as an FQDN instead of using the alias. There's definitely no errors in the DNS log when it fails?

    Steve



  • No, I noticed no error: just in case how is the DNS log enabled ?
    Now I did a rollback of all of the installation except one.
    I will try to keep an eye on it, but traffic through this device is very few. I don't even know if this firewall needed ever showed the problem.


  • Netgate Administrator

    It is enabled by default for errors and status information but you can turn up the logging level to see all dns requests if required. That's a setting in Services > DNS Resolver > Advanced Settings.

    Steve



  • Hi,
    should I set it up to maximum level? (I've just set it to lvl5)
    What should I look for in clog /var/log/resolver.log ?
    Can you give me an example?

    Regards.


  • Netgate Administrator

    I wouldn't expect you to need to turn up the logging. For example here's what I see in the resolver logs if I add an IP to an alias but typo it:

    
    Dec 4 12:53:14 	filterdns 		Adding Action: pf table: Test_Alias_2 host: 192.16810.10
    Dec 4 12:53:14 	filterdns 		Adding host 192.16810.10
    Dec 4 12:53:14 	filterdns 		failed to resolve host 192.16810.10 will retry later again. 
    

    Steve



  • This post is deleted!

  • Netgate Administrator

    Yes it should. I deliberately typo'd it to show what happens.
    It sees it as an FQDN as it's not a valid IP address and tries to resolve it. And of course that fails resulting in the errors shown.

    Steve

    Edit: Replying to a deleted post now. 😉



  • yeah, deleted my post as I overlooked the mention of 'not a valid IP address' in your original



  • Thanks for your support and suggestions.
    I have no filterdns entries in the logs until now.
    I will keep an eye on it.

    Remember that the same configuration on 2.4.3_p1 gives me no issues.


  • Netgate Administrator

    The most likely thing there is you're hitting something really obscure that passes in php 5.6 but not in php 7. I would have expected some error though but it may simply interpret it differently.

    Steve