Narrative: my upgrade to 2.5.0



  • I upgraded an APU2 from 2.4.4_p3 to 2.5.0 built on Sat Mar 14 19:18:01 EDT 2020 via the GUI in System > Update. The reason is because I wanted the new feature that lets me use a low-security API Token for Cloudflare DDNS instead of having to use the super-powerful Global Key that 2.4.4 required.

    The update appeared to work well at first. The GUI listed a bunch of items to be upgraded, and then it said it's rebooting. The firewall never came back up. I waited 10-15 minutes, unplugged and replugged the power, and it still never came back up.

    I had to do a clean install with a flash drive to get the firewall running again. Impressively, the install process somehow automatically applied my old config so I didn't have to restore it manually afterward. Good job with that (though it would be nice if the installer let me know that it was doing that so it wouldn't be a surprise)!

    The freshly installed pfSense had no packages installed, so I installed acme, openvpn-client-export, pfBlockerNG, and Service_Watchdog. When I then clicked Firewall > pfBlockerNG, I got an error about the page being missing, so I had to go back to System > PackageManager and click the reinstall button next to pfBlocker to fix that issue.

    A few times over the next several hours, the unbound service crashed randomly but the watchdog restarted it for me. And my internet would go down for about 5-10 seconds and come back up randomly several times (couldn't get out from LAN to WAN nor ping the WAN port from the internet for those seconds).

    Not sure if this is a new issue or I hadn't noticed it before. This message in the log mentioned 192.168.5.1 which looks related to the OpenVPN Server "IPv4 Tunnel Network" setting which is "192.168.5.0/24". So I'm not sure why a message would be logged about that IP changing.

    /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - -> 192.168.5.1 - Restarting packages.
    

    Also see these messages a few times:

    Mar 16 15:55:04	php-fpm	91655	/rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1584374104] unbound[82711:0] error: bind: address already in use [1584374104] unbound[82711:0] fatal error: could not open ports'
    Mar 16 11:55:04	dhcpleases	23974	/etc/hosts changed size from original!
    Mar 16 11:55:03	dhcpleases	23974	kqueue error: unknown
    Mar 16 11:55:03	dhcpleases	23974	kqueue error: unknown
    Mar 16 15:55:03	php-fpm	56495	/rc.linkup: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1584374103] unbound[58495:0] error: bind: address already in use [1584374103] unbound[58495:0] fatal error: could not open ports'
    Mar 16 11:55:02	dhcpleases	23974	Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process.
    

    Oddly this message says the IP changed but it looks like it the text says the before and after IPs are the same:

    Mar 16 15:55:46	php-fpm	91655	/rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 47.201.116.57 -> 47.201.116.57 - Restarting packages.
    

    This message seems to say that a cable was plugged in (HOTPLUG) but I'm pretty sure nobody touched the cables:

    Mar 16 15:55:33	php-fpm	20881	/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use PIA_VPN_VPNV4.
    Mar 16 15:55:33	php-fpm	76838	/rc.linkup: HOTPLUG: Configuring interface wan
    Mar 16 15:55:33	php-fpm	76838	/rc.linkup: DEVD Ethernet attached event for wan
    Mar 16 11:55:33	kernel		arpresolve: can't allocate llinfo for 47.201.116.1 on igb0
    Mar 16 15:55:32	check_reload_status	384	Reloading filter
    Mar 16 15:55:32	check_reload_status	384	Restarting OpenVPN tunnels/interfaces
    Mar 16 15:55:32	check_reload_status	384	Restarting ipsec tunnels
    Mar 16 15:55:32	check_reload_status	384	updating dyndns PIA_VPN_VPNV4
    Mar 16 11:55:32	rc.gateway_alarm	81404	>>> Gateway alarm: PIA_VPN_VPNV4 (Addr:1.1.1.1 Alarm:1 RTT:113.700ms RTTsd:.521ms Loss:27%)
    Mar 16 11:55:32	kernel		igb0: link state changed to UP
    Mar 16 15:55:32	check_reload_status	384	Linkup starting $igb0
    Mar 16 11:55:32	xinetd	6217	Exiting...
    Mar 16 15:55:30	check_reload_status	384	Reloading filter
    Mar 16 15:55:29	php-fpm	82620	[pfBlockerNG] Starting cron process.
    Mar 16 15:55:29	php-fpm	82620	/rc.start_packages: Restarting/Starting all packages.
    Mar 16 15:55:29	php-fpm	30004	/rc.linkup: DEVD Ethernet detached event for wan
    Mar 16 15:55:28	check_reload_status	384	Starting packages
    Mar 16 15:55:28	php-fpm	20881	/rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 10.52.10.6 -> 10.13.10.6 - Restarting packages.
    Mar 16 15:55:28	check_reload_status	384	Linkup starting $igb0
    Mar 16 11:55:28	kernel		igb0: link state changed to DOWN
    Mar 16 15:55:26	php-fpm	20881	/rc.newwanip: Creating rrd update script
    Mar 16 11:55:25	dhcpleases	600	kqueue error: unknown
    Mar 16 11:55:24	dhcpleases	600	Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process.
    Mar 16 11:55:24	dhcpleases	54683	/etc/hosts changed size from original!
    Mar 16 15:55:22	php-fpm	20881	/rc.newwanip: IP Address has changed, killing states on former IP Address 10.52.10.6.
    Mar 16 15:55:21	php-fpm	20881	/rc.newwanip: Removing static route for monitor 1.1.1.1 and adding a new route through 10.13.10.5
    Mar 16 11:55:21	xinetd	6217	Reconfigured: new=0 old=7 dropped=0 (services)
    

    Hope that helps with something! Thanks for your great products.



  • I think I've seen most of these messages in 2.4.4 p3 from time to time but they don't generally cause or indicate a problem. Except for that unbound one might be an issue. That looks familiar too. I'll see if I can dig up an old thread on that.

    nor ping the WAN port from the internet for those seconds).

    I thought a good firewall should never respond to ping on the WAN port. Mine doesn't and that's a feature/issue (however you want to look at it) that I like.





  • I have had a few error messages; however, I am extremely happy with pfSense v2.5-dev. I had upgraded from 2.4.4-p3 around the beginning of the year because I wanted to run Snort inline mode. My setup is in production mode for my home-office, and I have Suricata inline mode on WAN, Snort inline mode on LAN, as well as pfBlockerNG, and Squid's Antivirus. I am anticipating FreeBSD 12.1 around May/June and an official release by September. So far stability had been impressive!



  • My internet died for a few seconds again, and this time, pfSense generated a crash report:

    *Crash report begins. Anonymous machine information:

    amd64
    12.0-RELEASE-p10
    FreeBSD 12.0-RELEASE-p10 ce9563d5729(RELENG_2_5) pfSense

    Crash report details:

    PHP Errors:
    [24-Mar-2020 14:52:55 America/New_York] PHP Warning: file_get_contents(/tmp/igb0_router): failed to open stream: No such file or directory in /etc/inc/gwlb.inc on line 1480

    No FreeBSD crash data found.*



  • @Jeremy11one I started PHP errors thread...could you post yours there?



  • First of all, I'm not using 2.5.0-dev.
    2.4.4-p3-RC-last is just fine for me.

    @Jeremy11one said in Narrative: my upgrade to 2.5.0:

    The firewall never came back up. I waited 10-15 minutes, unplugged and replugged the power, and it still never came back up.

    That ok, but implies that you console-up and run the fsck test several times, the pfSense manual has some how-to's.

    @Jeremy11one said in Narrative: my upgrade to 2.5.0:

    omehow automatically applied my old config so I didn't have to restore it manually afterward.

    That does not correspond with - and should have experiences a flawed upgrade :

    @Jeremy11one said in Narrative: my upgrade to 2.5.0:

    The freshly installed pfSense had no packages installed

    When the 'old' config has been found, and WAN (at least) is up, settings are applied and packages listed as installed, setup and running in the version before will get installed and started one by one.
    This takes the time it needs to download, and while it is in temporary state, it's signalled in the GUI.

    @Jeremy11one said in Narrative: my upgrade to 2.5.0:

    A few times over the next several hours, the unbound service crashed randomly but the watchdog restarted it for me.

    When a critical service like unbound crashes, consider the system as out-of-order.
    You should investigate why it crashes. Having it restarted (automatically) is one of baldest thing you can do.
    In this case, unbound, I know it's good : it's running for years now on millions of FreeBSD and other OS's.
    So, it's hardware or software (== settings) related.
    I would shut down anything that can put a strain on unbound, like pgBlocker-NG - and others if any.

    @Jeremy11one said in Narrative: my upgrade to 2.5.0:

    rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 10.52.10.6 -> 10.13.10.6 - Restarting packages.

    A log line like that one is very known. "IP was 1.2.3.4 and it became 1.2.3.4 so a service restart is needed" ... ??!!
    Go figure.
    I presume that the log info doesn't seem correct. The reason why the interface was rebuild probably was
    I've seen it ones in a while and is not typically 2.5.0 related.

    @Jeremy11one said in Narrative: my upgrade to 2.5.0:

    /rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1584374104] unbound[82711:0] error: bind: address already in use [1584374104] unbound[82711:0] fatal error: could not open ports'

    Most probably :
    unbound was ordered to shut down. Because of the huge load of info it handles (reason : pfBlocker-NG) it needs time to take down the process : memory has to be freed, ports to be released, queues to be flushed, whatever.
    pfSense is willing to wait bit, but after some time, it will destroy the pid file, and restart unbound.
    The original instance is still running, trying to shut down, and the new one is started up (more strain) and this second instance will find ports still being assigned to the fist instance. That's a very classic no-go and seen a lot.
    Again : inspect what unbound is doing. In a out-of-the-box setup with a small processor like the SG1100, it can do it's job just fine. There are also posts on this forum that mention that a quad core I7 isn't enough .... something that can happen if you ask it to DNSBL half the Internet.

    I really advise you to check the file system first.

    @Jeremy11one said in Narrative: my upgrade to 2.5.0:

    PHP Warning: file_get_contents(/tmp/igb0_router): failed to open stream: No such file or directory in /etc/inc/gwlb.inc on line 1480

    That's an easy one.
    Check this page https://github.com/pfsense/pfsense/tree/master/src/etc/inc and you can see that this pfSense core file called gwlib.inc in folder /etc/inc/ must exist. remember : pfSense is mostly open source, so you can compare your system with what should (must !) be there. Good news : pfSEnse isn't that big, maybe 1 % of a vanilla Windows install.
    One thing is sure know : at leats one core file didn't wasn't installed correctly. This smells heavily like file system problems.

    Two things to do :
    First step : backup your config.
    Install pfSense from scratch - a clean install, with disk partitioning etc. Do not let it retrieve and keep a found config.xml (you have your backup anyway) Do the minimal WAN/LAN setup and stop there. Check for some time if all is good, up and running. The more you wait, the less surprises you will have in the future.
    Also : notice how well unbound does it's job, how fast it it.
    Now, import your settings.
    Consider putting pfBlocker-NG-devel on a severe diet.
    And keep watching these logs ....

    Next step : same as first step, but do not import the settings. Use it as a guideline to set up pfSense manually.
    Then install a package - set up that package - and wait - and test - test wait some more, one thing at the time, test more. Be patience.
    Add another package, and so on.
    When things go bad, you know what NOT to install / use to have a stable system. Remove that package, try other ones.

    And remember :
    2.5.0. is beading edge technology. You left known issues for other unknown issues.
    Keep things as simple as possible.



  • @Gertjan said in Narrative: my upgrade to 2.5.0:

    @Jeremy11one said in Narrative: my upgrade to 2.5.0:

    The firewall never came back up. I waited 10-15 minutes, unplugged and replugged the power, and it still never came back up.

    That ok, but implies that you console-up and run the fsck test several times, the pfSense manual has some how-to's.

    I didn't try that. The Web GUI just stopped making any progress and I didn't try viewing the console of the pfSense box. I just power cycled it to see if that fixed it.

    @Jeremy11one said in Narrative: my upgrade to 2.5.0:

    omehow automatically applied my old config so I didn't have to restore it manually afterward.

    That does not correspond with - and should have experiences a flawed upgrade :

    When I did my clean install to fix the problem, I specifically deleted all the old partitions and let pfSense create new ones. So I know it didn't have any remnants of the old install. I did have a copy of the pfSense backup file on my installer flash drive, and just before I began the clean install, I clicked a button on the installer to "recover config", so I assume one of those is how the installer automatically used my old config. It was a good thing, no complaints there.

    @Jeremy11one said in Narrative: my upgrade to 2.5.0:

    The freshly installed pfSense had no packages installed

    When the 'old' config has been found, and WAN (at least) is up, settings are applied and packages listed as installed, setup and running in the version before will get installed and started one by one.
    This takes the time it needs to download, and while it is in temporary state, it's signalled in the GUI.

    When I did the clean install, I didn't have the WAN cable plugged in. So it may have tried to download the packages and just failed permanently. It would have been nice if it would have tried again the next time the WAN came back up, but it didn't. So I had to manually install my packages.

    @Jeremy11one said in Narrative: my upgrade to 2.5.0:

    A few times over the next several hours, the unbound service crashed randomly but the watchdog restarted it for me.

    When a critical service like unbound crashes, consider the system as out-of-order.
    You should investigate why it crashes. Having it restarted (automatically) is one of baldest thing you can do.

    I just assumed it was a bug with the cutting edge 2.5.0 pfSense build so I didn't spend much time trying to figure it out. It got better once I reinstalled the pfBlocker package and messed with its configuration.

    So, it's hardware or software (== settings) related.
    I would shut down anything that can put a strain on unbound, like pgBlocker-NG - and others if any.

    Good advice, I'll try disabling pfBlocker if Unbound crashes again.

    @Jeremy11one said in Narrative: my upgrade to 2.5.0:

    rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 10.52.10.6 -> 10.13.10.6 - Restarting packages.

    A log line like that one is very known. "IP was 1.2.3.4 and it became 1.2.3.4 so a service restart is needed" ... ??!!
    Go figure.
    I presume that the log info doesn't seem correct. The reason why the interface was rebuild probably was
    I've seen it ones in a while and is not typically 2.5.0 related.

    This looks like a minor cosmetic bug, but if it causes pfSense to rebuild firewall rules or restart services then it could cause a brief interruption for no good reason.

    @Jeremy11one said in Narrative: my upgrade to 2.5.0:

    /rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1584374104] unbound[82711:0] error: bind: address already in use [1584374104] unbound[82711:0] fatal error: could not open ports'

    Most probably :
    unbound was ordered to shut down. Because of the huge load of info it handles (reason : pfBlocker-NG) it needs time to take down the process : memory has to be freed, ports to be released, queues to be flushed, whatever.
    pfSense is willing to wait bit, but after some time, it will destroy the pid file, and restart unbound.
    The original instance is still running, trying to shut down, and the new one is started up (more strain) and this second instance will find ports still being assigned to the fist instance. That's a very classic no-go and seen a lot.
    Again : inspect what unbound is doing. In a out-of-the-box setup with a small processor like the SG1100, it can do it's job just fine. There are also posts on this forum that mention that a quad core I7 isn't enough .... something that can happen if you ask it to DNSBL half the Internet.

    Good insight. It would be nice if there was a way to tell make Service Watchdog and unbound to play together nicely. Maybe unbound could set a flag when it's about to have intentional downtime that tells Watchdog to be patient so Watchdog won't try to restart the service too soon. For now, I removed unbound from the Watchdog list.

    I really advise you to check the file system first.

    Since this was a clean install (I deleted the partitions during the install wizard), I'd guess it's not a filesystem issue.

    @Jeremy11one said in Narrative: my upgrade to 2.5.0:

    PHP Warning: file_get_contents(/tmp/igb0_router): failed to open stream: No such file or directory in /etc/inc/gwlb.inc on line 1480

    That's an easy one.
    Check this page https://github.com/pfsense/pfsense/tree/master/src/etc/inc and you can see that this pfSense core file called gwlib.inc in folder /etc/inc/ must exist. remember : pfSense is mostly open source, so you can compare your system with what should (must !) be there.

    I may be wrong, but I think you're misunderstanding the error. The error seems to be saying that a command inside /etc/inc/gwlb.inc (on line 1480) is trying to access a file, and that file doesn't exist. It isn't saying that gwlb.inc doesn't exist. My box does have gwlb.inc in place where expected. I looked at line 1480 of that file, and it seems to reference a variable that points to a file path, but I don't know how to get the value of that variable to see which file it failed to access. Perhaps troubleshooting could be made easier if pfSense would catch any File Not Found errors and rethrow them with a message that includes the path attempted.

    Two things to do :
    <snip>

    Good, powerful troubleshooting advice there. I won't do it for now since the firewall has gotten better since I tinkered with pfBlocker.

    And remember :
    2.5.0. is beading edge technology. You left known issues for other unknown issues.

    Yep, I realize it's risky to use 2.5.0 already but I wanted the easy access to the new DDNS feature. So I'm glad I did it.

    Thanks for your help :)



  • Since this was a clean install (I deleted the partitions during the install wizard), I'd guess it's not a filesystem issue.

    The best way to clean the drive is remove the drive and connect to another machine and wipe it with software such
    as (Mini-Tool or Easus).

    Just deleting partition's isn't enough sometime's.


Log in to reply