May 2nd Snapshot doesnt work, breaks everything! Beware
-
I am still having issues with the new snapshot (5/4) too. I have 3 WAN connections and the two statics still seem to be working, but the appliance I have screeches to a halt after upgrading. I also cannot get a DHCP address on the third connection. Attached is the errors I get on initial boot and sequential reboots result the same.
![IMG_20180504_211145 - Copy.jpg](/public/imported_attachments/1/IMG_20180504_211145 - Copy.jpg)
![IMG_20180504_211145 - Copy.jpg_thumb](/public/imported_attachments/1/IMG_20180504_211145 - Copy.jpg_thumb) -
I can confirm this issue is still in the latest 5-04 snapshot
Running the above commands did not help
/etc/rc.reboot
/sbin/rebootThere is no "this issue" in this thread. You need to provide details about exactly what is not working, with console and/or log entries related to the issue.
-
I am still having issues with the new snapshot (5/4) too. I have 3 WAN connections and the two statics still seem to be working, but the appliance I have screeches to a halt after upgrading. I also cannot get a DHCP address on the third connection. Attached is the errors I get on initial boot and sequential reboots result the same.
Please at least post the DHCP log and any dhclient entries from it, and anything that looks relevant in the system or routing logs as well.
I can't replicate any DHCP client issues here, mine are all working OK.
-
Unfortunately I had to rebuild it so I could post this so I only have my syslog to go back to. I am using the 2.4.4-DEVELOPMENT (amd64) built on Thu Apr 26 14:32:50 CDT 2018 FreeBSD 11.1-STABLE snapshot to restore to and the C2758 board you used to use. When I upgrade to the latest snapshot, I am unable to do much of anything with the appliance. It looks like it just keeps bouncing the interface for that wan.
dpinger: HOME_DHCP 47.34.34.1: sendto error: 65
check_reload_status: Configuring interface wan
php-fpm[87613]: /rc.newwanip: rc.newwanip: Failed to update wan IP, restarting…
php-fpm[87613]: /rc.newwanip: rc.newwanip: on (IP address: ) (interface: HOME[wan]) (real interface: igb2).
php-fpm[87613]: /rc.newwanip: rc.newwanip: Info: starting on igb2.
dpinger: HOME_DHCP 47.34.34.1: sendto error: 65
kernel: arpresolve: can't allocate llinfo for 47.34.34.1 on igb2
dhclient[20017]: exiting.
dhclient[20017]: connection closed
dpinger: HOME_DHCP 47.34.34.1: sendto error: 65
kernel: arpresolve: can't allocate llinfo for 47.34.34.1 on igb2
kernel: arpresolve: can't allocate llinfo for 47.34.34.1 on igb2
php-fpm[43905]: /rc.linkup: HOTPLUG: Configuring interface wan
php-fpm[43905]: /rc.linkup: DEVD Ethernet attached event for wan
dhclient: /sbin/route add default 47.34.34.1
dhclient: Adding new routes to interface: igb2
dhclient: New Routers (igb2): 47.34.34.1
dhclient: New Broadcast Address (igb2): 255.255.255.255
dhclient: New Subnet Mask (igb2): 255.255.254.0
dhclient: New IP Address (igb2): 47.34.X.X
charon: 13[KNL] 47.34.X.X appeared on igb2
charon: 13[KNL] 47.34.X.X disappeared from igb2
dhclient: ifconfig igb2 inet 47.34.X.X netmask 255.255.254.0 broadcast 255.255.255.255
dhclient: Starting add_new_address()
dhclient: REBOOT
kernel: igb2: link state changed to DOWN
check_reload_status: Linkup starting igb2
HOME_DHCP 47.34.34.1: sendto error: 64 -
JimP, I can send you a 4m syslog from the time of upgrade if you would like.
after thumbing through more of the syslog, it seems pretty consistent on these repeated lines:
php-fpm[43905]: /rc.linkup: DEVD Ethernet attached event for wan
php-fpm[43905]: /rc.linkup: HOTPLUG: Configuring interface wan
charon: 04[KNL] 47.34.X.X disappeared from igb2
kernel: arpresolve: can't allocate llinfo for 47.34.34.1 on igb2
kernel: arpresolve: can't allocate llinfo for 47.34.34.1 on igb2
dpinger: HOME_DHCP 47.34.34.1: sendto error: 65
dhclient[20017]: connection closed
dhclient[20017]: exiting.
kernel: arpresolve: can't allocate llinfo for 47.34.34.1 on igb2
dpinger: HOME_DHCP 47.34.34.1: sendto error: 65
php-fpm[87613]: /rc.newwanip: rc.newwanip: Info: starting on igb2.
php-fpm[87613]: /rc.newwanip: rc.newwanip: on (IP address: ) (interface: HOME[wan]) (real interface: igb2).
php-fpm[87613]: /rc.newwanip: rc.newwanip: Failed to update wan IP, restarting…
check_reload_status: Configuring interface wan
dpinger: HOME_DHCP 47.34.34.1: sendto error: 65 -
I will have to roll back to April Build until this is fixed.
My DHCP connection has the same errors as the poster above.
-
Iv been trying every new development build (didnt try 5-9) and the issue seems to keep happening. I too have to roll back to the last April build
What's odd is I ran a virtual appliance and pfsense ran fine in it. Im starting to wonder if its hardware compatibility issues. Im using a quotom box
-
I don't doubt there is a problem here but I need a lot more detail than "it's broken" or "the same errors". Post the errors (even if they are duplicates), log entries, route table contents, anything you can come up with. I need to know exactly what isn't working, with detail. For example: interfaces missing addresses, missing or incorrect routes, services not running (exactly which ones are not running, and any relevant logs from them), and so on.
I still can't replicate any issues here in my lab. We might have one person here who is able to replicate this but they're still testing to find out if it's similar, too soon to say if it's related.
-
Iv been trying every new development build (didnt try 5-9) and the issue seems to keep happening. I too have to roll back to the last April build
What's odd is I ran a virtual appliance and pfsense ran fine in it. Im starting to wonder if its hardware compatibility issues. Im using a quotom box
me too ..
pfsense start normal , but no internet connection in pfsense or lan …
in logs a lot of "route has not been found""
this happening after update pfsense 2.4 in 05/09 , how you roll back to a old version ?
thanks
-
I don't doubt there is a problem here but I need a lot more detail than "it's broken" or "the same errors". Post the errors (even if they are duplicates), log entries, route table contents, anything you can come up with. I need to know exactly what isn't working, with detail. For example: interfaces missing addresses, missing or incorrect routes, services not running (exactly which ones are not running, and any relevant logs from them), and so on.
I still can't replicate any issues here in my lab. We might have one person here who is able to replicate this but they're still testing to find out if it's similar, too soon to say if it's related.
Hello
Id be happy to give you log files of the errors. Im just not sure which ones you want. Can you please tell me the location of the log files. Webgui is not accessible so I would need to pull them by SSH -
I would love to send in logs as I have a 4m CSV dump from my syslog server, but still I have not been told where to send them. As they are raw dumps, I am not posting them into the forums but would gladly send them to one of the developers.
-
I don't need 4M worth of records. I don't have time to sort through all of that. Just the last dozen or so lines of each log file is sufficient.
I think we have a lead on part of the problem, I pushed a fix for one potential path that could break it but there is one other that I haven't tracked down yet.
https://redmine.pfsense.org/issues/8504
More interesting to me now than logs are two things:
1. The <gateways>section of your configuration(s) before and after upgrade, or at least after. You can redact IP addresses but do not alter anything else.
2. Whether or not you have a default route for IPv4 or IPv6 in "netstat -rnW" after upgrade.</gateways> -
OK, there are at least three separate issues here from the looks of it:
0. Harmless route errors spamming the console/logs https://redmine.pfsense.org/issues/8497 (Fixed now)
1. An issue with the upgrade code not converting and handling default gateways properly in some cases https://redmine.pfsense.org/issues/8504 (Also fixed)
2. An issue where certain DHCP WANs (igb interfaces at least) constantly link cycle which leads to all sorts of other symptoms (services not running, IP addresses/routes missing, GUI inaccessible, etc) https://redmine.pfsense.org/issues/8506We're still working on that last one.
Now what I need to know is:
- What hardware are you running where this is happening?
- What type of network interface is it happening to? (Both systems here, and the logs posted in the thread are all igb, but we don't know if that's a coincidence or not)
- Check "clog /var/log/system.log | grep link" and/or "dmesg | grep link" output to see if the link is flapping
-
Updated to the latestest beta and still getting issues
Im using a Qotom boxMay 11 17:55:36 pfSense php-fpm[22628]: /rc.linkup: DEVD Ethernet attached event for wan
May 11 17:55:36 pfSense php-fpm[22628]: /rc.linkup: HOTPLUG: Configuring interface wan
May 11 17:55:37 pfSense kernel: igb0: link state changed to UP
May 11 17:55:37 pfSense kernel: igb0: link state changed to DOWN
May 11 17:55:42 pfSense kernel: igb0: link state changed to UP
May 11 17:55:43 pfSense php-fpm[22628]: /rc.linkup: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1526086543] unbound[66133:0] error: bind: address already in use [1526086543] unbound[66133:0] fatal error: could not open ports'
May 11 17:55:43 pfSense kernel: igb0: link state changed to DOWN
May 11 17:55:45 pfSense php-fpm[71870]: /rc.linkup: DEVD Ethernet detached event for wanIts just looping the same thing over and over
-
JimP, let us know when we can begin testing snapshots again as I can't keep rebuilding and restoring my firewall.
-
JimP, let us know when we can begin testing snapshots again as I can't keep rebuilding and restoring my firewall.
Which is why you don't run snapshots on important production firewalls, at least not without proper lab testing first.
No progress since my last post except that an additional issue has been found:
3. Interface MTU being set incorrectly in some cases https://redmine.pfsense.org/issues/8507 – This can lead to what appears to be partially working connectivity. Some sites will load, others will fail, some may be partially work and partially broken due to resources that can't be fetched. Browsers may return a blank page rather than an error or fail to fetch links at all.
-
JimP, this is not an important firewall. It is only used for my home environment, but I get to listen to my wife complain about not being able to get online. More of an annoyance to reload than it is anything else. Let me know if there is more logs or testing you need on this.
-
I get to listen to my wife complain about not being able to get online.
If it's carrying your wife's traffic then that is THE very definition of an important production firewall :-)
Let me know if there is more logs or testing you need on this.
I think we have an OK grasp of the general issues at the moment but a lack of leads on where the problem lies. So far all I've seen are symptoms and not the root cause yet, but since it's so tricky to reproduce in a lab setup it's a pain to try to dig into it for any length of time.
-
JimP, I think you're on to something with the mtu size. I can tell you that the interface (igb2) that is connecting, shows a default gateway and an IP, then it disappears from the "netstat -rnW" command screen.
I am also available after 6p CST if you would like remote access. As this appliance is a mirror of the C2758 Atom you used to sell, I am hoping there are not too many people that will experience this issue.
-
The next round of snapshots should be better here. It was related to the MTU. Turns out in 11.2, FreeBSD improved dhclient so it could handle the MTU, but it took the upstream MTU unconditionally and had no way to ignore the value. In each case I've seen so far, the ISP has sent a bogus MTU back which caused two things:
- On e1000 and some other drivers, setting the MTU causes the link to go down and back up, which triggers the interface event scripts, which restarted dhclient, which set the MTU again, which made the link go down and back up, repeat, repeat, repeat, boom.
- On other drivers, the MTU would be set to this value but it may not have been right. In my case and for others, this was a stupid low value like 576 which meant some sites would work and others would fail or be half broken.
We have a patch in the tree now from a FreeBSD dev which will be in the next set of snapshots that lets us ignore the incoming MTU with a supersede in the dhclient config (which I also added in the tree), and hopefully all this should hopefully return sanity to cases affected by these issues.