creating a vlan causes pfsense to go down
-
Since we upgraded 2.4.5_p1 to 2.5.2 we can no longer create a vlan without pfsense going down for 10 to 15 minutes after which it will recover. Each time we try, pfsense goes down immediately.
An overview of the setup:
2 identical physical servers in a CARP setup
Intel(R) Atom(TM) CPU C2758 @ 2.40GHz
8 CPUs: 1 package(s) x 8 core(s)
AES-NI CPU Crypto: Yes (active)
8 GB RAMA single interface igb0 for wan
LAGG0 with igb1/igb2 for internal networks
46 vlans on lagg0
23 ipsec tunnels to different sitesOn 2.4.5_p1 we never experienced this issue. We could create vlans and interfaces without causing any interruption or downtime.
The upgrade was done by reinstalling pfSense and restoring the xml backup.
The logs during this 10 to 15 minute outage are enormous, literally dozens of MB's of logging. Im not sure how much of it is actually useful. Here are some lines with I hope give an impression of what is happening. A note about the timestamps is that it may look like hours, its actually minutes but some processes log with 1 hour difference. I also think some lines are logged out of order because during this outage, dozens and dozens of megabytes of logging is written out. Sometimes hundreds of lines during the same second.
It starts with this line:
Dec 10 10:33:20 pfs_host01 kernel: vlan45: changing name to 'lagg0.2549'
Followed by about 500 lines repeating this for all interfaces:
Dec 10 10:33:20 pfs_host01 syslogd: sendto: Network is down Dec 10 10:33:21 pfs_host01 syslogd: sendto: Network is down Dec 10 09:33:21 pfs_host01 check_reload_status[301]: Linkup starting igb1 Dec 10 10:33:21 pfs_host01 syslogd: sendto: Network is down Dec 10 10:33:21 pfs_host01 kernel: igb1: link state changed to DOWN Dec 10 10:33:21 pfs_host01 syslogd: sendto: Network is down Dec 10 10:33:21 pfs_host01 kernel: igb2: link state changed to DOWN Dec 10 10:33:21 pfs_host01 syslogd: sendto: Network is down Dec 10 10:33:21 pfs_host01 kernel: lagg0: link state changed to DOWN Dec 10 10:33:21 pfs_host01 syslogd: sendto: Network is down Dec 10 10:33:21 pfs_host01 kernel: carp: 53@lagg0.2535: MASTER -> INIT (hardware interface down)
After this there a large blurp of logging along the lines of:
Dec 10 09:33:22 pfs_host01 php-fpm[53770]: /rc.carpbackup: HA cluster member "(10.0.128.1@lagg0.204): (ACC)" has resumed CARP state "BACKUP" for vhid 12 Dec 10 10:33:22 pfs_host01 syslogd: sendto: Network is down Dec 10 09:33:22 pfs_host01 php-fpm[71411]: /rc.linkup: Hotplug event detected for SOMESITENAME(opt32) static IP (1.2.3.4 ) Dec 10 10:33:22 pfs_host01 syslogd: sendto: Network is down Dec 10 09:33:22 pfs_host01 php-fpm[71411]: /rc.carpbackup: HA cluster member "(10.118.1.1@lagg0.118): (SOMENETWORKNAME)" has resumed CARP state "BACKUP" for vhid 50 Dec 10 10:33:22 pfs_host01 syslogd: sendto: Network is down Dec 10 09:33:23 pfs_host01 php-fpm[53770]: /rc.linkup: Hotplug event detected for SOMESITENAME(opt46) static IP (1.2.3.4 ) Dec 10 10:33:23 pfs_host01 syslogd: sendto: Network is down Dec 10 09:33:23 pfs_host01 php-fpm[71411]: /rc.carpbackup: HA cluster member "(10.0.152.1@lagg0.304): (TST)" has resumed CARP state "BACKUP" for vhid 18 Dec 10 10:33:23 pfs_host01 syslogd: sendto: Network is down Dec 10 09:33:23 pfs_host01 check_reload_status[301]: Reloading filter Dec 10 10:33:23 pfs_host01 syslogd: sendto: Network is down Dec 10 09:33:23 pfs_host01 php-fpm[53770]: /rc.carpbackup: HA cluster member "(10.0.100.1@lagg0.100): (NETWORKNAME)" has resumed CARP state "BACKUP" for vhid 2 Dec 10 10:33:23 pfs_host01 syslogd: sendto: Network is down Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down Dec 10 09:33:24 pfs_host01 php-fpm[71411]: /rc.carpbackup: HA cluster member "(10.0.98.1@lagg0.98): (NETWORKNAME)" has resumed CARP state "BACKUP" for vhid 49 Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down Dec 10 09:33:24 pfs_host01 php-fpm[53770]: /rc.linkup: Hotplug event detected for NETWORKNAME(opt27) static IP (1.2.3.4 ) Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down Dec 10 09:33:24 pfs_host01 php-fpm[53770]: /rc.linkup: Hotplug event detected for NETWORKNAME(opt7) static IP (1.2.3.4 ) Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down Dec 10 10:33:24 pfs_host01 kernel: igb1: link state changed to UP Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down Dec 10 10:33:24 pfs_host01 kernel: igb2: link state changed to UP Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down Dec 10 09:33:24 pfs_host01 check_reload_status[301]: Linkup starting igb1 Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down Dec 10 09:33:24 pfs_host01 check_reload_status[301]: Linkup starting igb2 Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down
Then some of this (the tripple dots indicate more of the same, removed for readability):
Dec 10 09:33:27 pfs_host01 check_reload_status[301]: Linkup starting lagg0 Dec 10 10:33:27 pfs_host01 kernel: lagg0: link state changed to UP Dec 10 10:33:27 pfs_host01 kernel: carp: 53@lagg0.2535: INIT -> BACKUP (initialization complete) Dec 10 10:33:27 pfs_host01 kernel: carp: demoted by -240 to 10080 (interface up) Dec 10 10:33:27 pfs_host01 kernel: lagg0.2535: link state changed to UP Dec 10 10:33:27 pfs_host01 kernel: carp: 30@lagg0.2516: INIT -> BACKUP (initialization complete) Dec 10 10:33:27 pfs_host01 kernel: carp: demoted by -240 to 9840 (interface up) .... Dec 10 10:33:28 pfs_host01 kernel: carp: 44@igb0: BACKUP -> MASTER (preempting a slower master) Dec 10 10:33:28 pfs_host01 kernel: carp: 5@igb3: BACKUP -> MASTER (preempting a slower master) Dec 10 10:33:28 pfs_host01 kernel: carp: 45@igb0: BACKUP -> MASTER (preempting a slower master) Dec 10 10:33:28 pfs_host01 kernel: carp: 47@igb0: BACKUP -> MASTER (preempting a slower master) Dec 10 10:33:28 pfs_host01 kernel: carp: 1@igb0: BACKUP -> MASTER (preempting a slower master) Dec 10 10:33:28 pfs_host01 kernel: carp: 4@igb0: BACKUP -> MASTER (preempting a slower master) Dec 10 10:33:28 pfs_host01 kernel: carp: 27@igb0: BACKUP -> MASTER (preempting a slower master) Dec 10 10:33:28 pfs_host01 kernel: carp: 38@igb0: BACKUP -> MASTER (preempting a slower master) Dec 10 10:33:28 pfs_host01 kernel: carp: 39@igb0: BACKUP -> MASTER (preempting a slower master) Dec 10 10:33:28 pfs_host01 kernel: carp: 40@igb0: BACKUP -> MASTER (preempting a slower master) Dec 10 10:33:28 pfs_host01 kernel: carp: 43@igb0: BACKUP -> MASTER (preempting a slower master) ... Dec 10 10:33:42 pfs_host01 kernel: carp: 68@lagg0.2546: MASTER -> BACKUP (more frequent advertisement received) Dec 10 10:33:42 pfs_host01 kernel: carp: 67@lagg0.2545: MASTER -> BACKUP (more frequent advertisement received) Dec 10 10:33:42 pfs_host01 kernel: carp: 69@lagg0.2547: MASTER -> BACKUP (more frequent advertisement received) Dec 10 10:33:42 pfs_host01 kernel: carp: 66@lagg0.2544: MASTER -> BACKUP (more frequent advertisement received) Dec 10 10:33:42 pfs_host01 kernel: carp: 65@lagg0.2543: MASTER -> BACKUP (more frequent advertisement received)
Then:
Dec 10 09:33:49 pfs_host01 check_reload_status[301]: Carp master event Dec 10 10:33:49 pfs_host01 kernel: carp: 36@lagg0.2514: BACKUP -> MASTER (preempting a slower master) Dec 10 10:33:49 pfs_host01 kernel: carp: 53@lagg0.2535: BACKUP -> MASTER (preempting a slower master) Dec 10 09:33:49 pfs_host01 check_reload_status[301]: Carp master event Dec 10 09:33:49 pfs_host01 check_reload_status[301]: Could not connect to /var/run/php-fpm.socket Dec 10 09:33:50 pfs_host01 check_reload_status[301]: Could not connect to /var/run/php-fpm.socket Dec 10 10:33:50 pfs_host01 kernel: carp: 34@lagg0.2517: BACKUP -> MASTER (master timed out) Dec 10 09:33:50 pfs_host01 check_reload_status[301]: Carp master event Dec 10 09:33:50 pfs_host01 check_reload_status[301]: Could not connect to /var/run/php-fpm.socket Dec 10 09:33:51 pfs_host01 check_reload_status[301]: Could not connect to /var/run/php-fpm.socket
And literally thousands and thousands of these repeat (I think >90% of all the logging is this line):
Dec 10 09:34:20 pfs_host01 check_reload_status[301]: Could not connect to /var/run/php-fpm.socket
All of the above seem to repeat for a while and then:
Dec 10 09:33:24 pfs_host01 php-fpm[53770]: /rc.carpbackup: HA cluster member "(x.y.0.129@lagg0.2529): (REDACTED)" has resumed CARP state "BACKUP" for vhid 46 Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down Dec 10 10:33:25 pfs_host01 syslogd: sendto: Network is down
And links coming back up again (many more of these):
Dec 10 09:33:27 pfs_host01 check_reload_status[301]: Linkup starting lagg0 Dec 10 10:33:27 pfs_host01 kernel: lagg0: link state changed to UP Dec 10 10:33:27 pfs_host01 kernel: carp: 53@lagg0.2535: INIT -> BACKUP (initialization complete) Dec 10 10:33:27 pfs_host01 kernel: carp: demoted by -240 to 10080 (interface up) Dec 10 10:33:27 pfs_host01 kernel: lagg0.2535: link state changed to UP Dec 10 10:33:27 pfs_host01 kernel: carp: 30@lagg0.2516: INIT -> BACKUP (initialization complete) Dec 10 10:33:27 pfs_host01 kernel: carp: demoted by -240 to 9840 (interface up)
Followed by:
Dec 10 09:33:27 pfs_host01 check_reload_status[301]: Carp backup event Dec 10 09:33:28 pfs_host01 check_reload_status[301]: Linkup starting lagg0.2535
For each interface.
Next OpenVPN server is going down after some CARP messages, I think repeatedly (and due to CARP failover?):
Dec 10 09:33:29 pfs_host01 php-fpm[71411]: /rc.filter_synchronize: Beginning XMLRPC sync data to https://172.16.1.3:443/xmlrpc.php. Dec 10 09:33:29 pfs_host01 check_reload_status[301]: Carp master event Dec 10 09:33:29 pfs_host01 php-fpm[53770]: /rc.carpbackup: Stopping OpenVPN server instance on WAN because of transition to CARP backup. Dec 10 09:33:29 pfs_host01 check_reload_status[301]: Carp master event Dec 10 09:33:29 pfs_host01 check_reload_status[301]: Reloading filter Dec 10 09:33:29 pfs_host01 check_reload_status[301]: Carp master event Dec 10 09:33:29 pfs_host01 php-fpm[71411]: /rc.filter_synchronize: XMLRPC reload data success with https://172.16.1.3:443/xmlrpc.php (pfsense.host_firmware_version). Dec 10 09:33:29 pfs_host01 php-fpm[71411]: /rc.filter_synchronize: XMLRPC versioncheck: 21.7 -- 21.7 Dec 10 09:33:29 pfs_host01 php-fpm[71411]: /rc.filter_synchronize: Beginning XMLRPC sync data to https://172.16.1.3:443/xmlrpc.php. Dec 10 09:33:29 pfs_host01 check_reload_status[301]: Carp master event Dec 10 09:33:29 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2535 Dec 10 09:33:29 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2516 Dec 10 09:33:29 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.204 Dec 10 09:33:30 pfs_host01 php-fpm[53770]: OpenVPN terminate old pid: 36104 Dec 10 09:33:30 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.16 Dec 10 09:33:30 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2534 Dec 10 09:33:30 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.118 Dec 10 10:33:30 pfs_host01 xinetd[48241]: Starting reconfiguration Dec 10 10:33:30 pfs_host01 xinetd[48241]: Swapping defaults Dec 10 10:33:30 pfs_host01 xinetd[48241]: readjusting service 6969-udp Dec 10 10:33:30 pfs_host01 xinetd[48241]: readjusting service check_mk Dec 10 10:33:30 pfs_host01 xinetd[48241]: Reconfigured: new=0 old=2 dropped=0 (services) Dec 10 09:33:30 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2517 Dec 10 09:33:30 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2548 Dec 10 10:33:30 pfs_host01 sshd[33502]: Fssh_packet_write_poll: Connection from user root x.y.z.26 port 36182: Permission denied Dec 10 10:33:30 pfs_host01 kernel: ovpns1: link state changed to DOWN Dec 10 09:33:30 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2533 Dec 10 09:33:30 pfs_host01 check_reload_status[301]: Carp master event Dec 10 10:33:30 pfs_host01 kernel: carp: 70@lagg0.2548: BACKUP -> MASTER (master timed out) Dec 10 09:33:30 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2503 Dec 10 09:33:31 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.304 Dec 10 09:33:31 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.100 Dec 10 09:33:31 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2532 Dec 10 09:33:31 pfs_host01 php-fpm[53770]: OpenVPN PID written: 40522 Dec 10 09:33:31 pfs_host01 php-fpm[53770]: /rc.carpbackup: Stopping OpenVPN server instance on WAN because of transition to CARP backup. Dec 10 10:33:31 pfs_host01 kernel: ovpns1: link state changed to UP Dec 10 09:33:31 pfs_host01 check_reload_status[301]: rc.newwanip starting ovpns1 Dec 10 09:33:31 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.101 Dec 10 09:33:31 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2546 Dec 10 09:33:31 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.98 Dec 10 09:33:32 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2512 Dec 10 09:33:32 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2547 Dec 10 09:33:32 pfs_host01 php-fpm[53770]: OpenVPN terminate old pid: 58864 Dec 10 10:33:32 pfs_host01 kernel: ovpns2: link state changed to DOWN Dec 10 09:33:32 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.201 Dec 10 09:33:32 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2544 Dec 10 09:33:32 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2529 Dec 10 10:33:32 pfs_host01 kernel: ovpns2: link state changed to UP Dec 10 09:33:32 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2514 Dec 10 09:33:33 pfs_host01 php-fpm[53770]: OpenVPN PID written: 71490 Dec 10 09:33:33 pfs_host01 php-fpm[53770]: /rc.carpbackup: Stopping OpenVPN server instance on WAN because of transition to CARP backup.
All the above just repeats and repeats for megabytes of logging. Then eventually when everything has come back to life the logging ends with this:
Dec 10 11:38:01 pfs_host01 xinetd[48241]: Starting reconfiguration Dec 10 11:38:01 pfs_host01 xinetd[48241]: Swapping defaults Dec 10 11:38:01 pfs_host01 xinetd[48241]: readjusting service 6969-udp Dec 10 11:38:01 pfs_host01 xinetd[48241]: readjusting service check_mk Dec 10 11:38:01 pfs_host01 xinetd[48241]: Reconfigured: new=0 old=2 dropped=0 (services) Dec 10 11:38:02 pfs_host01 kernel: carp: 71@lagg0.2549: BACKUP -> MASTER (preempting a slower master) Dec 10 10:38:02 pfs_host01 check_reload_status[301]: Carp master event Dec 10 10:40:27 pfs_host01 check_reload_status[301]: Syncing firewall Dec 10 10:40:29 pfs_host01 php-fpm[94311]: /rc.filter_synchronize: Beginning XMLRPC sync data to https://172.16.1.3:443/xmlrpc.php. Dec 10 10:40:29 pfs_host01 php-fpm[94311]: /rc.filter_synchronize: XMLRPC reload data success with https://172.16.1.3:443/xmlrpc.php (pfsense.host_firmware_version). Dec 10 10:40:29 pfs_host01 php-fpm[94311]: /rc.filter_synchronize: XMLRPC versioncheck: 21.7 -- 21.7 Dec 10 10:40:29 pfs_host01 php-fpm[94311]: /rc.filter_synchronize: Beginning XMLRPC sync data to https://172.16.1.3:443/xmlrpc.php. Dec 10 10:40:34 pfs_host01 php-fpm[94311]: /rc.filter_synchronize: XMLRPC reload data success with https://172.16.1.3:443/xmlrpc.php (pfsense.restore_config_section). Dec 10 10:41:37 pfs_host01 check_reload_status[301]: Syncing firewall Dec 10 10:42:55 pfs_host01 check_reload_status[301]: Syncing firewall Dec 10 10:43:46 pfs_host01 check_reload_status[301]: Syncing firewall Dec 10 10:50:34 pfs_host01 check_reload_status[301]: Syncing firewall Dec 10 10:51:30 pfs_host01 check_reload_status[301]: Syncing firewall Dec 10 10:52:18 pfs_host01 check_reload_status[301]: Syncing firewall Dec 10 10:53:09 pfs_host01 check_reload_status[301]: Syncing firewall Dec 10 10:54:56 pfs_host01 check_reload_status[301]: Syncing firewall Dec 10 10:55:29 pfs_host01 check_reload_status[301]: Syncing firewall Dec 10 10:55:40 pfs_host01 check_reload_status[301]: Syncing firewall Dec 10 10:56:20 pfs_host01 check_reload_status[301]: Syncing firewall Dec 10 10:56:25 pfs_host01 check_reload_status[301]: Reloading filter Dec 10 11:56:27 pfs_host01 xinetd[48241]: Starting reconfiguration Dec 10 11:56:27 pfs_host01 xinetd[48241]: Swapping defaults Dec 10 11:56:27 pfs_host01 xinetd[48241]: readjusting service 6969-udp Dec 10 11:56:27 pfs_host01 xinetd[48241]: readjusting service check_mk Dec 10 11:56:27 pfs_host01 xinetd[48241]: Reconfigured: new=0 old=2 dropped=0 (services)
I'm sorry if the logging isn't very useful but I'm really at a loss and I tried to gather what I could from the logging. Please ask if you like to know or see something else.
I really appreciate all the help I can get.
Thanks in advance!