Navigation

    Netgate Discussion Forum
    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search

    creating a vlan causes pfsense to go down

    HA/CARP/VIPs
    1
    1
    513
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      securvark last edited by

      Since we upgraded 2.4.5_p1 to 2.5.2 we can no longer create a vlan without pfsense going down for 10 to 15 minutes after which it will recover. Each time we try, pfsense goes down immediately.

      An overview of the setup:
      2 identical physical servers in a CARP setup
      Intel(R) Atom(TM) CPU C2758 @ 2.40GHz
      8 CPUs: 1 package(s) x 8 core(s)
      AES-NI CPU Crypto: Yes (active)
      8 GB RAM

      A single interface igb0 for wan
      LAGG0 with igb1/igb2 for internal networks
      46 vlans on lagg0
      23 ipsec tunnels to different sites

      On 2.4.5_p1 we never experienced this issue. We could create vlans and interfaces without causing any interruption or downtime.

      The upgrade was done by reinstalling pfSense and restoring the xml backup.

      The logs during this 10 to 15 minute outage are enormous, literally dozens of MB's of logging. Im not sure how much of it is actually useful. Here are some lines with I hope give an impression of what is happening. A note about the timestamps is that it may look like hours, its actually minutes but some processes log with 1 hour difference. I also think some lines are logged out of order because during this outage, dozens and dozens of megabytes of logging is written out. Sometimes hundreds of lines during the same second.

      It starts with this line:

      Dec 10 10:33:20 pfs_host01 kernel: vlan45: changing name to 'lagg0.2549'
      

      Followed by about 500 lines repeating this for all interfaces:

      Dec 10 10:33:20 pfs_host01 syslogd: sendto: Network is down
      Dec 10 10:33:21 pfs_host01 syslogd: sendto: Network is down
      Dec 10 09:33:21 pfs_host01 check_reload_status[301]: Linkup starting igb1
      Dec 10 10:33:21 pfs_host01 syslogd: sendto: Network is down
      Dec 10 10:33:21 pfs_host01 kernel: igb1: link state changed to DOWN
      Dec 10 10:33:21 pfs_host01 syslogd: sendto: Network is down
      Dec 10 10:33:21 pfs_host01 kernel: igb2: link state changed to DOWN
      Dec 10 10:33:21 pfs_host01 syslogd: sendto: Network is down
      Dec 10 10:33:21 pfs_host01 kernel: lagg0: link state changed to DOWN
      Dec 10 10:33:21 pfs_host01 syslogd: sendto: Network is down
      Dec 10 10:33:21 pfs_host01 kernel: carp: 53@lagg0.2535: MASTER -> INIT (hardware interface down)
      

      After this there a large blurp of logging along the lines of:

      Dec 10 09:33:22 pfs_host01 php-fpm[53770]: /rc.carpbackup: HA cluster member "(10.0.128.1@lagg0.204): (ACC)" has resumed CARP state "BACKUP" for vhid 12
      Dec 10 10:33:22 pfs_host01 syslogd: sendto: Network is down
      Dec 10 09:33:22 pfs_host01 php-fpm[71411]: /rc.linkup: Hotplug event detected for SOMESITENAME(opt32) static IP (1.2.3.4 )
      Dec 10 10:33:22 pfs_host01 syslogd: sendto: Network is down
      Dec 10 09:33:22 pfs_host01 php-fpm[71411]: /rc.carpbackup: HA cluster member "(10.118.1.1@lagg0.118): (SOMENETWORKNAME)" has resumed CARP state "BACKUP" for vhid 50
      Dec 10 10:33:22 pfs_host01 syslogd: sendto: Network is down
      Dec 10 09:33:23 pfs_host01 php-fpm[53770]: /rc.linkup: Hotplug event detected for SOMESITENAME(opt46) static IP (1.2.3.4 )
      Dec 10 10:33:23 pfs_host01 syslogd: sendto: Network is down
      Dec 10 09:33:23 pfs_host01 php-fpm[71411]: /rc.carpbackup: HA cluster member "(10.0.152.1@lagg0.304): (TST)" has resumed CARP state "BACKUP" for vhid 18
      Dec 10 10:33:23 pfs_host01 syslogd: sendto: Network is down
      Dec 10 09:33:23 pfs_host01 check_reload_status[301]: Reloading filter
      Dec 10 10:33:23 pfs_host01 syslogd: sendto: Network is down
      Dec 10 09:33:23 pfs_host01 php-fpm[53770]: /rc.carpbackup: HA cluster member "(10.0.100.1@lagg0.100): (NETWORKNAME)" has resumed CARP state "BACKUP" for vhid 2
      Dec 10 10:33:23 pfs_host01 syslogd: sendto: Network is down
      Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down
      Dec 10 09:33:24 pfs_host01 php-fpm[71411]: /rc.carpbackup: HA cluster member "(10.0.98.1@lagg0.98): (NETWORKNAME)" has resumed CARP state "BACKUP" for vhid 49
      Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down
      Dec 10 09:33:24 pfs_host01 php-fpm[53770]: /rc.linkup: Hotplug event detected for NETWORKNAME(opt27) static IP (1.2.3.4 )
      Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down
      Dec 10 09:33:24 pfs_host01 php-fpm[53770]: /rc.linkup: Hotplug event detected for NETWORKNAME(opt7) static IP (1.2.3.4 )
      Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down
      Dec 10 10:33:24 pfs_host01 kernel: igb1: link state changed to UP
      Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down
      Dec 10 10:33:24 pfs_host01 kernel: igb2: link state changed to UP
      Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down
      Dec 10 09:33:24 pfs_host01 check_reload_status[301]: Linkup starting igb1
      Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down
      Dec 10 09:33:24 pfs_host01 check_reload_status[301]: Linkup starting igb2
      Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down
      

      Then some of this (the tripple dots indicate more of the same, removed for readability):

      Dec 10 09:33:27 pfs_host01 check_reload_status[301]: Linkup starting lagg0
      Dec 10 10:33:27 pfs_host01 kernel: lagg0: link state changed to UP
      Dec 10 10:33:27 pfs_host01 kernel: carp: 53@lagg0.2535: INIT -> BACKUP (initialization complete)
      Dec 10 10:33:27 pfs_host01 kernel: carp: demoted by -240 to 10080 (interface up)
      Dec 10 10:33:27 pfs_host01 kernel: lagg0.2535: link state changed to UP
      Dec 10 10:33:27 pfs_host01 kernel: carp: 30@lagg0.2516: INIT -> BACKUP (initialization complete)
      Dec 10 10:33:27 pfs_host01 kernel: carp: demoted by -240 to 9840 (interface up)
      ....
      Dec 10 10:33:28 pfs_host01 kernel: carp: 44@igb0: BACKUP -> MASTER (preempting a slower master)
      Dec 10 10:33:28 pfs_host01 kernel: carp: 5@igb3: BACKUP -> MASTER (preempting a slower master)
      Dec 10 10:33:28 pfs_host01 kernel: carp: 45@igb0: BACKUP -> MASTER (preempting a slower master)
      Dec 10 10:33:28 pfs_host01 kernel: carp: 47@igb0: BACKUP -> MASTER (preempting a slower master)
      Dec 10 10:33:28 pfs_host01 kernel: carp: 1@igb0: BACKUP -> MASTER (preempting a slower master)
      Dec 10 10:33:28 pfs_host01 kernel: carp: 4@igb0: BACKUP -> MASTER (preempting a slower master)
      Dec 10 10:33:28 pfs_host01 kernel: carp: 27@igb0: BACKUP -> MASTER (preempting a slower master)
      Dec 10 10:33:28 pfs_host01 kernel: carp: 38@igb0: BACKUP -> MASTER (preempting a slower master)
      Dec 10 10:33:28 pfs_host01 kernel: carp: 39@igb0: BACKUP -> MASTER (preempting a slower master)
      Dec 10 10:33:28 pfs_host01 kernel: carp: 40@igb0: BACKUP -> MASTER (preempting a slower master)
      Dec 10 10:33:28 pfs_host01 kernel: carp: 43@igb0: BACKUP -> MASTER (preempting a slower master)
      ...
      Dec 10 10:33:42 pfs_host01 kernel: carp: 68@lagg0.2546: MASTER -> BACKUP (more frequent advertisement received)
      Dec 10 10:33:42 pfs_host01 kernel: carp: 67@lagg0.2545: MASTER -> BACKUP (more frequent advertisement received)
      Dec 10 10:33:42 pfs_host01 kernel: carp: 69@lagg0.2547: MASTER -> BACKUP (more frequent advertisement received)
      Dec 10 10:33:42 pfs_host01 kernel: carp: 66@lagg0.2544: MASTER -> BACKUP (more frequent advertisement received)
      Dec 10 10:33:42 pfs_host01 kernel: carp: 65@lagg0.2543: MASTER -> BACKUP (more frequent advertisement received)
      

      Then:

      Dec 10 09:33:49 pfs_host01 check_reload_status[301]: Carp master event
      Dec 10 10:33:49 pfs_host01 kernel: carp: 36@lagg0.2514: BACKUP -> MASTER (preempting a slower master)
      Dec 10 10:33:49 pfs_host01 kernel: carp: 53@lagg0.2535: BACKUP -> MASTER (preempting a slower master)
      Dec 10 09:33:49 pfs_host01 check_reload_status[301]: Carp master event
      Dec 10 09:33:49 pfs_host01 check_reload_status[301]: Could not connect to /var/run/php-fpm.socket
      Dec 10 09:33:50 pfs_host01 check_reload_status[301]: Could not connect to /var/run/php-fpm.socket
      Dec 10 10:33:50 pfs_host01 kernel: carp: 34@lagg0.2517: BACKUP -> MASTER (master timed out)
      Dec 10 09:33:50 pfs_host01 check_reload_status[301]: Carp master event
      Dec 10 09:33:50 pfs_host01 check_reload_status[301]: Could not connect to /var/run/php-fpm.socket
      Dec 10 09:33:51 pfs_host01 check_reload_status[301]: Could not connect to /var/run/php-fpm.socket
      

      And literally thousands and thousands of these repeat (I think >90% of all the logging is this line):

      Dec 10 09:34:20 pfs_host01 check_reload_status[301]: Could not connect to /var/run/php-fpm.socket
      

      All of the above seem to repeat for a while and then:

      Dec 10 09:33:24 pfs_host01 php-fpm[53770]: /rc.carpbackup: HA cluster member "(x.y.0.129@lagg0.2529): (REDACTED)" has resumed CARP state "BACKUP" for vhid 46
      Dec 10 10:33:24 pfs_host01 syslogd: sendto: Network is down
      Dec 10 10:33:25 pfs_host01 syslogd: sendto: Network is down
      

      And links coming back up again (many more of these):

      Dec 10 09:33:27 pfs_host01 check_reload_status[301]: Linkup starting lagg0
      Dec 10 10:33:27 pfs_host01 kernel: lagg0: link state changed to UP
      Dec 10 10:33:27 pfs_host01 kernel: carp: 53@lagg0.2535: INIT -> BACKUP (initialization complete)
      Dec 10 10:33:27 pfs_host01 kernel: carp: demoted by -240 to 10080 (interface up)
      Dec 10 10:33:27 pfs_host01 kernel: lagg0.2535: link state changed to UP
      Dec 10 10:33:27 pfs_host01 kernel: carp: 30@lagg0.2516: INIT -> BACKUP (initialization complete)
      Dec 10 10:33:27 pfs_host01 kernel: carp: demoted by -240 to 9840 (interface up)
      

      Followed by:

      Dec 10 09:33:27 pfs_host01 check_reload_status[301]: Carp backup event
      Dec 10 09:33:28 pfs_host01 check_reload_status[301]: Linkup starting lagg0.2535
      

      For each interface.

      Next OpenVPN server is going down after some CARP messages, I think repeatedly (and due to CARP failover?):

      Dec 10 09:33:29 pfs_host01 php-fpm[71411]: /rc.filter_synchronize: Beginning XMLRPC sync data to https://172.16.1.3:443/xmlrpc.php.
      Dec 10 09:33:29 pfs_host01 check_reload_status[301]: Carp master event
      Dec 10 09:33:29 pfs_host01 php-fpm[53770]: /rc.carpbackup: Stopping OpenVPN server instance on WAN because of transition to CARP backup.
      Dec 10 09:33:29 pfs_host01 check_reload_status[301]: Carp master event
      Dec 10 09:33:29 pfs_host01 check_reload_status[301]: Reloading filter
      Dec 10 09:33:29 pfs_host01 check_reload_status[301]: Carp master event
      Dec 10 09:33:29 pfs_host01 php-fpm[71411]: /rc.filter_synchronize: XMLRPC reload data success with https://172.16.1.3:443/xmlrpc.php (pfsense.host_firmware_version).
      Dec 10 09:33:29 pfs_host01 php-fpm[71411]: /rc.filter_synchronize: XMLRPC versioncheck: 21.7 -- 21.7
      Dec 10 09:33:29 pfs_host01 php-fpm[71411]: /rc.filter_synchronize: Beginning XMLRPC sync data to https://172.16.1.3:443/xmlrpc.php.
      Dec 10 09:33:29 pfs_host01 check_reload_status[301]: Carp master event
      Dec 10 09:33:29 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2535
      Dec 10 09:33:29 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2516
      Dec 10 09:33:29 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.204
      Dec 10 09:33:30 pfs_host01 php-fpm[53770]: OpenVPN terminate old pid: 36104
      Dec 10 09:33:30 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.16
      Dec 10 09:33:30 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2534
      Dec 10 09:33:30 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.118
      Dec 10 10:33:30 pfs_host01 xinetd[48241]: Starting reconfiguration
      Dec 10 10:33:30 pfs_host01 xinetd[48241]: Swapping defaults
      Dec 10 10:33:30 pfs_host01 xinetd[48241]: readjusting service 6969-udp
      Dec 10 10:33:30 pfs_host01 xinetd[48241]: readjusting service check_mk
      Dec 10 10:33:30 pfs_host01 xinetd[48241]: Reconfigured: new=0 old=2 dropped=0 (services)
      Dec 10 09:33:30 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2517
      Dec 10 09:33:30 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2548
      Dec 10 10:33:30 pfs_host01 sshd[33502]: Fssh_packet_write_poll: Connection from user root x.y.z.26 port 36182: Permission denied
      Dec 10 10:33:30 pfs_host01 kernel: ovpns1: link state changed to DOWN
      Dec 10 09:33:30 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2533
      Dec 10 09:33:30 pfs_host01 check_reload_status[301]: Carp master event
      Dec 10 10:33:30 pfs_host01 kernel: carp: 70@lagg0.2548: BACKUP -> MASTER (master timed out)
      Dec 10 09:33:30 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2503
      Dec 10 09:33:31 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.304
      Dec 10 09:33:31 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.100
      Dec 10 09:33:31 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2532
      Dec 10 09:33:31 pfs_host01 php-fpm[53770]: OpenVPN PID written: 40522
      Dec 10 09:33:31 pfs_host01 php-fpm[53770]: /rc.carpbackup: Stopping OpenVPN server instance on WAN because of transition to CARP backup.
      Dec 10 10:33:31 pfs_host01 kernel: ovpns1: link state changed to UP
      Dec 10 09:33:31 pfs_host01 check_reload_status[301]: rc.newwanip starting ovpns1
      Dec 10 09:33:31 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.101
      Dec 10 09:33:31 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2546
      Dec 10 09:33:31 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.98
      Dec 10 09:33:32 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2512
      Dec 10 09:33:32 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2547
      Dec 10 09:33:32 pfs_host01 php-fpm[53770]: OpenVPN terminate old pid: 58864
      Dec 10 10:33:32 pfs_host01 kernel: ovpns2: link state changed to DOWN
      Dec 10 09:33:32 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.201
      Dec 10 09:33:32 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2544
      Dec 10 09:33:32 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2529
      Dec 10 10:33:32 pfs_host01 kernel: ovpns2: link state changed to UP
      Dec 10 09:33:32 pfs_host01 check_reload_status[301]: rc.newwanip starting lagg0.2514
      Dec 10 09:33:33 pfs_host01 php-fpm[53770]: OpenVPN PID written: 71490
      Dec 10 09:33:33 pfs_host01 php-fpm[53770]: /rc.carpbackup: Stopping OpenVPN server instance on WAN because of transition to CARP backup.
      

      All the above just repeats and repeats for megabytes of logging. Then eventually when everything has come back to life the logging ends with this:

      Dec 10 11:38:01 pfs_host01 xinetd[48241]: Starting reconfiguration
      Dec 10 11:38:01 pfs_host01 xinetd[48241]: Swapping defaults
      Dec 10 11:38:01 pfs_host01 xinetd[48241]: readjusting service 6969-udp
      Dec 10 11:38:01 pfs_host01 xinetd[48241]: readjusting service check_mk
      Dec 10 11:38:01 pfs_host01 xinetd[48241]: Reconfigured: new=0 old=2 dropped=0 (services)
      Dec 10 11:38:02 pfs_host01 kernel: carp: 71@lagg0.2549: BACKUP -> MASTER (preempting a slower master)
      Dec 10 10:38:02 pfs_host01 check_reload_status[301]: Carp master event
      Dec 10 10:40:27 pfs_host01 check_reload_status[301]: Syncing firewall
      Dec 10 10:40:29 pfs_host01 php-fpm[94311]: /rc.filter_synchronize: Beginning XMLRPC sync data to https://172.16.1.3:443/xmlrpc.php.
      Dec 10 10:40:29 pfs_host01 php-fpm[94311]: /rc.filter_synchronize: XMLRPC reload data success with https://172.16.1.3:443/xmlrpc.php (pfsense.host_firmware_version).
      Dec 10 10:40:29 pfs_host01 php-fpm[94311]: /rc.filter_synchronize: XMLRPC versioncheck: 21.7 -- 21.7
      Dec 10 10:40:29 pfs_host01 php-fpm[94311]: /rc.filter_synchronize: Beginning XMLRPC sync data to https://172.16.1.3:443/xmlrpc.php.
      Dec 10 10:40:34 pfs_host01 php-fpm[94311]: /rc.filter_synchronize: XMLRPC reload data success with https://172.16.1.3:443/xmlrpc.php (pfsense.restore_config_section).
      Dec 10 10:41:37 pfs_host01 check_reload_status[301]: Syncing firewall
      Dec 10 10:42:55 pfs_host01 check_reload_status[301]: Syncing firewall
      Dec 10 10:43:46 pfs_host01 check_reload_status[301]: Syncing firewall
      Dec 10 10:50:34 pfs_host01 check_reload_status[301]: Syncing firewall
      Dec 10 10:51:30 pfs_host01 check_reload_status[301]: Syncing firewall
      Dec 10 10:52:18 pfs_host01 check_reload_status[301]: Syncing firewall
      Dec 10 10:53:09 pfs_host01 check_reload_status[301]: Syncing firewall
      Dec 10 10:54:56 pfs_host01 check_reload_status[301]: Syncing firewall
      Dec 10 10:55:29 pfs_host01 check_reload_status[301]: Syncing firewall
      Dec 10 10:55:40 pfs_host01 check_reload_status[301]: Syncing firewall
      Dec 10 10:56:20 pfs_host01 check_reload_status[301]: Syncing firewall
      Dec 10 10:56:25 pfs_host01 check_reload_status[301]: Reloading filter
      Dec 10 11:56:27 pfs_host01 xinetd[48241]: Starting reconfiguration
      Dec 10 11:56:27 pfs_host01 xinetd[48241]: Swapping defaults
      Dec 10 11:56:27 pfs_host01 xinetd[48241]: readjusting service 6969-udp
      Dec 10 11:56:27 pfs_host01 xinetd[48241]: readjusting service check_mk
      Dec 10 11:56:27 pfs_host01 xinetd[48241]: Reconfigured: new=0 old=2 dropped=0 (services)
      

      I'm sorry if the logging isn't very useful but I'm really at a loss and I tried to gather what I could from the logging. Please ask if you like to know or see something else.

      I really appreciate all the help I can get.

      Thanks in advance!

      1 Reply Last reply Reply Quote 0
      • First post
        Last post