Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    VIP works then fails after upgrade

    2.0-RC Snapshot Feedback and Problems - RETIRED
    2
    5
    1.8k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      dmmincrjr
      last edited by

      I upgraded to 2.0-RC1 on Thursday and the upgrade seemed to go fine. About 4 hours later I noticed I had no connectivity to my mail server which sits on the DMZ from the WAN interface. I have a VIP set up as a Proxy ARP and forward ports to what is required by the mail server. This worked without issue on 1.2.3. When I would try and connect to the mail server from the WAN port connections would time out. I changed the rules to log packets to see if I could find out what was wrong but nothing even seems to be hitting the firewall as no packets are being logged. There is nothing in the system logs that shows the VIP interface going down or having a problem. The only way to resolve seems to be to add another VIP which if I try and access that from the WAN does not work either. I then delete the VIP and switch the original VIP to IP Alias and then back to Proxy ARP and usually after doing this for a few times things seem to work like they should. This would then seem to last about 4 to 5 hours and then not work. I also have to turn off the OpenVPN service to restore the VIP. I have two WAN connections on different interfaces and before upgrading had them set up to act as a failover. In thinking this might have been the issue I removed groups that were created in the routing section as well as deactivated the rules in the LAN network to direct this traffic. I'm getting ready to move back to 1.2.3 by re-installing as I cannot continue to keep playing with this every few hours but was hoping someone might have a suggestion to correct the problem. Even rebooting the firewall does not restore the connectivity and I have also upgraded to the latest release.

      1 Reply Last reply Reply Quote 0
      • E
        eri--
        last edited by

        When the issue happens check the system logs and run the command ps -ax | grep chop and post here.

        1 Reply Last reply Reply Quote 0
        • D
          dmmincrjr
          last edited by

          Here is the result of the ps command

          $ ps -ax | grep chop
          24526  ??  Is    0:00.04 /usr/local/sbin/choparp xl1 auto 173.49.X.XX/32
          59516  ??  S      0:00.00 sh -c ps -ax | grep chop
          59628  ??  S      0:00.00 grep chop

          Here are the last 15 lines from the system log. The connection probably went down about 13:02

          Apr 1 12:36:10 dnsmasq[38066]: read /etc/hosts - 6 addresses
          Apr 1 12:29:13 dnsmasq[38066]: read /etc/hosts - 6 addresses
          Apr 1 12:27:06 root: rc.update_bogons.sh is ending the update cycle.
          Apr 1 12:27:06 root: Bogons file downloaded: no changes.
          Apr 1 12:27:06 root: rc.update_bogons.sh is beginning the update cycle.
          Apr 1 12:05:54 check_reload_status: syncing firewall
          Apr 1 12:05:53 check_reload_status: reloading filter
          Apr 1 12:05:53 check_reload_status: syncing firewall
          Apr 1 12:05:49 check_reload_status: syncing firewall
          Apr 1 12:05:49 php: /pkg_mgr_install.php: Beginning package installation for File Manager.
          Apr 1 12:05:48 check_reload_status: syncing firewall
          Apr 1 12:05:48 check_reload_status: syncing firewall
          Apr 1 12:05:48 check_reload_status: syncing firewall
          Apr 1 12:02:13 kernel: xl1: tx underrun, increasing tx start threshold to 120 bytes
          Apr 1 12:02:13 kernel: xl1: transmission error: 90

          I am also seeing this in a packet capture and not sure if it means anything.

          13:21:51.596989 ARP, Request who-has 173.49.X.XX (00:0a:5e:05:6c:a1) tell 0.0.0.0, length 46

          This was the system log while the connection was down.

          Apr 1 13:35:00 check_reload_status: reloading filter
          Apr 1 13:34:59 check_reload_status: syncing firewall
          Apr 1 13:34:32 check_reload_status: reloading filter
          Apr 1 13:34:29 check_reload_status: syncing firewall
          Apr 1 13:33:15 check_reload_status: reloading filter
          Apr 1 13:33:13 check_reload_status: syncing firewall
          Apr 1 13:32:24 check_reload_status: reloading filter
          Apr 1 13:32:22 check_reload_status: syncing firewall
          Apr 1 13:32:13 check_reload_status: syncing firewall
          Apr 1 13:31:54 check_reload_status: reloading filter
          Apr 1 13:31:54 check_reload_status: syncing firewall
          Apr 1 13:31:52 check_reload_status: syncing firewall
          Apr 1 13:31:16 check_reload_status: reloading filter
          Apr 1 13:31:16 check_reload_status: syncing firewall
          Apr 1 13:31:12 check_reload_status: syncing firewall
          Apr 1 13:30:41 check_reload_status: reloading filter
          Apr 1 13:30:38 check_reload_status: syncing firewall
          Apr 1 13:30:17 check_reload_status: reloading filter
          Apr 1 13:30:17 kernel: ovpns1: link state changed to DOWN
          Apr 1 13:29:14 dnsmasq[38066]: read /etc/hosts - 6 addresses
          Apr 1 13:29:14 dnsmasq[38066]: read /etc/hosts - 6 addresses
          Apr 1 13:25:16 dnsmasq[38066]: read /etc/hosts - 6 addresses
          Apr 1 13:23:43 dnsmasq[38066]: read /etc/hosts - 6 addresses
          Apr 1 13:22:27 kernel: xl1: promiscuous mode disabled
          Apr 1 13:21:48 kernel: xl1: promiscuous mode enabled
          Apr 1 13:17:17 kernel: xl1: promiscuous mode disabled
          Apr 1 13:17:13 kernel: xl1: promiscuous mode enabled
          Apr 1 12:36:10 dnsmasq[38066]: read /etc/hosts - 6 addresses

          1 Reply Last reply Reply Quote 0
          • E
            eri--
            last edited by

            Apr 1 12:02:13    kernel: xl1: tx underrun, increasing tx start threshold to 120 bytes
            Apr 1 12:02:13    kernel: xl1: transmission error: 90

            Seems like a driver issue.
            Can you change you nics easily?

            1 Reply Last reply Reply Quote 0
            • D
              dmmincrjr
              last edited by

              I cannot change nics that easily but I do not think that is the issue. That message appeared an hour or so before I lost connectivity and in searching the logs it has only appeared today. It has not appeared before the other instances where I have experienced the issue. The nic is a 3Com 3c905C-TX Fast Etherlink XL and I did not have this issue before upgrading to 2.0.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.