VIP works then fails after upgrade

dmmincrjr

I upgraded to 2.0-RC1 on Thursday and the upgrade seemed to go fine. About 4 hours later I noticed I had no connectivity to my mail server which sits on the DMZ from the WAN interface. I have a VIP set up as a Proxy ARP and forward ports to what is required by the mail server. This worked without issue on 1.2.3. When I would try and connect to the mail server from the WAN port connections would time out. I changed the rules to log packets to see if I could find out what was wrong but nothing even seems to be hitting the firewall as no packets are being logged. There is nothing in the system logs that shows the VIP interface going down or having a problem. The only way to resolve seems to be to add another VIP which if I try and access that from the WAN does not work either. I then delete the VIP and switch the original VIP to IP Alias and then back to Proxy ARP and usually after doing this for a few times things seem to work like they should. This would then seem to last about 4 to 5 hours and then not work. I also have to turn off the OpenVPN service to restore the VIP. I have two WAN connections on different interfaces and before upgrading had them set up to act as a failover. In thinking this might have been the issue I removed groups that were created in the routing section as well as deactivated the rules in the LAN network to direct this traffic. I'm getting ready to move back to 1.2.3 by re-installing as I cannot continue to keep playing with this every few hours but was hoping someone might have a suggestion to correct the problem. Even rebooting the firewall does not restore the connectivity and I have also upgraded to the latest release.

eri--

When the issue happens check the system logs and run the command ps -ax | grep chop and post here.

dmmincrjr

Here is the result of the ps command

$ ps -ax | grep chop
24526 ?? Is 0:00.04 /usr/local/sbin/choparp xl1 auto 173.49.X.XX/32
59516 ?? S 0:00.00 sh -c ps -ax | grep chop
59628 ?? S 0:00.00 grep chop

Here are the last 15 lines from the system log. The connection probably went down about 13:02

Apr 1 12:36:10 dnsmasq[38066]: read /etc/hosts - 6 addresses
Apr 1 12:29:13 dnsmasq[38066]: read /etc/hosts - 6 addresses
Apr 1 12:27:06 root: rc.update_bogons.sh is ending the update cycle.
Apr 1 12:27:06 root: Bogons file downloaded: no changes.
Apr 1 12:27:06 root: rc.update_bogons.sh is beginning the update cycle.
Apr 1 12:05:54 check_reload_status: syncing firewall
Apr 1 12:05:53 check_reload_status: reloading filter
Apr 1 12:05:53 check_reload_status: syncing firewall
Apr 1 12:05:49 check_reload_status: syncing firewall
Apr 1 12:05:49 php: /pkg_mgr_install.php: Beginning package installation for File Manager.
Apr 1 12:05:48 check_reload_status: syncing firewall
Apr 1 12:05:48 check_reload_status: syncing firewall
Apr 1 12:05:48 check_reload_status: syncing firewall
Apr 1 12:02:13 kernel: xl1: tx underrun, increasing tx start threshold to 120 bytes
Apr 1 12:02:13 kernel: xl1: transmission error: 90

I am also seeing this in a packet capture and not sure if it means anything.

13:21:51.596989 ARP, Request who-has 173.49.X.XX (00:0a:5e:05:6c:a1) tell 0.0.0.0, length 46

This was the system log while the connection was down.

Apr 1 13:35:00 check_reload_status: reloading filter
Apr 1 13:34:59 check_reload_status: syncing firewall
Apr 1 13:34:32 check_reload_status: reloading filter
Apr 1 13:34:29 check_reload_status: syncing firewall
Apr 1 13:33:15 check_reload_status: reloading filter
Apr 1 13:33:13 check_reload_status: syncing firewall
Apr 1 13:32:24 check_reload_status: reloading filter
Apr 1 13:32:22 check_reload_status: syncing firewall
Apr 1 13:32:13 check_reload_status: syncing firewall
Apr 1 13:31:54 check_reload_status: reloading filter
Apr 1 13:31:54 check_reload_status: syncing firewall
Apr 1 13:31:52 check_reload_status: syncing firewall
Apr 1 13:31:16 check_reload_status: reloading filter
Apr 1 13:31:16 check_reload_status: syncing firewall
Apr 1 13:31:12 check_reload_status: syncing firewall
Apr 1 13:30:41 check_reload_status: reloading filter
Apr 1 13:30:38 check_reload_status: syncing firewall
Apr 1 13:30:17 check_reload_status: reloading filter
Apr 1 13:30:17 kernel: ovpns1: link state changed to DOWN
Apr 1 13:29:14 dnsmasq[38066]: read /etc/hosts - 6 addresses
Apr 1 13:29:14 dnsmasq[38066]: read /etc/hosts - 6 addresses
Apr 1 13:25:16 dnsmasq[38066]: read /etc/hosts - 6 addresses
Apr 1 13:23:43 dnsmasq[38066]: read /etc/hosts - 6 addresses
Apr 1 13:22:27 kernel: xl1: promiscuous mode disabled
Apr 1 13:21:48 kernel: xl1: promiscuous mode enabled
Apr 1 13:17:17 kernel: xl1: promiscuous mode disabled
Apr 1 13:17:13 kernel: xl1: promiscuous mode enabled
Apr 1 12:36:10 dnsmasq[38066]: read /etc/hosts - 6 addresses

eri--

Apr 1 12:02:13 kernel: xl1: tx underrun, increasing tx start threshold to 120 bytes
Apr 1 12:02:13 kernel: xl1: transmission error: 90

Seems like a driver issue.
Can you change you nics easily?

dmmincrjr

I cannot change nics that easily but I do not think that is the issue. That message appeared an hour or so before I lost connectivity and in searching the logs it has only appeared today. It has not appeared before the other instances where I have experienced the issue. The nic is a 3Com 3c905C-TX Fast Etherlink XL and I did not have this issue before upgrading to 2.0.