WAN connection "freezing"



  • Hello everybody, I'd like to start off by saying that I have been very happy with pfSense, up until recently. I am using the VMware image on my server running VMware ESXi. The server has 2 physical NICs, one connected to my cable modem, and the other to a switch which serves up my home network. The pfSense VM is the only one with access to the WAN NIC.

    Now I've been having this issue with the WAN interface "freezing". By "freezing" I mean pfSense is showing an IP from my ISP, but nothing inside can connect to the outside world, or vice-versa. Now I have had this problem about once a month over the past 2 years, and I have always regained connectivity by "releasing" then "renewing" the connection. The server has been down for a few months because I moved and no decent ISPs service my apartment. Tuesday, however, I talked my mother into hosting it for me at her house (a trade-off for her half of the phone bill). Since then I've had this "freezing" issue once a day, and twice within an hour tonight. It has become a major source of frustration for the both of us, and I would really like to know how to resolve the issue. Can anyone give me some helpful advice on what to do?

    If additional information is necessary from the server, it'll have to wait until my mother renews the connection in the morning. I've provided about all I can think of. Also, I've been getting "Response error. Technical description: 502 Bad Gateway _ Response error, a bad response was received from another proxy server or the destination origin server" when checking my website from my phone, and mmy mother gets a DNS error when trying to view websites from behind the firewall.



  • @Vl4dim1r:

    The pfSense VM is the only one with access to the WAN NIC.

    By PCI passthrough?

    @Vl4dim1r:

    Now I've been having this issue with the WAN interface "freezing". By "freezing" I mean pfSense is showing an IP from my ISP, but nothing inside can connect to the outside world, or vice-versa.

    Can anything "inside" "connect" to pfSense (web GUI? or by SSH? or to get a ping response?)

    @Vl4dim1r:

    Now I have had this problem about once a month over the past 2 years, and I have always regained connectivity by "releasing" then "renewing" the connection.

    Your pfSense WAN link is PPP? DHCP? static IP?



  • My hardware doesn't support device passthrough. There are 2 networks set up through VMware for the respective NIC, WAN and LAN, pfSense is the only machine on the WAN network.

    Devices on the LAN side have access to all services on the network, including the webGUI and the VMware backend.

    The connection to the ISP is DHCP. The once a month could have been my lease renewing from my ISP, though I keep the same IP.



  • @Vl4dim1r:

    The connection to the ISP is DHCP.

    Please post the output of  pfSense shell command```

    clog /var/log/system.log | grep dhclient

    
    @Vl4dim1r:
    
    > The once a month could have been my lease renewing from my ISP, though I keep the same IP.
    
    Lease renewal should normally be completed in well under a few seconds.
    
    What version of pfSense are you running?


  • I will post the output in a few hours when my mother wakes up and renews the lease. I've already woken her up once tonight, and I cant afford to driveout there and do it myself at the moment. If there are no additional comments, I'll edit the results into this post to avoid double-posting.

    
    $  clog /var/log/system.log | grep dhclient
    Jan  3 08:29:38 cerberus dhclient[268]: connection closed
    Jan  3 08:29:38 cerberus dhclient[268]: connection closed
    Jan  3 08:29:38 cerberus dhclient[268]: exiting.
    Jan  3 08:29:38 cerberus dhclient[268]: exiting.
    Jan  3 08:29:43 cerberus dhclient[2445]: DHCPREQUEST on em0 to 255.255.255.255 port 67
    Jan  3 08:30:02 cerberus dhclient[2445]: DHCPDISCOVER on em0 to 255.255.255.255 port 67 interval 4
    Jan  3 08:30:02 cerberus dhclient[2445]: DHCPOFFER from 10.35.136.1
    Jan  3 08:30:04 cerberus dhclient[2445]: DHCPREQUEST on em0 to 255.255.255.255 port 67
    Jan  3 08:30:05 cerberus dhclient[2445]: DHCPACK from 10.35.136.1
    Jan  3 08:30:05 cerberus dhclient[2445]: bound to 24.35.138.19 -- renewal in 302400 seconds.
    Jan  3 20:01:11 cerberus dhclient[2446]: connection closed
    Jan  3 20:01:11 cerberus dhclient[2446]: connection closed
    Jan  3 20:01:11 cerberus dhclient[2446]: exiting.
    Jan  3 20:01:11 cerberus dhclient[2446]: exiting.
    Jan  3 20:01:14 cerberus dhclient[839]: DHCPREQUEST on em0 to 255.255.255.255 port 67
    Jan  3 20:01:15 cerberus dhclient[839]: DHCPACK from 10.35.136.1
    Jan  3 20:01:15 cerberus dhclient[839]: bound to 24.35.138.19 -- renewal in 302400 seconds.
    Jan  5 02:03:34 cerberus dhclient[840]: connection closed
    Jan  5 02:03:34 cerberus dhclient[840]: connection closed
    Jan  5 02:03:34 cerberus dhclient[840]: exiting.
    Jan  5 02:03:34 cerberus dhclient[840]: exiting.
    Jan  5 02:03:37 cerberus dhclient[25158]: DHCPREQUEST on em0 to 255.255.255.255 port 67
    Jan  5 02:03:37 cerberus dhclient[25158]: DHCPACK from 10.35.136.1
    Jan  5 02:03:37 cerberus dhclient[25158]: bound to 24.35.138.19 -- renewal in 302400 seconds.
    Jan  5 11:26:27 cerberus dhclient[25159]: connection closed
    Jan  5 11:26:27 cerberus dhclient[25159]: connection closed
    Jan  5 11:26:27 cerberus dhclient[25159]: exiting.
    Jan  5 11:26:27 cerberus dhclient[25159]: exiting.
    Jan  5 11:26:35 cerberus dhclient[12954]: DHCPREQUEST on em0 to 255.255.255.255 port 67
    Jan  5 11:26:35 cerberus dhclient[12954]: DHCPACK from 10.35.136.1
    Jan  5 11:26:35 cerberus dhclient[12954]: bound to 24.35.138.19 -- renewal in 302400 seconds.
    Jan  5 19:09:50 cerberus dhclient[12955]: connection closed
    Jan  5 19:09:50 cerberus dhclient[12955]: connection closed
    Jan  5 19:09:50 cerberus dhclient[12955]: exiting.
    Jan  5 19:09:50 cerberus dhclient[12955]: exiting.
    Jan  5 19:09:55 cerberus dhclient[52161]: DHCPREQUEST on em0 to 255.255.255.255 port 67
    Jan  5 19:09:55 cerberus dhclient[52161]: DHCPACK from 10.35.136.1
    Jan  5 19:09:55 cerberus dhclient[52161]: bound to 24.35.138.19 -- renewal in 302400 seconds.
    
    


  • Do those times when dhclient reports it is exiting all correspond to someone attempting to force DHCP lease renewal?

    If not, are there any entries in the pfSense system log at around those times that reference em0 (for example, link up or link down events).

    Does your cable modem have any event logging facilities? If so, does it report anything "interesting" over the interval the pfSense WAN interface freezes?

    Does VMware report anything relating to the interface to the cable modem in those same intervals.

    When your WAN interfaces "freezes", can you access anything on the Internet from the pfSense console? Can you access the pfSense GUI over the pfSense LAN interface?



  • @wallabybob:

    Do those times when dhclient reports it is exiting all correspond to someone attempting to force DHCP lease renewal?

    Yes, they do correspond with someone forcing a release/renewal.

    @wallabybob:

    Does your cable modem have any event logging facilities? If so, does it report anything "interesting" over the interval the pfSense WAN interface freezes?

    The only thing I'm seeing in  my cable modem log is my mother rebooting it. I'll have to wait until tomorrow to post a full log.

    @wallabybob:

    Does VMware report anything relating to the interface to the cable modem in those same intervals.

    I am seeing the same error for vmnic around the same time it lost connection, the lease renewed, and lost connection again last night. Server time is behind 10 years, ahead 6 months, behind 2 days and ahead 5 hours and 9 minutes. I don't feel its too important to work out the exact time its behind when the days/hours are what I'm concerned with.

    
    Lost network connectivity on virtual switch 
    "vSwitch1". Physical NIC vmnic1 is down. 
    Affected portgroups:"WAN", "WAN".
    error
    7/3/2003 5:09:47 AM
    
    Lost network connectivity on virtual switch 
    "vSwitch1". Physical NIC vmnic1 is down. 
    Affected portgroups:"WAN", "WAN".
    error
    7/3/2003 4:35:53 AM
    
    Lost network connectivity on virtual switch 
    "vSwitch1". Physical NIC vmnic1 is down. 
    Affected portgroups:"WAN", "WAN".
    error
    7/3/2003 3:46:36 AM
    
    ``` 
    
    @wallabybob:
    
    > When your WAN interfaces "freezes", can you access anything on the Internet from the pfSense console? Can you access the pfSense GUI over the pfSense LAN interface?
    
    I cannot access the Internet from the pfSense console. I can access the pfSense GUI over the pfsense LAN 5nterface.


  • Server time is behind 10 years, ahead 6 months, behind 2 days and ahead 5 hours and 9 minutes. I don't feel its too important to work out the exact time its behind when the days/hours are what I'm concerned with.

    i'm wondering if your problems with server time could be the issue …

    -you ask for a ip lease at your isp's dhcp server at 2013-01-06 01:00AM ; the lease last for xxxx seconds
    -your server time shifts back 1 day
    -pfsense thinks lease does not have to be renewed for 1 day + xxxx seconds
    -dhcp lease expires after xxxx seconds, pfsense does not renew ---> connection lost



  • After attempting to decipher the log times, I corrected the time within VMware, so that might take care of it if you're right. The host's bios time is still way off, but I'll have no way of changing it for at least a week.

    Update: Well time was not the issue. I had my mother correct the time in the bios, and the WAN connection has dropped twice today. 10:36AM and just now at 3:38AM.



  • @Vl4dim1r:

    the WAN connection has dropped twice today. 10:36AM and just now at 3:38AM.

    Please post the extract from the system log for the times 10 minutes BEFORE WAN connection lost to 10 minutes AFTER WAN connection lost for both drops.



  • Sorry about taking so long to reply. I've been on a 12 hour/ day, 6 days a week work schedule and the connection is down every morning when i get home. It's started dropping sometime between 11:30pm and 11:50pm consistently every night; getting home at 12am it's somewhat difficult to check my logs. Is there some reason a large download on my mother's computer would cause the connection drop? The times seem to coincide with a Windows update or anti-virus update on her system. I had her change the schedules, but something else might be updating. Also, when my cousin was staying at her house last week it seemed to drop when he watched a large amount of youtube videos. Her desktop is the only Personal Computer on the network. I will try to post the log in the afternoon if I wake up with enough time before work to do so.



  • Did the bios time stay corrected?  (I'm thinking cmos battery)



  • Does the computer shutdown or restart due to a crash? Check the uptime on the pfSense home page.

    It is not clear if "the connection is down" means the pfSense VM is down or the pfSense VM is running but  is not communicating with the cable modem.



  • @MMacD:

    Did the bios time stay corrected?  (I'm thinking cmos battery)

    Yeah it stays corrected, the battery is about a year and a half old.

    @wallabybob:

    Does the computer shutdown or restart due to a crash? Check the uptime on the pfSense home page.

    No, nothing shuts down, restarts, or crashes. Current uptime is

    
    2 days, 04:39
    
    

    . The only reason the uptime isn't longer is because I restarted the pfsense VM after updating.  Which, I might add, was a mistake. The updating that is. Now no new ports I forward (checked in "NAT" and "Rules") work. Anyways, that's a whole other issue. Uptime before the restart was about 2 weeks.

    @wallabybob:

    It is not clear if "the connection is down" means the pfSense VM is down or the pfSense VM is running but  is not communicating with the cable modem.

    "Connection is down" means there is no communication between my network and the cable modem/internet. The pFsense VM stays running.

    Is there a way I can set pFsense to auto-renew every few hours as a temporary fix? I was also thinking about getting a USB 3G modem for a backup connection.



  • You're getting a 7 day lease length. It'll renew at 3.5 days. The fact something stops working that requires a release/renew way more frequently than the lease length seems to indicate an ISP problem. It's possible it could be some very complex issue related to something with your ESX host and cable modem, I'd put a bare metal physical firewall in place to get that complexity out of the picture before trying to convince your ISP there's something broken. Or put a network tap between your ESX host and cable modem and see what's happening on the wire at that level, but guessing that's probably not an option.



  • What is your Monitoring IP set to?  I had mine set up to monitor the router IP address, which for some reason wouldn't renew even if it said the link was down.  I chose something like 8.8.8.8, which for the time being has maintained my connection when the connection goes down.  My ISP was working on their infrastructure and I wouldn't get the same router address when their systems came back up.  This would force my IP address to completely change.  This for some reason would cause the WAN port to not renew it's IP address and I'd have to come in and manually release/renew it.  For now I haven't seen the problem occur again and changing the Monitoring IP may have helped.  System->Routing->Gateways.


Locked