How to understand gateway logs for troubleshooting
-
Hi,
Any help would be greatly appreciated as i am about to have an argument with my ISP about my connection stability.
My Setup:
- ADSL connection in AU.
- Dlink ADSL modem in bridged mode
- pfsense box with ADSL account details configured for PPPOE connection
- 1 x WAN NIC, 1 x LAN NIC - Various other networking hardware on the LAN side for network distribution
My Issue:
I have been experiencing random connections drops at random times of the day for random lengths of time. Sometimes multiple times an hour. When the issue occurs i notice no inet activity on various computers laptops etc. i log into the my pfsense box and see the WAN interface is down on the dashboard. After several minutes it reconnects. The DSL signal on the modem doesn't appear to have been lost at the time (lights indicate has DSL signal)Logs for gateway for today only:
Jan 1 11:19:49 apinger: alarm canceled (config reload): WAN_PPPOE(...) *** delay ***
Jan 1 11:19:49 apinger: SIGHUP received, reloading configuration.
Jan 1 11:19:45 apinger: alarm canceled: WAN_PPPOE(...) *** loss ***
Jan 1 11:19:45 apinger: alarm canceled: WAN_PPPOE(...) *** down ***
Jan 1 11:18:40 apinger: ALARM: WAN_PPPOE(...) *** down ***
Jan 1 11:18:15 apinger: ALARM: WAN_PPPOE(...) *** loss ***
Jan 1 11:15:47 apinger: ALARM: WAN_PPPOE(...) *** delay ***
Jan 1 11:13:33 apinger: alarm canceled (config reload): WAN_PPPOE(...) *** delay ***
Jan 1 11:13:33 apinger: SIGHUP received, reloading configuration.
Jan 1 11:13:29 apinger: alarm canceled: WAN_PPPOE(...) *** down ***
Jan 1 11:13:12 apinger: ALARM: WAN_PPPOE(...) *** down ***
Jan 1 11:11:40 apinger: ALARM: WAN_PPPOE(...) *** delay ***
Jan 1 11:08:06 apinger: alarm canceled (config reload): WAN_PPPOE(...) *** delay ***
Jan 1 11:08:06 apinger: SIGHUP received, reloading configuration.
Jan 1 11:08:03 apinger: alarm canceled: WAN_PPPOE(...) *** down ***
Jan 1 11:07:45 apinger: ALARM: WAN_PPPOE(...) *** down ***
Jan 1 11:06:13 apinger: ALARM: WAN_PPPOE(...) *** delay ***
Jan 1 10:38:58 apinger: alarm canceled (config reload): WAN_PPPOE(...) *** delay ***
Jan 1 10:38:58 apinger: SIGHUP received, reloading configuration.
Jan 1 10:38:54 apinger: alarm canceled: WAN_PPPOE(...) *** down ***
Jan 1 10:37:05 apinger: ALARM: WAN_PPPOE(...) *** down ***
Jan 1 10:35:29 apinger: ALARM: WAN_PPPOE(...) *** delay ***
Jan 1 10:35:19 apinger: alarm canceled (config reload): WAN_PPPOE(...) *** delay ***
Jan 1 10:35:19 apinger: SIGHUP received, reloading configuration.
Jan 1 10:35:15 apinger: alarm canceled: WAN_PPPOE(...) *** down ***
Jan 1 10:34:59 apinger: ALARM: WAN_PPPOE(...) *** down ***
Jan 1 10:33:30 apinger: ALARM: WAN_PPPOE(...) *** delay ***
Jan 1 10:30:29 apinger: alarm canceled: WAN_PPPOE(...) *** delay ***
Jan 1 10:29:55 apinger: ALARM: WAN_PPPOE(...) *** delay ***
Jan 1 10:29:33 apinger: alarm canceled (config reload): WAN_PPPOE(...) *** delay ***
Jan 1 10:29:33 apinger: SIGHUP received, reloading configuration.
Jan 1 10:29:29 apinger: alarm canceled: WAN_PPPOE(...) *** down ***
Jan 1 10:29:11 apinger: ALARM: WAN_PPPOE(...) *** down ***
Jan 1 10:27:15 apinger: ALARM: WAN_PPPOE(...) *** delay ***
Jan 1 10:27:05 apinger: alarm canceled (config reload): WAN_PPPOE(...) *** delay ***
Jan 1 10:27:05 apinger: SIGHUP received, reloading configuration.
Jan 1 10:27:01 apinger: alarm canceled: WAN_PPPOE(...) *** down ***
Jan 1 10:25:59 apinger: ALARM: WAN_PPPOE(...) *** down ***
Jan 1 10:24:24 apinger: ALARM: WAN_PPPOE(...) *** delay ***
Jan 1 10:23:52 apinger: alarm canceled (config reload): WAN_PPPOE(...) *** delay ***
Jan 1 10:23:52 apinger: SIGHUP received, reloading configuration.
Jan 1 10:23:48 apinger: alarm canceled: WAN_PPPOE(...) *** down ***
Jan 1 10:23:31 apinger: ALARM: WAN_PPPOE(...) *** down ***
Jan 1 10:21:52 apinger: ALARM: WAN_PPPOE(...) *** delay ***
Jan 1 09:41:43 apinger: alarm canceled: WAN_PPPOE(...) *** delay ***
Jan 1 09:41:35 apinger: ALARM: WAN_PPPOE(...) *** delay ***NOTE: IP address have been changed to ...
I interpret these logs as either a connection to my ISPs authentication server or gateway server is lost. (hence the issue is with them)
Any assistance with troubleshooting this before contacting my ISP would be be greatly appreciated.
I am hoping to avoid the usual its your computer, its your cables, its the phone interfering, its the modem.
-
Those log entries are all from the apinger service which monitors the WAN connection. It pings the remote gateway at 1s intervals (by default) to ensure the WAN is still up and proboem free. It raises an alarm if it sees packet loss or delayed response beyond certain limits.
The logs are showing both delay and packet loss. This could be the result of an actual disconnect or it could be your connection is outside the standard limits used by apinger causing false alarms.
You can tune apinger or didsble it entirely as a test.Steve
-
Thanks for the info Steve.
So if i understand this…... my connection to the ISP may actually be ok and when these alarms are occurring pfsense is taking down the link to try to restart it so to speak.
On another note - i have been doing some research on my ISP and there appears to be other people with a similar issue that are most likely not running a pfsense box. So it could still be either way but for the time being i have put a diff modem in that is not in bridge mode (modem controlling the net connection and auth for PPPoE)
This modem does keep some basic connection/re-connection counts so this will help me narrow down the issue. unfortunately it does not give time of disconnection but it is a start for me.
for future reference for me where is the best source of info on how to tune apinger for when i go back the previous setup?
cheers
Todd
-
Watch the RRD WAN quality graphs to get some idea of your ping times and packet loss rate.
In System: Routing: click the 'e' to edit your default gateway. Click advanced to access the apinger parameters. Try increasing them if they are below values you regularly see on your WAN.
Alternatively you can disable apinger altogether as a test by checking 'Disable Gateway Monitoring'.Steve
-
Watch the RRD WAN quality graphs to get some idea of your ping times and packet loss rate.
In System: Routing: click the 'e' to edit your default gateway. Click advanced to access the apinger parameters. Try increasing them if they are below values you regularly see on your WAN.
Alternatively you can disable apinger altogether as a test by checking 'Disable Gateway Monitoring'.Steve
I've experiencing the same issue with my cable modem connection over the past few months. I ended up increasing the gateway monitoring parameters that Stephen is talking about. Once in a while I'll see some entries in the GW log but I can live with it, before it was 4-5 times a day.
On another note, apinger has become really sensitive since 2.1. Before I changed my parameters, a speedtest within my provider networks would trigger an alarm.
Stephen