pfSense email flood issue
-
@stephenw10 yes, my wan2 went down for some time, but it stayed down for at least 10 minutes. Usually I receive 1 email when this happens, but sometimes the emails does not stop until I reboot the firewall.
-
You you do see the gateway logs showing WAN2 going up and down numerous times when this happens? Does the system log show anything? Like maybe the link itself is flapping?
Steve
-
@stephenw10 Sorry, didn't really understand the question the first time.
I have a bunch of "Link: DOWN" (15 in total) until it went up again:May 12 11:56:21 link ppp[99924]: [opt1_link0] LCP: no reply to 1 echo request(s) May 12 11:56:31 link ppp[99924]: [opt1_link0] LCP: no reply to 2 echo request(s) May 12 11:56:41 link ppp[99924]: [opt1_link0] LCP: no reply to 3 echo request(s) May 12 11:56:51 link ppp[99924]: [opt1_link0] LCP: no reply to 4 echo request(s) May 12 11:57:01 link ppp[99924]: [opt1_link0] LCP: no reply to 5 echo request(s) May 12 11:57:01 link ppp[99924]: [opt1_link0] LCP: peer not responding to echo requestsMay 12 11:57:01 link ppp[99924]: [opt1_link0] LCP: state change Opened --> Stopping May 12 11:57:01 link ppp[99924]: [opt1_link0] Link: Leave bundle "opt1" May 12 11:57:01 link ppp[99924]: [opt1] Bundle: Status update: up 0 links, total bandwidth 9600 bps May 12 11:57:01 link ppp[99924]: [opt1] IPCP: Close event May 12 11:57:01 link ppp[99924]: [opt1] IPCP: state change Opened --> Closing May 12 11:57:01 link ppp[99924]: [opt1] IPCP: SendTerminateReq #4 May 12 11:57:01 link ppp[99924]: [opt1] IPCP: LayerDown May 12 11:57:01 link ppp[99924]: [opt1] IFACE: Down event May 12 11:57:01 link ppp[99924]: [opt1] IFACE: Rename interface pppoe0 to pppoe0 May 12 11:57:01 link ppp[99924]: [opt1] IFACE: Set description "WAN2" May 12 11:57:01 link ppp[99924]: [opt1] IPCP: Down event May 12 11:57:01 link ppp[99924]: [opt1] IPCP: LayerFinish May 12 11:57:01 link ppp[99924]: [opt1] Bundle: No NCPs left. Closing links... May 12 11:57:01 link ppp[99924]: [opt1] IPCP: state change Closing --> Initial May 12 11:57:01 link ppp[99924]: [opt1_link0] LCP: SendTerminateReq #2 May 12 11:57:01 link ppp[99924]: [opt1_link0] LCP: LayerDown May 12 11:57:03 link ppp[99924]: [opt1_link0] LCP: SendTerminateReq #3 May 12 11:57:05 link ppp[99924]: [opt1_link0] LCP: state change Stopping --> Stopped May 12 11:57:05 link ppp[99924]: [opt1_link0] LCP: LayerFinish May 12 11:57:05 link ppp[99924]: [opt1_link0] PPPoE: connection closed May 12 11:57:05 link ppp[99924]: [opt1_link0] Link: DOWN event May 12 11:57:05 link ppp[99924]: [opt1_link0] LCP: Down event May 12 11:57:05 link ppp[99924]: [opt1_link0] LCP: state change Stopped --> Starting May 12 11:57:05 link ppp[99924]: [opt1_link0] LCP: LayerStart May 12 11:57:05 link ppp[99924]: [opt1_link0] Link: reconnection attempt 1 in 1 secondsMay 12 11:57:06 link ppp[99924]: [opt1_link0] Link: reconnection attempt 1 May 12 11:57:06 link ppp[99924]: [opt1_link0] PPPoE: Connecting to '' May 12 11:57:15 link ppp[99924]: [opt1_link0] PPPoE connection timeout after 9 seconds May 12 11:57:15 link ppp[99924]: [opt1_link0] Link: DOWN event May 12 11:57:15 link ppp[99924]: [opt1_link0] LCP: Down event May 12 11:57:15 link ppp[99924]: [opt1_link0] Link: reconnection attempt 2 in 3 secondsMay 12 11:57:18 link ppp[99924]: [opt1_link0] Link: reconnection attempt 2 May 12 11:57:18 link ppp[99924]: [opt1_link0] PPPoE: Connecting to '' May 12 11:57:27 link ppp[99924]: [opt1_link0] PPPoE connection timeout after 9 seconds May 12 11:57:27 link ppp[99924]: [opt1_link0] Link: DOWN event May 12 11:57:27 link ppp[99924]: [opt1_link0] LCP: Down event May 12 11:57:27 link ppp[99924]: [opt1_link0] Link: reconnection attempt 3 in 2 secondsMay 12 11:57:29 link ppp[99924]: [opt1_link0] Link: reconnection attempt 3
I don't think this is considered flapping, isn't it?
What's strange is that the lines in the email are repeating, the timestamps are not changing. Instead, new notifications are adding up.
-
Yes, I agree.
Do they just continually add up or is it sending you, say, all the alerts from the last 10mins each time?
-
@stephenw10 they keep adding up, the last mail had:
-pppoe disconnected
-pppoe reconnected
-updated DNS record
-the start of the reboot (for stopping this email flood)
-the completion of the bootup -
That's still all inside the last 10mins though. Does it eventually stop sending the oldest alerts? How long does that take?
-
@stephenw10 I don't know, I had to restart it.
I'm trying to replicate manually what happened today to see if this behavior keeps happening.Another thing I didn't mentioned before: I also have telegram notifications enabled, and they are working normally, they are not being replicated.
EDIT:
it's happening again: unplugged the ups (simulating a blackout), unplugged the wan2 eth, plugged in the ups after 15 seconds and then plugged in the wan2.
It's been 12 min and I received over 60 emails.Is there anything I can look while the problem is ongoing?
-
Do you see those in /tmp/notices?
Do you see a huge list of alerts in the GUI?
-
@stephenw10 sorry I had to reboot it and didn't check this. For a little over 20 min I received the same text 98 times. They kept adding up to the same emails.
Tomorrow I'll do another try and check the tmp folder as you suggested.
-
@stephenw10
I didn't found any notices file(s) or directory under /tmp.
Attached are the tmp list during the problem, after ups and wan2 reconnection and after a complete reboot.
during.txt after_reboot.txt after.txt -
Similar problem here after 2.6.0 update.
Triggered by (NUT) UPS CyberPower on battery and/or DynDNS updated IP Address on WAN (pppoe0).
Sample Gateway Log Entries:
May 18 22:17:54 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:53 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:53 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:52 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:52 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:51 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:51 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:50 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:50 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:49 dpinger 26971 WAN_PPPOE : sendto error: 65
...and on and on.Only a reboot will stop the emails.
Also, the continuous email notification "DynDNS updated IP Address on WAN (pppoe0)" is indicating a stale WAN IP, not the current/updated one.
Also, after the 2.6.0 update, I still require the patch in order for my Dynamic DNS Clients (No-IP) to update.
I have disabled notifications for the time being.
Thanks for your help. -
Hello, any update/advice on this issue?
-
My issue was solved with a fresh install of 2.6.0 and a restore of the configuration. Only took a few minutes and all is well now. The re-install also gave me the opportunity to update the file system to zfs.
-
@dt-0 thank you, definitely going to give it a try the next week
-
Unclear still if you're seeing the same alerts being continually sent or just a massive number of alerts.
Have you been able to check the dashboard when this happens yet?Steve
-
@stephenw10 I'm receiving the same alerts over and over. Even today, my pppoe went down only 1 time in 6 minutes, but I received the same email every 10 seconds or so.
-
Check the file: /var/db/notices_lastmsg.txt
That file should store the last message sent and prevent sending the same message twice.
Steve
-
@stephenw10 at 7:09 I unplugged ups and wan2 and started flooding.
First email I received is from the UPS, followed by the wan2 down:
The file notices_lastmsg.txt is updated with:MONITOR: WAN2_PPPOE has packet loss, omitting from routing group WAN_FAILOVER
The following minutes the same last email is getting repeated every 10 sec or so, here are the next 3:
[Edited]
(date in the email is the same 7:09:43, but I received them up until 7:13 before I rebooted).During the flood, the file notices_lastmsg.txt didn't change unless something new happened, only then it changed with the new notification, like at 7:12, where I plugged back in the wan2:
Now the emails are being sent also with the new notifications, they are adding up.
Afterwards I rebooted and got the last email:
This is the content of notices_lastmsg.txt now:7:09:43 MONITOR: WAN2_PPPOE has packet loss, omitting from routing group WAN_FAILOVER 142.250.184.99|xxx.xxx.xxx.xxx|WAN2_PPPOE|10.029ms|1.699ms|24%|down|highloss 7:12:47 MONITOR: WAN2_PPPOE is available now, adding to routing group WAN_FAILOVER 142.250.184.99|xxx.xxx.xxx.xxx|WAN2_PPPOE|11.602ms|0.092ms|0.0%|online|none 7:12:48 DynDNS updated IP Address on WAN2 (pppoe0) to xxx.xxx.xxx.xxx 7:14:01 pfSense is rebooting now. 7:15:18 Bootup complete
PS:
The ups came back online at 7:12:52, I've got only one email relative to it, ,in the middle of the other wan2 ones:
-
Hmm, this should prevent that happening unless the notices are sent as 'forced':
https://github.com/pfsense/pfsense/blob/RELENG_2_6_0/src/etc/inc/notices.inc#L334Probably time to open a bug report and get some developer eyes on it. I failed to replicate it here though.
Steve
-
@stephenw10 I'll briefly explain my setup and how I can replicate the problem.
-2 PPPoE WANs (tier1 and tier2).
-One ups (by cyberpower, I don't think this matter but the user @DT-0 happened to also have one).Unplug the ups , after about 5 seconds unplug the wan2. Doing so I'm able to replicate the email flood problem.
I'm going to open a bug report for now, thanks