pfSense email flood issue

Samlink

@stephenw10 yes, my wan2 went down for some time, but it stayed down for at least 10 minutes. Usually I receive 1 email when this happens, but sometimes the emails does not stop until I reboot the firewall.

stephenw10

You you do see the gateway logs showing WAN2 going up and down numerous times when this happens? Does the system log show anything? Like maybe the link itself is flapping?

Steve

Samlink

@stephenw10 Sorry, didn't really understand the question the first time.
I have a bunch of "Link: DOWN" (15 in total) until it went up again:

May 12 11:56:21 link ppp[99924]: [opt1_link0] LCP: no reply to 1 echo request(s)       May 12 11:56:31 link ppp[99924]: [opt1_link0] LCP: no reply to 2 echo request(s)       May 12 11:56:41 link ppp[99924]: [opt1_link0] LCP: no reply to 3 echo request(s)       May 12 11:56:51 link ppp[99924]: [opt1_link0] LCP: no reply to 4 echo request(s)       May 12 11:57:01 link ppp[99924]: [opt1_link0] LCP: no reply to 5 echo request(s)       May 12 11:57:01 link ppp[99924]: [opt1_link0] LCP: peer not responding to echo requestsMay 12 11:57:01 link ppp[99924]: [opt1_link0] LCP: state change Opened --> Stopping    May 12 11:57:01 link ppp[99924]: [opt1_link0] Link: Leave bundle "opt1"                May 12 11:57:01 link ppp[99924]: [opt1] Bundle: Status update: up 0 links, total bandwidth 9600 bps                                                                           May 12 11:57:01 link ppp[99924]: [opt1] IPCP: Close event                              May 12 11:57:01 link ppp[99924]: [opt1] IPCP: state change Opened --> Closing          May 12 11:57:01 link ppp[99924]: [opt1] IPCP: SendTerminateReq #4                      May 12 11:57:01 link ppp[99924]: [opt1] IPCP: LayerDown                                May 12 11:57:01 link ppp[99924]: [opt1] IFACE: Down event                              May 12 11:57:01 link ppp[99924]: [opt1] IFACE: Rename interface pppoe0 to pppoe0       May 12 11:57:01 link ppp[99924]: [opt1] IFACE: Set description "WAN2"                  May 12 11:57:01 link ppp[99924]: [opt1] IPCP: Down event                               May 12 11:57:01 link ppp[99924]: [opt1] IPCP: LayerFinish                              May 12 11:57:01 link ppp[99924]: [opt1] Bundle: No NCPs left. Closing links...         May 12 11:57:01 link ppp[99924]: [opt1] IPCP: state change Closing --> Initial         May 12 11:57:01 link ppp[99924]: [opt1_link0] LCP: SendTerminateReq #2                 May 12 11:57:01 link ppp[99924]: [opt1_link0] LCP: LayerDown                           May 12 11:57:03 link ppp[99924]: [opt1_link0] LCP: SendTerminateReq #3                 May 12 11:57:05 link ppp[99924]: [opt1_link0] LCP: state change Stopping --> Stopped   May 12 11:57:05 link ppp[99924]: [opt1_link0] LCP: LayerFinish                         May 12 11:57:05 link ppp[99924]: [opt1_link0] PPPoE: connection closed                 May 12 11:57:05 link ppp[99924]: [opt1_link0] Link: DOWN event                         May 12 11:57:05 link ppp[99924]: [opt1_link0] LCP: Down event                          May 12 11:57:05 link ppp[99924]: [opt1_link0] LCP: state change Stopped --> Starting   May 12 11:57:05 link ppp[99924]: [opt1_link0] LCP: LayerStart                          May 12 11:57:05 link ppp[99924]: [opt1_link0] Link: reconnection attempt 1 in 1 secondsMay 12 11:57:06 link ppp[99924]: [opt1_link0] Link: reconnection attempt 1             May 12 11:57:06 link ppp[99924]: [opt1_link0] PPPoE: Connecting to ''                  May 12 11:57:15 link ppp[99924]: [opt1_link0] PPPoE connection timeout after 9 seconds May 12 11:57:15 link ppp[99924]: [opt1_link0] Link: DOWN event                         May 12 11:57:15 link ppp[99924]: [opt1_link0] LCP: Down event                          May 12 11:57:15 link ppp[99924]: [opt1_link0] Link: reconnection attempt 2 in 3 secondsMay 12 11:57:18 link ppp[99924]: [opt1_link0] Link: reconnection attempt 2             May 12 11:57:18 link ppp[99924]: [opt1_link0] PPPoE: Connecting to ''                  May 12 11:57:27 link ppp[99924]: [opt1_link0] PPPoE connection timeout after 9 seconds May 12 11:57:27 link ppp[99924]: [opt1_link0] Link: DOWN event                         May 12 11:57:27 link ppp[99924]: [opt1_link0] LCP: Down event                          May 12 11:57:27 link ppp[99924]: [opt1_link0] Link: reconnection attempt 3 in 2 secondsMay 12 11:57:29 link ppp[99924]: [opt1_link0] Link: reconnection attempt 3

I don't think this is considered flapping, isn't it?

What's strange is that the lines in the email are repeating, the timestamps are not changing. Instead, new notifications are adding up.

stephenw10

Yes, I agree.

Do they just continually add up or is it sending you, say, all the alerts from the last 10mins each time?

Samlink

@stephenw10 they keep adding up, the last mail had:
-pppoe disconnected
-pppoe reconnected
-updated DNS record
-the start of the reboot (for stopping this email flood)
-the completion of the bootup

stephenw10

That's still all inside the last 10mins though. Does it eventually stop sending the oldest alerts? How long does that take?

Samlink

@stephenw10 I don't know, I had to restart it.
I'm trying to replicate manually what happened today to see if this behavior keeps happening.

Another thing I didn't mentioned before: I also have telegram notifications enabled, and they are working normally, they are not being replicated.

EDIT:
it's happening again: unplugged the ups (simulating a blackout), unplugged the wan2 eth, plugged in the ups after 15 seconds and then plugged in the wan2.
It's been 12 min and I received over 60 emails.

Is there anything I can look while the problem is ongoing?

stephenw10

Do you see those in /tmp/notices?

Do you see a huge list of alerts in the GUI?

Samlink

@stephenw10 sorry I had to reboot it and didn't check this. For a little over 20 min I received the same text 98 times. They kept adding up to the same emails.

Tomorrow I'll do another try and check the tmp folder as you suggested.

Samlink

@stephenw10
I didn't found any notices file(s) or directory under /tmp.
Attached are the tmp list during the problem, after ups and wan2 reconnection and after a complete reboot.
during.txt after_reboot.txt after.txt

DT 0

Similar problem here after 2.6.0 update.

Triggered by (NUT) UPS CyberPower on battery and/or DynDNS updated IP Address on WAN (pppoe0).

Sample Gateway Log Entries:
May 18 22:17:54 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:53 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:53 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:52 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:52 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:51 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:51 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:50 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:50 dpinger 26971 WAN_PPPOE : sendto error: 65
May 18 22:17:49 dpinger 26971 WAN_PPPOE : sendto error: 65
...and on and on.

Only a reboot will stop the emails.
Also, the continuous email notification "DynDNS updated IP Address on WAN (pppoe0)" is indicating a stale WAN IP, not the current/updated one.
Also, after the 2.6.0 update, I still require the patch in order for my Dynamic DNS Clients (No-IP) to update.
I have disabled notifications for the time being.
Thanks for your help.

Samlink

Hello, any update/advice on this issue?

DT 0

My issue was solved with a fresh install of 2.6.0 and a restore of the configuration. Only took a few minutes and all is well now. The re-install also gave me the opportunity to update the file system to zfs.

Samlink

@dt-0 thank you, definitely going to give it a try the next week

stephenw10

Unclear still if you're seeing the same alerts being continually sent or just a massive number of alerts.
Have you been able to check the dashboard when this happens yet?

Steve

Samlink

@stephenw10 I'm receiving the same alerts over and over. Even today, my pppoe went down only 1 time in 6 minutes, but I received the same email every 10 seconds or so.

stephenw10

Check the file: /var/db/notices_lastmsg.txt

That file should store the last message sent and prevent sending the same message twice.

Steve

Samlink

@stephenw10 at 7:09 I unplugged ups and wan2 and started flooding.
First email I received is from the UPS, followed by the wan2 down:

The file notices_lastmsg.txt is updated with:

MONITOR: WAN2_PPPOE has packet loss, omitting from routing group WAN_FAILOVER

The following minutes the same last email is getting repeated every 10 sec or so, here are the next 3:
[Edited]
(date in the email is the same 7:09:43, but I received them up until 7:13 before I rebooted).

During the flood, the file notices_lastmsg.txt didn't change unless something new happened, only then it changed with the new notification, like at 7:12, where I plugged back in the wan2:

Now the emails are being sent also with the new notifications, they are adding up.
Afterwards I rebooted and got the last email:

This is the content of notices_lastmsg.txt now:

7:09:43 MONITOR: WAN2_PPPOE has packet loss, omitting from routing group WAN_FAILOVER
142.250.184.99|xxx.xxx.xxx.xxx|WAN2_PPPOE|10.029ms|1.699ms|24%|down|highloss
7:12:47 MONITOR: WAN2_PPPOE is available now, adding to routing group WAN_FAILOVER
142.250.184.99|xxx.xxx.xxx.xxx|WAN2_PPPOE|11.602ms|0.092ms|0.0%|online|none
7:12:48 DynDNS updated IP Address on WAN2 (pppoe0) to xxx.xxx.xxx.xxx
7:14:01 pfSense is rebooting now.
7:15:18 Bootup complete

PS:
The ups came back online at 7:12:52, I've got only one email relative to it, ,in the middle of the other wan2 ones:

stephenw10

Hmm, this should prevent that happening unless the notices are sent as 'forced':
https://github.com/pfsense/pfsense/blob/RELENG_2_6_0/src/etc/inc/notices.inc#L334

Probably time to open a bug report and get some developer eyes on it. I failed to replicate it here though.

https://redmine.pfsense.org/

Steve

Samlink

@stephenw10 I'll briefly explain my setup and how I can replicate the problem.
-2 PPPoE WANs (tier1 and tier2).
-One ups (by cyberpower, I don't think this matter but the user @DT-0 happened to also have one).

Unplug the ups , after about 5 seconds unplug the wan2. Doing so I'm able to replicate the email flood problem.

I'm going to open a bug report for now, thanks