Just happened again. Devices using the native WAN interface as a Gateway stay unaffected.
Logs (System --> General) show ntopng crashing:
May 7 17:38:10 kernel pid 15404 (ntopng), uid 0: exited on signal 11 (core dumped)
May 7 17:38:10 kernel igb2: promiscuous mode disabled
May 7 17:38:10 kernel igb3: promiscuous mode disabled
May 7 17:38:32 ntopng [HTTPserver.cpp:924] ERROR: [HTTP] set_ports_option: cannot bind to 3000s: Address already in use
May 7 17:38:32 ntopng [mongoose.c:4584] ERROR: set_ports_option: cannot bind to 3000s: No error: 0
May 7 17:38:32 ntopng [HTTPserver.cpp:1104] ERROR: Unable to start HTTP server (IPv4) on ports 3000s
May 7 17:38:32 ntopng [HTTPserver.cpp:1110] ERROR: Either port in use or another ntopng instance is running (using the same port)
Logs (System --> Gateways)
May 7 17:37:55 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr REMOVED bind_addr REMOVED identifier "WAN "
May 7 17:37:55 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr REMOVED bind_addr REMOVED identifier "VPN1 "
May 7 17:37:55 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr REMOVED bind_addr REMOVED identifier "SITETOSITE1 "
May 7 17:37:55 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr REMOVED bind_addr REMOVED identifier "SITETOSITE2 "
May 7 17:37:55 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr REMOVED bind_addr REMOVED identifier "VPN2 "
Bold Italics edited by me
Edit: ntopng is the problem. Every time I restart a gateway tunnel, ntopng crashes and NAT stops working.
Here is what the ntopng logs are filled with:
[Mutex.cpp:46] WARNING: pthread_mutex_lock() returned 11 [Resource deadlock avoided][errno=0]RAM had ~1600M free so not running out of RAM. CPU as I said was 100% on one of four cores at the time of this happening.
I uninstalled ntopng for now as it was unusable.
Edit 2: Totally not fixed. Seems to happen when I restart VPN2 but not always I think. WAN and VPN1 gateways always register as Down in Status --> Gateways even when they are up. ntopgn not the problem!
VPN2 has a NAT port forward rule with it's corresponding Firewall rule, will try to disable that and see if anything changes. Will investigate more and report back.
Edit 3: Seems to be fixed by selecting System --> Advanced --> Misc --> Reset states on Gateway down. I also had to add VPN1 Gateway in LAN Firewall Rules as Gateway as it would still not work with the Gateway set to default. I would like some input from someone if this is correct.