TNSR as NAT problem with setting up forward rule
-
I set up TNSR as router and so far having some fun with that. Unfortunately I ran into a problem.
When I first setup a nat rule with "nat static mapping tcp local 192.168.2.253 8443 external WAN 443 out-to-in-only" everything works fine (even the rule works).... until I reboot.
After reboot (with saving configuration or without) I lose my complete configuration and nothing works anymore. Even the interfaces aren't setup anymore. With other nat cmds I get cli errors after rebooting. Luckily I can just restore the VM snapshot to get to the previous configuration.
Does someone has an idea how to debug this or what could be wrong?
Thanks
-
Without the logs it's difficult to say what might be happening. If you can reproduce it that easily, next time check the output of
sudo journalctl -xleu clixon-backend
and scroll up a bit and see what it logged as the reason it couldn't load the configuration.You might also look at
sudo journalctl -xleu vpp
but the error is more likely to be inclixon-backend
than in VPP. -
Something else I thought of, is your WAN static or DHCP?
It's possible it is choking on the rule if your WAN is DHCP and it doesn't have an address yet when the configuration gets loaded.
I have static NAT rules here in my edge TNSR that work fine, but they have my static external address hardcoded. That said, I'm also not using
out-to-in-only
so it's possible, though unlikely, that is a factor as well. -
Thank you Jimp!
clixon shows:
Jun 16 22:51:26 tnsrrouter clixon_backend[1180]: VPP stop detected
Jun 16 22:51:29 tnsrrouter clixon_backend[1180]: VPP start detected
Jun 16 22:51:32 tnsrrouter clixon_backend[1180]: VPP stop detected
Jun 16 22:51:35 tnsrrouter clixon_backend[1180]: VPP start detected
Jun 16 22:51:38 tnsrrouter clixon_backend[1180]: VPP stop detected
Jun 16 22:51:41 tnsrrouter clixon_backend[1180]: VPP start detected
Jun 16 22:51:44 tnsrrouter clixon_backend[1180]: VPP stop detected
Jun 16 22:51:47 tnsrrouter clixon_backend[1180]: VPP start detectedand vpp shows:
Jun 16 22:53:07 tnsrrouter vnet[2850]: #1 0x00007f743eb9a3c0 0x7f743eb9a3c0
Jun 16 22:53:07 tnsrrouter vnet[2850]: #2 0x00007f743eee3912 ip4_interface_first_address + 0x22
Jun 16 22:53:07 tnsrrouter vnet[2850]: #3 0x00007f73fa793d38 0x7f73fa793d38
Jun 16 22:53:07 tnsrrouter vnet[2850]: #4 0x00007f73fa794c47 0x7f73fa794c47
Jun 16 22:53:07 tnsrrouter vnet[2850]: #5 0x00007f743eee40f2 0x7f743eee40f2
Jun 16 22:53:07 tnsrrouter vnet[2850]: #6 0x00007f73faa465d1 0x7f73faa465d1
Jun 16 22:53:07 tnsrrouter vnet[2850]: #7 0x00007f73faa4a81f 0x7f73faa4a81f
Jun 16 22:53:07 tnsrrouter vnet[2850]: #8 0x00007f73faada8ee 0x7f73faada8ee
Jun 16 22:53:07 tnsrrouter vnet[2850]: #9 0x00007f73faa71d0e 0x7f73faa71d0e
Jun 16 22:53:07 tnsrrouter vnet[2850]: #10 0x00007f73faad75c3 nl_cache_parse + 0x63
Jun 16 22:53:07 tnsrrouter vnet[2850]: #11 0x00007f73faadc2cb nl_msg_parse + 0x7b
Jun 16 22:53:07 tnsrrouter vnet[2850]: #12 0x00007f73faa4cdb1 0x7f73faa4cdb1
Jun 16 22:53:07 tnsrrouter vnet[2850]: #13 0x00007f743ebe4f1b 0x7f743ebe4f1b
Jun 16 22:53:07 tnsrrouter vnet[2850]: #14 0x00007f743eb7331c 0x7f743eb7331c
Jun 16 22:53:07 tnsrrouter systemd-coredump[2914]: Process 2850 (vpp_main) of user 0 dumped core.
-- Subject: Process 2850 (vpp_main) dumped core
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- Documentation: man:core(5)
-- Process 2850 (vpp_main) crashed and dumped core.
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.
Jun 16 22:53:08 tnsrrouter systemd[1]: vpp.service: Main process exited, code=dumped, status=6/ABRT
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- An ExecStart= process belonging to unit vpp.service has exited.
-- The process' exit code is 'dumped' and its exit status is 6.
Jun 16 22:53:08 tnsrrouter echo[2929]: VPP stopped, modifying TNSR startup mode
Jun 16 22:53:08 tnsrrouter echo[2930]: TNSR startup mode switch : using running DB
Jun 16 22:53:08 tnsrrouter systemd[1]: vpp.service: Failed with result 'core-dump'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- The unit vpp.service has entered the 'failed' state with result 'core-dump'.
lines 471-510/510 (END)I guest there lays the problem. Yes, my external interface is on DHCP. That's something I can't really change. As mentioned if I try other NAT rules (like without out-to-in-only) I get different effects. I can't even do CLI commands anymore.
Thank you,
Alex -
I wouldn't expect VPP to crash like that from a NAT rule, that is a bit odd.
If you scroll up (up arrow) in the journalctl output from clixon-backend and vpp was there anything more before those parts?
Sometimes clixon-backend has several sets of log entries at boot as it tries/retries to do things when it encounters a problem so you may have to scroll up a bit to find the relevant part, but you can typically tell by the timestamp.
-
Thanks for your time. Even if I scroll up I can only see this repeatedly:
Jun 17 07:33:05 tnsrrouter vpp[2369]: /usr/bin/vpp[2369]: perfmon: skipping source 'intel-uncore' - intel_uncore_init: no uncore units found
Jun 17 07:33:05 tnsrrouter vpp[2369]: /usr/bin/vpp[2369]: perfmon: skipping source 'intel-core' - intel_core_init: not a IA-32 CPU
Jun 17 07:33:05 tnsrrouter /usr/bin/vpp[2369]: perfmon: skipping source 'intel-uncore' - intel_uncore_init: no uncore units found
Jun 17 07:33:05 tnsrrouter /usr/bin/vpp[2369]: perfmon: skipping source 'intel-core' - intel_core_init: not a IA-32 CPU
Jun 17 07:33:12 tnsrrouter vnet[2453]: dpdk/cryptodev: dpdk_cryptodev_init: Failed to configure cryptodev
Jun 17 07:33:12 tnsrrouter vnet[2453]: vat-plug/load: vat_plugin_register: oddbuf plugin not loaded...
Jun 17 07:33:12 tnsrrouter vnet[2453]: vat-plug/load: vat_plugin_register: ikev2 plugin not loaded...
Jun 17 07:33:16 tnsrrouter vnet[2453]: received signal SIGSEGV, PC 0x7fc1e2e52912, faulting address 0x0
Jun 17 07:33:16 tnsrrouter vnet[2453]: #0 0x00007fc1e2baa9b8 0x7fc1e2baa9b8
Jun 17 07:33:16 tnsrrouter vnet[2453]: #1 0x00007fc1e2b093c0 0x7fc1e2b093c0
Jun 17 07:33:16 tnsrrouter vnet[2453]: #2 0x00007fc1e2e52912 ip4_interface_first_address + 0x22
Jun 17 07:33:16 tnsrrouter vnet[2453]: #3 0x00007fc19e702d38 0x7fc19e702d38
Jun 17 07:33:16 tnsrrouter vnet[2453]: #4 0x00007fc19e703c47 0x7fc19e703c47
Jun 17 07:33:16 tnsrrouter vnet[2453]: #5 0x00007fc1e2e530f2 0x7fc1e2e530f2
Jun 17 07:33:16 tnsrrouter vnet[2453]: #6 0x00007fc19e9b55d1 0x7fc19e9b55d1
Jun 17 07:33:16 tnsrrouter vnet[2453]: #7 0x00007fc19e9b981f 0x7fc19e9b981f
Jun 17 07:33:16 tnsrrouter vnet[2453]: #8 0x00007fc19ea498ee 0x7fc19ea498ee
Jun 17 07:33:16 tnsrrouter vnet[2453]: #9 0x00007fc19e9e0d0e 0x7fc19e9e0d0e
Jun 17 07:33:16 tnsrrouter vnet[2453]: #10 0x00007fc19ea465c3 nl_cache_parse + 0x63
Jun 17 07:33:16 tnsrrouter vnet[2453]: #11 0x00007fc19ea4b2cb nl_msg_parse + 0x7b
Jun 17 07:33:16 tnsrrouter vnet[2453]: #12 0x00007fc19e9bbdb1 0x7fc19e9bbdb1
Jun 17 07:33:16 tnsrrouter vnet[2453]: #13 0x00007fc1e2b53f1b 0x7fc1e2b53f1b
Jun 17 07:33:16 tnsrrouter vnet[2453]: #14 0x00007fc1e2ae231c 0x7fc1e2ae231c
Jun 17 07:33:16 tnsrrouter systemd-coredump[2519]: Process 2453 (vpp_main) of user 0 dumped core.before the nat rule + reboot the vpp log seems to be completely empty.
Just for completeness this is not an Intel system (AMD Epyc). -
It is even more odd now. I have it working the only thing I needed to change was the number of CPU assigned. The VM is using 4 threads so I did "dataplane cpu main-core 3" and this prevents the crashing when I reboot after the NAT rule.
I still have this message after a reboot in the log though:
Jun 18 07:11:40 tnsrrouter vpp[1044]: /usr/bin/vpp[1044]: perfmon: skipping source 'intel-uncore' - intel_uncore_init: no uncore units found
Jun 18 07:11:40 tnsrrouter vpp[1044]: /usr/bin/vpp[1044]: perfmon: skipping source 'intel-core' - intel_core_init: not a IA-32 CPU
Jun 18 07:11:40 tnsrrouter /usr/bin/vpp[1044]: perfmon: skipping source 'intel-uncore' - intel_uncore_init: no uncore units found
Jun 18 07:11:40 tnsrrouter /usr/bin/vpp[1044]: perfmon: skipping source 'intel-core' - intel_core_init: not a IA-32 CPU