Issue after setting dataplane workers 1>
-
Hi there,
We were experimenting with some of the features and wanted to see what multiple cores would do. After setting the dataplane workers to 2, the vpp configuration kind of lost it.
At this point we are not able to use the cli anymore ( tnsr.sock is not present ), and VPP does not seem to pickup the interfaces anymore.
Is this a known thing? Am I doing something wrong?
Regards,
Robert
-
We have seen some issues with >1 worker but nothing like that.
How did you set the worker count? Are you certain you used the correct syntax?
We have TNSR CLI commands to set the workers in 19.05 which is due out shortly, before that you'd have to modify the VPP startup config by hand, which could be easy to do incorrectly.
-
I set the workers by doing :
dataplane cpu workers 2
Which funny enough spawns 3 processes. But which process creates the tnsr.sock?
-
The worker threads are extra processes above the main thread. The main thread is always there, handling the main set of tasks, and workers can be used to spread the load.
On 19.05 the functionality has been improved a bit, you can set CPU affinity and you have more control over how workers are allocated.
With no workers set, you'll see this:
tnsr# show dataplane cpu threads Threads master tnsr# show dataplane cpu threads ID Name Type PID LCore Core Socket -- -------- ---- ----- ----- ---- ------ 0 vpp_main 27301 1 0 0 tnsr#
And then if you increase the worker count:
tnsr(config)# dataplane cpu workers 2 tnsr(config)# service dataplane restart tnsr(config)# show dataplane cpu threads ID Name Type PID LCore Core Socket -- -------- ------- ----- ----- ---- ------ 0 vpp_main 28921 1 0 0 1 vpp_wk_0 workers 28943 0 2 0 2 vpp_wk_1 workers 28944 2 8 0
The docs for 19.05 also cover this more thoroughly.
I haven't seen it behave as you describe, however. Can you share the contents of your
/etc/vpp/startup.conf
? -
I just checked and my version is tnsr-v19.02.1-1.. I guess that explains some of the weirdness
My current startup config :
unix {
nodaemon
log /tmp/vpp.log
full-coredump
cli-listen /run/vpp/cli.sock
gid vpp
}statseg {
socket-name /run/vpp/stats.sock
}api-trace {
on
}api-segment {
gid vpp
}nat {
endpoint-dependent
} -
19.05 isn't out yet, but will be very soon :-)
That config looks fine, does it look the same with the extra workers? It should only differ by the addition of a CPU stanza with the workers line.
-
@jimp said in Issue after setting dataplane workers 1>:
does it look the same with the extra workers? It should only differ by the addition of a CPU stanza with the workers line.
Yes thats the only difference. But I can't figure out why the tnsr.sock and tnsr.pid are gone. I also removed the workers configuration from the running and startup files. After a reboot still the same behaviour :(
-
Do you get any errors when stopping or starting any of the individual TNSR services? See https://docs.netgate.com/tnsr/en/latest/basics/starting-tnsr.html for more info.
Any errors in the logs? (
sudo journalctl -xe
) -
There are definitly some errors.
-- Unit clixon-backend.service has begun starting up. May 29 22:34:10 packetblaster clixon_backend[20449]: Version: tnsr-v19.02.1-1 May 29 22:34:10 packetblaster clixon_backend[20449]: Build timestamp: Thu Mar 28 14:00:12 2019 CDT May 29 22:34:10 packetblaster clixon_backend[20449]: Git Commit: 0x8b47d140 May 29 22:34:10 packetblaster clixon_backend[20449]: Expires on: Fri Jul 26 21:00:12 2019 May 29 22:34:10 packetblaster clixon_backend[20449]: This TNSR instance is not configured for package updates. May 29 22:34:10 packetblaster clixon_backend[20449]: For information see http://www.netgate.com/docs/tnsr/updating/index.html May 29 22:34:10 packetblaster clixon_backend[20449]: cfg_event_init: Config event processing is active May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: cfg_event_init: Config event processing is active May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: master: current caps: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowMay 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: master: Preserved capabilities May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: besd_init: plugin state data initialized May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: cfg_backend_check_start_time: system boot: 1559042601, VPP start: 1559042615, May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: cfg_backend_check_start_time: clixon_backend start state: system already has rMay 29 22:34:10 packetblaster clixon_backend[20449]: master: current caps: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,caMay 29 22:34:10 packetblaster clixon_backend[20449]: master: Preserved capabilities May 29 22:34:10 packetblaster clixon_backend[20449]: besd_init: plugin state data initialized May 29 22:34:10 packetblaster clixon_backend[20449]: cfg_backend_check_start_time: system boot: 1559042601, VPP start: 1559042615, cfg backend last May 29 22:34:10 packetblaster clixon_backend[20449]: cfg_backend_check_start_time: clixon_backend start state: system already has running configuratMay 29 22:34:10 packetblaster clixon_backend[20449]: tnsr_err_report: 236: Config error: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: tnsr_err_report: 236: Config error: Plugin: vpp, Module: interface, Object: TeMay 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: startup_mode_startup: Commit of startup failed, exiting: Plugin: vpp, Module: May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: clixon_backend: 20449 Terminated retval:-1 May 29 22:34:10 packetblaster clixon_backend[20449]: startup_mode_startup: Commit of startup failed, exiting: Plugin: vpp, Module: interface, ObjectMay 29 22:34:10 packetblaster clixon_backend[20449]: clixon_backend: 20449 Terminated retval:-1 May 29 22:34:10 packetblaster clixon_backend[20449]: cfg_event_shutdown: Config event processing has stopped May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: cfg_event_shutdown: Config event processing has stopped May 29 22:34:10 packetblaster clixon_backend[20449]: os_priv_change: changing uid from 0 to 0 May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: os_priv_change: changing uid from 0 to 0 May 29 22:34:10 packetblaster clixon_backend[20449]: os_priv_change: changing uid from 0 to 0 May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: os_priv_change: changing uid from 0 to 0 May 29 22:34:10 packetblaster systemd[1]: clixon-backend.service: control process exited, code=exited status=255 May 29 22:34:10 packetblaster systemd[1]: Failed to start Clixon backend. -- Subject: Unit clixon-backend.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit clixon-backend.service has failed.
The interface are there.. But it seems that the UIO driver is not being loaded
[root@packetblaster ~]# lshw -class network -businfo
Bus info Device Class Descriptionpci@0000:01:00.0 network Ethernet Controller X710 for 10GbE SFP+
pci@0000:01:00.1 network Ethernet Controller X710 for 10GbE SFP+
pci@0000:04:00.0 eno2 network I210 Gigabit Network Connection
pci@0000:03:00.0 eno1 network I210 Gigabit Network Connection -
Unfortunately the formatting there makes things hard to read, but it also looks like the important part of the error(s) is not visible.
For example, this line cuts off:
May 29 22:34:10 packetblaster clixon_backend[20449]: tnsr_err_report: 236: Config error: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1
Can you check the logs again and see if you can use
sudo journalctl -xe | less
to get the full text. When you post it here, use the code button</>
and put the log text inside to make it easier to read. -
Hi, here is the better formatted version.
May 31 09:29:57 packetblaster clixon_backend[16334]: Version: tnsr-v19.02.1-1 May 31 09:29:57 packetblaster clixon_backend[16334]: Build timestamp: Thu Mar 28 14:00:12 2019 CDT May 31 09:29:57 packetblaster clixon_backend[16334]: Git Commit: 0x8b47d140 May 31 09:29:57 packetblaster clixon_backend[16334]: cfg_event_init: Config event processing is active May 31 09:29:57 packetblaster clixon_backend[16334]: Expires on: Fri Jul 26 21:00:12 2019 May 31 09:29:57 packetblaster clixon_backend[16334]: This TNSR instance is not configured for package updates. May 31 09:29:57 packetblaster clixon_backend[16334]: For information see http://www.netgate.com/docs/tnsr/updating/index.html May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: cfg_event_init: Config event processing is active May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: master: current caps: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36+ep May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: master: Preserved capabilities May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: besd_init: plugin state data initialized May 31 09:29:57 packetblaster clixon_backend[16334]: master: current caps: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36+ep May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: cfg_backend_check_start_time: system boot: 1559042602, VPP start: 1559042616, cfg backend last start: 1559287791 May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: cfg_backend_check_start_time: clixon_backend start state: system already has running configuration applied (1) May 31 09:29:57 packetblaster clixon_backend[16334]: master: Preserved capabilities May 31 09:29:57 packetblaster clixon_backend[16334]: besd_init: plugin state data initialized May 31 09:29:57 packetblaster clixon_backend[16334]: cfg_backend_check_start_time: system boot: 1559042602, VPP start: 1559042616, cfg backend last start: 1559287791 May 31 09:29:57 packetblaster clixon_backend[16334]: cfg_backend_check_start_time: clixon_backend start state: system already has running configuration applied (1) May 31 09:29:57 packetblaster clixon_backend[16334]: tnsr_err_report: 236: Config error: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found May 31 09:29:57 packetblaster clixon_backend[16334]: startup_mode_startup: Commit of startup failed, exiting: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found. May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: tnsr_err_report: 236: Config error: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: startup_mode_startup: Commit of startup failed, exiting: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found. May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: clixon_backend: 16334 Terminated retval:-1 May 31 09:29:57 packetblaster clixon_backend[16334]: clixon_backend: 16334 Terminated retval:-1 May 31 09:29:57 packetblaster clixon_backend[16334]: cfg_event_shutdown: Config event processing has stopped May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: cfg_event_shutdown: Config event processing has stopped May 31 09:29:57 packetblaster clixon_backend[16334]: os_priv_change: changing uid from 0 to 0 May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: os_priv_change: changing uid from 0 to 0 May 31 09:29:57 packetblaster clixon_backend[16334]: os_priv_change: changing uid from 0 to 0 May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: os_priv_change: changing uid from 0 to 0 May 31 09:29:57 packetblaster systemd[1]: clixon-backend.service: control process exited, code=exited status=255 May 31 09:29:57 packetblaster systemd[1]: Failed to start Clixon backend. -- Subject: Unit clixon-backend.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit clixon-backend.service has failed.
Edit : Looks like one long line right now :D
-
how is this
May 31 09:29:57 packetblaster clixon_backend[16334]: Version: tnsr-v19.02.1-1 May 31 09:29:57 packetblaster clixon_backend[16334]: Build timestamp: Thu Mar 28 14:00:12 2019 CDT May 31 09:29:57 packetblaster clixon_backend[16334]: Git Commit: 0x8b47d140 May 31 09:29:57 packetblaster clixon_backend[16334]: cfg_event_init: Config event processing is active May 31 09:29:57 packetblaster clixon_backend[16334]: Expires on: Fri Jul 26 21:00:12 2019 May 31 09:29:57 packetblaster clixon_backend[16334]: This TNSR instance is not configured for package updates May 31 09:29:57 packetblaster clixon_backend[16334]: For information see http://www.netgate.com/docs/tnsr/updating/index.html May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: cfg_event_init: Config event processing is active May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: master: current caps: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36+ep May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: master: Preserved capabilities May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: besd_init: plugin state data initialized May 31 09:29:57 packetblaster clixon_backend[16334]: master: current caps: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36+ep May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: cfg_backend_check_start_time: system boot: 1559042602, VPP start: 1559042616, cfg backend last start: 1559287791 May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: cfg_backend_check_start_time: clixon_backend start state: system already has running configuration applied (1) May 31 09:29:57 packetblaster clixon_backend[16334]: master: Preserved capabilities May 31 09:29:57 packetblaster clixon_backend[16334]: besd_init: plugin state data initialized May 31 09:29:57 packetblaster clixon_backend[16334]: cfg_backend_check_start_time: system boot: 1559042602, VPP start: 1559042616, cfg backend last start: 1559287791 May 31 09:29:57 packetblaster clixon_backend[16334]: cfg_backend_check_start_time: clixon_backend start state: system already has running configuration applied (1) May 31 09:29:57 packetblaster clixon_backend[16334]: tnsr_err_report: 236: Config error: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found May 31 09:29:57 packetblaster clixon_backend[16334]: startup_mode_startup: Commit of startup failed, exiting: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found. May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: tnsr_err_report: 236: Config error: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: startup_mode_startup: Commit of startup failed, exiting: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found. May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: clixon_backend: 16334 Terminated retval:-1 May 31 09:29:57 packetblaster clixon_backend[16334]: clixon_backend: 16334 Terminated retval:-1 May 31 09:29:57 packetblaster clixon_backend[16334]: cfg_event_shutdown: Config event processing has stopped May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: cfg_event_shutdown: Config event processing has stopped May 31 09:29:57 packetblaster clixon_backend[16334]: os_priv_change: changing uid from 0 to 0 May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: os_priv_change: changing uid from 0 to 0 May 31 09:29:57 packetblaster clixon_backend[16334]: os_priv_change: changing uid from 0 to 0 May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: os_priv_change: changing uid from 0 to 0 May 31 09:29:57 packetblaster systemd[1]: clixon-backend.service: control process exited, code=exited status=255 May 31 09:29:57 packetblaster systemd[1]: Failed to start Clixon backend. -- Subject: Unit clixon-backend.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- Unit clixon-backend.service has failed.
Learning to format is your friend ;)
You had a HUGE amount of spaces in there vs returns.. I fixed it.
-
@johnpoz said in Issue after setting dataplane workers 1>:
May 31 09:29:57: startup_mode_startup: Commit of startup failed, exiting: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found.
So it can't find that interface, which is why it fails to start up. When you can get into the CLI again you might try explicitly configuring the network interfaces in the dataplane, https://docs.netgate.com/tnsr/en/latest/setup/setup-vpp-interfaces.html -- The automatic whitelisting may not be working right on your hardware.
Though 19.05 is out now, you should probably get that and use it.
-
@jimp said in Issue after setting dataplane workers 1>:
When you can get into the CLI again you might try explicitly configuring the network interfaces in the dataplane,
Hi,
@johnpoz thanks for that ;)
@jimp I tried that ( setting the interfaces again in the cli ). But as soon as I want to set something it starts about the tnsr.sock missing. Basicly with every command.
May 31 16:53:48: clicon_rpc_connect_unix: 409: Protocol error: /var/tnsr/tnsr.sock: config daemon not running?: No such file or directory Protocol error: /var/tnsr/tnsr.sock: config daemon not running?: No such file or directory
-
You might need to clear out the config manually (
sudo rm /var/tnsr/*
) and then restart the services manually again: https://docs.netgate.com/tnsr/en/latest/basics/starting-tnsr.html#manual-tnsr-service-operations -- you could grab the contents of those db files if you want to keep the old config.Something else besides the worker count had to have changed if it's failing to find an interface that used to be there.
-
@jimp This did the trick. I cleared out the /var/tnsr/ and restarted manually. Now the CLI is usable again. Thanks!
Btw, how can I get the latest version? I requested for a new trial version in order to get it.. But I'm not sure if this is the way to go.
-
If you submit a request for a new trial, someone should be in touch with you to work out the details. New trials are available, but the automated request mechanism was deactivated.
-
Thank you @jimp . I'll just wait for a reaction :)