Issue after setting dataplane workers 1>



  • Hi there,

    We were experimenting with some of the features and wanted to see what multiple cores would do. After setting the dataplane workers to 2, the vpp configuration kind of lost it.

    At this point we are not able to use the cli anymore ( tnsr.sock is not present ), and VPP does not seem to pickup the interfaces anymore.

    Is this a known thing? Am I doing something wrong?

    Regards,

    Robert


  • Rebel Alliance Developer Netgate

    We have seen some issues with >1 worker but nothing like that.

    How did you set the worker count? Are you certain you used the correct syntax?

    We have TNSR CLI commands to set the workers in 19.05 which is due out shortly, before that you'd have to modify the VPP startup config by hand, which could be easy to do incorrectly.



  • I set the workers by doing :

    dataplane cpu workers 2

    Which funny enough spawns 3 processes. But which process creates the tnsr.sock?


  • Rebel Alliance Developer Netgate

    The worker threads are extra processes above the main thread. The main thread is always there, handling the main set of tasks, and workers can be used to spread the load.

    On 19.05 the functionality has been improved a bit, you can set CPU affinity and you have more control over how workers are allocated.

    With no workers set, you'll see this:

    tnsr# show dataplane cpu 
      threads               Threads
    master tnsr# show dataplane cpu threads 
    ID Name     Type PID   LCore Core Socket
    -- -------- ---- ----- ----- ---- ------
     0 vpp_main      27301     1    0      0 
    tnsr# 
    

    And then if you increase the worker count:

    tnsr(config)# dataplane cpu workers 2
    tnsr(config)# service dataplane restart
    tnsr(config)# show dataplane cpu threads
    ID Name     Type    PID   LCore Core Socket
    -- -------- ------- ----- ----- ---- ------
     0 vpp_main         28921     1    0      0 
     1 vpp_wk_0 workers 28943     0    2      0 
     2 vpp_wk_1 workers 28944     2    8      0 
    

    The docs for 19.05 also cover this more thoroughly.

    I haven't seen it behave as you describe, however. Can you share the contents of your /etc/vpp/startup.conf?



  • I just checked and my version is tnsr-v19.02.1-1.. I guess that explains some of the weirdness

    My current startup config :

    unix {
    nodaemon
    log /tmp/vpp.log
    full-coredump
    cli-listen /run/vpp/cli.sock
    gid vpp
    }

    statseg {
    socket-name /run/vpp/stats.sock
    }

    api-trace {
    on
    }

    api-segment {
    gid vpp
    }

    nat {
    endpoint-dependent
    }


  • Rebel Alliance Developer Netgate

    19.05 isn't out yet, but will be very soon :-)

    That config looks fine, does it look the same with the extra workers? It should only differ by the addition of a CPU stanza with the workers line.



  • @jimp said in Issue after setting dataplane workers 1>:

    does it look the same with the extra workers? It should only differ by the addition of a CPU stanza with the workers line.

    Yes thats the only difference. But I can't figure out why the tnsr.sock and tnsr.pid are gone. I also removed the workers configuration from the running and startup files. After a reboot still the same behaviour :(


  • Rebel Alliance Developer Netgate

    Do you get any errors when stopping or starting any of the individual TNSR services? See https://docs.netgate.com/tnsr/en/latest/basics/starting-tnsr.html for more info.

    Any errors in the logs? (sudo journalctl -xe)



  • There are definitly some errors.

    -- Unit clixon-backend.service has begun starting up. May 29 22:34:10 packetblaster clixon_backend[20449]: Version: tnsr-v19.02.1-1 May 29 22:34:10 packetblaster clixon_backend[20449]: Build timestamp: Thu Mar 28 14:00:12 2019 CDT May 29 22:34:10 packetblaster clixon_backend[20449]: Git Commit: 0x8b47d140 May 29 22:34:10 packetblaster clixon_backend[20449]: Expires on: Fri Jul 26 21:00:12 2019 May 29 22:34:10 packetblaster clixon_backend[20449]: This TNSR instance is not configured for package updates. May 29 22:34:10 packetblaster clixon_backend[20449]: For information see http://www.netgate.com/docs/tnsr/updating/index.html May 29 22:34:10 packetblaster clixon_backend[20449]: cfg_event_init: Config event processing is active May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: cfg_event_init: Config event processing is active May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: master: current caps: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowMay 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: master: Preserved capabilities May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: besd_init: plugin state data initialized May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: cfg_backend_check_start_time: system boot: 1559042601, VPP start: 1559042615, May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: cfg_backend_check_start_time: clixon_backend start state: system already has rMay 29 22:34:10 packetblaster clixon_backend[20449]: master: current caps: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,caMay 29 22:34:10 packetblaster clixon_backend[20449]: master: Preserved capabilities May 29 22:34:10 packetblaster clixon_backend[20449]: besd_init: plugin state data initialized May 29 22:34:10 packetblaster clixon_backend[20449]: cfg_backend_check_start_time: system boot: 1559042601, VPP start: 1559042615, cfg backend last May 29 22:34:10 packetblaster clixon_backend[20449]: cfg_backend_check_start_time: clixon_backend start state: system already has running configuratMay 29 22:34:10 packetblaster clixon_backend[20449]: tnsr_err_report: 236: Config error: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: tnsr_err_report: 236: Config error: Plugin: vpp, Module: interface, Object: TeMay 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: startup_mode_startup: Commit of startup failed, exiting: Plugin: vpp, Module: May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: clixon_backend: 20449 Terminated retval:-1 May 29 22:34:10 packetblaster clixon_backend[20449]: startup_mode_startup: Commit of startup failed, exiting: Plugin: vpp, Module: interface, ObjectMay 29 22:34:10 packetblaster clixon_backend[20449]: clixon_backend: 20449 Terminated retval:-1 May 29 22:34:10 packetblaster clixon_backend[20449]: cfg_event_shutdown: Config event processing has stopped May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: cfg_event_shutdown: Config event processing has stopped May 29 22:34:10 packetblaster clixon_backend[20449]: os_priv_change: changing uid from 0 to 0 May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: os_priv_change: changing uid from 0 to 0 May 29 22:34:10 packetblaster clixon_backend[20449]: os_priv_change: changing uid from 0 to 0 May 29 22:34:10 packetblaster clixon_backend[20449]: May 29 22:34:10: os_priv_change: changing uid from 0 to 0 May 29 22:34:10 packetblaster systemd[1]: clixon-backend.service: control process exited, code=exited status=255 May 29 22:34:10 packetblaster systemd[1]: Failed to start Clixon backend. -- Subject: Unit clixon-backend.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit clixon-backend.service has failed.

    The interface are there.. But it seems that the UIO driver is not being loaded

    [root@packetblaster ~]# lshw -class network -businfo
    Bus info Device Class Description

    pci@0000:01:00.0 network Ethernet Controller X710 for 10GbE SFP+
    pci@0000:01:00.1 network Ethernet Controller X710 for 10GbE SFP+
    pci@0000:04:00.0 eno2 network I210 Gigabit Network Connection
    pci@0000:03:00.0 eno1 network I210 Gigabit Network Connection


  • Rebel Alliance Developer Netgate

    Unfortunately the formatting there makes things hard to read, but it also looks like the important part of the error(s) is not visible.

    For example, this line cuts off:

    May 29 22:34:10 packetblaster clixon_backend[20449]: tnsr_err_report: 236: Config error: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1
    

    Can you check the logs again and see if you can use sudo journalctl -xe | less to get the full text. When you post it here, use the code button </> and put the log text inside to make it easier to read.



  • Hi, here is the better formatted version.

    May 31 09:29:57 packetblaster clixon_backend[16334]: Version: tnsr-v19.02.1-1                                                                                                                 May 31 09:29:57 packetblaster clixon_backend[16334]: Build timestamp: Thu Mar 28 14:00:12 2019 CDT                                                                                            May 31 09:29:57 packetblaster clixon_backend[16334]: Git Commit: 0x8b47d140                                                                                                                   May 31 09:29:57 packetblaster clixon_backend[16334]: cfg_event_init: Config event processing is active                                                                                        May 31 09:29:57 packetblaster clixon_backend[16334]: Expires on: Fri Jul 26 21:00:12 2019                                                                                                     May 31 09:29:57 packetblaster clixon_backend[16334]: This TNSR instance is not configured for package updates.                                                                                May 31 09:29:57 packetblaster clixon_backend[16334]: For information see http://www.netgate.com/docs/tnsr/updating/index.html                                                                 May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: cfg_event_init: Config event processing is active                                                                       May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: master: current caps: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36+ep                                                                                                                                                                    May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: master: Preserved capabilities                                                                                          May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: besd_init: plugin state data initialized                                                                                May 31 09:29:57 packetblaster clixon_backend[16334]: master: current caps: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36+ep                                                                                                                                                                                     May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: cfg_backend_check_start_time: system boot: 1559042602, VPP start: 1559042616, cfg backend last start: 1559287791        May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: cfg_backend_check_start_time: clixon_backend start state: system already has running configuration applied (1)          May 31 09:29:57 packetblaster clixon_backend[16334]: master: Preserved capabilities                                                                                                           May 31 09:29:57 packetblaster clixon_backend[16334]: besd_init: plugin state data initialized                                                                                                 May 31 09:29:57 packetblaster clixon_backend[16334]: cfg_backend_check_start_time: system boot: 1559042602, VPP start: 1559042616, cfg backend last start: 1559287791                         May 31 09:29:57 packetblaster clixon_backend[16334]: cfg_backend_check_start_time: clixon_backend start state: system already has running configuration applied (1)                           May 31 09:29:57 packetblaster clixon_backend[16334]: tnsr_err_report: 236: Config error: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found                                                                                                                                                 May 31 09:29:57 packetblaster clixon_backend[16334]: startup_mode_startup: Commit of startup failed, exiting: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found.                                                                                                                           May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: tnsr_err_report: 236: Config error: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found                                                                                                                                May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: startup_mode_startup: Commit of startup failed, exiting: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found.                                                                                                          May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: clixon_backend: 16334 Terminated retval:-1                                                                              May 31 09:29:57 packetblaster clixon_backend[16334]: clixon_backend: 16334 Terminated retval:-1                                                                                               May 31 09:29:57 packetblaster clixon_backend[16334]: cfg_event_shutdown: Config event processing has stopped                                                                                  May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: cfg_event_shutdown: Config event processing has stopped                                                                 May 31 09:29:57 packetblaster clixon_backend[16334]: os_priv_change: changing uid from 0 to 0                                                                                                 May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: os_priv_change: changing uid from 0 to 0                                                                                May 31 09:29:57 packetblaster clixon_backend[16334]: os_priv_change: changing uid from 0 to 0                                                                                                 May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: os_priv_change: changing uid from 0 to 0                                                                                May 31 09:29:57 packetblaster systemd[1]: clixon-backend.service: control process exited, code=exited status=255                                                                              May 31 09:29:57 packetblaster systemd[1]: Failed to start Clixon backend.                                                                                                                     -- Subject: Unit clixon-backend.service has failed                                                                                                                                            -- Defined-By: systemd                                                                                                                                                                        -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel                                                                                                                       --                                                                                                                                                                                            -- Unit clixon-backend.service has failed.                                         
    

    Edit : Looks like one long line right now :D


  • LAYER 8 Global Moderator

    how is this

    May 31 09:29:57 packetblaster clixon_backend[16334]: Version: tnsr-v19.02.1-1
    May 31 09:29:57 packetblaster clixon_backend[16334]: Build timestamp: Thu Mar 28 14:00:12 2019 CDT
    May 31 09:29:57 packetblaster clixon_backend[16334]: Git Commit: 0x8b47d140
    May 31 09:29:57 packetblaster clixon_backend[16334]: cfg_event_init: Config event processing is active
    May 31 09:29:57 packetblaster clixon_backend[16334]: Expires on: Fri Jul 26 21:00:12 2019
    May 31 09:29:57 packetblaster clixon_backend[16334]: This TNSR instance is not configured for package updates
    May 31 09:29:57 packetblaster clixon_backend[16334]: For information see http://www.netgate.com/docs/tnsr/updating/index.html
    May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: cfg_event_init: Config event processing is active
    May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: master: current caps: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36+ep
    May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: master: Preserved capabilities
    May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: besd_init: plugin state data initialized
    May 31 09:29:57 packetblaster clixon_backend[16334]: master: current caps: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36+ep
    May 31 09:29:57 packetblaster clixon_backend[16334]:
    May 31 09:29:57: cfg_backend_check_start_time: system boot: 1559042602, VPP start: 1559042616, cfg backend last start: 1559287791
    May 31 09:29:57 packetblaster clixon_backend[16334]:
    May 31 09:29:57: cfg_backend_check_start_time: clixon_backend start state: system already has running configuration applied (1)
    May 31 09:29:57 packetblaster clixon_backend[16334]: master: Preserved capabilities
    May 31 09:29:57 packetblaster clixon_backend[16334]: besd_init: plugin state data initialized
    May 31 09:29:57 packetblaster clixon_backend[16334]: cfg_backend_check_start_time: system boot: 1559042602, VPP start: 1559042616, cfg backend last start: 1559287791
    May 31 09:29:57 packetblaster clixon_backend[16334]: cfg_backend_check_start_time: clixon_backend start state: system already has running configuration applied (1)
    May 31 09:29:57 packetblaster clixon_backend[16334]: tnsr_err_report: 236: Config error: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found
    May 31 09:29:57 packetblaster clixon_backend[16334]: startup_mode_startup: Commit of startup failed, exiting: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found.
    May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: tnsr_err_report: 236: Config error: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found
    May 31 09:29:57 packetblaster clixon_backend[16334]: 
    May 31 09:29:57: startup_mode_startup: Commit of startup failed, exiting: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found.
    May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: clixon_backend: 16334 Terminated retval:-1
    May 31 09:29:57 packetblaster clixon_backend[16334]: clixon_backend: 16334 Terminated retval:-1
    May 31 09:29:57 packetblaster clixon_backend[16334]: cfg_event_shutdown: Config event processing has stopped
    May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: cfg_event_shutdown: Config event processing has stopped
    May 31 09:29:57 packetblaster clixon_backend[16334]: os_priv_change: changing uid from 0 to 0
    May 31 09:29:57 packetblaster clixon_backend[16334]: 
    May 31 09:29:57: os_priv_change: changing uid from 0 to 0
    May 31 09:29:57 packetblaster clixon_backend[16334]: os_priv_change: changing uid from 0 to 0
    May 31 09:29:57 packetblaster clixon_backend[16334]: May 31 09:29:57: os_priv_change: changing uid from 0 to 0
    May 31 09:29:57 packetblaster systemd[1]: clixon-backend.service: control process exited, code=exited status=255
    May 31 09:29:57 packetblaster systemd[1]: Failed to start Clixon backend.
    -- Subject: Unit clixon-backend.service has failed
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    -- Unit clixon-backend.service has failed.    
    

    Learning to format is your friend ;)

    You had a HUGE amount of spaces in there vs returns.. I fixed it.


  • Rebel Alliance Developer Netgate

    @johnpoz said in Issue after setting dataplane workers 1>:

    May 31 09:29:57: startup_mode_startup: Commit of startup failed, exiting: Plugin: vpp, Module: interface, Object: TenGigabitEthernet1/0/0, Operation: add, Error message: Invalid interface, Error info: Interface not found.

    So it can't find that interface, which is why it fails to start up. When you can get into the CLI again you might try explicitly configuring the network interfaces in the dataplane, https://docs.netgate.com/tnsr/en/latest/setup/setup-vpp-interfaces.html -- The automatic whitelisting may not be working right on your hardware.

    Though 19.05 is out now, you should probably get that and use it.



  • @jimp said in Issue after setting dataplane workers 1>:

    When you can get into the CLI again you might try explicitly configuring the network interfaces in the dataplane,

    Hi,

    @johnpoz thanks for that ;)

    @jimp I tried that ( setting the interfaces again in the cli ). But as soon as I want to set something it starts about the tnsr.sock missing. Basicly with every command.

    May 31 16:53:48: clicon_rpc_connect_unix: 409: Protocol error: /var/tnsr/tnsr.sock: config daemon not running?: No such file or directory
    Protocol error: /var/tnsr/tnsr.sock: config daemon not running?: No such file or directory
    

  • Rebel Alliance Developer Netgate

    You might need to clear out the config manually (sudo rm /var/tnsr/*) and then restart the services manually again: https://docs.netgate.com/tnsr/en/latest/basics/starting-tnsr.html#manual-tnsr-service-operations -- you could grab the contents of those db files if you want to keep the old config.

    Something else besides the worker count had to have changed if it's failing to find an interface that used to be there.



  • @jimp This did the trick. I cleared out the /var/tnsr/ and restarted manually. Now the CLI is usable again. Thanks!

    Btw, how can I get the latest version? I requested for a new trial version in order to get it.. But I'm not sure if this is the way to go.


  • Rebel Alliance Developer Netgate

    If you submit a request for a new trial, someone should be in touch with you to work out the details. New trials are available, but the automated request mechanism was deactivated.



  • Thank you @jimp . I'll just wait for a reaction :)


Log in to reply