Suricata nm_txsync_prologue vtnet0 TX0: fail

darcey

I have Suricata running, inline mode, on two vtnet interfaces. Underlying hypervisor and VM network config is similar for both interfaces. Suricata config is also similar for each interface, other than rule selection and SID drop config.

I see some netmap errors on one interface only. They occur on the interface with higher traffic and when that interface is passing most packets. Typically, when streaming in the evenings.
This sequence of two log lines occurs between 0 and 10 times per day.

Feb 13 21:10:16 fw kernel: 616.140790 [1684] nm_txsync_prologue        vtnet0 TX0: fail 'head > kring->rtail && head < kring->rhead' h 82 
c 82 t 81 rh 83 rc 83 rt 81 hc 83 ht 81
Feb 13 21:10:16 fw kernel: 616.142794 [1787] netmap_ring_reinit        called for vtnet0 TX0

AIUI the second entry is the attempt at recovery and the error itself is most commonly seen with unsynchronised mutlithread access to one netmap ring buffer. That is as far as my research has got so far.

I am not noticing any problems as such. The errors have been present since I enabled inline mode. I'd like to address the error if it's possible.
I am still on suricata 5.0.6.

bmeeks

I most definitely suggest you upgrade to the latest Suricata package. There were a number of changes in the netmap code within Suricata between version 5.0.6 and version 6.0.4. Netmap is the technology used when Inline IPS Mode is enabled.

WARNING: Do NOT upgrade Suricata until you upgrade pfSense. The 2.6.0 CE and 22.01 pfSense Plus releases were just announced. You must not upgrade Suricata until you first upgrade pfSense to the latest version!

The biggest changes made the netmap module in Suricata much more multithreading friendly. They also added the ability to use mutiple host stack rings. Prior to the 6.0.3 Suricata release, netmap host connections were limited to a single ring pair. And there was also the potential for multiple threads to access the single ring pair and step on each other's data. The error message you logged above from the nm_txsync_prologue function is a prime indicator of that happening.

darcey

@bmeeks Many thanks for helping out (again). I had high cpu utilisation issues with suricata 6 (which I also posted about a couple of months back). I'd reinstated this pfsense installation from a backup in order to revert to v5.

I thought I'd wait and see if any upstream changes or new config options emerged to address the increased CPU use, in suricata 6, on old hardware.

But now 2.6.0 is here, I may just go ahead with the upgrades and live with the higher CPU. If it bugs me too much I will restore an older pfsense. Our energy prices are set to go up over 50% in a month's time, so it's not the time to be burning more kWh!

Thanks for the caution re upgrade order. I'd forgotten that one.

bmeeks

@darcey said in Suricata nm_txsync_prologue vtnet0 TX0: fail:

@bmeeks Many thanks for helping out (again). I had high cpu utilisation issues with suricata 6 (which I also posted about a couple of months back). I'd reinstated this pfsense installation from a backup in order to revert to v5.

Yes, that increased CPU utilization is due to changes made upstream in the flow manager code beginning with the 6.x branch. There is of yet, no fix for it. It is a complicated issue.

The change is helpful on some architectures, but performs quite the opposite on a few others. I think, if I am remembering the last notes correctly from that Redmine ticket on the upstream Suricata site, that the change is beneficial to more hardware than it adversely impacts . So there is no great urgency I sense to address the issue from the upstream team.

darcey

@bmeeks Yes, that was the impression I got when I last checked on it. I also recall there was mention of exposing a new configuration variable (used as parameter in the calls to usleep()). IIRC it was suggested this would allow suricata to be made more adaptable to low and high performance HW without end user resorting to recompiling.

darcey

@bmeeks said in Suricata nm_txsync_prologue vtnet0 TX0: fail:

Yes, that increased CPU utilization is due to changes made upstream in the flow manager code beginning with the 6.x branch. There is of yet, no fix for it. It is a complicated issue.

The change is helpful on some architectures, but performs quite the opposite on a few others. I think, if I am remembering the last notes correctly from that Redmine ticket on the upstream Suricata site, that the change is beneficial to more hardware than it adversely impacts . So there is no great urgency I sense to address the issue from the upstream team.

I came across this post, on the ipfire mailing list, regarding benefits of setting suricata cpu affinity. I hope to give it a try soon. Though I think I will need more than the two cores currently allocated to the guest on which pfsense runs.

bmeeks

@darcey said in Suricata nm_txsync_prologue vtnet0 TX0: fail:

I came across this post, on the ipfire mailing list, regarding benefits of setting suricata cpu affinity. I hope to give it a try soon. Though I think I will need more than the two cores currently allocated to the guest on which pfsense runs.

Just note that there is no provision within the pfSense package GUI to make changes to CPU affinity. You will need to make those edits directly in the source code files. The suricata.yaml file for each configured interface is built from scratch and written to disk each time you start/restart Suricata in the GUI. The file is created from information stored within config.xml on the firewall. A template file is used to set basic Suricata parameters, and then the GUI adjusted configuration parameters are merged with the template file info to produce the final suricata.yaml for an interface.

So to play around with CPU affinity settings, you need to make edits to the file /usr/local/pkg/suricata/suricata_yaml_template.inc and save them. Specifically you would need to make changes in this section of that file:

# Suricata is multi-threaded. Here the threading can be influenced.
threading:
  set-cpu-affinity: no
  detect-thread-ratio: 1.0

So once edited, changes you supply in the template are merged into the final suricata.yaml produced for each configured Suricata interface. Be careful editing in that file! Suricata is very picky about the exact formatting of lines in that file, including leading spaces for indention. Be sure to make a backup of the original file before editing, so if you blow it up you can restore the original.

darcey

@bmeeks Thanks again for the help and the directions. I will certainly back up before trying the changes.
I guess, with multiple suricata processes/interfaces, statically specifying cpu affinity in suricata_yaml_template.inc will generate identical config WRT affinity for each suricata.yaml. Could that be counterproductive?
How might I first test this by editing suricata.yaml directly and restarting each suricata process, without config getting rebuilt by the package? It would be good to have the option of testing independent cpu-affinity for each process. I first need to make sure I can throw some more cpus at the vm!

bmeeks

@darcey said in Suricata nm_txsync_prologue vtnet0 TX0: fail:

@bmeeks Thanks again for the help and the directions. I will certainly back up before trying the changes.
I guess, with multiple suricata processes/interfaces, statically specifying cpu affinity in suricata_yaml_template.inc will generate identical config WRT affinity for each suricata.yaml. Could that be counterproductive?
How might I first test this by editing suricata.yaml directly and restarting each suricata process, without config getting rebuilt by the package? It would be good to have the option of testing independent cpu-affinity for each process. I first need to make sure I can throw some more cpus at the vm!

You are correct that editing the template will put the change on all processes.

The only way to start/restart a Suricata process and NOT rewrite the config is by using the CLI shell script at/usr/local/etc/rc.d/suricata.sh. You can run it with the "start", "stop" or "restart" option.

You will find the appropriate suricata.yaml for each running interface in a unique subdirectory for each interface under /usr/local/etc/suricata/. The directory names contain the physical NIC name plus a UUID identifier.

darcey

Thanks @bmeeks. I will try to determine if independent configs make a difference. If not, but some sort of explicit cpu-affinity still helps, I will make changes in the template. I will report my findings back here.

darcey

I performed a quick test of suricata running on 1 and 2 (virtually idle) interfaces, to compare default config vs explicit management cpu affinity.

pfSense guest is assigned 2 cpu and 4GB RAM.

For each combination, I waited for suricata to settle down, then noted:

CPU use of the VM process on the proxmox hypervisor top -c -p $(pgrep -d',' -f 'name fw')
CPU use of an individual suricata process on the pfsense VM top -aSH

#   suricata    interfaces      hypervisor          suricata per process cpu
    config                      (qemu proxmox)      (pfsense vm)
----------------------------------------------------------------------------
1)              0	        13%                 0%
2)  default     1               45%                 6.5%
3)  default     2               50%                 5.4%
4)  A           1               26%                 4.4%
5)  A           2               40%                 3.6%
6)  B           2               40%                 4.4%

config A)

vtnet*/suricata.yaml (same for each interface)

threading:
  set-cpu-affinity: yes
  cpu-affinity:
    - management-cpu-set:
        cpu: [ "0" ]

config B)

vtnet2/suricata.yaml

threading:
  set-cpu-affinity: yes
  cpu-affinity:
    - management-cpu-set:
        cpu: [ "0" ]

vtnet3/suricata.yaml

threading:
  set-cpu-affinity: yes
  cpu-affinity:
    - management-cpu-set:
        cpu: [ "1" ]

CPU use of each suricata process in pfsense remains fairly consistent across all configurations. However, setting a suricata management cpu affinity seems to have a marked effect on the cpu use reported for the pfsense guest (qemu process on proxmox hypervisor).
This is most noticable for 1 suricata interface, less so with two. Also specifying different cpu management affinity for each of the two processes had little effect. Though this guest is only assigned 2 CPUs anyway.
Assigning 3 CPUs, I'm guessing I would see similar improvement in 6) vs 3) as I do 4) vs 2)

Just to add, I wonder if cpu pinning in the underlying VM config on proxmox might achieve similar results?