Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Softflowd: 2 instances on every interface after a while + v5 logs in the future

    Traffic Monitoring
    1
    1
    1.0k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A
      apu
      last edited by

      Installed and configured softflowd to send logs to a graylog system, using NetFlow version 5 graylog collector and version 5 configured in pfSense softflowd GUI config.

      Both pfSense and softflowd are current (2.3.2_1 and pfSense-pkg-softflowd-1.2.1_2/softflowd-0.9.8_2 respectively)

      Discovered 2 issues:

      1. After some time has passed (a day, maybe less, still testing), each interface has 2 softflowd processes running on it, which doubles up all the logs (but with slight differences in the timestamps, grrr!)
      2. The date recorded by 'nf_stop' is sometimes in the future (note we are using V5, not V9 format logs [others have reported this issue with V9])

      Details:

      1. Initially only configured WAN for testing. Worked perfectly.

      Later on, added the remaining interaces. Still (mostly - see end) ok.

      Apart from the fact that randomly it will launch 2 instances on every interface, leading to double up of logs - a day or so ago noticed this:

      ps ax | grep soft | sort -k7

      42786  -  Is      0:10.64 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
      43381  -  Is      0:10.70 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
      40849  -  Is      0:02.95 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
      41966  -  Is      0:02.96 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
      36436  -  Is      0:01.88 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
      38743  -  Is      0:01.89 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
      38207  -  Is      0:03.29 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
      40541  -  Is      0:03.31 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
      40097  -  Is      0:00.06 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
      41030  -  Is      0:00.06 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
      41787  -  Is      0:00.02 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
      43037  -  Is      0:00.02 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
      35570  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
      36342  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
      36099  -  Is      0:09.37 /usr/local/sbin/softflowd -i igb2 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb2.pid -c /var/run/softflowd.igb2.ctl

      So I logged into the pfsense gui, went to Services->softflowd, made no changes except to press save and then it returned to proper operation:

      ps ax | grep soft | sort -k7

      15178  -  Ss      0:00.00 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
      14648  -  Ss      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
      13353  -  Ss      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
      13965  -  Ss      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
      14308  -  Ss      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
      14829  -  Ss      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
      13020  -  Ss      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
      12740  -  Ss      0:00.00 /usr/local/sbin/softflowd -i igb2 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb2.pid -c /var/run/softflowd.igb2.ctl

      Today I check graylog netflow reports and it has done the same thing again and checking pfsense:

      ps ax | grep soft | sort -k 7

      73519  -  Is      0:07.32 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
      73998  -  Is      0:07.34 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
      72773  -  Is      0:01.51 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
      73068  -  Is      0:01.51 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
      70361  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
      71276  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
      71587  -  Is      0:05.99 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
      72014  -  Is      0:06.00 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
      72350  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
      72799  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
      73045  -  Is      0:00.01 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
      73703  -  Is      0:00.01 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
      70282  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
      70512  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
      70118  -  Ss      0:02.82 /usr/local/sbin/softflowd -i igb2 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb2.pid -c /var/run/softflowd.igb2.ctl

      So logged into GUI again, made no changes (again) except to click save (again) and all ok again:

      ps ax | grep soft | sort -k 7

      42316  -  Is      0:00.04 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
      41617  -  Is      0:00.02 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
      40609  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
      40836  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
      41215  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
      41962  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
      40176  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
      39828  -  Ss      0:00.06 /usr/local/sbin/softflowd -i igb2 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb2.pid -c /var/run/softflowd.igb2.ctl

      Is anyone else seeing this weird behaviour? Can we not run it on multiple interfaces simultaneously? Is this a known bug?

      1. Regarding the "mostly ok": we built a python interrogator script for graylog/elasticsearch and grabbed the nf_stop field to use as 'last seen' date.

      This was working very well (or so we thought), until 2 days ago (10th January) when the report indicated - according to the log - packets left this network 3 days into the future on 13-January????? Huh?

      At this time have abandoned the nf_stop date since it clearly is not always valid and have reverted to using the graylog received date as 'last seen'.

      Is anyone else seeing this weird behaviour from softflowd?

      Thanks in advance for any help/suggestions/ideas.

      UPDATE: It appears that one interface (igb1 - the one with all the vlans on it) has been flapping a few times over the last few days.  Connected to low end vlan switch. Suspect this may be related:
      Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1
      Jan 11 21:46:57 pfsense kernel: igb1: link state changed to DOWN
      Jan 11 21:46:57 pfsense kernel: igb1_vlan3 : link state changed to DOWN
      Jan 11 21:46:57 pfsense kernel: igb1_vlan4: link state changed to DOWN
      Jan 11 21:46:57 pfsense kernel: igb1_vlan5: link state changed to DOWN
      Jan 11 21:46:57 pfsense kernel: igb1_vlan6: link state changed to DOWN
      Jan 11 21:46:57 pfsense kernel: igb1_vlan9: link state changed to DOWN
      Jan 11 21:46:57 pfsense kernel: igb1_vlan10: link state changed to DOWN
      Jan 11 21:46:57 pfsense kernel: igb1_vlanZ: link state changed to DOWN
      Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan3
      Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan4
      Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan5
      Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan6
      Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan9
      Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan10
      Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlanZ
      Jan 11 21:46:58 pfsense check_reload_status: Reloading filter
      Jan 11 21:46:58 pfsense php-fpm[95604]: /rc.linkup: Hotplug event detected for igb1_vlan3 (opt3) static IP (internal IP)
      Hotplug events for other interfaces not shown

      Note 1: vlanZ (Z is a number) is the log interface, excluded from softflow config
      Note 2: All vlans are distinct /24 subnets, no bridges of any kind. The other 2 interfaces are not vlanned, one is static IP /24 internal and the other is WAN (DHCP to a bridged cable modem)

      Oddly though, the 2 non-vlanned interfaces were adversely affected in that they also ended up with 2 copies of softflowd running on them, even though neither of those interfaces flapped.

      So it appears to be a bug(?) in the interface "up" code that fails to kill running softflowd processes before starting new ones when the interface is brought up again - this is a guess at this time.

      System specs:
      Other packages installed include snort (one minor version behind at this moment), pfBlockerNG (installed but not enabled yet) and Status_Traffic_Totals - latter 2 both current.
      Hardware: APU2, 4GB RAM, embedded install on msata 16GB (this APU is a rocket compared to the old Alix2D3 it replaced - highly recommended, esp if snort is a requirement)

      3GB free memory:
      last pid:  3652;  load averages:  0.04,  0.06,  0.01                                                                                                                                up 5+09:32:40  17:31:39
      62 processes:  1 running, 61 sleeping
      CPU:  0.0% user,  0.2% nice,  0.3% system,  0.4% interrupt, 99.1% idle
      Mem: 118M Active, 428M Inact, 297M Wired, 996K Cache, 83M Buf, 3083M Free

      Spends most of it's time idling at 75MHz, load barely reaches 0.25 (a poor man's monitoring script runs on the serial console permanently for at-a-glance view - it's in color and mostly green - the APU is well within temp limits even at 64 degrees celsius/147 Fahrenheit. Below updates are 10 second intervals):

      Temps: Current (Min/Max)            Pathetic excuse for temp and clock speed graph                                CPU Frequency  Load Averages
      Temp: 54.3C (51.2/64.2)                          ==============  ….........                                        300 MHz  0.04 0.06 0.01
      Temp: 53.8C (51.2/64.2)                          ==============  ......                                              150 MHz  0.04 0.06 0.01
      Temp: 55.0C (51.2/64.2)                          ===============  ............                                        300 MHz  0.03 0.06 0.01
      Temp: 54.3C (51.2/64.2)                          ==============  ...                                                  75 MHz  0.02 0.05 0.01
      Temp: 54.5C (51.2/64.2)                          ===============  ...                                                  75 MHz  0.02 0.05 0.01
      Temp: 55.6C (51.2/64.2)                        ================  ........................................            1000 MHz  0.02 0.05 0.01
      Temp: 54.5C (51.2/64.2)                          ===============  ...............                                      375 MHz  0.01 0.05 0.01
      Temp: 54.0C (51.2/64.2)                          ==============  ......                                              150 MHz  0.01 0.05 0.00
      Temp: 54.1C (51.2/64.2)                          ==============  ...                                                  75 MHz  0.01 0.04 0.00
      Temp: 54.3C (51.2/64.2)                          ==============  ............                                        300 MHz  0.01 0.04 0.00
      Temp: 54.5C (51.2/64.2)                          ===============  ...                                                  75 MHz  0.00 0.04 0.00
      Temp: 55.3C (51.2/64.2)                        ================  ........................................            1000 MHz  0.00 0.04 0.00
      Temp: 55.1C (51.2/64.2)                        ================  .....................                                525 MHz  0.00 0.04 0.00
      Temp: 54.1C (51.2/64.2)                          ==============  ......                                              150 MHz  0.00 0.04 0.00
      Temp: 54.1C (51.2/64.2)                          ==============  ...                                                  75 MHz  0.00 0.03 0.00
      Temp: 53.8C (51.2/64.2)                          ==============  ...                                                  75 MHz  0.00 0.03 0.00
      Temp: 54.1C (51.2/64.2)                          ==============  ...                                                  75 MHz  0.00 0.03 0.00
      Temp: 55.6C (51.2/64.2)                        ================  ........................................            1000 MHz  0.08 0.04 0.01
      Temp: 55.3C (51.2/64.2)                        ================  ........................                            600 MHz  0.07 0.04 0.01

      Edits: Added UPDATE section, typos

      1 Reply Last reply Reply Quote 0
      • First post
        Last post
      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.