Softflowd: 2 instances on every interface after a while + v5 logs in the future



  • Installed and configured softflowd to send logs to a graylog system, using NetFlow version 5 graylog collector and version 5 configured in pfSense softflowd GUI config.

    Both pfSense and softflowd are current (2.3.2_1 and pfSense-pkg-softflowd-1.2.1_2/softflowd-0.9.8_2 respectively)

    Discovered 2 issues:

    1. After some time has passed (a day, maybe less, still testing), each interface has 2 softflowd processes running on it, which doubles up all the logs (but with slight differences in the timestamps, grrr!)
    2. The date recorded by 'nf_stop' is sometimes in the future (note we are using V5, not V9 format logs [others have reported this issue with V9])

    Details:

    1. Initially only configured WAN for testing. Worked perfectly.

    Later on, added the remaining interaces. Still (mostly - see end) ok.

    Apart from the fact that randomly it will launch 2 instances on every interface, leading to double up of logs - a day or so ago noticed this:

    ps ax | grep soft | sort -k7

    42786  -  Is      0:10.64 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
    43381  -  Is      0:10.70 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
    40849  -  Is      0:02.95 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
    41966  -  Is      0:02.96 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
    36436  -  Is      0:01.88 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
    38743  -  Is      0:01.89 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
    38207  -  Is      0:03.29 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
    40541  -  Is      0:03.31 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
    40097  -  Is      0:00.06 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
    41030  -  Is      0:00.06 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
    41787  -  Is      0:00.02 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
    43037  -  Is      0:00.02 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
    35570  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
    36342  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
    36099  -  Is      0:09.37 /usr/local/sbin/softflowd -i igb2 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb2.pid -c /var/run/softflowd.igb2.ctl

    So I logged into the pfsense gui, went to Services->softflowd, made no changes except to press save and then it returned to proper operation:

    ps ax | grep soft | sort -k7

    15178  -  Ss      0:00.00 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
    14648  -  Ss      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
    13353  -  Ss      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
    13965  -  Ss      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
    14308  -  Ss      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
    14829  -  Ss      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
    13020  -  Ss      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
    12740  -  Ss      0:00.00 /usr/local/sbin/softflowd -i igb2 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb2.pid -c /var/run/softflowd.igb2.ctl

    Today I check graylog netflow reports and it has done the same thing again and checking pfsense:

    ps ax | grep soft | sort -k 7

    73519  -  Is      0:07.32 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
    73998  -  Is      0:07.34 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
    72773  -  Is      0:01.51 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
    73068  -  Is      0:01.51 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
    70361  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
    71276  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
    71587  -  Is      0:05.99 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
    72014  -  Is      0:06.00 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
    72350  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
    72799  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
    73045  -  Is      0:00.01 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
    73703  -  Is      0:00.01 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
    70282  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
    70512  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
    70118  -  Ss      0:02.82 /usr/local/sbin/softflowd -i igb2 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb2.pid -c /var/run/softflowd.igb2.ctl

    So logged into GUI again, made no changes (again) except to click save (again) and all ok again:

    ps ax | grep soft | sort -k 7

    42316  -  Is      0:00.04 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
    41617  -  Is      0:00.02 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
    40609  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
    40836  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
    41215  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
    41962  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
    40176  -  Is      0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
    39828  -  Ss      0:00.06 /usr/local/sbin/softflowd -i igb2 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb2.pid -c /var/run/softflowd.igb2.ctl

    Is anyone else seeing this weird behaviour? Can we not run it on multiple interfaces simultaneously? Is this a known bug?

    1. Regarding the "mostly ok": we built a python interrogator script for graylog/elasticsearch and grabbed the nf_stop field to use as 'last seen' date.

    This was working very well (or so we thought), until 2 days ago (10th January) when the report indicated - according to the log - packets left this network 3 days into the future on 13-January????? Huh?

    At this time have abandoned the nf_stop date since it clearly is not always valid and have reverted to using the graylog received date as 'last seen'.

    Is anyone else seeing this weird behaviour from softflowd?

    Thanks in advance for any help/suggestions/ideas.

    UPDATE: It appears that one interface (igb1 - the one with all the vlans on it) has been flapping a few times over the last few days.  Connected to low end vlan switch. Suspect this may be related:
    Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1
    Jan 11 21:46:57 pfsense kernel: igb1: link state changed to DOWN
    Jan 11 21:46:57 pfsense kernel: igb1_vlan3 : link state changed to DOWN
    Jan 11 21:46:57 pfsense kernel: igb1_vlan4: link state changed to DOWN
    Jan 11 21:46:57 pfsense kernel: igb1_vlan5: link state changed to DOWN
    Jan 11 21:46:57 pfsense kernel: igb1_vlan6: link state changed to DOWN
    Jan 11 21:46:57 pfsense kernel: igb1_vlan9: link state changed to DOWN
    Jan 11 21:46:57 pfsense kernel: igb1_vlan10: link state changed to DOWN
    Jan 11 21:46:57 pfsense kernel: igb1_vlanZ: link state changed to DOWN
    Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan3
    Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan4
    Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan5
    Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan6
    Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan9
    Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan10
    Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlanZ
    Jan 11 21:46:58 pfsense check_reload_status: Reloading filter
    Jan 11 21:46:58 pfsense php-fpm[95604]: /rc.linkup: Hotplug event detected for igb1_vlan3 (opt3) static IP (internal IP)
    Hotplug events for other interfaces not shown

    Note 1: vlanZ (Z is a number) is the log interface, excluded from softflow config
    Note 2: All vlans are distinct /24 subnets, no bridges of any kind. The other 2 interfaces are not vlanned, one is static IP /24 internal and the other is WAN (DHCP to a bridged cable modem)

    Oddly though, the 2 non-vlanned interfaces were adversely affected in that they also ended up with 2 copies of softflowd running on them, even though neither of those interfaces flapped.

    So it appears to be a bug(?) in the interface "up" code that fails to kill running softflowd processes before starting new ones when the interface is brought up again - this is a guess at this time.

    System specs:
    Other packages installed include snort (one minor version behind at this moment), pfBlockerNG (installed but not enabled yet) and Status_Traffic_Totals - latter 2 both current.
    Hardware: APU2, 4GB RAM, embedded install on msata 16GB (this APU is a rocket compared to the old Alix2D3 it replaced - highly recommended, esp if snort is a requirement)

    3GB free memory:
    last pid:  3652;  load averages:  0.04,  0.06,  0.01                                                                                                                                up 5+09:32:40  17:31:39
    62 processes:  1 running, 61 sleeping
    CPU:  0.0% user,  0.2% nice,  0.3% system,  0.4% interrupt, 99.1% idle
    Mem: 118M Active, 428M Inact, 297M Wired, 996K Cache, 83M Buf, 3083M Free

    Spends most of it's time idling at 75MHz, load barely reaches 0.25 (a poor man's monitoring script runs on the serial console permanently for at-a-glance view - it's in color and mostly green - the APU is well within temp limits even at 64 degrees celsius/147 Fahrenheit. Below updates are 10 second intervals):

    Temps: Current (Min/Max)            Pathetic excuse for temp and clock speed graph                                CPU Frequency  Load Averages
    Temp: 54.3C (51.2/64.2)                          ==============  ….........                                        300 MHz  0.04 0.06 0.01
    Temp: 53.8C (51.2/64.2)                          ==============  ......                                              150 MHz  0.04 0.06 0.01
    Temp: 55.0C (51.2/64.2)                          ===============  ............                                        300 MHz  0.03 0.06 0.01
    Temp: 54.3C (51.2/64.2)                          ==============  ...                                                  75 MHz  0.02 0.05 0.01
    Temp: 54.5C (51.2/64.2)                          ===============  ...                                                  75 MHz  0.02 0.05 0.01
    Temp: 55.6C (51.2/64.2)                        ================  ........................................            1000 MHz  0.02 0.05 0.01
    Temp: 54.5C (51.2/64.2)                          ===============  ...............                                      375 MHz  0.01 0.05 0.01
    Temp: 54.0C (51.2/64.2)                          ==============  ......                                              150 MHz  0.01 0.05 0.00
    Temp: 54.1C (51.2/64.2)                          ==============  ...                                                  75 MHz  0.01 0.04 0.00
    Temp: 54.3C (51.2/64.2)                          ==============  ............                                        300 MHz  0.01 0.04 0.00
    Temp: 54.5C (51.2/64.2)                          ===============  ...                                                  75 MHz  0.00 0.04 0.00
    Temp: 55.3C (51.2/64.2)                        ================  ........................................            1000 MHz  0.00 0.04 0.00
    Temp: 55.1C (51.2/64.2)                        ================  .....................                                525 MHz  0.00 0.04 0.00
    Temp: 54.1C (51.2/64.2)                          ==============  ......                                              150 MHz  0.00 0.04 0.00
    Temp: 54.1C (51.2/64.2)                          ==============  ...                                                  75 MHz  0.00 0.03 0.00
    Temp: 53.8C (51.2/64.2)                          ==============  ...                                                  75 MHz  0.00 0.03 0.00
    Temp: 54.1C (51.2/64.2)                          ==============  ...                                                  75 MHz  0.00 0.03 0.00
    Temp: 55.6C (51.2/64.2)                        ================  ........................................            1000 MHz  0.08 0.04 0.01
    Temp: 55.3C (51.2/64.2)                        ================  ........................                            600 MHz  0.07 0.04 0.01

    Edits: Added UPDATE section, typos


Log in to reply