Softflowd: 2 instances on every interface after a while + v5 logs in the future
-
Installed and configured softflowd to send logs to a graylog system, using NetFlow version 5 graylog collector and version 5 configured in pfSense softflowd GUI config.
Both pfSense and softflowd are current (2.3.2_1 and pfSense-pkg-softflowd-1.2.1_2/softflowd-0.9.8_2 respectively)
Discovered 2 issues:
- After some time has passed (a day, maybe less, still testing), each interface has 2 softflowd processes running on it, which doubles up all the logs (but with slight differences in the timestamps, grrr!)
- The date recorded by 'nf_stop' is sometimes in the future (note we are using V5, not V9 format logs [others have reported this issue with V9])
Details:
- Initially only configured WAN for testing. Worked perfectly.
Later on, added the remaining interaces. Still (mostly - see end) ok.
Apart from the fact that randomly it will launch 2 instances on every interface, leading to double up of logs - a day or so ago noticed this:
ps ax | grep soft | sort -k7
42786 - Is 0:10.64 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
43381 - Is 0:10.70 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
40849 - Is 0:02.95 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
41966 - Is 0:02.96 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
36436 - Is 0:01.88 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
38743 - Is 0:01.89 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
38207 - Is 0:03.29 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
40541 - Is 0:03.31 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
40097 - Is 0:00.06 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
41030 - Is 0:00.06 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
41787 - Is 0:00.02 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
43037 - Is 0:00.02 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
35570 - Is 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
36342 - Is 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
36099 - Is 0:09.37 /usr/local/sbin/softflowd -i igb2 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb2.pid -c /var/run/softflowd.igb2.ctlSo I logged into the pfsense gui, went to Services->softflowd, made no changes except to press save and then it returned to proper operation:
ps ax | grep soft | sort -k7
15178 - Ss 0:00.00 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
14648 - Ss 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
13353 - Ss 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
13965 - Ss 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
14308 - Ss 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
14829 - Ss 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
13020 - Ss 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
12740 - Ss 0:00.00 /usr/local/sbin/softflowd -i igb2 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb2.pid -c /var/run/softflowd.igb2.ctlToday I check graylog netflow reports and it has done the same thing again and checking pfsense:
ps ax | grep soft | sort -k 7
73519 - Is 0:07.32 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
73998 - Is 0:07.34 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
72773 - Is 0:01.51 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
73068 - Is 0:01.51 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
70361 - Is 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
71276 - Is 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
71587 - Is 0:05.99 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
72014 - Is 0:06.00 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
72350 - Is 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
72799 - Is 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
73045 - Is 0:00.01 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
73703 - Is 0:00.01 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
70282 - Is 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
70512 - Is 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
70118 - Ss 0:02.82 /usr/local/sbin/softflowd -i igb2 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb2.pid -c /var/run/softflowd.igb2.ctlSo logged into GUI again, made no changes (again) except to click save (again) and all ok again:
ps ax | grep soft | sort -k 7
42316 - Is 0:00.04 /usr/local/sbin/softflowd -i igb0 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb0.pid -c /var/run/softflowd.igb0.ctl
41617 - Is 0:00.02 /usr/local/sbin/softflowd -i igb1_vlan10 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan10.pid -c /var/run/softflowd.igb1_vla
40609 - Is 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan3 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan3.pid -c /var/run/softflowd.igb1_vlan3
40836 - Is 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan4 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan4.pid -c /var/run/softflowd.igb1_vlan4
41215 - Is 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan5 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan5.pid -c /var/run/softflowd.igb1_vlan5
41962 - Is 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan6 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan6.pid -c /var/run/softflowd.igb1_vlan6
40176 - Is 0:00.00 /usr/local/sbin/softflowd -i igb1_vlan9 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb1_vlan9.pid -c /var/run/softflowd.igb1_vlan9
39828 - Ss 0:00.06 /usr/local/sbin/softflowd -i igb2 -n graylog:2055 -v 5 -T full -t general=86400 -t maxlife=172800 -p /var/run/softflowd.igb2.pid -c /var/run/softflowd.igb2.ctlIs anyone else seeing this weird behaviour? Can we not run it on multiple interfaces simultaneously? Is this a known bug?
- Regarding the "mostly ok": we built a python interrogator script for graylog/elasticsearch and grabbed the nf_stop field to use as 'last seen' date.
This was working very well (or so we thought), until 2 days ago (10th January) when the report indicated - according to the log - packets left this network 3 days into the future on 13-January????? Huh?
At this time have abandoned the nf_stop date since it clearly is not always valid and have reverted to using the graylog received date as 'last seen'.
Is anyone else seeing this weird behaviour from softflowd?
Thanks in advance for any help/suggestions/ideas.
UPDATE: It appears that one interface (igb1 - the one with all the vlans on it) has been flapping a few times over the last few days. Connected to low end vlan switch. Suspect this may be related:
Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1
Jan 11 21:46:57 pfsense kernel: igb1: link state changed to DOWN
Jan 11 21:46:57 pfsense kernel: igb1_vlan3 : link state changed to DOWN
Jan 11 21:46:57 pfsense kernel: igb1_vlan4: link state changed to DOWN
Jan 11 21:46:57 pfsense kernel: igb1_vlan5: link state changed to DOWN
Jan 11 21:46:57 pfsense kernel: igb1_vlan6: link state changed to DOWN
Jan 11 21:46:57 pfsense kernel: igb1_vlan9: link state changed to DOWN
Jan 11 21:46:57 pfsense kernel: igb1_vlan10: link state changed to DOWN
Jan 11 21:46:57 pfsense kernel: igb1_vlanZ: link state changed to DOWN
Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan3
Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan4
Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan5
Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan6
Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan9
Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlan10
Jan 11 21:46:57 pfsense check_reload_status: Linkup starting igb1_vlanZ
Jan 11 21:46:58 pfsense check_reload_status: Reloading filter
Jan 11 21:46:58 pfsense php-fpm[95604]: /rc.linkup: Hotplug event detected for igb1_vlan3 (opt3) static IP (internal IP)
Hotplug events for other interfaces not shownNote 1: vlanZ (Z is a number) is the log interface, excluded from softflow config
Note 2: All vlans are distinct /24 subnets, no bridges of any kind. The other 2 interfaces are not vlanned, one is static IP /24 internal and the other is WAN (DHCP to a bridged cable modem)Oddly though, the 2 non-vlanned interfaces were adversely affected in that they also ended up with 2 copies of softflowd running on them, even though neither of those interfaces flapped.
So it appears to be a bug(?) in the interface "up" code that fails to kill running softflowd processes before starting new ones when the interface is brought up again - this is a guess at this time.
System specs:
Other packages installed include snort (one minor version behind at this moment), pfBlockerNG (installed but not enabled yet) and Status_Traffic_Totals - latter 2 both current.
Hardware: APU2, 4GB RAM, embedded install on msata 16GB (this APU is a rocket compared to the old Alix2D3 it replaced - highly recommended, esp if snort is a requirement)3GB free memory:
last pid: 3652; load averages: 0.04, 0.06, 0.01 up 5+09:32:40 17:31:39
62 processes: 1 running, 61 sleeping
CPU: 0.0% user, 0.2% nice, 0.3% system, 0.4% interrupt, 99.1% idle
Mem: 118M Active, 428M Inact, 297M Wired, 996K Cache, 83M Buf, 3083M FreeSpends most of it's time idling at 75MHz, load barely reaches 0.25 (a poor man's monitoring script runs on the serial console permanently for at-a-glance view - it's in color and mostly green - the APU is well within temp limits even at 64 degrees celsius/147 Fahrenheit. Below updates are 10 second intervals):
Temps: Current (Min/Max) Pathetic excuse for temp and clock speed graph CPU Frequency Load Averages
Temp: 54.3C (51.2/64.2) ============== …......... 300 MHz 0.04 0.06 0.01
Temp: 53.8C (51.2/64.2) ============== ...... 150 MHz 0.04 0.06 0.01
Temp: 55.0C (51.2/64.2) =============== ............ 300 MHz 0.03 0.06 0.01
Temp: 54.3C (51.2/64.2) ============== ... 75 MHz 0.02 0.05 0.01
Temp: 54.5C (51.2/64.2) =============== ... 75 MHz 0.02 0.05 0.01
Temp: 55.6C (51.2/64.2) ================ ........................................ 1000 MHz 0.02 0.05 0.01
Temp: 54.5C (51.2/64.2) =============== ............... 375 MHz 0.01 0.05 0.01
Temp: 54.0C (51.2/64.2) ============== ...... 150 MHz 0.01 0.05 0.00
Temp: 54.1C (51.2/64.2) ============== ... 75 MHz 0.01 0.04 0.00
Temp: 54.3C (51.2/64.2) ============== ............ 300 MHz 0.01 0.04 0.00
Temp: 54.5C (51.2/64.2) =============== ... 75 MHz 0.00 0.04 0.00
Temp: 55.3C (51.2/64.2) ================ ........................................ 1000 MHz 0.00 0.04 0.00
Temp: 55.1C (51.2/64.2) ================ ..................... 525 MHz 0.00 0.04 0.00
Temp: 54.1C (51.2/64.2) ============== ...... 150 MHz 0.00 0.04 0.00
Temp: 54.1C (51.2/64.2) ============== ... 75 MHz 0.00 0.03 0.00
Temp: 53.8C (51.2/64.2) ============== ... 75 MHz 0.00 0.03 0.00
Temp: 54.1C (51.2/64.2) ============== ... 75 MHz 0.00 0.03 0.00
Temp: 55.6C (51.2/64.2) ================ ........................................ 1000 MHz 0.08 0.04 0.01
Temp: 55.3C (51.2/64.2) ================ ........................ 600 MHz 0.07 0.04 0.01Edits: Added UPDATE section, typos