My problems with 2.0-RC1
-
Hello.
I've recently switched to pfSense on my router. Its Multi LAN / Multi WAN setup with one master and one backup pfSense running 2.0-RC1. Below are few of my problems, questions and suggestions after first week of using it :-)
1. When I'm making some changes on my master router, ie. adding new aliases or firewall rules CPU load and memory usage dramatically increases on my backup router. More and more of php processes appear and approx after an hour 4 gigs of ram and 8 gigs of swap are fully used and system dies. I thought it may be related to 64 bit architecutre and reinstalled to 32 bit, but problem was still there. Now I'm going back to 64 bits so I'll check which script exactly is being run when I'm done with reinstall and reconfig.
2. I have problem with failover. When I'll disconnect my master router backup one immediately picks it up and network continues to work. However when master is coming back online and reclaims master state (backup router is back in slave mode) it immediately hangs for 4-5 minutes. That of course causes all network connections to be terminated after timeout. After few minutes bootup continues on master and network starts working again. Is such delay normal in this case? Or maybe backup router switches to slave mode too early? Any other possible reasons?
3. I'm using bind as DNS server on pfSense. Its installed from official FreeBSD packages + my own init script that I put in /usr/local/etc/rc.d. Script starts fine but I was surprised that few instances of DNS server were running after rebooting. A little investigation told me that each "something.sh" that I put into mentioned directory is being run few times during bootup. It varries from 2 to 5 times in my case. Is this normal or its a bug? If its normal where I should put my init script so its run only once? I tried using /etc/rc.conf (and original init script that comes with package) but its wiped out after each reboot. /etc/defaults/rc.conf on the other hand is overwritten after each upgrade.
4. I'd welcome a little change to web interface :-) When I'm adding reject or drop rule to firewall I usually want it to be logged. Can logging be automagically marked after selecting reject or drop while adding rule? Of course not every user may want such behavior so maybe it would be better to add an option ie. in system / advanced: "Mark logging option automatically while adding reject / drop rules"?
-
For 1. and 2. please provide more input.
For 3. its not supported. You have to make sure every time your script is called it should deal within itself for restarting/reconfiguring and not start a new instance. -
About my first problem: I've reinstalled my backup router and it still dies. However now I've more info. Whatever change I'll make on my master router ie. add/delete/modify alias, firewall rule, VPN configuration or even just change web interface skin it causes high load on backup router. It spawns between 30 and 150 (thats lowest and highest value I've noted) php processes:
/usr/local/bin/php /etc/rc.filter_configure_sync
Each process takes between 115 to 150 megs of ram. 3 to 5 of them have "accept" state, other have "lockf" state. System load goes up to 8 or 10, and CPU load to above 90%. I tried x86 and x86_64 version with same result. I tried different hardware with same result. If more input is needed to resolve this problem let me know what should I check.
Additionally it seems like every change in pfSense web interface (even as trivial as skin change) causes full pfSense reload and terminates network connection for minute or so. For example all my VOIP calls die, remote SSH sessions freeze, Jabber server and few other services logs that connection to remote peers has been terminated. That happens after clicking "Save" and/or "Apply changes". Its really annoying and is also bad thing as I'm unable to do anything on router during work hours :-(
Now problem #2… it kind of disappeared after going back to 64bit pfSense. There is still delay when master router comes back online, but its around 30 seconds which is acceptable for me.
Problem #3. Yes, I'm handling things properly in my init script. I was just wondering if its normal that init script is called more than once as "somescript.sh start" during bootup. For me its a bug, but maybe there is some specific pfSense thing that requires calling "service start" more than once.
And two small bugs I've spoted today:
1. When I'm adding new firewall rule based on existing rule and existing rule is ie. UDP only, the newly created rule is by default TCP only. Protocol type should be copied as well from existing rule.
2. After last upgrade (done this evening) when logged via SSH there is list of network interfaces. Before upgrade there was correct interface state there, now there is NONE/NONE instead of UP or DOWN.
-
1. When I'm adding new firewall rule based on existing rule and existing rule is ie. UDP only, the newly created rule is by default TCP only. Protocol type should be copied as well from existing rule.
can't replicate that, the protocol of the original is always retained.
2. After last upgrade (done this evening) when logged via SSH there is list of network interfaces. Before upgrade there was correct interface state there, now there is NONE/NONE instead of UP or DOWN.
This is odd too, there is no interface state at the console. It'll show NONE where there is no IP on an interface.
I'm not sure what you're seeing there or what you're even running with the issues you're having, we have a bunch of CARP setups in production and never seen any of the issues you are. Maybe some package you have installed, or some modifications you've made to the system.
-
@cmb:
can't replicate that, the protocol of the original is always retained.
Seems to be gone after uninstalling widescreen package but I'm not sure if that was causing it.
@cmb:
This is odd too, there is no interface state at the console. It'll show NONE where there is no IP on an interface.
It looks like in example below. If its normal then ok, no problems here. I'm pretty sure it was displaying this information differently before upgrade, but don't have screenshot or screen dump how it looked before.
WLAN (opt3) -> lagg0_vlan21 -> 192.168.3.253/24 NONE/NONE
# ifconfig lagg0_vlan21 lagg0_vlan21: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500 options=3 <rxcsum,txcsum>ether 00:15:17:1d:b1:4c inet6 fe80::21b:78ff:fec3:8304%lagg0_vlan21 prefixlen 64 scopeid 0xd inet 192.168.3.253 netmask 0xffffff00 broadcast 192.168.3.255 nd6 options=3 <performnud,accept_rtadv>media: Ethernet autoselect status: active vlan: 21 parent interface: lagg0</performnud,accept_rtadv></rxcsum,txcsum></up,broadcast,running,promisc,simplex,multicast>
@cmb:
I'm not sure what you're seeing there or what you're even running with the issues you're having, we have a bunch of CARP setups in production and never seen any of the issues you are. Maybe some package you have installed, or some modifications you've made to the system.
My backup router was reinstalled and updated yesterday and is now clean pfSense 2.0-RC1 without any modifications. My configuration is pretty simple as well, just CARP, bunch of aliases and firewall rules, few OpenVPN tunnels… thats all. It still behaves like I described above - forking dozens of PHP processes after saving any change on master router. If there is enough processes to eat up all memory machine simply dies.
My master router has installed Bind as DNS server, Bacula client, bash and mc, all from official FreeBSD packages. Both Bind and Bacula use my own init scripts to start during bootup and doesn't interact with pfSense in any way. Router doesn't use itself as DNS server. Thats all modifications I've made.
-
Just FYI about my problem #2 from first post: when master router goes up again it freezes for a while after displaying "Starting syslog…done.", backup router goes back to slave mode in very same time.
-
Start over with a 100% stock master too, no packages or anything. It's almost certain a package or some other modification on your primary is somehow hosing the secondary by screwing up the sync process.
-
@cmb:
Start over with a 100% stock master too, no packages or anything. It's almost certain a package or some other modification on your primary is somehow hosing the secondary by screwing up the sync process.
I've reinstalled master router today and restored config.xml. Everything went fine so I've added few new firewall rules. After clicking "Save" for each rule more and more php processes were forked on backup router. I've finally cliked "Apply changes". Whole process took me less than minute. In that time system load on backup roter went up to 8.42, cpu usage to 96%, ram (4gigs) usage to 85%, swap (8 gigs) usage to 94%…
When I checked there were 328 rc.filter_configure_sync processes. 3 of these 328 processes had accept status, other lockf status. Machine was hardly usable for few minutes. After that system load dropped to ~1.7 and php processes were slowly disappearing, but ram and swap usage stayed high. After about 30 minutes eveyrthing was back to normal (~20% cpu usage, ~12% memory usage and 0 swap usage). Positive change is that machine was sluggish for only few minutes and didn't crashed, but still such behaviour isn't normal.
-
Which part of PHP code is responsible for launching rc.filter_configure_sync on backup router when I'll click "Save/Apply Changes" on master router? I guess I'd have to solve this issue myself.
FYI: I found why items from /usr/local/etc/rc.d were starting many times instead of once. Its caused by /etc/rc.newwanip. Whenever new IP is added to wan interface during startup this script calls /etc/rc.start_packages which calls each *.sh with "start" option. I see that calling each *.sh with "stop" was moved out from /etc/rc.start_packages to /etc/rc.stop_packages some time ago. Before issuing "start" command "stop" should run first. Therefore /etc/rc.newwanip should call /etc/rc.stop_packages first to restart services and not only try to start them again which won't work for most of software (they'd only log "already running" instead of restarting).