[ER] pfSense box unreachable after config LAGG failover interfaces for LAN/DMZ

rcfa

Skip to the bottom of the thread for what's going on. Partially me being stupid, partially pfSense acting stupid.
–----original post-------
I'm trying to figure out if I'm doing something stupid, or if there's an issue with drivers, or pfSense.

So I have this LANNER box with 6 ethernet interfaces from em0 through em5.
I configure em0/1 as a LAGG failover and assign it to the WAN.
Configuration is through the WAN interface.

Next I create LAGG failover interfaces for em2/3 and em4/5. I then assign these to LAN and OPT1 (which I want to use for a DMZ/guest network. The main reason for the LAGG setup is that if ever an ethernet port breaks, I just have to replug the cable without changing the config (or something like that). In essence I have more ethernet ports than I need, so that seemed a reasonable thing to do.

LAN and OPT1 have no network address assigned to them (yet). Within moments of getting that far, the web interface of the pfSense becomes unreachable. I also can't ping the pfSense box, however I can ping other computers FROM the pfSense box.

The way to get again control of the box is by using the revert to recent configuration option from the console menu. As soon as I go back far enough that the LAGG interfaces for LAN and OPT1 are no longer defined, the box becomes reachable again.

Is there a setting somewhere that blocks the web configuration on the WAN port as soon as there is a LAN port defined, even if the LAN port isn't configured (no IP address, no real network connection)? Is there an issue with these network drivers, e.g. do I have to enable polling support or disable some of the hardware features, like e.g. checksum or something like this?

When this first happened, I was also in the process of downloading packages, so I thought I did something stupid there, but now I can reliably recreate this just by creating these interfaces in an otherwise pretty much factory settings configuration.

Anyone got ideas? Anything I can look at to track this down? (specific log files?)

Thanks!

podilarius

First … WAN blocks by default and you will need to add the rule. As you will need to add a LAGG rule to allow traffic. Please search the forums. I think I have seen this chicken/egg problem in here before.

rcfa

@podilarius:

First … WAN blocks by default and you will need to add the rule. As you will need to add a LAGG rule to allow traffic. Please search the forums. I think I have seen this chicken/egg problem in here before.

Wait, I thought that was what the anti-lockout rule was all about?
Also, why can I have a LAGG-failover on WAN, and everything works, and then it stops working as soon as I define a LAN? Is that when the anti-lockout rule gets kicked out or something?

i'll see what I can find…

wallabybob

It has been my experience that a reboot is sometimes necessary after major interface or IP address changes.

It is not clear exactly what you have started with and what you have done. For example, if you have moved the pfSense LAN interface from a physical interface with IP address to a LAGG interface without IP address it is not immediately clear to me who will claim ownership of incoming packet with destination address = the former IP address of the LAN interface. Since the former LAN interface is no longer the LAN interface a "less friendly" set of firewall rules presumably will be applied.

I think you should be working on the "OPT1 group" first to work out what needs to be done then apply it to the "LAN group" rather than trying to do both together. I suspect what should be done will include something like creating an OPTx interface for the LAGG then from the Interfaces -> (assign) page assigning the LAGG interface to OPT1 and (say) the former physical interface of OPT1 to OPTx.

rcfa

Well, the thing is: nothing is hooked up to the LAN and OPT1 groups. They were set up in preparation of what the box is supposed to do in the future. Right now, only the WAN link is hooked up and connected to my real LAN which acts as WAN for now as far as pfSense is concerned.

So all web configurator access goes through the WAN port, and that WAN port is already set up as LAGG interface with failover.

So what puzzles me is why the WAN stops working, when I don't change anything about the WAN/LAGG setup, and simply add unused (!) OPT1 and LAN LAGG-failover interfaces.

Reboot doesn't help, that's what I tried first. Neither does a change in IP address on the WAN interface help. Only reverting to any state before the OPT1 and LAN LAGG-failover interfaces were defined, and instantly I can get back in.

podilarius

@rcfa:

Wait, I thought that was what the anti-lockout rule was all about?
Also, why can I have a LAGG-failover on WAN, and everything works, and then it stops working as soon as I define a LAN? Is that when the anti-lockout rule gets kicked out or something?

i'll see what I can find…

This is what I am thinking. The Anti lockout rule is for the LAN interface. When you add it to a LAGG, it might not adhere to this anti-lockout rule.
Again, what version of pfSense are you running?

wallabybob

@rcfa:

Reboot doesn't help, that's what I tried first. Neither does a change in IP address on the WAN interface help. Only reverting to any state before the OPT1 and LAN LAGG-failover interfaces were defined, and instantly I can get back in.

What are the steps you take to revert to the old state?

Any chance you inadvertently mess with a WAN related interface when you are configuring the LAGG interfaces? (Perhaps the interfaces are not what you think they are, for example you might think they are 0, 1, 2, 3, … left to right when in fact they are 2, 3, 0, 1 ... or maybe an interface is in multiple LAGG groups.)

What responses do you get when you ping each of the pfSense IP addresses?

rcfa

@wallabybob:

@rcfa:

Reboot doesn't help, that's what I tried first. Neither does a change in IP address on the WAN interface help. Only reverting to any state before the OPT1 and LAN LAGG-failover interfaces were defined, and instantly I can get back in.

What are the steps you take to revert to the old state?

From the console menu, item 15 "Revert to recent…."

@wallabybob:

Any chance you inadvertently mess with a WAN related interface when you are configuring the LAGG interfaces? (Perhaps the interfaces are not what you think they are, for example you might think they are 0, 1, 2, 3, … left to right when in fact they are 2, 3, 0, 1 ... or maybe an interface is in multiple LAGG groups.)

Well, since at first only the WAN interface is configured, and I can access the box through that LAGG-failover-WAN interface, the mapping of the em0-em5 matches what I expect. Even if it should be different, since I don't move the ethernet cable to a different plug, things should continue working, particularly since no interface is in more than one group. There are (when it stops working) three LAGG-failover groups, all identical in configuration, except that one is em0/em1, the next is em2/em3, and the third is em4/em5.

The first of these is assigned to the WAN interface, and that works. Only once I start creating the other two LAGG-failover groups, do things go haywire. In other words, I have no problem with a WAN interface that's a LAGG-failover group made up from em0/em1.

As a matter of fact, that's what I have right now, and I can access the web configurator just fine. But I'm fairly certain that the moment I add the other two groups, I'll be locked out again, even before I get a chance to assign an IP address to the LAN and OPT1 interfaces.

What I will try now, if just the definition of the LAGG-failover group, without assignment to an interface name, is sufficient to lock me out, too…

I'll try to narrow down exactly when my access gets killed.

@wallabybob:

What responses do you get when you ping each of the pfSense IP addresses?

None. I can ping FROM the pfSense box using the console menu, but I can no longer ping the pfSense box from elsewhere, once I'm locked out.

rcfa

diff /Users/rcfa/Downloads/pfSense/config-20120523073304.xml /Users/rcfa/Downloads/config-20120523074005.xml 
224a225,233
> 		 <lan>> 			
> 			<if>lagg1</if>
></lan> 
> 		 <opt1>> 			
> 			<if>lagg2</if>
> 			 <spoofmac>></spoofmac></opt1> 
426,427c435,436
< 		<time>1337757306</time>
< 		
---
> 		<time>1337758770</time>
> 		
448a458,469
> 		 <lagg>> 			<members>em2,em3</members>
> 			
> 			<laggif>lagg1</laggif>
> 			<proto>failover</proto>
></lagg> 
> 		 <lagg>> 			<members>em4,em5</members>
> 			
> 			<laggif>lagg2</laggif>
> 			<proto>failover</proto>
></lagg> 
462c483,484
< 	 <ppps>---
> 	 <ppps>></ppps></ppps>

This is the diff between a config that works, and one that doesn't work.
Note, once I make the changes that are reflected in this config file, I still have a little bit of time, and then the web interface locks up. Due to that delay, I was able to download the config file, but after that I only visited one or two pages on the config interface without making any changes or hitting any save or apply button, other than the one to download the configuration. (But this is the first time I use the little time I have between making changes and the configurator locking up to quickly download the config file, such that I was able to make this diff)

Now anyone tell me why these changes should make the difference between the configurator working or not working…

The attached screen shots show exactly what changed, and it shows the pre-existing anti lockout rule on the WAN interface.

![Screen Shot 2012-05-23 at 03.34.11.jpg](/public/imported_attachments/1/Screen Shot 2012-05-23 at 03.34.11.jpg)
![Screen Shot 2012-05-23 at 03.34.11.jpg_thumb](/public/imported_attachments/1/Screen Shot 2012-05-23 at 03.34.11.jpg_thumb)
![Screen Shot 2012-05-23 at 03.35.22.jpg](/public/imported_attachments/1/Screen Shot 2012-05-23 at 03.35.22.jpg)
![Screen Shot 2012-05-23 at 03.35.22.jpg_thumb](/public/imported_attachments/1/Screen Shot 2012-05-23 at 03.35.22.jpg_thumb)
![Screen Shot 2012-05-23 at 03.36.16.jpg](/public/imported_attachments/1/Screen Shot 2012-05-23 at 03.36.16.jpg)
![Screen Shot 2012-05-23 at 03.36.16.jpg_thumb](/public/imported_attachments/1/Screen Shot 2012-05-23 at 03.36.16.jpg_thumb)
![Screen Shot 2012-05-23 at 03.37.34.jpg](/public/imported_attachments/1/Screen Shot 2012-05-23 at 03.37.34.jpg)
![Screen Shot 2012-05-23 at 03.37.34.jpg_thumb](/public/imported_attachments/1/Screen Shot 2012-05-23 at 03.37.34.jpg_thumb)
![Screen Shot 2012-05-23 at 03.39.49.jpg](/public/imported_attachments/1/Screen Shot 2012-05-23 at 03.39.49.jpg)
![Screen Shot 2012-05-23 at 03.39.49.jpg_thumb](/public/imported_attachments/1/Screen Shot 2012-05-23 at 03.39.49.jpg_thumb)

podilarius

If I recall correctly, once you create a LAN interface, the anti-lockout on the WAN shifts to the LAN. In which case you will loose configuration access. Try creating manual rules in the WAN to allow 22,80, and 443 to the WAN address before starting to config the LAN and OPT LAGG interfaces.

stephenw10

@rcfa:

Is there a setting somewhere that blocks the web configuration on the WAN port as soon as there is a LAN port defined?

Yes.

Starting with 2.0 it has been possible to configure pfSense with only one interface, that interface must be WAN. In that scenario pfSense will default to allowing webgui access on the WAN interface.
If you have two or more interfaces webgui access is via LAN only, by default, and WAN is blocked.

Steve

rcfa

Hm, then I would say, this is a bug, because as you can see, the so-called LAN interface isn't configured: it has no IP address family assigned to it, nor does it have any IP address. Short, it's essentially a place-holder.

Heck, I'm not even sure the system would give me enough time to properly configuring the LAN interface after I set up the LAGG-failover group and assign it to the LAN interface. For all I know, I might be in the midst of setting up IPv4 and IPv6 properties when I get locked out. These interfaces aren't even marked as active!

So I really think this needs to change. Also, a firewall rule, once established, shouldn't go away without explicit (rather than implicit) user action. If the understanding is that it's a security risk to have admin access on the WAN port, then maybe after creating a LAN interface a message should be displayed that advises the user to remove the lockout rule from the WAN interface, but it shouldn't just happen behind the user's back.

Only the OPT1 interface was activated long enough to rename it to DMZ and then it was immediately deactivated again. The LAN interface was never activated at all.

When I have a rule in there, I rely on it sticking…

podilarius

I see where you are coming from, and manual rules hold to that. The auto-rules like the anti-lockout might need to change a little. This is going to be up to the developers and maintainers of this project.

For the problem at hand. If you create the manual lockout rules on the WAN before starting in on the LAN and DMZ LAGG setups, are you able to get to where you can finish the config or still access the GUI after adding the LAN and DMZ LAGGs?

rcfa

@podilarius:

I see where you are coming from, and manual rules hold to that. The auto-rules like the anti-lockout might need to change a little. This is going to be up to the developers and maintainers of this project.

For the problem at hand. If you create the manual lockout rules on the WAN before starting in on the LAN and DMZ LAGG setups, are you able to get to where you can finish the config or still access the GUI after adding the LAN and DMZ LAGGs?

I will have to test that, because for some reason restoring with menu item 15 didn't get me far enough back. So right now I'm back to factory settings, downloading the latest snapshot. Then I can restore the last working settings (which will download a ton of packages, too). Once I'm done with that, I can test again. Hopefully without creating another lock-out, because this gets rather time consuming fast… (each cycle takes hours between downloading packages, etc.)

Would be cool if packages could persist through updates, but that's a different issue, and not likely going to happen in the short term.

stephenw10

Packages should persist through an update. When you update everything should be reinstalled automagically. However some don't as you've probably found. ::) Packages have to be kept up to date by their maintainers.

Steve

rcfa

@stephenw10:

Packages should persist through an update. When you update everything should be reinstalled automagically. However some don't as you've probably found. ::) Packages have to be kept up to date by their maintainers.

When I mean "persist", then I mean "persist" not "automatic reinstall".

If I update Mac OS X, I don't have to reinstall Photoshop or even third party device driver afterwards, either. Yes, things are (pretty much all from what I can tell) reinstalled. But it takes about 3-4h for a few dozen packages to re-download and re-install, particularly if you don't have a blazing fast drive, CPU and connection.

For something like a firewall, that's a damn long time. That's a few hours without backup DNS server, without e-mail filtering, without web server, without phone service, etc. (depending on what packages are installed and used.)

Right now, updates of pfSense are closer to a re-install than to a true update. IMO that is one of the biggest weak spots of pfSense when compared to most commercial boxes: there an update is more or less the time it takes to upload the firmware, plus a reboot, with the only down-time for any service being the reboot.

The good thing is, the stable releases require such an upgrade fairly rarely (still a hassle), but it kind of gets a grind when dealing with snapshot releases… So this is something that should at some point be addressed. Maybe the new package system makes this better, would be awesome. Otherwise, it's maybe something for a 2.5 or 3.0 release of pfSense, but certainly something that should be on the radar in the long term.

rcfa

OK, when I add manually fw rules to let 443 and 22 pass, then I'm seemingly not getting locked out.
I can however confirm, that the anti-lockout rule does get nuked behind the user's back (because it's missing now), and that despite the fact that both the LAN and the DMZ(opt1) interface are disabled, which particularly in combination I consider dubious.

If we're concerned, it would be better to never open port 80 on the WAN and force https for the web configurator unless access is through the LAN or another trusted network.

I'd consider passwords going in the clear over the net a bigger problem than keeping encrypted access to the configuration interface open from public networks.

stephenw10

Do you not get redirected to https if you try to connect to port 80?

I think I agree with you that switching the webgui access to LAN should only happen once LAN is enabled with a valid IP that seems like an oversight. I have done similar things before but not including a LAGG setup. You can usually get around it by making all the required changes before applying them.

@rcfa:

it takes about 3-4h for a few dozen packages to re-download and re-install

3-4h! :o What packages are you running that take that long?
My own box takes, maybe, a few minutes to reinstall the packages. I am only running a few small packages though.

Steve

podilarius

Squid and Snort alone can take minutes for each … a slow connection would extend even that.

rcfa

@stephenw10:

Do you not get redirected to https if you try to connect to port 80?

AFAIK only IFF you have the https as default AND you enable redirection. But if the web configurator is set to use http, then you can happily send all the info in the clear as long as the default lock-out rule allow port 80.

My point however is, that https should ALWAYS be the default, and http on a WAN link should essentially never happen. Someone sniffing the password, and you might as well not have a firewall.
I'd hike the restrictions for allowing http to be enabled:

system must have active LAN link
no anti-lockout rule on the WAN link that opens up port 80 to the system

Even in a LAN it's naive to assume that you're in a friendly environment, so I don't know why the console keeps asking me if I want to revert the web configurator to http whenver I make a minor change in the interface setup or something like that. Things should default to https, and people should have to jump though hoops if they really want to expose their sensitive passwords etc. to the public.

@rcfa:

it takes about 3-4h for a few dozen packages to re-download and re-install

@stephenw10:

3-4h! :o What packages are you running that take that long?
My own box takes, maybe, a few minutes to reinstall the packages. I am only running a few small packages though.

Let's start with snort, squid3, then let's continue with pfBlocker, mailscanner, dansguardian, imspector, anti-virus (havp), vhosts, sipproxy, radius, postfix, pfflowd, arpwatch, mailreport, nmap, and a few more. And I haven't even thought of freeswitch or asterisk yet ;)

Admittedly, I don't have the fastest connection yet (pfSense is supposed to help, because all my WAN traffic needs to get tunneled out to the internet due to a numbskull ISP called Verizon), and a CF Card is not the fastest disk drive, either…