Routing issue related to dynamic nature of OpenVPN interface (I think)
I have a configuration with multiple paths to the internet. First there is a primary WAN and a Backup (LTE), in a Gateway group. Both of these gateways are Bridged DHCP - the provider dynamically supplies the IP.
Consequently, my Static IP space is in the cloud and I tunnel that routed network in using OpenVPN. The OpenVPN server is in the cloud and the pfSense side acts as the client - this enables pfSense to move the tunnel between the Primary and Backup interfaces when fail-over/recovery occurs. There is also a Gateway Group for the OpenVPN tunnel - which has not helped the problem, BTW.
For the most part, egress for outbound traffic is through the Primary Gateway group. There are some servers though which must use a Static IP for egress, so those are routed to the tunnel.
All ingress is to a Static IP and arrives through the tunnel.
This all works fine for the most part, except when it dosen't. I believe what is happening is the OpenVPN interface goes down, in which case I believe it must dynamically disappear, so when traffic arrives for the tunnel it gets sent instead over the Primary Gateway group. I assume this is because when the tunnel goes away so does the gateway which leaves only one viable route.
So now two things happen. First, the traffic that goes out over the wrong gateway is hopelessly broken, but second, 4 states get established which now pins the traffic to the wrong gateway, so then the tunnel does come back up I have to realize that a bunch of services are broken, go into pfSense, and drop the 4 bad states so that new, correct, states can be established.
So, the question is, how can I pin traffic to a specific route such that if the OpenVPN link fails, as you would expect, the route fails and dosen't try to find another way out? The other requirement is that the failure is graceful, so the state table dosen't get poisoned and normal operation can resume when the link comes back up? Is this even possible because of the nature of OpenVPN and how it's integrated in pfSense? Is my only alternative to set up a separate OpenVPN server in a DMZ?
I assume you do policy based routing using the VPN gateway to direct the specific traffic over to the remote site.
So set a check at System > Advanced > Miscellaneous > "Do not create rules when gateway is down".
You may also add a floating block rule to WAN for direction "out" for the appropriate traffic to prohibit outbound and adding states, but this shouldn't be necessary.
Thanks! The checkbox won't solve the problem directly, but that completely explains why it's doing what it's doing.
I do have a rule that creates a gateway exception for some traffic, and that checkbox is just selecting two different ways for there to be no gateway exception when the gateway goes down. The options are use the default gateway OR fall through and ultimately use the default gateway. So, there also needs to be a block rule immediately following the gateway exception rule, AND the checkbox needs to be checked, so the traffic falls through to an explicit block of the traffic if the gateway is down.
Sadly, there really ought to be greater choice in handling traffic in exception cases. What I need is "when a rule has a gateway specified and this gateway is down, the rule is STILL created AS WRITTEN" OR "when a rule has a gateway specified and this gateway is down, the rule is STILL created BUT AS A BLOCK RULE".
I'm not terribly familiar with the low-level pf stuff and pfctl , but is there more information from the state table than what is shown in the pfsense GUI? I'd like to figure out what the pattern is in my situation in my thread that's causing this issue. I reenabled pfSync and I'm not so sure now if that was really solving the problem.
In my case, anything TCP related works fine in all failure situations....failing between onsite pfsense nodes or taking down a remote gateway node. UDP still works, somewhat. It either works fine, or it seems to only work in one direction, I.e. inbound. Let me explain.
FreePBX had this tech called Responsive Firewall which basically opens up SIP ports to the internet with three tiers of service. First, initially all inbound registration attempts are permitted. However , if these registrations are unsuccessful after a certain number over a frequency, that remote endpoint is first rate limited and if registrations continue to fail, the endpoint is banned for 24 hours.
In my case, I'm seeing inbound registrations, but my remote endpoints aren't getting the reply and completing the registration process...so FreePBX sees it as a failure and eventually bans that IP.
All the while, TCP traffic works fine.
I've usually set block& log rules at the end of a chain, so traffic that's falling through gets logged. Also, look for packets that are getting dropped that shouldn't have. You can dump the rules with pfctl and correlate the rule numbers with those in the log. You can also log the PASS rules. I've sometimes found my traffic getting passed by a different rule than expected, so it passed without the correct policy being applied. pfSense does insert a few "surprise rules" which you can't see in the GUI, so you're traffic might get handled by something very unexpected. I also had a case where there were some system rules being inserted at the end of the chain after the user rules, and my "block all and log" rule was preventing the system rules from ever being reached - not sure if it's been fixed but that was killing the tftp proxy.
TCP traffic is stateful, whereas UDP is pseudo-stateful, on account of the state table which makes UDP sort-of behave like TCP, but not exactly. With SIP especially I've found you can get a race condition - where either endpoint could initiate a "conversation", but the firewall has ingress and egress rules, so depending on which end is first to start the "conversation" a different rule may allow it, but then UDP is allowed bidirectionally, where as TCP is allowed unidirectionally because of it's stateful nature. As I recall I ran into trouble when the ingress rule had a problem, but and outbound packet would pass and allow traffic bidirectionally, but in time the state expires, gets removed, and the inbound packets start failing because they can't trigger the ingress rule and only had been working because the egress rule had fired.
@tlum Can you elaborate on what you mean by block and log at the end of a chain, are you suggesting to create a block rule at the very bottom of a ruleset? In my testing situation, I've got a wide open pass on all traffic, so there shouldn't be anything in the ruleset that is blocking traffic
Well, it's a general best practice to put a "block all" rule at the end of a filter chain in a firewall. In the event that traffic falls through all of the rules in the chain you want to guarantee that traffic not matching a rule gets explicitly handled - usually rejected - in a consistent way; not relying on an implicit block or reject. A block all rule is just a rule set to match ANYTHING, and set to block or reject rather than pass. You also have the option to set a rule to be logged.
In my experience though, a final block all WILL BREAK some pfSense functionality because some system rules are inserted after the user rules. It's still safe to log all traffic that hits the end of chain without being handled.
You may find these to be useful:
The filter rules
pfctl -vvs rules OR -vvsr
The NAT rules
pfctl -vvs nat OR -vvsn
pfctl -vvs states OR -vvss
pfctl -vvs Tables OR -vvsT
To dump the bogons table, for example
pfctl -t bogons -T show
Well, it's a general best practice to put a "block all" rule at the end of a filter chain in a firewall.
There is no need for this - this is default. And common for most any firewall to be default deny. Unless you turn off the logging of the default blocks. There is little reason to put an extra rule at the bottom.
Only reason you might do this - is if you turn off logging of the default block rule, and want to have a rule that blocks and logs in addition to the default block that is not logging.
I do this for example. Where I don't log default, and have rules to block and log only tcp syn traffic, and than another that blocks and logs common/interesting udp traffic.
All the other stuff that gets blocked, I don't see any reason to log that. Its not interesting to me.. Random udp ports from p2p stray traffic and the like..
@vbman213 said in Routing issue related to dynamic nature of OpenVPN interface (I think):
I've got a wide open pass on all traffic, so there shouldn't be anything in the rule set that is blocking traffic
Right, but, there is an implicit block that the end of the chain anyway. The problem is you don't know if your traffic got handled by a rule as expected or fell through and got dropped. So, at least place a rule at the end which matches all traffic and with log checked so you see all traffic which got there in your log... then look for legitimate traffic that was expected to be handled, but fell through.
Also, watch the byte and packet counts on the filters. You wrote a rule, you expected it to handle certain traffic, yet the count is zero (0)... then something isn't right. Conversely, if you've got count on rules that aren't expected to fire, you've got an issue there too.
As a troubleshooting aid?
I don't see the point of looking for something I am not interested in.
You would see any traffic that fell through your rules be the default block - that out of the box logs.. Only if the person turned off said logging would they need such a rule.
And if they are troubleshooting say a port forward - easier to just sniff to see if the traffic gets there.
Sniffing is much better than looking for a block log, which again is there by default.. So unless they on purpose turned that off.. Which they could just turn on again, vs creating some special rule.
Deny all rules at the end of chains usually have little to no cost given the fact that high value traffic should not be getting that far. While most chains should deny by default, one little configuration error can change that to accept; so in keeping with the "security is an onion" principal, best practice says don't rely on a single layer or single assumption. In security there is no such thing as "there is no need for this". It's always a question of, "can I afford all the layers of paranoia, or does it come at too high of a cost or burden".
That said, there is Reject (Active via ICMP) and Block (stealthfully), the latter generally being preferred in order to mitigate attempts to survey the attack surface. Explicitly defining a Block rule adds an additional layer of insurance that traffic won't be actively rejected giving away valuable intelligence.
Then there is logging which is not generally a default condition, and that audit trail is a best practice.
...while I'm at it, an unredacted screenshot of your attack surface is an anti-pattern!
I don't see the point of looking for something I am not interested in.
Neither did the guys at Starwood between at least 2014, and 2018 when Marriott took them over.
In InfoSec you capture everything and all of it... then let IDS sort it out.
Deny all rules at the end of chains
Again dude - its THERE already... Your telling him to do something that is already there!! And its already being logged.
My point was not to tell him to turn off logging.. My point was your telling him to put a default deny in his rules.. When its there already..
In my experience though, a final block all WILL BREAK some pfSense functionality because some system rules are inserted after the user rules
Is not true at all..
There are hidden rules - like allowing dhcp enabled when you turn on dhcp.. But they are not at the end of the user rules.. They are before the rules - you posted how to look at the full tables.. Where are these system rules AFTER user rules?? That you could break?
My point was your telling him to put a default deny in his rules.. When its there already..
I'm not going to continue arguing about something because I made an offhand comment about a "best practice" which is off topic. And, I think I was clear that when it comes to security it's better to duplicate when the cost of the duplication in bearable. I never said your were wrong but I am saying you're arguing a false dichotomy.
Where are these system rules AFTER user rules?? That you could break?
This is one I know of for sure. I don't know if it was ever resolved, nor do I know whether it was a red herring or a common occurrence. What I do know is that the actual filter chains are opaque since what you can see gets merged in with things you can't see.
I did recommend that it be made less opaque by displaying the system generated rules statically so that users can understand what's going on under the covers and be in a better position to work around it. There could even be a "show hidden rules" toggle, to provide "optional" transparency.
So, following this experience I have zero trust in not running into unexpected counterintuitive behavior, so I'll always advise people to use caution and to use
pfctl -vvsnto validate any assumptions.
So a thread with you talking to yourself about some problem.. From 2012? And then a redmine you submit with zero validation or any comments even..
Yeah clearly you were on to something there with some hidden rule after your rules that breaks stuff <rolleyes>
Off hand comment about best practice? Your the one that brought it up
I've usually set block& log rules at the end of a chain, so traffic that's falling through gets logged
Which again what I am saying is DEFAULT!!! So you confusing the user with something that is already there - telling him to create rules to do something that gets done by default..
Which your saying "could" break stuff.. <rolleyes>
So..... Any idea what the problem is here in the OP lol
Take a deep breath and open your mind. Confrontational moderators do the community a disservice. Start by finding what you agree with then go from there.
I did start a thread in 2012. I also found my own thread in a search 5 years later while trying to fix the same problem - which still hadn't been fixed - and was quite embarrassed to be taking my own advice, which I'd completely forgotten.
Who cares if it was initially discovered in 2012 if it still exists today? Who cares if no one else could help with the issue; all that's germane is whether the assessment was correct or not. And, if you're weren't in such a hurry to invalidate anyone who challenges you, you might have noticed that the redmine ticket was opened in Jan 2016 and was marked "confirmed" in July of the same year. So the message you're sending is giving condescending attitude has a higher priority than accuracy... and I'm guessing that's not the message a "global moderator" wants to be sending.
From my experience you have to:
- See if you can isolate the larger setup into a simpler test case with a reproducible failure condition. The full blown implementation usually causes a lot of distracting noise you'll end up chasing into dead ends. Try to break it down into the simplest possible test case.
- Isolate the failure condition; What leads to it. If it's easily reproducible that's a bonus.
- I'd usually start with assessing rules. You wrote them for a reason and you expect them to be doing something specific. Do your see anomalies? Rules that were expected to fire that didn't, or that did but shouldn't have. That's a very opportunistic step; sometimes you get lucky and something obvious jumps out, other times not. That's a low cost, high value, first step.
- Then, trace the transition from the working state to the failure state with a capture and analyze that. That ends up being a learning experience first and foremost; it forces you to reconcile various misconceptions but will eventually lead to the solution. SIP is annoyingly complex, although wireshark has a lot of great tools to help with the analysis.
I don't know if this will help any. I've been running SIP through pfSense for more than 12 years now and have come across various difficulties that I've mostly beaten into submission at this point... although, the maturity of all the moving pieces has helped a lot along the way. The following laundry list is not specific to your issue, just something to review and consider what the impact would be, if any, in your specific setup.
- IP layer addresses/ports are written in Application Layer headers.
- PBX may be trying to mitigate problems by predicting the public address of a server on a private subnet. It’s important to understand what behavior is enabled and determine if it’s suitable to the architecture, including all edge cases.
- SIP application layer header rewriting rules may be required, and if so, care must be taken to mitigate issues with edge cases
- When STUN is in use there will be problems if it’s used incorrectly; i.e. reached through wrong path, returns wrong answer. Care must be taken to avoid race conditions if/when failovers can dynamically change the public IP.
- Out-of-Band SIP application protocol negotiates a transport stream protocol with specific IP layer address/port pairs.
- Usually this is implemented by opening a static block to be utilized for S/RTP port assignments, rather than synchronizing the negotiation with rules dynamically. This pool MUST be synchronized with the PBX.
- You need to be clear on which side can initiate the S/RTP stream so the rules are in the right place.
- Packet filters generally can’t distinguish separate application layer streams over UDP, so a SIP REGISTER (outbound) will enable a SIP INVITE (inbound) to PASS even when there is no functional inbound rule, as long as the State continues to exist (assuming both sides are using port 5060).
- SIP should be implemented over TCP since the control protocol benefits from being reliable.
- S/RTP streams should be implemented over UDP because reliability is undesirable. In real time applications, packets arriving late or out of order have no value. You can’t play audio packets out of order, and can’t hold up real-time streams for retransmission of lost packets without introducing an unrecoverable delay.
- Reliable tunnels can exacerbate the problem, and certainly will contribute to judder. Real-time traffic needs to basically follow a now or never pattern.
- Silence is not golden. In some configurations, silence, like muting a phone while on a conference call, will cause no S/RTP packets to be sent.
- I’ve had issues with silence lasting more than 5-minutes causing the call to terminate. It appears that something in the path decides to timeout if no S/RTP packets are received for 5-minutes, and declares the call to be abandoned.
- Firewalls may expire state table entries considered stale even though the call is active from the application perspective.
- There is usually an option to “generate silence packets” which actually creates packets with pseudo-background noise, which then maintains a consistent S/RTP stream and lessens the likelihood calls will drop due to S/RTP timeouts. FYI: I’ve seen some phones that were too smart for their own good; generating silence packets when the audio was silent, but as soon as Mute was activates the S/RTP stream stopped anyway.
- IP layer addresses/ports are written in Application Layer headers.