Prioritizing WAN gateway monitoring ICMP traffic

ternarybit

I'd like to prioritize ICMP traffic so my gateway monitoring stats aren't skewed when the link saturates. As it stands, the gateway pings are getting queued, creating spikes in my quality graphs whenever my upload bandwidth is saturated. The outpass traffic graphs exactly match the "high latency" spikes in my monitoring graphs.

I can see it in the state table:

WAN icmp [–MY-WAN-IP--]:13324 -> [–WAN-MONITOR-IP]:13324 0:0 707 / 711 20 KiB / 20 KiB

Then I create a firewall rule:

Floating rule
Action: match
WAN interface
Direction: out
Protocol: ICMP
Source: any
Destination: any
Queue: qACK

Then I kill the state above and check the firewall rule. It catches 1-2 pings then shows no traffic.

I've tried variations including both directions, selecting LAN/OPT interfaces as well as WAN in the rule, creating a normal rule on the WAN interface…Can't seem to catch this traffic. Any help appreciated!

Nullity

I think it makes more sense for ICMP to be prioritized the same as the majority of your traffic. Otherwise, you will be seeing pings that report latency & packet drops that does not match most of your traffic.

ternarybit

@Nullity:

I think it makes more sense for ICMP to be prioritized the same as the majority of your traffic. Otherwise, you will be seeing pings that report latency & packet drops that does not match most of your traffic.

I want my quality graphs to represent, as accurately as possible, the actual quality of the link between my router and the ISP (which is a WISP, quite prone to erratic latency and loss). If my quality graph goes way into the red, I want to be able to squarely point the finger at my ISP, not a normal spike in traffic that caused my pings to sit in my own router's queue for 500ms.

If ICMP is prioritized and I still see high latency/loss, I know for sure it wasn't anything on my end, and it was actually the fault of my gateway.

NogBadTheBad

Cisco equipment tends to drop ICMP when the links get busy.

If your link is busy wouldn't you rather the important stuff gets through?

johnpoz

"Cisco equipment tends to drop ICMP when the links get busy."

Is that really just Cisco, isn't that a normal thing.. To a router/switch should I route/switch this packet or spend answer that ping ;) Its job of routing/switching should be of more importance than answering a stupid ping.. If busy will get to that ping when I have time sort of thing.. While yes Cisco for sure does that - figure that should be across the board normal.

NogBadTheBad

@johnpoz:

"Cisco equipment tends to drop ICMP when the links get busy."

Is that really just Cisco, isn't that a normal thing.. To a router/switch should I route/switch this packet or spend answer that ping ;) Its job of routing/switching should be of more importance than answering a stupid ping.. If busy will get to that ping when I have time sort of thing.. While yes Cisco for sure does that - figure that should be across the board normal.

Exactly John :)

ternarybit

@NogBadTheBad:

Cisco equipment tends to drop ICMP when the links get busy.

If your link is busy wouldn't you rather the important stuff gets through?

This ICMP traffic is important stuff to me. I don't care if a few bytes of "real" traffic is queued or even dropped every few seconds during peak times. It's worth it to me to have accurate quality graphs that ACTUALLY represents what my ISP's link quality is. I have the queue and queuedrops graphs if I want to know the state of MY router.

I know that the typical pfSense demographic is all about bleeding-edge performance and max throughput, but that is not my use case.

Is anyone going to actually answer my question instead of questioning my motives?

johnpoz

Which is why I brought up that is possible its just ping and not for sure mean actual packet loss of real data ;) But if sees it all the time, then it does point to either the isp device not being properly sized or a network issue. I would for sure investigate such an issue..

ternarybit

@johnpoz:

Which is why I brought up that is possible its just ping and not for sure mean actual packet loss of real data ;) But if sees it all the time, then it does point to either the isp device not being properly sized or a network issue. I would for sure investigate such an issue..

I see random latency spikes, 200-500% higher than baseline latency, specifically during times of maxxed-out upload. I know this because I overlay the traffic graph over my latency graph and they exactly match. It makes perfect sense that an outgoing ICMP packet might get queued during such times, creating a falsified quality graph that implies that my wireless ISP just took a dump again, when REALLY all that happened was the pings got stuck in MY OWN queue.

I don't WANT my quality graphs to show when MY queues get full. I want it to show when my ISP has trouble.

Edit: And again, none of this has to do with my OP, which is asking for advice on how to write a firewall rule, not on the merits/demerits of my motives for doing so.

johnpoz

if your packets are getting queued up on your own device, is that not just buffer bloat effects.

Off the top I do not know how, or even possible to queue or set priority on traffic sent by the firewall itself (pfsense).. I have never had need or want to look into such a thing. The firewall really doesn't generate any traffic that should take higher priority than any other traffic it was actually routing.

The only thing that would come close maybe is dns queries from either the forwarder or resolver wanting to take higher priority than other traffic generated by clients that its actually routing..

ternarybit

@johnpoz:

if your packets are getting queued up on your own device, is that not just buffer bloat effects.

Off the top I do not know how, or even possible to queue or set priority on traffic sent by the firewall itself (pfsense).. I have never had need or want to look into such a thing. The firewall really doesn't generate any traffic that should take higher priority than any other traffic it was actually routing.

The only thing that would come close maybe is dns queries from either the forwarder or resolver wanting to take higher priority than other traffic generated by clients that its actually routing..

If my uplink is saturated, why wouldn't my packets get queued? Isn't that the point of queues/buffers, to hold packets in line waiting for available bandwidth to leave the interface?

I used the traffic shaper wizard to prioritize DNS traffic using floating rules, similar to the one in my OP. It's catching plenty of traffic, but for some reason, my ICMP rule doesn't see the traffic.

johnpoz

If your uplink is saturated then sure stuff is going to get queued, I am just not aware of how you can mark the traffic generated by the firewall itself to take higher priority in the queue which is what your asking right?

You want the icmp traffic that is being generated by pfsense to have higher priority than any other traffic either from clients or from the firewall. In this case you just want the firewalls own pings to be able to jump in line..

ternarybit

@johnpoz:

If your uplink is saturated then sure stuff is going to get queued, I am just not aware of how you can mark the traffic generated by the firewall itself to take higher priority in the queue which is what your asking right?

You want the icmp traffic that is being generated by pfsense to have higher priority than any other traffic either from clients or from the firewall. In this case you just want the firewalls own pings to be able to jump in line..

At least at first glance, it would seem really straightforward. I have an ICMP "state" in my firewall table between my WAN IP and a given destination (see OP)

Why can't I put a floating rule on my WAN interface, matching outbound ICMP traffic, to put the traffic in a specific QoS queue?

Nullity

IIRC (from setting NTP packets originating from pfSense to high priority), you use a floating rule with source as "self" to catch the packets then assign them to the appropriate queue.

You might also need to tag LAN originating packets so that they are excluded.

I forget what I did exactly but I think I confirmed it's functionality a year ago when I set it up… at least, I hope I did because the rules & queues are still there. ???

I'll look at my rules later and try to help if you have trouble with the above guidelines.

johnpoz

That is exactly the sort of post I hang around here for, and put up the the same questions over and over and over again for ;) I most likely would of never ran into such a tidbit of info.. I was not aware you could pick self as a source in the floating rules…

Oh I just looked you can not pick it as interface, but if you pick it as an outgoing rule (ie floating..) then sure you could then place the traffic into a queue - nice!

But isn't it a bit late at that point?? Since if its leaving wouldn't it already be pointless to place it in a queue? And it wouldn't be entering the interface with that source?

ternarybit

@Nullity:

IIRC (from setting NTP packets originating from pfSense to high priority), you use a floating rule with source as "self" to catch the packets then assign them to the appropriate queue.

You might also need to tag LAN originating packets so that they are excluded.

I forget what I did exactly but I think I confirmed it's functionality a year ago when I set it up… at least, I hope I did because the rules & queues are still there. ???

I'll look at my rules later and try to help if you have trouble with the above guidelines.

I'm not in front of my webUI right now, but I will definitely try this and report back. I didn't look at the Source field, so I wasn't aware that self was an option.

Nullity

@johnpoz:

That is exactly the sort of post I hang around here for, and put up the the same questions over and over and over again for ;) I most likely would of never ran into such a tidbit of info.. I was not aware you could pick self as a source in the floating rules…

Oh I just looked you can not pick it as interface, but if you pick it as an outgoing rule (ie floating..) then sure you could then place the traffic into a queue - nice!

But isn't it a bit late at that point?? Since if its leaving wouldn't it already be pointless to place it in a queue? And it wouldn't be entering the interface with that source?

Regarding traffic-shaping queues, packets are only queued when leaving the interface, so assigning them at that point should work perfectly fine. I think that's exactly how the Traffic-shaping Wizard sets up it's floating rules.

I forget the quirks with NAT and floating rules. Isn't the source changed to the WAN IP when outgoing the WAN? If so, "self" sourced wouldn't work unless matched when incoming the WAN?

ternarybit

@Nullity:

IIRC (from setting NTP packets originating from pfSense to high priority), you use a floating rule with source as "self" to catch the packets then assign them to the appropriate queue.

You might also need to tag LAN originating packets so that they are excluded.

I forget what I did exactly but I think I confirmed it's functionality a year ago when I set it up… at least, I hope I did because the rules & queues are still there. ???

I'll look at my rules later and try to help if you have trouble with the above guidelines.

So far, it looks like this works, but with one weird caveat: I have to set the rule to action Pass. It doesn't work with rule action Match.

But still, really glad to have a solution. Thank you!

raitd

@ternarybit
I agree, pfSense needs to priorities its ICMP packets used for WAN monitoring. None of its default settings to do with WAN monitoring are helpful until you prioritize this traffic. In our use case with multi-WAN failover (not load sharing, due to high cost of traffic on our backup internet connection) this is a critical function to get right.

We have found that unless you prioritize the WAN monitoring traffic the multi-WAN dpinger process will decide that the connection is down when it is only busy, and then it decides to cut over to the backup connection. Then it is a race between the backup connection getting flooded, vs how quickly pfSense realizes the main connection is actually perfectly fine.

We normally bundle this ICMP along with DNS in to the one queue. Having the infrastructure working is MUCH more important than some user trying to email their latest cat video to their buddy. Given the size and frequency of the monitoring packets I see no problem with it jumping to the front of the queue. Sure this means the quality graph now shows how the ISP is performing instead of what your users are experiencing, but the Queue and Queuedrop graphs will show info to give guidance on if the user experience is acceptable or not.

FYI, sometime between when this topic was last active and now, using a floating rule with "Match" now on 2.4.4 works ok. Note we also use high availability, and run on ESXi for extra fun times. Also note, the CBQ implementation in the 2.4.x versions seems totally broken, we've had to switch to HFSC, so we still see 1.5~2.0x higher latency during times when our internet connection is flooded, due to us using the KISS method for our HFSC implementation.

gff1stof3

So glad to see this thread. I also have a dual WAN setup and I needed to prioritize the ICMP for WAN monitoring just like you did. It simply wont work reliably the other way around as once there is some congestion the traffic jumps over to the backup link which then kills it and makes for a real mess. Thanks.