Packet Loss issue



  • I have some serious issues with continuous packet loss. The web gui goes almost unresponsive during this time. I have been using pfSense for well over 2 years and never had this issue with my earlier ISP. I checked with the ISP and there is no issue. I replaced my cable modem today with a DOCIS3 but that has not resolved the issue. The ISP has no issues with pings and RTT from their end. I even changed the network cable running between the modem and pfSense.

    Here is my config..
    2.0.1-RELEASE (amd64)
    Intel Dual Port Gigabit network card (never had any issues with this card, replaced with another Intel card to rule this out)

    WAN - VLAN 10 (em0) – tried with direct input to WAN port too.
    LAN - VLAN 1 (em1)
    VoIP - VLAN 2 (em1)
    Video - VLAN 3 (em1)

    Here is the system log..

    Feb 25 21:44:54
    apinger: alarm canceled: WAN(x.x.x.x) *** loss ***
    Feb 25 21:44:09
    check_reload_status: Reloading filter
    Feb 25 21:43:59
    apinger: ALARM: WAN(x.x.x.x) *** loss ***
    Feb 25 21:43:22
    check_reload_status: Reloading filter
    Feb 25 21:43:12
    apinger: alarm canceled: WAN(x.x.x.x) *** loss ***
    Feb 25 21:42:05
    check_reload_status: Reloading filter
    Feb 25 21:41:55
    apinger: ALARM: WAN(x.x.x.x) *** loss ***
    Feb 25 21:38:44
    check_reload_status: Reloading filter
    Feb 25 21:38:34
    apinger: alarm canceled: WAN(x.x.x.x) *** loss ***
    Feb 25 21:37:54
    check_reload_status: Reloading filter
    Feb 25 21:37:44
    apinger: ALARM: WAN(x.x.x.x) *** loss ***
    Feb 25 21:33:57
    check_reload_status: Reloading filter
    Feb 25 21:33:47
    apinger: alarm canceled: WAN(x.x.x.x) *** loss ***
    Feb 25 21:29:25
    check_reload_status: Reloading filter
    Feb 25 21:29:15
    apinger: ALARM: WAN(x.x.x.x) *** loss ***



  • Anyone?



  • Check if you are not reaching firewall maximum states.

    dashboard will show current load and if you want to increase it, to on system -> advanced -> firewall/NAT



  • This is what it shows.

    State table size 21/783000
    MBUF Usage 2966/25600


  • Netgate Administrator

    So are you saying that this problem started when you switched ISPs?

    Just because they aren't seeing a problem does not mean they don't have a problem!  ;)

    Try connecting to their service directly with another machine and run some pings.

    Steve



  • I did that already. The cable line seems perfectly fine. I hooked up the cable modem directly to my testing laptop. Internet was working perfectly for over an hour without a single issue.

    Could there be a problem with the VLAN driver? The internal LAN is not having any such issues as I have 3 internal VLANs. This is drving me nuts as the internet comes to a grinding halt every few mins. I see the pfSense dashboard (after a lot of lag) showing packet loss in yellow.. well above 30% then after a few mins it goes back to normal.

    I have disabled IP Monitoring in the Routing options for now. This has helped as pfSense is not flushing old states as it does after ever WAN fail alarm.


  • Netgate Administrator

    Hmm,
    So is it a new ISP? (may not be relevant)

    Why are you using a VLAN on em0 when you only have a single interface on it?

    Steve



  • Have my network on a 48 port patch panel and switch. It's easy to work that way as I am planning to add additional WAN in the months ahead. I removed the VLAN on em0 and attached the modem cable directly to pfSense but it didn't make a difference.



  • Is your monitor IP your gateway IP? May have to change that to something else, some ISP routers will not reliably respond to pings even when they're passing traffic just fine.



  • I already tried that. Used google DNS 8.8.8.8. Same apinger issue.


  • Netgate Administrator

    So, have you recently changed ISP? Did it coincide with this problem?

    Steve



  • Yes, I recently moved. But didn't see this issue for the first 3 months. This has been going on since last 3 weeks.



  • I do not see the apinger issue with this build

    2.1-DEVELOPMENT (amd64)
    built on Sun Feb 26 13:39:54 CET 2012

    Unfortunately I cannot use this build as Squid fails to work on it.



  • @asterix:

    I do not see the apinger issue with this build

    2.1-DEVELOPMENT (amd64)
    built on Sun Feb 26 13:39:54 CET 2012

    Almost certainly a coincidence, nothing at all has changed with apinger.


  • Netgate Administrator

    Is that an 8.3 build?
    The drivers will be different.
    What are NICs are you using?

    Steve



  • Yes, that's a 8.3 build

    Intel dual gigabit. It has worked flawlessly till 2.0. Something changed in 2.0.1?

    If it were a driver issue then why don't I see this within the network? It's only on the WAN. Maybe apinger is just monitoring WAN I suppose.


  • Netgate Administrator

    Exactly.
    It maybe some incompatibility with the upstream equipment. Some change in the ISPs network.
    Which Intel chipset is it?

    Steve



  • I think I found the root cause. I have snort installed (sorry failed to mention that).

    Something changed in the snort.org rules that I usually select.

    Last night, snort.org site was down and with a clean new install it failed to update the snort.org rule set but the emerging rules were applied. I kept an eye on the system logs and didn't see the apinger issue. I had even rebooted my 48 port L2 managed switch so I thought it might be the cause (though I had rebooted it earlier).

    All through out the night and whole day today there was not a single apinger issue reported in the system log. Just now I did a manual snort rules update and snort.org rule sets were added. I selected the usual snort.org rules (I do the same in the emerging rules too). The moment snort restarted with the new rules applied the system log started to fill up with apinger issues again.

    Now I have  deselected all snort.org rules and will have to select one rule set at a time to pin point exactly which rule set is causing this apinger issue.


  • Netgate Administrator

    I would not have guessed that, though it makes sense. Good to know.
    Hopefully this will help anyone else with a similar issue.

    Steve


Log in to reply