Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    {done} for someone to figure out my problem - figured out my own problem

    Scheduled Pinned Locked Moved Expired/Withdrawn Bounties
    10 Posts 4 Posters 6.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • G
      GoldServe
      last edited by

      Here is my background. I've used NET4801, WRAP boards with three ethernets and a wireless card in all of them. Using 1.2rc2 to 1.2rc4, they all have died when torrenting or very high network traffic in the same way. Because of high loads from the wireless interrupts, I concluded that the NET4801 and WRAP boards were too slow for my purpose.

      Now I have a 1GHZ mini-itx eden board with two nic ports and a pci slot which i've added a mini-pci for wireless. I have everything set up properly with dual wan load balancing and fail over. Everything is working perfect cept when I run torrents at high throughput. The CPU does not go over 20% so that can't be the problem. Miniupnpd will complain in the system logs that it has run out of buffer space (but i've got 512MB ram) so I even disabled miniupnpd. Now, the box will drop all traffic, WAN and LAN (wifi) for a few seconds and will recover itself. The common denominator between all the boxes is the wireless card. Could somehow the atheros drivers be leaking memory or something?

      This is what a ping from a local host to the box looks like:

      Reply from 192.168.1.1: bytes=32 time<1ms TTL=64
      Reply from 192.168.1.1: bytes=32 time<1ms TTL=64
      Reply from 192.168.1.1: bytes=32 time<1ms TTL=64
      Reply from 192.168.1.1: bytes=32 time<1ms TTL=64
      Reply from 192.168.1.1: bytes=32 time=410ms TTL=64
      Reply from 192.168.1.1: bytes=32 time=1756ms TTL=64
      Reply from 192.168.1.1: bytes=32 time=913ms TTL=64
      Reply from 192.168.1.1: bytes=32 time=388ms TTL=64
      Reply from 192.168.1.1: bytes=32 time=100ms TTL=64
      Request timed out.
      Request timed out.
      Request timed out.
      Request timed out.
      Request timed out.
      Request timed out.
      Request timed out.
      Request timed out.
      Request timed out.
      Request timed out.
      Reply from 192.168.1.1: bytes=32 time<1ms TTL=64
      Reply from 192.168.1.1: bytes=32 time<1ms TTL=64
      Reply from 192.168.1.1: bytes=32 time<1ms TTL=64
      Reply from 192.168.1.1: bytes=32 time<1ms TTL=64
      Reply from 192.168.1.1: bytes=32 time<1ms TTL=64
      Reply from 192.168.1.1: bytes=32 time<1ms TTL=64
      Reply from 192.168.1.1: bytes=32 time<1ms TTL=64
      Reply from 192.168.1.1: bytes=32 time<1ms TTL=64

      I am offering $50 to anyone who can guide me through debugging this really annoying issue (from preventing pfsense being perfect in my opinion) and getting it fixed.

      This has happened to three different setups of mine so i seriously think there is a bug somewhere.

      Thanks for looking.

      1 Reply Last reply Reply Quote 0
      • G
        GoldServe
        last edited by

        I think i've solved my problem and i've isolated the problem to the atheros wireless card.

        From another machine on another network, I pinged the box and found it was still alive. The only interface that seemed to have died was the atheros interface. So i've been reading up on the net and I found I had tx buffer underrun errors when I typed athstats. That was a bad bad thing and would reset the card causing the timeouts. From the remote machine, I sshed into the box and confirmed this was the case but ifconfig -v ath0 did not show the card with the OACTIVE flag. I guess it resetted the card before I could see the flag.

        Anyways, my solution was to increase the hw.ath.txbuffer from 200 to 2000 and maybe this delays when the box dies. With 400 and 800, the interface took longer and longer to die.

        If anyone knows why or has a permanent fix, please post it this way! Hope this helps some people with wireless cards.

        1 Reply Last reply Reply Quote 0
        • jahonixJ
          jahonix
          last edited by

          How does your states table look like?
          Running torrents is likely to exceed the 10.000 states preset. Just increase it and test again.

          Further on, I would change the "Firewall Optimization Options" from "System: Advanced functions" to "aggressive".

          Please report back what you'll find!

          1 Reply Last reply Reply Quote 0
          • G
            GoldServe
            last edited by

            Upped states tables to 30 000 and tried aggressive. No help. My states are no where near 30000 and it seems like the ath0 is dropped. I can still ping the router from wan side with no dropped packets.

            1 Reply Last reply Reply Quote 0
            • E
              eri--
              last edited by

              Please post /tmp/rules.debug; pfctl -vvsr; dmesg; what services you are running and athstats output.

              1 Reply Last reply Reply Quote 0
              • G
                GoldServe
                last edited by

                rules.debug: http://www.pastebin.ca/892342

                pfctl -vvsr: http://www.pastebin.ca/892343

                dmesg: http://www.pastebin.ca/892348

                services:
                dnsmasq DNS Forwarder
                Running
                [Restart Service] [Stop Service]
                dhcpd DHCP Service
                Running
                [Restart Service] [Stop Service]
                miniupnpd UPnP Service
                Running

                athstats:
                via:/tmp#  athstats
                1135628 tx management frames
                8274 tx frames discarded prior to association
                1164 tx discarded empty frame
                39 tx failed 'cuz FIFO underrun
                23325 tx failed 'cuz bogus xmit rate
                2389 tx frames with rts enabled
                965 tx frames with 11g protection
                12618 rx failed 'cuz of FIFO overrun
                1122744 rx management frames
                84760 beacon setup failed 'cuz no mbuf
                815798812 beacons transmitted
                307 periodic calibration failures
                1 rate control checks
                1 tx used alternate antenna
                Antenna profile:
                [2] tx  1143860 rx  1150871

                I've sorta fixed by problems now by raising ath tx and rx buffers to:
                hw.ath.txbuf: 2000
                hw.ath.rxbuf: 2000

                Is there an upper limit and are people with wireless cards on pfsense experiencing this problem with high loads?

                Thanks!

                1 Reply Last reply Reply Quote 0
                • E
                  eri--
                  last edited by

                  I would mostly say that you have signal problems or have interference on your channel.

                  1 Reply Last reply Reply Quote 0
                  • G
                    GoldServe
                    last edited by

                    I have very good signal. Running another atheros card on the laptop and on 5.9ghz A band. The two are no further than 20 feet apart.

                    Pretty sure it is the TX underrun error because when I get ping timeouts, I check the athstats from the wan and see that number shoot up. I'm guessing the card then goes into reset and in a few seconds, my connection is made again. Windows never re-associates so I did not loose the connection completely but the card does not send out data for a few seconds.

                    1 Reply Last reply Reply Quote 0
                    • C
                      cybrsrfr
                      last edited by

                      I have a Netgear wg311t that has the Atheros chipset and it also shows great signal works for a short while and then goes up and down. I've duplicated this same issue on a friends pfSense machine with a different wg311t.

                      I have used other Atheros wireless devices with pfSense that work great with no issues. So I think it is a problem specific to the Atheros chipset that the wg311t uses.

                      1 Reply Last reply Reply Quote 0
                      • G
                        GoldServe
                        last edited by

                        It's possible…but you should take a look at the athstats when it does go down. If you see many tx underruns, then you've got the same problem. You've gotta hit it with heavy traffic though.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.