Upgrade from 2.2 to 2.4.4 was close but no cigar.



  • Hi All, first and foremost let me say that I'm a big fan of pfSense, used it now for last 4 years and love it. I'm fairly well versed in most of it and not afraid to get down and dirty with it. So here's the bother I'm having.
    I had 2.2 something working well on a Dell R610 for last 2 years or so with another R610 exact same spec waiting in the wings as a backup box should the live one fail. I tested the backup box a few times and it was fine as a full blown stand in. However the time came to upgrade to the latest version so I flattened the backup box and loaded up 2.4.4 from scratch and restored the backup config from the 2.2 box.
    Everything loaded up fine, all interfaces came up as expected but when put it in place I immediately find we are receiving data in but nothing going out. The WAN is a PPPoE connection which hooked up fine to the ISP and looked good.
    After much head scratching and testing I found the only way to get things working was to change the default gateway to the WAN PPPoE rather than the WAN Interface itself. This kicked everything back into action and I was happy for an hour or so until I noticed our 3CX Voip phones which live on one of the Lans where not ringing and generally messing about. As the phones where perfectly fine before the upgrade I figure somethings amiss with the pfsense config. The Lan that the phones are on also has PC's on it which continue to work fine. I guess my first instinct is troubleshoot why the default gateway had to be changed in order to get traffic flowing out again so this is where I wish to start. I would normally tough in and sort this myself but nowadays I don't have the luxury of time to spend days figuring it out :-( so I'm kinda hoping and would be very grateful if someone can throw me some pointers!


  • Netgate Administrator

    There were a number of changes to the default gateway behaviour going to 2.4.4. A new feature allows you set a gateway group there so traffic from the firewall itself can make use of failover gateways usefully. If you didn't have a gateway set as default previously it will use 'automatic' in 2.4.4 and unfortunately that can choose the wrong gateway in some cases. A number of bugs have been opened and fixed for that 2.4.5 snapshot should behave far better but just setting a default gateway as you did should also correct it.
    That may be completely unrelated to the VoIP issue unless somehow they are connecting out over the wrong WAN or something similar.
    Is it just phones behind the firewall and an external PBX?
    If they are registering correctly and you can call out and get audio in both directions then the only cause there would be they are registering the wrong public IP for incoming SIP traffic somehow.

    Steve



  • Thanks for the reply Stephen. I think you are correct in thinking the gateway issue is nothing to do with the voip issues. Just to clarify the gateway situation: On the 2.2 box I had 2 gateways
    Gateway 1: Name- WANGW, Interface- WAN, Gateway- <external ip>, Monitor IP- <external ip> Description- ISP Gateway
    Gateway 2: Name- WAN_PPPOE, Interface- WAN, Gateway- <isp gateway ip >, Monitor IP- <isp gateway ip> Description- Interface WAN_PPPOE Gateway

    Gateway 1 was set as default and everything was hunky dory

    Then put 2.4.4 box in with same config but didn't allow any traffic out. So set gateway 2 as default and everything started working again (except phones)

    If I try to set gateway 1 back as default, same problem - no outbound traffic. I'm not hell bent on trying to figure why this is the case as it may not be anything to do with the voip phone issue and everything else seems to work fine, it just struck me as the only thing I changed from the original working config from the 2.2 box.

    I think the voip issue could be related to an IP registration issue like you say or port re-writing. My problem is that logic seems to have replaced with randomness in solving this because I had one phone out of 6 which works perfectly fine and continues to work fine. All other phones register to the external 3CX pbx and can call out with 2-way audio. I actually factory reset the reception phone today and reprovisioned it and now that has started working as it should. I did that to another non-ringing phone and had no effect or improvement.
    As you can imagine with such random behavoir it's difficult to narrow down the cause, I think with some deeper analysis of the 3CX verbose activity log I should be able to spot the issue, just takes loads of time which i don't have in abundance as I wear many hats during the working day!


  • Netgate Administrator

    Ah, well if the phones normally connect via Gateway 1 is there a chance they are still sending that IP? Many VoIP setups can see the actual source and ignore bad values like that but not all by any means.

    Is WAN 1 DHCP? Do you have any advanced options set there at all? It may be pulling a bad MTU value from the provider that pfSense previously ignored. That changed in 2.4.4. It should not ignore those unless told to.
    Does the gateway monitor show up? Can you ping out of that WAN? Pings are small and will work with a bad MTU.
    https://redmine.pfsense.org/issues/8507

    Steve



  • Do you even have a multi WAN setup, and if not where does the Gateway 1 come from.



  • Ok so it appears to have been an issue with our Zyxel GS1500 PoE switch. I took my phone home with me and tested it from there. It worked perfectly. Brought it back to the office this morning and it just started working. Maybe it just wanted a change of scenery or something ;-) but anyhow it's working now. My colleagues phone on the desk next to me (we're all plugged into the same PoE switch) still wouldn't play ball however. So I unplugged it, walked over to another desk and plugged it back in. Low and behold it starts working. Unplugged it and went back to his desk and plugged back in - doesn't work. Walked over to the PoE switch and moved his connection to a different port on the same switch which got him up and running. Repeated procedure for other phones (All Yealink T46G's) and now all is back to normal.
    So something in that managed switch (which we don't actually manage, it was left as was out-of-the box) was interfering with traffic somehow. Incidentally there are no PC's on that PoE switch only phones, that aspect sadly didn't occur to me until now otherwise I might have rebooted it for the sake of it. So thanks for help guys but it turned out to be something basic in the end.