PPPoE reconenction fix - mpd fix ($100)
-
heres another trace from the vr1 interface, changes i made were
set a custom reset period
change service name under WAN to WE1 (no idea if this makes a difference or no)purpose of trace was to test if pppoe reconnects fine on a custom reset period set in pfsense and result was it did reconnect fine
command run to trace was
tcpdump -i vr1 -s 0 -vvvX -w /tmp/pcap.capturetrace file
http://www.mediafire.com/?osabfdk0189hgai -
the problem in running the same above command and taking a trace is that once the isp modem is switched off or reset, all activity stops on vr1 and so no more packets r traced, even closing the trace and rerunning the command doesnt yield any more packets at all so which could mean a trace actually stops if the port is switched off or reset and even once the modem has come back online, rerunning the trace shows no packets what so everso this could be the actual issue as the web gui keeps showing reconnection attempts but the trace doesnt yield any packet at all so it could mean if the vr1 port is prematurely brought down then even if it gets up, mpd isnt able to reuse it at all coz in the previous trace it showed that a custom reset brings the interface down and up again successfully.
-
I am looking at the trace with 138.943 Bytes (the smaller one of the last 2 traces).
What I can see from that is that the PPPoE Server/BRAS is NOT aware that the session has been brought down. The modem has been reset at frame 528 i guess, where we can see a PADT. We have no LCP Termination request and Termination Ack, probably because the modem is booting at that time. Also that PADT probably never reached the PPPoE Server on the other side.
So while the Firewall/PC is already aware that the session is down and trying to reestablish the PPPoE connection, the BRAS is not, having a zombie pppoe session active. We can see that in frame 588 and 649, because the BRAS is still sending LCP echos for the old session.
So what? The BRAS or intermediate access-switches probably have some DDOS countermeasures configured, which allow only 1 session per mac (or modem or port or whatever), and because the session is still up, flood protection hoped in and the PADR is ignored.
It should be only a matter of time until the BRAS considers the zombie session to be stale, discard that session and let you reestablish the connection.
Did you try to let the Firewall run for 10 - 15 minutes and see if it changed anything?
Are you sure that you fix this by downgrading your FW, or may it be that your provider has changed something on their network the same period you upgraded your FW?
You told us that you have the same problem when terminating the PPPoE connection on the PC. Are you using mpd to establish the connection on your PC?What we need is a complete trace that includes everything in a larger timespan.
You should put a switch/hub between the firewall and the modem, this way the port on the firewall stays up, and the capture is still running. Also we need at least 10 minutes of connection uptime, only then reset the modem, and let the capture running for another 10 - 15 minutes.
Only that way we can understand the whole context of the Control Session packets.
If you want to remove your own internet traffic from the dump, you can open the capture in wireshark, set the display filter to:```
pppoed || (pppoes && ppp.protocol == 0xc021) -
I believe I am experiencing the same issue*. If Ermal** is willing, I can provide him a live system for troubleshooting/testing with ssh and/or web access through a separate WAN. This is a production system, so there are some conditions:
1. I must be notified and present at all times when you are logged in.
2. The main WAN can only be down from 4-6 am Mountain time. If I have to reflash the system and restore my config, that's fine, but it will be happening by 6 am.PM me if you want to take me up on it.
*http://redmine.pfsense.org/issues/1943
**If somebody besides Ermal wants to take this on then you'll need somebody that I trust to vouch for you. -
i guess its clear from luky37 's post that the pppoe server doesnt allow more than one session which i think is true in my region coz the isp doesnt allow the same account to reconnect from an alternate location as well.
ill try connecting the wan through a switch and getting a trace, cant gaurantee it will be for 15mins though but atleast it will capture the events in the link reset situation.
would it be possible to make pfsense remember the last settings during connection negotiation and once the link is up, send a connection termination packet then restart the whole pppoe protocol?
-
here is the trace with a switch between pf and the isp modem to keep the trace running in spite of modem being reset
http://www.mediafire.com/?9m8g65a5v975qc1it seems true, the modem remembers the last active connection and so doesn't allow a new connection but a way to over come this was, what i noticed was, once i unplug the cable and then replug it, this in turn tells the modem a link down and then it tells the pppoe server to erase previous connection info due to link down and then once i plug in the cable, pfsense will renegotiate a successful connection (this unplug and replug isn't part of the trace because i cant keep the connection down too long as its used in production)
-
Are you saying that the modem, in bridge mode, is remembering an active session and preventing a new one? If that's the case, why does rebooting pfsense re-establish the connection? There is a switch between my modems and pfsense, so the modem will never see the link as down.
-
and to answer previous questions:
-
i have left the firewall on for more than 60mins in the past but it never reconnects, reason is because no switch present in between so once the port is in a half dead state, no packets reach between firewall and the modem because its never able to recover from this state which was proved by trace also without the switch. bringing down the port to off and then on or unplugging the cable and replugging it would get it back in a working state such that packets would be able to go from firewall to modem.
-
it could be the isp might have changed something but to my knowledge there hasnt been any firmware upgrade till date of the modem or on any side and till almost a 2 months back i had tried the older pf with old mpd and it worked fine
-
on the pc i never used mpd, i just used the windows built in RAS and dial up connection thing
would it be possible to implement this:
once link goes down and pfsense starts to go in endless reconnection loop, try about 5 times and if still failed, make the vr1 on the firewall to down state and then after a few secs to up state and then reattempt, keep doing this till the isp modem has come back alive after reset or reboot after which the vr1 down will at least once trigger a connection end on the pppoe server and then next attempt would get it reconnected again.OR
fix the half dead vr1 port state detection on the vr1 on pfsense
-
-
Are you saying that the modem, in bridge mode, is remembering an active session and preventing a new one? If that's the case, why does rebooting pfsense re-establish the connection? There is a switch between my modems and pfsense, so the modem will never see the link as down.
the port on the isp modem needs to be brought down by unplugging the cable to make it send a link down to its remote pppoe server then when its brought up again with a replug, then pfsense renegotiates a new connection successfully.
the part about how its able to reconnect on pfsense boot i seriously don't understand how that happens now, there definitely is some difference in the way mpd connects on boot and on link loss
-
key to connection on boot and link loss is with the state of the vr1 port etc and if that is studied then we could come to a conclusion of the issue first
-
so you have the very same issue with the RAS client under windows and the test with the old mpd/fw release was 2 months ago. That doesn't sound like a mpd/pfsense issue to me then…
You will not be aware of configurations changes on the Service Provider side.
What happens if you just disconnect and reconnect the ethernet cable after some seconds between pfsense and modem, when the reconnect loop is occuring?
-
im still sure if i configure a old pfsense box it will work just fine.
if i disconnect and reconenct cable when a reconnect loop is occuring during a fresh reboot of pfsense then it conencts fine but if the system was running for a period and then the link was brought down by the modem reset then the reconnect loop that time unplugging cable and replugging dont help provided the cable runs direct and not through a switch
-
if im not the only one with this issue means something is definitely changed in mpd 5 compared to mpd 4 and it cant be a isp fault also coz 2 isp don't make the same mistake.
the mystery still remains on whats different when connecting on boot and reconnecting on link loss, y does mpd connect perfectly fine when rebooting?
-
Have a look at this http://serverfault.com/questions/163811/pfsense-possible-to-traffic-capture-the-actual-wan-port
-
they guy on serverfault talks about a 'faulty ATM switch with MAC 00:90:1a:a0:a1:f4 (Unisphere Solutions)', but:
- an ATM switch has no MAC address, as it has nothing to do with Ethernet
- Unisphere Solutions produces (or produced) BRAS technology, which in this case terminates the PPPoE session
Both the serverfault guy and xbipin's PPPoE sessions are terminated on an Unisphere BRAS, as we can clearly see from the traces. It may be that there is a common software defect on that BRAS.
I still don't believe in a mpd defect, as the same issue is seen on the Windows box.
-
i guess a way around this is if its possible to bring vr1 interface down and up again with software rather than pulling the plug in between reconnect attempts, similar to how we disable an interface in windows and then enable it, this would replicate a plug pulling and a solution atleast.
-
can clarknova test the same using the older pfsense with the older mpd which will prove whose fault it is, mpd, freebsd or BRAS
-
out of desperation, i shut down my alix and re flashed pfsense 1.2.3 an quickly setup the wan and lan connections to see how it behaved, first thing i noticed was as soon as my isp mode was reset, pfsense triggered a vr1 - DOWN and when the modem came up again it triggered vr1 - UP, now this doesn't happen in this new pfsense 2.0, it considers the port to be alive for some reason, bear in mind there is no switch between pfsense and the modem.
as soon as the modem came up and vr1 showed as UP it didnt connect but nor did it show any reconenction attempts so i guess in a hurry i forgot to set that dial on demand stuff, but if i went to interfaces and clicked disconnect and then connect again then it would connect perfectly fine, also if i went to the wan page and clicked save then also it would reconnect fine so i guess it works fine compared to newer mpd.
cable unplug and replug also works fine, except due to dial on demand mis config, i had to go to interfaces and click disconnect or if it disconnected automatically then would just need to click connect and wan would be up again.
i tried multiple scenarios and all i can say i didnt have to reboot pfsense to make it reconnect which usually i have to do in pfsense 2 with the newer mpd.
so i guess that vr1 down and vr1 up during modem reset or switch off and on was key to this, i usually try reset only coz chances of a switch off and on are almost nil, in worst case it would reset itself or if there was a firmware update then it would reboot which was something like a reset only but triggered from the isp end automatically and thats when pfsense needs to be able to reconnect again once modem is up.
i wasnt able to take traces or even put a switch on the wan side because i had to get the device up asap.
-
so you have the very same issue with the RAS client under windows and the test with the old mpd/fw release was 2 months ago. That doesn't sound like a mpd/pfsense issue to me then…
You will not be aware of configurations changes on the Service Provider side.
What happens if you just disconnect and reconnect the ethernet cable after some seconds between pfsense and modem, when the reconnect loop is occuring?
its true it happens with the RAS client but if i unplug and replug or simply disable and enable the interface and reattempt it would connect just fine
-
My pppoe0 interface is mlppp with 6 vlan members. After disconnecting then reconnecting the cable to the interface that is parent to these vlans, I disabled all 6 parent vlans in the GUI, hit Apply, then re-enabled all 6 parent vlans in the GUI and hit Apply. pppoe0 did not reconnect, and I had to reboot pfsense to get the WAN back.