PPPoE reconenction fix - mpd fix ($100)

wallabybob

correct me if im wrong, below is the command i run

tcpdump -i vr1 -s 0 -vvvX -w /tmp/pcap.capture

Looks fine to me though -vvvX seems to apply just to decoding and displaying captured data.

here is another capture, its the same, once the isp modem is reset the packet count never goes ahead inspite of mpd retrying to conect

Capture ran from 16:46:56.457199 to 16:47:16.647419 and shows pings and responses for the whole duration (about 20 seconds). Perhaps you ran the capture for much longer than 20 seconds and there was no traffic after the last entry. This doesn't seem consistent with mpd's claim to be making reconnection attempts.

xbipin

the capture was running for more than 2-3mins and when i say no traffic after last entry, i meant once i reset the modem, the last captured packet was before that and after that inspite of mpd showing reconenct attempts under logs, no packets control or any other appear in the trace so mayb mpd just keeps saying its reconnecting but no actual attempt is made or mayb the capture doesnt take the ppoe control packets in the trace at all.

anyways ill run the trace again for about 5mins now but im sure the capture wont have any packets in them related to reconnect attempts

xbipin

i ran the trace and its the same, it takes about 20secs for me to start the trace and then reset the modem and once done, for the next 10mins also no new packets in the capture

xbipin

for al alternative approach, i took the wan wire and plugged it into my windows PC and started wireshark and then first created a pppoe connection and dialed it and all fine and then reset the modem and windows too wont redial at all till the LAN card is disabled and enabled again and then the same connection redialed, i hope this trace will help as it lists pppoe control packets as well

http://www.mediafire.com/?sp6n1weermra3y8

xbipin

what i noticed from the trace was after modem reset the PC sends a PADI and the server replies with PADO then PC sends PADR but then the server for some reason doesnt send the PADS and to only make this work, the LAN card needs to be disabled and reenabled so guess one solution between reconnection request sent by mpd would be to actually disable the port first then enable it and then dialout, no idea how but the older mpd v4 used to and still does work like a charm but for that i would have to revert back to pfsense 1.2.3

xbipin

any suggestions?

dhatz

Ermal is probably away, since according to this, both he and cmb will be doing a presentation at EuroBSDCon11 on 6-Oct-2011.

Have you contacted mpd's developers at http://sourceforge.net/projects/mpd/ about this?

xbipin

no i havent checked with the mpd developers, it would be like repeating a year long story from start :)

xbipin

no progress yet

luky37

Can you please re-upload the last 2 capture files, they have been removed from mediafire.

marcelloc

you can check pptp connection via icmp and if ping fails x times, restart your pptp connection.
It could be at cron or included in the patch that emal sent in this tread.

I have a similar situation with cisco vpn and this check solved my problem.

xbipin

i would like to mention again, restarting etc doesnt solve it coz even if i manually stop and start it, it wont connect, its something in the pppoe protocol and for some reason the modem doesnt send back PADS which its supposed to

Insert Quote
what i noticed from the trace was after modem reset the PC sends a PADI and the server replies with PADO then PC sends PADR but then the server for some reason doesnt send the PADS and to only make this work, the LAN card needs to be disabled and reenabled so guess one solution between reconnection request sent by mpd would be to actually disable the port first then enable it and then dialout, no idea how but the older mpd v4 used to and still does work like a charm but for that i would have to revert back to pfsense 1.2.3

xbipin

here is the last trace link
http://www.mediafire.com/?v3o0wbz4e74cwqq

xbipin

heres another trace from the vr1 interface, changes i made were
set a custom reset period
change service name under WAN to WE1 (no idea if this makes a difference or no)

purpose of trace was to test if pppoe reconnects fine on a custom reset period set in pfsense and result was it did reconnect fine

command run to trace was
tcpdump -i vr1 -s 0 -vvvX -w /tmp/pcap.capture

trace file
http://www.mediafire.com/?osabfdk0189hgai

xbipin

the problem in running the same above command and taking a trace is that once the isp modem is switched off or reset, all activity stops on vr1 and so no more packets r traced, even closing the trace and rerunning the command doesnt yield any more packets at all so which could mean a trace actually stops if the port is switched off or reset and even once the modem has come back online, rerunning the trace shows no packets what so everso this could be the actual issue as the web gui keeps showing reconnection attempts but the trace doesnt yield any packet at all so it could mean if the vr1 port is prematurely brought down then even if it gets up, mpd isnt able to reuse it at all coz in the previous trace it showed that a custom reset brings the interface down and up again successfully.

luky37

I am looking at the trace with 138.943 Bytes (the smaller one of the last 2 traces).

What I can see from that is that the PPPoE Server/BRAS is NOT aware that the session has been brought down. The modem has been reset at frame 528 i guess, where we can see a PADT. We have no LCP Termination request and Termination Ack, probably because the modem is booting at that time. Also that PADT probably never reached the PPPoE Server on the other side.

So while the Firewall/PC is already aware that the session is down and trying to reestablish the PPPoE connection, the BRAS is not, having a zombie pppoe session active. We can see that in frame 588 and 649, because the BRAS is still sending LCP echos for the old session.

So what? The BRAS or intermediate access-switches probably have some DDOS countermeasures configured, which allow only 1 session per mac (or modem or port or whatever), and because the session is still up, flood protection hoped in and the PADR is ignored.

It should be only a matter of time until the BRAS considers the zombie session to be stale, discard that session and let you reestablish the connection.

Did you try to let the Firewall run for 10 - 15 minutes and see if it changed anything?
Are you sure that you fix this by downgrading your FW, or may it be that your provider has changed something on their network the same period you upgraded your FW?
You told us that you have the same problem when terminating the PPPoE connection on the PC. Are you using mpd to establish the connection on your PC?

What we need is a complete trace that includes everything in a larger timespan.

You should put a switch/hub between the firewall and the modem, this way the port on the firewall stays up, and the capture is still running. Also we need at least 10 minutes of connection uptime, only then reset the modem, and let the capture running for another 10 - 15 minutes.

Only that way we can understand the whole context of the Control Session packets.

If you want to remove your own internet traffic from the dump, you can open the capture in wireshark, set the display filter to:```
pppoed || (pppoes && ppp.protocol == 0xc021)

clarknova

I believe I am experiencing the same issue*. If Ermal** is willing, I can provide him a live system for troubleshooting/testing with ssh and/or web access through a separate WAN. This is a production system, so there are some conditions:

1. I must be notified and present at all times when you are logged in.
2. The main WAN can only be down from 4-6 am Mountain time. If I have to reflash the system and restore my config, that's fine, but it will be happening by 6 am.

PM me if you want to take me up on it.

*http://redmine.pfsense.org/issues/1943
**If somebody besides Ermal wants to take this on then you'll need somebody that I trust to vouch for you.

xbipin

i guess its clear from luky37 's post that the pppoe server doesnt allow more than one session which i think is true in my region coz the isp doesnt allow the same account to reconnect from an alternate location as well.

ill try connecting the wan through a switch and getting a trace, cant gaurantee it will be for 15mins though but atleast it will capture the events in the link reset situation.

would it be possible to make pfsense remember the last settings during connection negotiation and once the link is up, send a connection termination packet then restart the whole pppoe protocol?

xbipin

here is the trace with a switch between pf and the isp modem to keep the trace running in spite of modem being reset
http://www.mediafire.com/?9m8g65a5v975qc1

it seems true, the modem remembers the last active connection and so doesn't allow a new connection but a way to over come this was, what i noticed was, once i unplug the cable and then replug it, this in turn tells the modem a link down and then it tells the pppoe server to erase previous connection info due to link down and then once i plug in the cable, pfsense will renegotiate a successful connection (this unplug and replug isn't part of the trace because i cant keep the connection down too long as its used in production)

clarknova

Are you saying that the modem, in bridge mode, is remembering an active session and preventing a new one? If that's the case, why does rebooting pfsense re-establish the connection? There is a switch between my modems and pfsense, so the modem will never see the link as down.