PPPoE reconenction fix - mpd fix ($100)
-
Sorry, I haven't had a lot of time to look into this.
I have looked at your packet captures. Unfortunately they don't show the PPP control packets. Can you provide another set with the capture interface being the physical device used for the PPP link rather than the PPP link? For example, if pppoe0 uses physical interface vr1 I would like the capture on interface vr1 rather than pppoe0.
capture1 initially shows pings and replies then just pings (no replies). No ppp control packets.
capture2 just looks like "normal" traffic: no PPP control packets shown and no obvious sign of a "down" link. (I expected to see repeated PPP initalisation attempts.) Capture2 timestamps go from 18:57:31.x to 18:57:42.x, about 11 seconds. Your previously posted log of a failed reinitialisation covers a time span of nearly two minutes. Are you sure you provided a capture taken when mpd was reporting repeated initialisation attempts?
Your previous posts have repeated mentioned a modem reset. What is a modem reset and (since it appears to cause so much grief) why do you do it?
-
i just ran the command given to me for both pppoe0 and vr1 and when i said mpd going in reconenct loop, that actually i see from the web gui but i also fail to understand y no pppoe control packets r seen on the trace related to vr1, i hope its the right command im running.
the issue is just not modem reset but suppose if its goes down or drops an active connection or it has a firmware update and reboots or even in pfsense i set periodic reset for pppoe, in all these situations it never reconnects untill i reboot pfsense so.
ill again run a trace for vr1 and send it
-
here is another capture, its the same, once the isp modem is reset the packet count never goes ahead inspite of mpd retrying to conect, correct me if im wrong, below is the command i run
tcpdump -i vr1 -s 0 -vvvX -w /tmp/pcap.capture
capture can be downloaded from
http://www.mediafire.com/?41s858ze4bri7c9 -
correct me if im wrong, below is the command i run
tcpdump -i vr1 -s 0 -vvvX -w /tmp/pcap.capture
Looks fine to me though -vvvX seems to apply just to decoding and displaying captured data.
here is another capture, its the same, once the isp modem is reset the packet count never goes ahead inspite of mpd retrying to conect
Capture ran from 16:46:56.457199 to 16:47:16.647419 and shows pings and responses for the whole duration (about 20 seconds). Perhaps you ran the capture for much longer than 20 seconds and there was no traffic after the last entry. This doesn't seem consistent with mpd's claim to be making reconnection attempts.
-
the capture was running for more than 2-3mins and when i say no traffic after last entry, i meant once i reset the modem, the last captured packet was before that and after that inspite of mpd showing reconenct attempts under logs, no packets control or any other appear in the trace so mayb mpd just keeps saying its reconnecting but no actual attempt is made or mayb the capture doesnt take the ppoe control packets in the trace at all.
anyways ill run the trace again for about 5mins now but im sure the capture wont have any packets in them related to reconnect attempts
-
i ran the trace and its the same, it takes about 20secs for me to start the trace and then reset the modem and once done, for the next 10mins also no new packets in the capture
-
for al alternative approach, i took the wan wire and plugged it into my windows PC and started wireshark and then first created a pppoe connection and dialed it and all fine and then reset the modem and windows too wont redial at all till the LAN card is disabled and enabled again and then the same connection redialed, i hope this trace will help as it lists pppoe control packets as well
http://www.mediafire.com/?sp6n1weermra3y8
-
what i noticed from the trace was after modem reset the PC sends a PADI and the server replies with PADO then PC sends PADR but then the server for some reason doesnt send the PADS and to only make this work, the LAN card needs to be disabled and reenabled so guess one solution between reconnection request sent by mpd would be to actually disable the port first then enable it and then dialout, no idea how but the older mpd v4 used to and still does work like a charm but for that i would have to revert back to pfsense 1.2.3
-
any suggestions?
-
Ermal is probably away, since according to this, both he and cmb will be doing a presentation at EuroBSDCon11 on 6-Oct-2011.
Have you contacted mpd's developers at http://sourceforge.net/projects/mpd/ about this?
-
no i havent checked with the mpd developers, it would be like repeating a year long story from start :)
-
no progress yet
-
Can you please re-upload the last 2 capture files, they have been removed from mediafire.
-
you can check pptp connection via icmp and if ping fails x times, restart your pptp connection.
It could be at cron or included in the patch that emal sent in this tread.I have a similar situation with cisco vpn and this check solved my problem.
-
i would like to mention again, restarting etc doesnt solve it coz even if i manually stop and start it, it wont connect, its something in the pppoe protocol and for some reason the modem doesnt send back PADS which its supposed to
Insert Quote
what i noticed from the trace was after modem reset the PC sends a PADI and the server replies with PADO then PC sends PADR but then the server for some reason doesnt send the PADS and to only make this work, the LAN card needs to be disabled and reenabled so guess one solution between reconnection request sent by mpd would be to actually disable the port first then enable it and then dialout, no idea how but the older mpd v4 used to and still does work like a charm but for that i would have to revert back to pfsense 1.2.3 -
here is the last trace link
http://www.mediafire.com/?v3o0wbz4e74cwqq -
heres another trace from the vr1 interface, changes i made were
set a custom reset period
change service name under WAN to WE1 (no idea if this makes a difference or no)purpose of trace was to test if pppoe reconnects fine on a custom reset period set in pfsense and result was it did reconnect fine
command run to trace was
tcpdump -i vr1 -s 0 -vvvX -w /tmp/pcap.capturetrace file
http://www.mediafire.com/?osabfdk0189hgai -
the problem in running the same above command and taking a trace is that once the isp modem is switched off or reset, all activity stops on vr1 and so no more packets r traced, even closing the trace and rerunning the command doesnt yield any more packets at all so which could mean a trace actually stops if the port is switched off or reset and even once the modem has come back online, rerunning the trace shows no packets what so everso this could be the actual issue as the web gui keeps showing reconnection attempts but the trace doesnt yield any packet at all so it could mean if the vr1 port is prematurely brought down then even if it gets up, mpd isnt able to reuse it at all coz in the previous trace it showed that a custom reset brings the interface down and up again successfully.
-
I am looking at the trace with 138.943 Bytes (the smaller one of the last 2 traces).
What I can see from that is that the PPPoE Server/BRAS is NOT aware that the session has been brought down. The modem has been reset at frame 528 i guess, where we can see a PADT. We have no LCP Termination request and Termination Ack, probably because the modem is booting at that time. Also that PADT probably never reached the PPPoE Server on the other side.
So while the Firewall/PC is already aware that the session is down and trying to reestablish the PPPoE connection, the BRAS is not, having a zombie pppoe session active. We can see that in frame 588 and 649, because the BRAS is still sending LCP echos for the old session.
So what? The BRAS or intermediate access-switches probably have some DDOS countermeasures configured, which allow only 1 session per mac (or modem or port or whatever), and because the session is still up, flood protection hoped in and the PADR is ignored.
It should be only a matter of time until the BRAS considers the zombie session to be stale, discard that session and let you reestablish the connection.
Did you try to let the Firewall run for 10 - 15 minutes and see if it changed anything?
Are you sure that you fix this by downgrading your FW, or may it be that your provider has changed something on their network the same period you upgraded your FW?
You told us that you have the same problem when terminating the PPPoE connection on the PC. Are you using mpd to establish the connection on your PC?What we need is a complete trace that includes everything in a larger timespan.
You should put a switch/hub between the firewall and the modem, this way the port on the firewall stays up, and the capture is still running. Also we need at least 10 minutes of connection uptime, only then reset the modem, and let the capture running for another 10 - 15 minutes.
Only that way we can understand the whole context of the Control Session packets.
If you want to remove your own internet traffic from the dump, you can open the capture in wireshark, set the display filter to:```
pppoed || (pppoes && ppp.protocol == 0xc021) -
I believe I am experiencing the same issue*. If Ermal** is willing, I can provide him a live system for troubleshooting/testing with ssh and/or web access through a separate WAN. This is a production system, so there are some conditions:
1. I must be notified and present at all times when you are logged in.
2. The main WAN can only be down from 4-6 am Mountain time. If I have to reflash the system and restore my config, that's fine, but it will be happening by 6 am.PM me if you want to take me up on it.
*http://redmine.pfsense.org/issues/1943
**If somebody besides Ermal wants to take this on then you'll need somebody that I trust to vouch for you.