Weird setup issue
-
I've just set up a satellite office at work and bought a Netgate M1n1wall pre-loaded with pfSense to ship there. I've been using pfSense both on Netgates as well as virtualized for a few years now but this is the first time that I've seen this.
I configured the unit before shipping it out. I gave it a static IP from my fiber provider, configured my rules/NAT/etc, and build my IPSEC tunnel. Everything worked just fine. Just before I shipped it out, I changed the static IP and gateway to those given to me by the ISP for the remote site. The remote site has Ethernet over Copper supplied by Cox Communications in California. Now that the unit has arrived, one of the staff members down there connected the Netgate to the EoC equipment and the LAN. They were able to access the WebGUI and confirm that the WAN port is up but are unable to ping the ISP gateway address via the WAN from the WebGUI.
I walked the user through disconnecting the Netgate, reconfiguring a local desktop with the same static information supplied by the ISP, and testing. The desktop connected just fine using all of the same settings. Putting the Netgate back in place shows the same thing as before. I get connectivity indication that the WAN port is "up" and physically connected to the EoC equipment, but I have no Internet access. I've walked the user through confirming the IP, Gateway, and CIDR mask over and over and everything looks just fine.
I don't understand what I'm missing. Can anyone offer any suggestions as to what is different now that this unit is connected at my remote site as compared to when it was here at my site?
-
Is the gateway at the remote site able to respond to pings? Some are not and in that case pfSense will see the connection as down shortly after it's connected. You would see that reported by apinger in the logs though. The solution to that is to choose a different monitor IP.
IS the remote EoC equipment locked to a MAC address cable modem style? Perhaps a simple power cycle of that could help. Maybe the ISP monitors IPs against MACs and won't talk to one that changes without some sort of authorisation. That doesn't explain why the desktop machine was able to connect without a problem though. What hardware is the m1n1wall replacing?
Steve
-
Is the gateway at the remote site able to respond to pings?
Yes, I'm able to ping the gateway for the remote site from my office. I was also able to ping that gateway once I had the user put his Win7 PC in place of pfSense.
IS the remote EoC equipment locked to a MAC address cable modem style? Perhaps a simple power cycle of that could help. Maybe the ISP monitors IPs against MACs and won't talk to one that changes without some sort of authorisation. That doesn't explain why the desktop machine was able to connect without a problem though. What hardware is the m1n1wall replacing?
It's my understanding that the EoC setup isn't locked to a MAC. When pfSense didn't connect I then swapped in the PC before putting the Netgate back in place. I would expect the Netgate to have been the first MAC that the EoC equipment saw, unless there is some reason that it can't see pfSense, and then the desktop wouldn't have worked.
This is a new installation so I don't have any older hardware to fall back on. My EoC has only been in a couple of days and my gear just arrived at the new location yesterday.
-
Is the gateway outside the WAN subnet?
That is something that has caught out a few people. It's a configuration that should never exist because it breaks the rules. Since it's outside the IP specification FreeBSD doesn't support it but Windows has some sort of cludge that allows it to work.It's beyond my memory but since it's not the first time there may be a workaround if that's the case.
Steve
E.g.: http://forum.pfsense.org/index.php?topic=37301.0
-
Nope, the gateway is part of our /29 block.
IP Block: XXX.XXX.XXX.168/29
Gateway: XXX.XXX.XXX.169
Subnet: 255.255.255.248
First Usable: XXX.XXX.XXX.170
Last Usable: XXX.XXX.XXX.174 -
Hmm, Do you have access to the logs? Anything in them? :-\
Steve
-
I'm waiting for the onsite guy to turn up for work so I can have another look. With the time difference between here and California, I won't have access to my remote hands for another hour. :)
It's frustrating, having set it all up and tested it before shipping I anticipated an easy peasy setup without needing the onsite guy to do anything other than plug it together for me. I've done this numerous times without any issue. I can't for the life of me figure out what's different this time.
-
I can't say as I can see anything out of the ordinary in the logs. Below are the log entries from the time my local guy got onsite this morning. I had him restart the ISP equipment first thing but that didn't get us anywhere.
Sep 4 08:38:13 check_reload_status: Linkup starting vr1
Sep 4 08:38:13 kernel: vr1: link state changed to DOWN
Sep 4 08:38:16 php: :Hotplug event detected for wan but ignoring since interface is configured with static IP
Sep 4 08:41:52 check_reload_status: Linkup starting vr1
Sep 4 08:41:52 kernel: vr1: link state changed to UP
Sep 4 08:41:55 php: :Hotplug event detected for wan but ignoring since interface is configured with static IP
Sep 4 08:41:56 check_reload_status: rc.newwanip starting vr1
Sep 4 08:42:00 php: :rc.newwanip: Informational is starting vr1.
Sep 4 08:42:00 php: :rc.newwanip: on (IP address: {MyStaticIPHere}) (interface: wan) (real interface: vr1)
Sep 4 08:42:00 php: ROUTING: setting default route to MyISPGatewayIP
Sep 4 08:42:00 apinger: Exiting on signal 15
Sep 4 08:42:01 apinger: Starting Alarm Pinger, apinger(48687)
Sep 4 08:42:01 check_reload_status: Reloading filter
Sep 4 08:42:11 apinger: ALARM: COXGW (MyISPGatewayIP) down
Sep 4 08:42:21 check_reload_status: Reloading filter
Sep 4 08:49:18 dnsmasq[23098]: reading /etc/resolv.conf
Sep 4 08:49:18 dnsmasq[23098]: using nameserver MyDNS#53
Sep 4 08:49:18 dnsmasq[23098]: using nameserver MyISPDNS1
Sep 4 08:49:18 dnsmasq[23098]: using nameserver MyISPDNS1
Sep 4 09:01:18 php: /index.php: Successful webConfigurator login for user 'ANTech' from 10.2.100.2 -
Apinger is showing the gateway as down. Even if you can ping it remotely I would try changing the monitor IP to, say, 8.8.8.8. Can your man ping the gateway from the Win7 box?
Steve
-
We were unable to ping both the ISP gateway and Google's DNS from the webConfigurator.
When the Win7 box was put in place of pfSense, both of the above were pingable.
-
I assume you are using 2.0.3 32bit? Some people have had some odd IPv6 routing issues recently with 2.1RC.
Other than that I out of suggestions. :( Other than contacting Negate who may have some insight specific to your ISP. I'm the wrong side of the pond for that. ;)
Steve
-
Perhaps this is something more fundamental. The Netgate box has be proven when you configured it initially (assuming it wasn't damaged in transit). The EoC box has been proven by connecting the Win7 box.
The interface reports being UP but is it really? So far you have seen no traffic at all from vr1, yes?
This is the sort of thing that can be caused by some rare hardware mismatch. Is the Netgate box connecting 100Mbps full duplex? Can you try putting a switch between the Netgate box and EoC equipment?Steve
-
I've asked my onsite guy to pull everything out of the site switch, connect it in between the Netgate and Cox and then configure a static in his PC and plug it into the LAN port of the Netgate to check on that.
I've also purchased support and opened a ticket with the pfSense gang. I suspect one way or another this will be resolved shortly. Support is so rarely required with pfSense that I generally only purchase it as a last resort. Sounds like we're there. ;)
-
You could have your man take a laptop and broadcast wifi from his phone and with the laptop ethernet port connect to pfsense and wifi to phone.
You could then use his laptop to see whats up with pfsense via teamviewer yourself.
I suspect you are dealing with a fat fingered typo in settings or something very simple like that.
(I did this for two of the forum members recently - Fat finger their settings I mean… :P )
-
Unfortunately we don't have any laptops onsite and my local guy hasn't been in the U.S. long enough to have re-purchased the basic amenities for himself. :)
He's actually surprisingly good. There's no way I could have achieved this with any of our other warehouse managers. If I was going to have an issue like this, I'm glad that it happened with this site. I expected a typo as well but he helped rule that out very quickly (several times…just to be sure). The only things that were changed from when it was in a working state were the static IP and the gateway address really so it was a short list of things to confirm.
I sent a copy of my config as well as some basic command line results to pfSense support yesterday and they confirmed that all was well. So far, they appear to be as stumped as I've been but are narrowing the options down.
We pulled everything apart yesterday and put a switch in between pfSense and the ISP equipment but that didn't do anything either.
-
I've set one of these up with on cable before. What version of pfsense are you running and is it 64 or 32 bit?
-
I've used pfSense quite successfully on Cable, DSL, and Fiber in the past. While Ethernet over Copper isn't hugely different, it's definitely not a cable line. The ISP's hardware does still present me with a modem-like device that sits behind the EoC bonding device.
I'm currently running pfSense 2.03. While I didn't think to look before I shipped it out, the ALIX board has an AMD Geode LX800 which I believe should mean that it's likely running the amd64 NanoBSD build but I don't know that for certain. I'll have the local guy confirm that when he gets on site today. I have a number of ALIX boards identical to this one running on both DSL and Cable installs but this is my first on EoC. Given how easily a PC connects with a static IP, I can't imagine the differences are significant…and yet here we are. :) If I hadn't already tested for it so many times already, I would feel inclined to say that it's a bad cable based upon the behaviour that I'm seeing.
-
Hmmmm - Maybe try backing up your install configuration and then installing the same box with the 32 bit version of 2.1?
See if results are better. Once, when 2.03 didn't work for me with lots of IPs 2.1 did.
-
That's certainly an option but I'll probably leave it as a last resort. Given that I have several implementations of 2.03 in the field that have been trouble free and given that this one is such a basic setup (and so far away) I'm hoping to avoid deploying an RC in production unless I have to.
-
I agree its very strange - I can only imagine a few reasons for this to happen.
There was a router of some sort previously connected and you should clone its MAC to get an IP. (This is some BS I encounter occasionally)
There is something different about this set up than your others.
I've actually not had much luck with 2.03 and more than 4 IPs. (Others maybe have. My experience with it is limited)Lastly, maybe its not a router problem at all. Maybe the ISP made an error either in what it allocated you or the info they provided you?
-
I don't thing the Geode supports 64bit. The amd64 architecture is so named because AMD introduced a 64 bit cpu before Intel did.
The switch between the EoC and Netgate box didn't help but did it change anything?
What does 'ifconfig vr1' report? If you don't have console access you can run that from Diagnostics: Command Prompt:I bet the support guys will solve this, be interesting to know what we overlooked here.
@kejianshi I thought most of those things but all of that is ruled out by the fact that the Win7 machine connects no problems with the same details. That's what made me think subnet.
Steve
-
I'll read back. Did the Win7 machine grab an IP with DHCP or was that connection manually configured?
-
The setup is a new installation so I'm pretty sure we're not dealing with a MAC issue. Especially given that we've repeatedly connected a Win7 box directly to the EoC and been able to test successfully.
I've had good luck with 2.03 in several different sizes of installation from single IP's up to 10 or 12 IPs with several LANs and QoS. It's just always treated me well.
Since I ordered the NetGate preloaded with pfSense, I feel comfortable that it's loaded correctly. Although I haven't actually checked what architecture this one is loaded with, this is the 5th Netgate that I've ordered in the last year or so and all have been rock solid.
I was suspicious of the ISP having made a mistake as well but using the same settings in that Win7 box proved to work fine so I have to assume that my connection is capable of working. In addition the onsite tech that the ISP sent out had no problems with it during setup.
I'm still waiting for my onsite guy to arrive at work this morning. I'll run some of the new tests that pfSense Support has requested when he gets in.
Here are the ifconfig results that I sent to pfSense Support yesterday. They confirmed that they were happy with what they saw there. I have supplied them with the information that the ISP supplied to me as well to ensure that they're looking at the same data that I am.
$ ifconfig -a
vr0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
options=8280b <rxcsum,txcsum,vlan_mtu,wol_ucast,wol_magic,linkstate>ether 00:0d:b9:2f:8d:a8
inet 10.2.0.1 netmask 0xffff0000 broadcast 10.2.255.255
inet6 fe80::20d:b9ff:fe2f:8da8%vr0 prefixlen 64 scopeid 0x1
nd6 options=43 <performnud,accept_rtadv>media: Ethernet autoselect (100baseTX <full-duplex>)
status: active
vr1: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
options=8280b <rxcsum,txcsum,vlan_mtu,wol_ucast,wol_magic,linkstate>ether {myMAC}
inet {ISPSuppliedIP} netmask 0xfffffff8 broadcast {BROADCASTIP}
inet6 {ISPSuppliedInfo} prefixlen 64 scopeid 0x2
nd6 options=43 <performnud,accept_rtadv>media: Ethernet autoselect (100baseTX <full-duplex>)
status: active
vr2: flags=8802 <broadcast,simplex,multicast>metric 0 mtu 1500
options=8280b <rxcsum,txcsum,vlan_mtu,wol_ucast,wol_magic,linkstate>ether 00:0d:b9:2f:8d:aa
media: Ethernet autoselect (none)
status: no carrier
lo0: flags=8049 <up,loopback,running,multicast>metric 0 mtu 16384
options=3 <rxcsum,txcsum>inet 127.0.0.1 netmask 0xff000000
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
nd6 options=43 <performnud,accept_rtadv>pfsync0: flags=0<> metric 0 mtu 1460
syncpeer: 224.0.0.240 maxupd: 128 syncok: 1
enc0: flags=41 <up,running>metric 0 mtu 1536
pflog0: flags=100 <promisc>metric 0 mtu 33200</promisc></up,running></performnud,accept_rtadv></rxcsum,txcsum></up,loopback,running,multicast></rxcsum,txcsum,vlan_mtu,wol_ucast,wol_magic,linkstate></broadcast,simplex,multicast></full-duplex></performnud,accept_rtadv></rxcsum,txcsum,vlan_mtu,wol_ucast,wol_magic,linkstate></up,broadcast,running,simplex,multicast></full-duplex></performnud,accept_rtadv></rxcsum,txcsum,vlan_mtu,wol_ucast,wol_magic,linkstate></up,broadcast,running,simplex,multicast> -
I'll read back. Did the Win7 machine grab an IP with DHCP or was that connection manually configured?
The connection requires static IP's so the Win7 machine was manually configured with the ISP supplied info as it was entered into pfSense.
-
I'd wipe the box, reinstall and reload configuration and try again - assuming this hasn't also been tried.
But yeah - Seems like everyone knows exactly what they are doing.
-
Ethernet autoselect (100baseTX <full-duplex>)
Its not Gigabit huh?
Its probably nothing, but I'm sure the laptop was auto-negotiating a gigabit connection with the modem.
You already tried inserting a VLAN switch in between the modem and the pfsense and using 2 untagged VLAN ports?
1 plugged into the modem and 1 into the pfsense?
Its unlikely, but I've seen in the past where some gigibit equipment didn't play well with some 100base equipment (I've seen it here only once)
Anyway - I'm sure the super expert paid support guys will fix it. They are pretty good.</full-duplex>
-
Nope the Netgate m1n1wall is an Alix board with vr(4) 10/100 NICs. Otherwise auto-negotiating to 1000Mbps but the cable not being up to it would high on my list of suspects.
Definitely check that it's negotiating the speed/duplex correctly with the EoC box, ifconfig or Status: Interfaces: will show that.
I have seen some odd connection issues where the negotiation fails or the interface refuses to come up unless the remote end is already up. That can be a problem with two boxes directly connected which is why I suggested the switch.Is the Win7 box using a Gigabit NIC?
Steve
-
Given that the Netgate and the modem were both willing to autonegotiate I'm certain that they agreed on an acceptable speed. I've run into fiber media convertors that insist upon Gigabit, but that's because they won't autonegotiate. I would be impressed if that modem was Gigabit capable given that EoC's capabilities end around 1/10th of that. The ALIX board definitely isn't Gigabit.
Yeah, I've plunked a switch inbetween the two but that didn't solve it either. (thankfully!)
Yes, the Win7 box is Gigabit capable.
-
OK - This is one for the experts I guess.
I do have a small stash of this stuff. If you need, I can FEDEX overnight some to you. For a price.
http://www.youtube.com/watch?v=3nbEeU2dRBg
-
I've set one of these up with on cable before. What version of pfsense are you running and is it 64 or 32 bit?
I just got word from my guy onsite, that Netgate is running 32bit.
Status: dashboard - version-
2.0.3-release (i386)
Built on fri apr 12 10:22; 18 EDT 2013
FreeBSD 8.1-release-p13 -
I'd try either a reload of the firmware or fresh install. (Often fixes things where you think it wouldn't)
-
It turned out that the gear supplied by my ISP was tagging all packets send inward to my routers' WAN port but wouldn't accept packets tagged with the same VLAN ID. Several calls to my ISP got me through to someone who knew what I was talking about and fixed it in just a few seconds.
I'm ashamed to say that I didn't catch that during my troubleshooting and it was the superexperts at pfSense support that ultimately found the problem.
-
ISP equipment problems?
(I'd have never guessed) ;D
-
Hmm, yet the Win7 machine had no problems. :-\
Was it revealed by a packet capture on WAN?
Ah well, glad you got it sorted.
Steve
-
Yep. We ran a packet capture on the WAN and found the VLAN ID on the inbound packets. Thinking we were clever, we tried setting the WAN port to the same VLAN ID and then nothing returned at all. Even the ISP tech was stumped as to how that hardware got set up and sent out that way.
-
I wish I knew enough to install this stuff for a living… Sounds like fun. :D
I hope you got paid by the hour?