Some hosts aren't connecting to the internet but others are
-
OK. So VM settings are ok. You dont have to run full blown mem test. That will take days to complete with 64gb of RAM. Just run quick test from pure DOS mode which will test CPU/RAM memory controller. As for storage, windows check only verifies integrity of files and file system, and in rare cases, it will run surface test if windows itself suspects that there are bad sectors on the drive. Anyway, thats not enough. Since you have a nice Samsung nVME, just download and install Samsung Magician SSD tool from Samsung website. That tool will read S.M.A.R.T parameters and perform quick test to see if your drive controller and drive itself are OK.
In your last post, you mentioned high CPU usage and sluggishness. None of such things can be caused by default pfSense installation. It all indicates issue with either hardware or host operating system. Try removing Windows network loopback driver and reinstalling it again.
When we say "bare metal" we refer to pure hardware. No virtualization. This is to rule out any potential missconfiguration or issues with hardware or software. Hyper-V is just virtualization software that Microsoft built into the Windows 10. Its similar to VirtualBox. You can try it, but as @stephenw10 already said, its better to try on bare metal.
-
@mechtheist said in Some hosts aren't connecting to the internet but others are:
it started listing an ip6 entry for the WAN along with the other interfaces
A real IPv6 address? That didn't exist before?
That's the sort of change that could be introduced by your ISP and if not setup right (or completely disabled) can make the experience at a client seem very bad. Though it usually presents as a delay before anything starts loading rather than slow throughput.
Steve
-
@nimrod said in Some hosts aren't connecting to the internet but others are:
OK. So VM settings are ok. You dont have to run full blown mem test. That will take days to complete with 64gb of RAM. Just run quick test from pure DOS mode which will test CPU/RAM memory controller. As for storage, windows check only verifies integrity of files and file system, and in rare cases, it will run surface test if windows itself suspects that there are bad sectors on the drive. Anyway, thats not enough. Since you have a nice Samsung nVME, just download and install Samsung Magician SSD tool from Samsung website. That tool will read S.M.A.R.T parameters and perform quick test to see if your drive controller and drive itself are OK.
In your last post, you mentioned high CPU usage and sluggishness. None of such things can be caused by default pfSense installation. It all indicates issue with either hardware or host operating system. Try removing Windows network loopback driver and reinstalling it again.
When we say "bare metal" we refer to pure hardware. No virtualization. This is to rule out any potential missconfiguration or issues with hardware or software. Hyper-V is just virtualization software that Microsoft built into the Windows 10. Its similar to VirtualBox. You can try it, but as @stephenw10 already said, its better to try on bare metal.
Days for a memtest, that's depressing because my other main PC IS having a subtle thing going on that is likely a memory issue and I've been putting off testing due to how much time it took, it's also 64GB, and I thought maybe it was 8-12 hrs, it's been a really long time since I've done one of those. I'll try a DOS mode test.
I used to have Magician installed but removed it, forgot why, I tried to reinstall it with latest version I just downloaded and got this:
Not helpful, I can't even figure out who's objecting, Windows doesn't mention it in its virus security settings nor does Malwarebytes, but I put in an exception in both of those and still getting it, it must be part of the install program itself [probably likely as it has that 'Setup' bit showing] and not sure what to do about that, maybe there's a log entry, will have to deal with later. On a positive note, I have CrystalDiskInfo that I run from startup and it isn't reporting any errors, so shouldn't be any SMART issues.
I wasn't sure about bare metal, I think I've read about some of the higher end VM tech is much better at bringing your OSes closer to 'bare metal' so I thought maybe that's what you meant. I have like 4 or 5 older PCs I might drag back out and try. There is an issue I will mention in next post in answer to stephenw10.Thanks again for your efforts!
-
@stephenw10 said in Some hosts aren't connecting to the internet but others are:
@mechtheist said in Some hosts aren't connecting to the internet but others are:
it started listing an ip6 entry for the WAN along with the other interfaces
A real IPv6 address? That didn't exist before?
That's the sort of change that could be introduced by your ISP and if not setup right (or completely disabled) can make the experience at a client seem very bad. Though it usually presents as a delay before anything starts loading rather than slow throughput.
Steve
Not sure what 'real' ip6 implies, I should have done a screen shot. It was there only when I was first dicking with the new pfsense install and didn't have the WAN connected, so it can't be an ISP thing.
I had an idea that made me want to facepalm, maybe it's the NIC after all, I'm using a Intel
PRO/1000 PT Quad Port Low Profile Server Adapter, it uses a 82571EB Gigabit controller, I searched the forum for any info and found one of your posts from just a couple of months ago here:
PCIe 1 vs 2 Some virtualisation support in the 82580. But 82571 is an em(4) NIC but 82580 is an igb(4) muti-queue NIC. If you have multiple CPU cores that is going to load them far more efficiently.
There is non-low profile version of this NIC that only supports PCIe1 but the LowProfile version is supposed to have the ability to work with PCIe2 through some kind of addon chip or emulation or ??? Could that be the problem? With the PC connected to the LAN through the quad NIC it can do 100Mbps just fine. That's from the internet with Ookla speedtest. I ran a local test and got this:
and that's while using RDP to run the test so it had that traffic to compete with. The FreeBSD manual page for the em driver doesn't actually list the lowprofile version, it has a LOT of the similar cards though. Mine is the 'PT' variety, not sure what the difference is with the PT, MT, PF, MF etc versions. Does any of that mean anything to you and is it a possible reason for what I'm seeing, could the low profile kludge be the culprit? I appreciate all the help, thanx. -
@mechtheist said in Some hosts aren't connecting to the internet but others are:
Not sure what 'real' ip6 implies
I mean a globally routable public IP as opposed to a link-local IP based in the MAC address which every interface has.
An 82571 NIC will easily pass 1Gbps with any vaguely recent processor. Unless it's something ultra low power maybe, but any system designed for that wouldn't be using that NIC.
I wouldn't expect any issue with that.Steve
-
@stephenw10 said in Some hosts aren't connecting to the internet but others are:
@mechtheist said in Some hosts aren't connecting to the internet but others are:
Not sure what 'real' ip6 implies
I mean a globally routable public IP as opposed to a link-local IP based in the MAC address which every interface has.
An 82571 NIC will easily pass 1Gbps with any vaguely recent processor. Unless it's something ultra low power maybe, but any system designed for that wouldn't be using that NIC.
I wouldn't expect any issue with that.Steve
OK, I switched the LAN and WAN to the mobo's onboard NICs and no change, then did a new install of pfsense and no change, if anything it was even slower.
I used to drool at the thought of getting a 1Mbps internet connection, a 'T1' I think I always heard them called, but that was a long time ago. So it's not the NIC, here the quad NIC isn't connected in any way. I was also immediately after the new install getting anomalous cpu usage again:That's what it looks like, the right two are from earlier examples, the left one is from this new install, I saw a number of 3-400% figures, just didn't catch with the screenshot.
The ip6 address popped up again and I see now what happened, pfsense seems to have a lot more ip6 stuff going on and set up as default, there's at least 2 or 3 settings I had to switch off, kept getting 'you can't do that because X is turned on' kinda thing, I really don't recall having to do any of that before.
One strangeness I noted was the syslog was full of this:
Nov 16 14:29:34 nginx 2021/11/16 14:29:34 [error] 33672#100684: *11813 open() "/usr/local/www/cgi-bin/luci/;stok=df714b094f1a294eb62bd685e32d868d/admin/status" failed (2: No such file or directory), client: 10.0.0.40, server: , request: "POST /cgi-bin/luci/;stok=df714b094f1a294eb62bd685e32d868d/admin/status?form=wan_speed HTTP/2.0", host: "10.0.0.1", referrer: "https://10.0.0.1/webpages/index.html?t=0093dbde" Nov 16 14:29:38 nginx 2021/11/16 14:29:38 [error] 33899#100585: *13843 open() "/usr/local/www/cgi-bin/luci/;stok=df714b094f1a294eb62bd685e32d868d/admin/status" failed (2: No such file or directory), client: 10.0.0.40, server: , request: "POST /cgi-bin/luci/;stok=df714b094f1a294eb62bd685e32d868d/admin/status?form=wan_speed HTTP/2.0", host: "10.0.0.1", referrer: "https://10.0.0.1/webpages/index.html?t=0093dbde"
There were 4 of the first line followed by 496 of the second one. 10.0.0.40 is my main PC that I'm on now, not the PC with the VMs. So the whole log file was just this crap which means absolutely nothing to me.
I realized trying it bare-metal meant to run pfsense on the same PC I have it virtualized on, it makes no sense to use another PC, I've been thinking of doing that to 'solve' this problem but that wouldn't help figure out what's going on with the VMs. I could just try running it live, that shouldn't be too intrusive, installing it would make me really nervous I'd break something, I'm not that good with boot loaders and the like and zfs is utterly opaque to me.
-
Hmm that's a lot of awking! I'd try running
ps -auxwwd
to see what it calling that command.Those errors are something on 10.0.0.40 trying to open a page that does not exist on pfSense.
And the page it's looking for, /cgi-bin/luci/, is from OpenWRT so I suspect you have ha another router at 10.0.0.1 running that at some time.A restriction to ~1Mbps almost has to be some link issue or something in the VM setup.
Steve
-
@stephenw10 said in Some hosts aren't connecting to the internet but others are:
Hmm that's a lot of awking! I'd try running
ps -auxwwd
to see what it calling that command.Those errors are something on 10.0.0.40 trying to open a page that does not exist on pfSense.
And the page it's looking for, /cgi-bin/luci/, is from OpenWRT so I suspect you have ha another router at 10.0.0.1 running that at some time.A restriction to ~1Mbps almost has to be some link issue or something in the VM setup.
Steve
I appreciate your patience and help. You're probably right about the clogging of the log, I think must be because when I switched from router to pfsense I just took the WAN cable from the router and plugged it in the NIC for pfsense and then unplugged the router from the LAN, so my PC was probably still trying to doing something with the router. I thought the same thing about the awk cpu usage. The one with awk at 99% also has 'unslogd' right under it at 11% so I was thinking it had something to do with diddling with the log file archiving. I couldn't find anything on unslogd but it seems a plausible guess at what unlogging might do;)
You've been a great help and I'm hoping you can give me a little more, what would I look for if it's a link issue? Are we talking packet capture? Are there any kind of errors you think I should look for? The VM settings thing is really frustrating and why I keep repeating about doing this for a long time, as far as I know, I'm setting up the VMs the same way I've always done except switching to the vertio for the networks, and I did that when I was already having the problem hoping it might improve the performance and it did by some, maybe 5-10% at best. I ordered one of the newer NICs with the 82580 controllers. It's not the NIC causing the problem but maybe something will popup by switching to the newer card.
In case anyone can make sense of it, this is the setup info on virtualbox:
C:\Users\rob>vboxmanage showvminfo "pfsense_octo" Name: pfsense_octo Groups: / Guest OS: FreeBSD (64-bit) UUID: e73eb791-860c-40f9-94a5-b02b4073ec44 Config file: H:\VirtualBoxVMs\pfsense_octo\pfsense_octo.vbox Snapshot folder: H:\VirtualBoxVMs\pfsense_octo\Snapshots Log folder: H:\VirtualBoxVMs\pfsense_octo\Logs Hardware UUID: e73eb791-860c-40f9-94a5-b02b4073ec44 Memory size: 10121MB Page Fusion: disabled VRAM size: 126MB CPU exec cap: 100% HPET: disabled CPUProfile: host Chipset: piix3 Firmware: BIOS Number of CPUs: 8 PAE: enabled Long Mode: enabled Triple Fault Reset: disabled APIC: enabled X2APIC: disabled Nested VT-x/AMD-V: disabled CPUID Portability Level: 0 CPUID overrides: None Boot menu mode: message and menu Boot Device 1: DVD Boot Device 2: HardDisk Boot Device 3: Not Assigned Boot Device 4: Not Assigned ACPI: enabled IOAPIC: enabled BIOS APIC mode: APIC Time offset: 0ms RTC: local time Hardware Virtualization: enabled Nested Paging: enabled Large Pages: enabled VT-x VPID: enabled VT-x Unrestricted Exec.: enabled Paravirt. Provider: Default Effective Paravirt. Prov.: None State: powered off (since 2021-11-16T16:34:52.162000000) Graphics Controller: VMSVGA Monitor count: 1 3D Acceleration: disabled 2D Video Acceleration: disabled Teleporter Enabled: disabled Teleporter Port: 0 Teleporter Address: Teleporter Password: Tracing Enabled: disabled Allow Tracing to Access VM: disabled Tracing Configuration: Autostart Enabled: disabled Autostart Delay: 0 Default Frontend: VM process priority: default Storage Controller Name (0): AHCI Storage Controller Type (0): IntelAhci Storage Controller Instance Number (0): 0 Storage Controller Max Port Count (0): 30 Storage Controller Port Count (0): 2 Storage Controller Bootable (0): on AHCI (0, 0): Empty AHCI (1, 0): H:\VirtualBoxVMs\pfsense_octo\pfsense-2nd-replacement-11-11-2021--1333.vhd (UUID: a60260db-d839-44c6-8434-345e0f64826c) NIC 1: MAC: 080027B1449A, Attachment: Bridged Interface 'Intel(R) I211 Gigabit Network Connection', Cable connected: on, Trace: off (file: none), Type: virtio, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: allow-all, Bandwidth group: none NIC 2: MAC: 08002719A76D, Attachment: Bridged Interface 'Intel(R) Ethernet Connection (2) I218-V', Cable connected: on, Trace: off (file: none), Type: virtio, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: allow-all, Bandwidth group: none NIC 3: disabled NIC 4: disabled NIC 5: disabled NIC 6: disabled NIC 7: disabled NIC 8: disabled Pointing Device: USB Tablet Keyboard Device: PS/2 Keyboard UART 1: disabled UART 2: disabled UART 3: disabled UART 4: disabled LPT 1: disabled LPT 2: disabled Audio: enabled (Driver: DSOUND, Controller: AC97, Codec: STAC9700) Audio playback: enabled Audio capture: disabled Clipboard Mode: disabled Drag and drop Mode: disabled VRDE: enabled (Address 0.0.0.0, Ports 33891, MultiConn: on, ReuseSingleConn: off, Authentication type: null) Video redirection: disabled VRDE property : TCP/Ports = "33891" VRDE property : TCP/Address = <not set> VRDE property : VideoChannel/Enabled = <not set> VRDE property : VideoChannel/Quality = <not set> VRDE property : VideoChannel/DownscaleProtection = <not set> VRDE property : Client/DisableDisplay = <not set> VRDE property : Client/DisableInput = <not set> VRDE property : Client/DisableAudio = <not set> VRDE property : Client/DisableUSB = <not set> VRDE property : Client/DisableClipboard = <not set> VRDE property : Client/DisableUpstreamAudio = <not set> VRDE property : Client/DisableRDPDR = <not set> VRDE property : H3DRedirect/Enabled = <not set> VRDE property : Security/Method = <not set> VRDE property : Security/ServerCertificate = <not set> VRDE property : Security/ServerPrivateKey = <not set> VRDE property : Security/CACertificate = <not set> VRDE property : Audio/RateCorrectionMode = <not set> VRDE property : Audio/LogPath = <not set> OHCI USB: disabled EHCI USB: disabled xHCI USB: enabled USB Device Filters: <none> Bandwidth groups: <none> Shared folders: Name: 'shared', Host path: 'H:\VirtualBoxVMs\pfsense_octo\shared' (machine mapping), writable, auto-mount, mount-point: '/root/octo' Capturing: active Capture audio: not active Capture screens: 0 Capture file: H:\VirtualBoxVMs\pfsense_octo\pfsense_octo.webm Capture dimensions: 1024x768 Capture rate: 512kbps Capture FPS: 25kbps Capture options: vc_enabled=true,ac_enabled=false,ac_profile=med Guest: Configured memory balloon size: 0MB
-
I'm not really very familiar with VBox but nothing there jumps out. I do run it here occasionally but in Linux. What is the host processor? Nothing would really make that much difference but that would be a difference.
You should be able to run it in hyper-v on the same host relatively easily. It you can't run it bare metal that's the next thing I would try.
https://docs.netgate.com/pfsense/en/latest/recipes/virtualize-hyper-v.htmlSteve
-
@stephenw10
I was actually thinking of using the linux VM I use for a web server to run a pfsense VM on, there's probably more than enough cpu power to work adequately. The PC is win10 on a i7-5960X w/ 64GB RAM with an NVIDIA 2080 gpu hopefully that will be enough so it's not too painful to work with.I wrote that yesterday, still working on getting a VM in a VM going. I'm definitely going to try HyperV, one reason for the hesitation is you have to turn all of it off in Windows or it interferes with VirtualBox. The last time I used it I think was back in win8 when Windows basically came with a VM XP. The new card should get here Monday.
-
Hmm, VM in a VM is double the potential issues IMO.
If there's any way you can test it bare metal however temporarily I would do that.
At least moving to Hyper-V changes the potential issues so if you still see it the issue is probably on the host or external.
Steve
-
@stephenw10 Yeah, I'm going to do a live instance and see how that goes. Thanks for helping out and being patient.
-
@stephenw10 said in Some hosts aren't connecting to the internet but others are:
Hmm, VM in a VM is double the potential issues IMO.
If there's any way you can test it bare metal however temporarily I would do that.
At least moving to Hyper-V changes the potential issues so if you still see it the issue is probably on the host or external.
Steve
The deed is done, not too many hiccups but one BIG one, pfsense doesn't do Live anymore, and it's been awhile, I had no idea. So I used a USB drive, worked flawlessly pretty much. Weirdly, I couldn't log on with either chrome or Firefox, chrome just took the ID and password and ignored them and presented a blank login page again and firefox simply wouldn't connect, it timed out every time even making sure I wasn't in https. speedtest-
I kinda forgot how zippy it feels when it's not virtualized. I was looking at the pfsense documentation and it discusses Type 1 and Type 2 virtualization engines, the Type 1 it calls 'bare metal'.
So it's sitting there all zippy and everything and I realized I had no idea what I should look for, I'm an idiot. The only thing I did was to look at ifconfig and it showed
em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=81009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,VLAN_HWFILTER> ether a0:36:9f:29:18:28 inet6 fe80::a236:9fff:fe29:1828%em1 prefixlen 64 scopeid 0x2 inet6 ::a236:9fff:fe29:1828 prefixlen 64 autoconf inet6 ::cbda:70cf:b28a:8ad5 prefixlen 128 inet 192.168.0.7 netmask 0xffffff00 broadcast 192.168.0.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
The one thing that stands out to me, if you look all the way up to my first post I mention how it was reporting a 10GbaseT connection in virtualized operation but bare-metal, it's showing the correct 1000baseT. I changed the properties of the NICs in windows to 1000 from 'auto' but that didn't change anything. Could this be the problem? I'd assume there's a setting in pfsense for this but it's never been an issue before and I didn't think of changing it there and I'm back on the router now.
So much fun. -
@mechtheist said in Some hosts aren't connecting to the internet but others are:
I changed the properties of the NICs in windows to 1000 from 'auto'
You had set the NICs to 1G fixed? That could definitely cause a problem. When using Gigabit Ethernet you almost always should have the speed/duplex set to autoselect. The only time you would not is if the other end is also set fixed and that should only ever happen on a 100M link. Gigabit requires autoselect.
I suspect that is the problem here and the virtualisation is hiding it. VBox always present as a 10G NIC to pfSense so it can't see the real link speed.Steve
-
@stephenw10 said in Some hosts aren't connecting to the internet but others are:
@mechtheist said in Some hosts aren't connecting to the internet but others are:
I changed the properties of the NICs in windows to 1000 from 'auto'
You had set the NICs to 1G fixed? That could definitely cause a problem. When using Gigabit Ethernet you almost always should have the speed/duplex set to autoselect. The only time you would not is if the other end is also set fixed and that should only ever happen on a 100M link. Gigabit requires autoselect.
I suspect that is the problem here and the virtualisation is hiding it. VBox always present as a 10G NIC to pfSense so it can't see the real link speed.Steve
I only set it to 1G fixed recently to see if that made a difference and it didn't. I think the 10G thing must be a virtio thing, I never saw that until recently and I switched to using the virtio interfaces recently. If everyone is talking 1G fixed it's still an issue? That's the first time I ever remember changing that setting, I'll put them all back on auto. This is really frustrating, it's a x50 reduction in bandwidth FFS, it should be something glaringly obvious. Running top on that PC had the top running processes all mostly below .1% most of the time, not used to seeing numbers in top like .04%.
-
It's weird, I messed up the quote somehow and when i tried to edit it which only involved inserting a carriage return after the past line of the quote and my saved edit got flagged as spam!
-
Mmm, the forum hates too many edits! I made that edit.
Yeah, VBox with virtio always appears as 10G.
Yes, there would be no problem if both ends of both links are set to 1G fixed. However the issue is likely to be to a modem etc where it cannot be set fixed. You should really always leave it as autoselect for a Gig link.
Steve
-
@stephenw10 Well, I got Hyper-V working, kinda sorta, a few learning curve issues with the network settings using switches and how that affects which interface is doing what etc, and there were some weird glitches that I can't fathom but that's kinda SOP. But it looks good:
Thanks for the help, I'll probably never know what the hell went wrong with pfsense on VirtualBox but gotta move on. -
Cool. Yeah looks like an issue in VBox then somehow.