Where to look to find issue

flat4

My previous setup was a beige box with an i5 from 2012. Not rack friendly so i purchased a SUPERMICRO X9DRD-LF-TW008 used from ebay and it's ok. Its been working just fine until about week and half when it started sending messages all day and night about my WAN being available. So it seems that it disconnects and reconnects constantly. Full disclosure the box sits in my garage and it not cool but it has not overheated. Rebooting corrects the issue for several hours but it returns. I have my old system still working that sat it the same garage for nearly 3 years with no issues that i am going to put back in to see if it does the same thing. Called the ISP and they do not see any issues from their end. Below is a snippet of the errors I see and the messages i get. I need to find out if the problem is the on board nics that are failing or is it the disk or what else could it be.

Any help would be appreciated.

Aug 8 10:57:45 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:45 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:45 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:45 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:45 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:45 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:45 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:45 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:45 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:45 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:45 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:46 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:46 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:46 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:46 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:46 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:46 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:47 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:47 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:47 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:47 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:47 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:48 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0
Aug 8 10:57:48 	kernel 		arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0


Pfsense_comache, [08.08.22 07:28]
pfSense.local.lan
MONITOR: WAN_DHCP is available now, adding to routing group WAN_Group
1.1.1.1|xx.xx.xx.xx|WAN_DHCP|10.836ms|0.317ms|2%|online|none

Pfsense_comache, [08.08.22 07:58]
pfSense.local.lan
MONITOR: WAN_DHCP is available now, adding to routing group WAN_Group
1.1.1.1|xx.xx.xx.xx|WAN_DHCP|10.8ms|0.287ms|3%|online|none

stephenw10

@flat4 said in Where to look to find issue:

Aug 8 10:57:45 kernel arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0

This implies that there is no longer a subnet on igb0 that the gateway IP can be added in. In other words the WAN has lost its IP address.
The logs below that show the WAN coming back up and the monitoring IP starting to respond again so it can be added back to the gateway group.

I expect to see other logs before that showing why it was removed. Does igb0 actually go down?

How is it connected? What's it connected to?

Steve

flat4

@stephenw10 said in Where to look to find issue:

@flat4 said in Where to look to find issue:

Aug 8 10:57:45 kernel arpresolve: can't allocate llinfo for xx.xx.xx.xx on igb0

This implies that there is no longer a subnet on igb0 that the gateway IP can be added in. In other words the WAN has lost its IP address.
The logs below that show the WAN coming back up and the monitoring IP starting to respond again so it can be added back to the gateway group.

I expect to see other logs before that showing why it was removed. Does igb0 actually go down?

No is does not

How is it connected?
Its connected to a ONT
What's it connected to?

1 Gig Fiber

Steve

I connected my old box that i was using as my pfsense box and it lasted about 2 hours before the messages pop up again.

I have a cradle point as backup and disconnected the ont and left the cradle point connected. Messages stopped other than the high latency every once in a while.

I think it mat be a ONT issue or fiber, I had an old apple router and connected it. I plugged in a mac and have been running a ping session to google since last night and it has had about 30% drops.

I will be contacting my isp.

stephenw10

Yeah, that sounds like an upstream issue somewhere. Just a presents differently in pfSense.

flat4

Just wanted to update this.

After working with my ISP forcing me to use their router so they could pull logs.
(ISP is using Nokia ONT and their beacons, you have to register with nokia so they can pull logs from their web dashboard)

Tech came out and stated they gave been having problems with these ONT overheating. (mine sits in my garage cooking most of the afternoon)

They replace the ONT and they open the one I had and sure enough it had burn pins.

Running on the new ONT same model and its hot so I am getting errors again.

So it must be getting hot and connecting and disconnecting.

ONT has not vents whatsoever thinking of what i can do to keep it cooler.

stephenw10

Add a fan? Move it somewhere cooler?

Not that many options if the ISP insist you use that. I assume they'd frown on you adding vents!

Steve

flat4

@stephenw10

when fiber got installed I was one of the early adopters in my neighborhood so i was able to get them to leave me a good amount of slack on the fiber.

techs were very open and said if it kept happening they may switch me to their business ONT that has ventilation.

I will let it run for a while before I open a new ticket and press for that.

flat4

Update number whatever

New ONT same model was installed it lasted a few hours and pfsense started doing its flip flop. Now prior to this we were 110F weather cold front came in and we drop into the upper 70's. So the ont should not be overheating, called ISP and they went ahead switch me to a unit they use at businesses it has a built in 5 port switch and pots lines and plenty of ventilation . I'll paste the notification from last night when i plugged int straight into pfsense.

What I find odd is I had to use their router in initial troubleshooting with the ISP, so i setup it up placed my pfsense box in the dmz of the isp router and it will run night and day without dropping and or notifying me of lost connectivity to the WAN

So I am wondering if the ISP has it setup that when a router that is not provided by them is de-prioritize and that is why I am getting flopping.

The isp router does have a bridge mode that I can test that theory.
Thanks for allowing me to just rant .

Pfsense_comache, [8/19/2022 12:54 AM]
pfSense.local.lan
MONITOR: WAN_DHCP has packet loss, omitting from routing group WAN_Group
1.1.1.1|xx.xx.xx.xx|WAN_DHCP|9.927ms|0.074ms|25%|down|highloss

Pfsense_comache, [8/19/2022 12:54 AM]
pfSense.local.lan
MONITOR: WAN_DHCP is available now, adding to routing group WAN_Group
1.1.1.1|xx.xx.xx.xx|WAN_DHCP|9.901ms|0.078ms|20%|online|loss

Pfsense_comache, [8/19/2022 12:54 AM]
pfSense.local.lan
MONITOR: WAN_DHCP has packet loss, omitting from routing group WAN_Group
1.1.1.1|xx.xx.xx.xx|WAN_DHCP|9.927ms|0.074ms|25%|down|highloss

Pfsense_comache, [8/19/2022 12:54 AM]
pfSense.local.lan
MONITOR: WAN_DHCP is available now, adding to routing group WAN_Group
1.1.1.1|xx.xx.xx.xx|WAN_DHCP|9.901ms|0.078ms|20%|online|loss

Pfsense_comache, [8/19/2022 12:54 AM]
pfSense.local.lan
DynDNS updated IP Address on WAN (igb0) to xx.xx.xx.xx

Pfsense_comache, [8/19/2022 1:01 AM]
pfSense.local.lan
DynDNS updated IP Address on WAN (igb0) to xx.xx.xx.xx

Pfsense_comache, [8/19/2022 1:25 AM]
pfSense.local.lan
MONITOR: WAN_DHCP is available now, adding to routing group WAN_Group
1.1.1.1|xx.xx.xx.xx|WAN_DHCP|9.867ms|0.137ms|4%|online|none

Pfsense_comache, [8/19/2022 1:55 AM]
pfSense.local.lan
MONITOR: WAN_DHCP is available now, adding to routing group WAN_Group
1.1.1.1|xx.xx.xx.xx|WAN_DHCP|9.869ms|0.065ms|4%|online|none

Pfsense_comache, [8/19/2022 2:25 AM]
pfSense.local.lan
MONITOR: WAN_DHCP is available now, adding to routing group WAN_Group
1.1.1.1|xx.xx.xx.xx|WAN_DHCP|9.866ms|0.135ms|3%|online|none

Pfsense_comache, [8/19/2022 2:55 AM]
pfSense.local.lan
MONITOR: WAN_DHCP is available now, adding to routing group WAN_Group
1.1.1.1|xx.xx.xx.xx|WAN_DHCP|11.534ms|17.767ms|4%|online|none

stephenw10

@flat4 said in Where to look to find issue:

So I am wondering if the ISP has it setup that when a router that is not provided by them is de-prioritize and that is why I am getting flopping.

It could be they have some sort of telemetry enabled and are resetting the line when they don't see their own router on the connection. That would be quite unusual though.

Steve

flat4

On last reply and this can be put to rest.
using there device and setting to bridge mode only granted me a day with no notifications and no loss of internet.

My guess is that recently they must have implemented something that checks for the nokia router. I only say this because while it's setup as a router and i put my pfsense box in the DMZ of the nokia router no issues whatsoever.

Thanks @stephenw10 for replying