DHCP Server Keeps Failing

MrVining · Dec 10, 2012, 8:08 AM

First of all I'm not very router savvy. I just started learning how subnetting works, but I'm still pretty green.

I used pfSense pre version 2 for a few years and it worked flawlessly. Then I moved and didn't have access to PPoE info and I was stuck using the routing functions of the DSL router for a while. Recently I got my self a new little SuperMicro Atom server board to run pfSense. I've got:

Atom 525 CPU
4 GB RAM
30GB SSD

I installed pfSense, I don't remember doing a software update and the current version is "2.0.1-RELEASE (amd64)". The problem I'm having is with the DHCP server. Last week all of the clients on the network lost internet access and were not refreshing IP address. As I'm a newb and didn't really have much configured other than a couple ports forwarded and squid I ended up doing a factory restore today. The only thing I have changed was the LAN IP from 192.168.1.1 to 192.168.1.254. The DHCP server worked great for a few hours, and now it's not working again. I tried restarting the router but still no luck. I looked through a few recent posts and didn't see anything else like this happening to others so I'm guessing I messed something up.

One thing I did find odd is the Dashboard
CPU usage: 0%
Memory usage: 4%
Swap usage: 0%
Disk usage: 100%

Disk usage 100%??? Could this be the root of my problem? I'm sure it's not actually using 30 gigs… I guess if I had a setting wrong when I was using squid that could have happened, but I was thinking the "factory defaults" would fix that if it was the issue.

Thank you for your time,

wallabybob · Dec 10, 2012, 10:46 AM

Check the DHCP log (see Status -> System Logs, click on DHCP tab) for a possible explanation.

pfSense shell command```

ps lax | grep dhcp


squid seems to have a reputation for being hungry for disk space. It is probably not a good idea to assume the squid defaults will "work" in all environments.

esnakk · Dec 10, 2012, 4:16 PM

@MrVining:

<text>Disk usage 100%??? Could this be the root of my problem?</text>

Yes it could.
Maybe your logs fill up the disk. We run a couple of systems very similar to yours and I have experienced a few problems - maybe you are suffering from one or several of these;

Some bios for supermicro needs patching. It seems some of the older bios versions cause trouble, typically the systems just hang and wont reboot. Sometimes they are stuck in some kind of "checking bios loop" that could take for ever (nothing happended for a week + when I left a system just to see what would happen.
Some Intel nics that SM use do NOT work with "Hardware Checksum Offloading" (See: System -> Advanced -> Networking and disable "Hardware Checksum Offloading"). Even the most expensive Intel Nics from SM needs to have this disabled! All kinds of weird stuff happens if you don't.
RAM. Some servers shipped in 2012 can only see 2GB of ram even though they are supposed to be able to use up to 4GB. Bios patching etc does not help, the server will never be able to use more than 2GB. If you use a RAM-disk for logging on your SSD-system (you should or log to a different server so you dont wear out your disk, also consider using the nanobsd-version if you are not already doing so) it is likely you run out of RAM if you only have 2 gigs and use a lot of modules, squid etc etc.
RAM #2. It seems only the recommended Micron dimms work good with some of the 525 servers from SM. For example Kingston are supposed to work but I have seen at least two systems that behaved strangely (crashes, reboots etc) with Kingston and after a change to the slightly more expensive (but with a veeeery long waiting list, it took us over a month to get our last shipment of these small buggers!) Micron dimms everything worked like a charm.
CARP. If you use fail over and syncronize dhcp-settings sometimes things go wrong. It seems the only fix is to completely remove all dhcp settings, syncronize, remove sync for dhcp, syncronize, wait, go back and re enter syncronization for dhcp and re enter dhcp for one interface at a time while waiting for sync between all changes. Wait a bit more (or force syncronization) and voila - it works again. Something seems to happen (like corrupt conf or similar) when syncing. Every now and then this problem reoccur in our firewalls and we have to go through this procedure to get it working again.

Often when the problem occurs the CPU usage go up very high and then you would find something in the dhcp logfile saying something like dhcp crashed. If you check services (Status -> Services) you would also see that DHCP is not running (most of the time, at a few occasions dhcp keeps running but would not actually work, ie: a client can not recieve an IP-address etc).

If you decide to do a reinstall I would strongly suggest you consider using the 4GB nanobsd version, it could possibly cause less wear on your SSD disk(s). I think you have a RAID1 capable controller on all (most?) of the 525 servers from Supermicro so you could probably do a mirrored install if you want to improve the life time and mtbf for the system a little more.

cheers,
E

MrVining · Dec 10, 2012, 7:35 PM

Dec 10 13:28:37 dhcpd: write_lease: unable to write lease 192.168.1.216
Dec 10 13:28:37 dhcpd: DHCPREQUEST for 192.168.1.216 (192.168.1.254) from x (android-x) via em1: database update failed
Dec 10 13:28:38 dhcpd: Wrote 15 leases to leases file.
Dec 10 13:28:38 dhcpd: write_lease: unable to write lease 192.168.1.215
Dec 10 13:28:38 dhcpd: DHCPREQUEST for 192.168.1.215 (192.168.1.254) from x (Apple-TV) via em1: database update failed
Dec 10 13:28:38 dhcpd: DHCPDISCOVER from x (storage01) via em1
Dec 10 13:28:38 dhcpd: DHCPOFFER on 192.168.1.208 to x (storage01) via em1
Dec 10 13:28:39 dhcpd: Wrote 15 leases to leases file.
Dec 10 13:28:39 dhcpd: write_lease: unable to write lease 192.168.1.215
Dec 10 13:28:39 dhcpd: DHCPREQUEST for 192.168.1.215 (192.168.1.254) from x (Apple-TV) via em1: database update failed
Dec 10 13:28:40 dhcpd: Wrote 15 leases to leases file.
Dec 10 13:28:40 dhcpd: write_lease: unable to write lease 192.168.1.208
Dec 10 13:28:40 dhcpd: DHCPREQUEST for 192.168.1.208 (192.168.1.254) from x (storage01) via em1: database update failed
Dec 10 13:28:41 dhcpd: Wrote 15 leases to leases file.
Dec 10 13:28:41 dhcpd: write_lease: unable to write lease 192.168.1.215
Dec 10 13:28:41 dhcpd: DHCPREQUEST for 192.168.1.215 (192.168.1.254) from x (Apple-TV) via em1: database update failed
Dec 10 13:28:43 dhcpd: DHCPDISCOVER from x (WAP01) via em1

wallabybob · Dec 10, 2012, 8:50 PM

@MrVining:

Dec 10 13:28:37 dhcpd: write_lease: unable to write lease 192.168.1.216
Dec 10 13:28:37 dhcpd: DHCPREQUEST for 192.168.1.216 (192.168.1.254) from x (android-x) via em1: database update failed
Dec 10 13:28:38 dhcpd: Wrote 15 leases to leases file.
Dec 10 13:28:38 dhcpd: write_lease: unable to write lease 192.168.1.215
Dec 10 13:28:38 dhcpd: DHCPREQUEST for 192.168.1.215 (192.168.1.254) from x (Apple-TV) via em1: database update failed
Dec 10 13:28:38 dhcpd: DHCPDISCOVER from x (storage01) via em1
Dec 10 13:28:38 dhcpd: DHCPOFFER on 192.168.1.208 to x (storage01) via em1
Dec 10 13:28:39 dhcpd: Wrote 15 leases to leases file.
Dec 10 13:28:39 dhcpd: write_lease: unable to write lease 192.168.1.215
Dec 10 13:28:39 dhcpd: DHCPREQUEST for 192.168.1.215 (192.168.1.254) from x (Apple-TV) via em1: database update failed
Dec 10 13:28:40 dhcpd: Wrote 15 leases to leases file.
Dec 10 13:28:40 dhcpd: write_lease: unable to write lease 192.168.1.208
Dec 10 13:28:40 dhcpd: DHCPREQUEST for 192.168.1.208 (192.168.1.254) from x (storage01) via em1: database update failed
Dec 10 13:28:41 dhcpd: Wrote 15 leases to leases file.
Dec 10 13:28:41 dhcpd: write_lease: unable to write lease 192.168.1.215
Dec 10 13:28:41 dhcpd: DHCPREQUEST for 192.168.1.215 (192.168.1.254) from x (Apple-TV) via em1: database update failed
Dec 10 13:28:43 dhcpd: DHCPDISCOVER from x (WAP01) via em1

My guess is that DHCP is not issuing new leases because it can't write lease files because the disk is full. Make some room. I have been running pfSense for over 3 years on a 1GB DOM, but I don't run squid. I suggest you take a detailed look at your squid configuration.

MrVining · Dec 11, 2012, 6:47 PM

I quit running squid after I reset to defaults, but by that time the disk was full. I ended up reinstalling because I was already at near zero configuration.

I am 100% sure that squid was culprit, or my configuration of squid to be exact. I'm guessing I added a digit when I was setting up the amount of disk space it could use for caching.