Thursday, 2 April 2009

DHCP automated failover

Today I had one of those better days that I'd like to share with you, there a nice tool call dhcpcmd you can get it from Microsoft it was release with NT4 and later with windows 2000 and its still works on vista and 2008 the nice this about this is that it can do something simple called "GetVersion" might not seem like a really important thing but lets explain what it can be used for.

There are three basic ways to setup DHCP first is two server with half the scope on each and if one fails remove the excluded range and continue to server the ip range from one server, this works but needs manual effort.

Second is to setup a cluster resource for you DHCP this works quite well but your DHCP jet database is not cluster aware so sometimes you need to restart your DHCP server service to get it working after it fails over, again that's manual effort.

Third option two servers setup and one with DHCP server service stopped until first server fails, and again manual effort to start it.

So far you start to see a theme and is allot of manual effort and like all manual effort it will need you to do this fail over at early morning for sure because that's how it goes in the IT world when something breaks.

Now when I came across DHCPCMD even just its ability to GetVersion was enough, let me show you with the first option where have the scope on two server with excluded ranges, I have the following in a script file on one server doesn't even have to be one of the nodes, and it has scheduled to check every 5 minutes using this script.

And as you'll see I've put some basic responses in for a failure.

@echo off
dhcpcmd 192.168.2.2 GetVersion
if errorlevel 1 goto Server1_Failed
dhcpcmd 192.168.2.3 GetVersion
if errorlevel 1 goto Server2_Failed

netsh dhcp server \\winserver-2 scope 192.168.2.0 add excluderange 192.168.2.10 192.168.2.128
netsh dhcp server \\winserver-1 scope 192.168.2.0 add excluderange 192.168.2.128 192.168.2.254
goto All_Done

:Server1_Failed
rem --- alert
net send Administrator "Warning: DHCP server 1 failure failing over to second server"
netsh dhcp server \\winserver-2 scope 192.168.2.0 delete excluderange 192.168.2.10 192.168.2.128
goto All_Done

:Server2_Failed
rem --- alert
net send Administrator "Warning: DHCP server 2 failure"
netsh dhcp server \\winserver-1 scope 192.168.2.0 delete excluderange 192.168.2.128 192.168.2.254
goto All_Done

:All_Done
exit


Now the second and third option are almost the same where you want to start a service and or restart a service so here is an example

@echo off
dhcpcmd 192.168.2.2 GetVersion
if errorlevel 1 goto Server1_Failed
goto All_Done

:Server1_Failed
net send Administrator "Warning: DHCP server 1 failure failing over to second server"
psexec \\winserver-1 net stop dhcpserver
psexec \\winserver-2 net start dhcpserver
goto All_Done

:All_Done
exit


Now you setup more complex responses to not being able to get something as simple as version information, but you can do this with almost anything that you can get an output from, and I have some nice ones for monitoring servers just using simple scripts.

My hope is that after reading this you will thing of another three or more services that you can do something smiler to and now you won't have to fix it in the night you can wait till morning.

No comments: