One of the most common questions that I get asked when talking about application failover is… “how does the IP change work”, or “how do I update DNS”, or “what about my firewall rules”.
They are all valid questions, but they are normally all from “Server people”, not “Network people”. So my new question back will be… why change the IP when you could just move the whole application including its networking and everything it needs to run as one big unit? In fact it’s pretty easy to do if you work with your network team…
Basically what I want to demonstrate is how to move a web server from one site to another, while maintaining the same IP address on that server as well as the same firewall rules, and the same DNS settings. I will accomplish this via a private subnet that is portable between sites, and to help make it portable I will leverage some dynamic routing protocols along the way.
Now for the fine print… I am not advocating you use pfSense or RIP or OSPF or anything else that I used here to do this in the real world. This test was done in my lab, in a controlled environment with a non production workload. So basically kids, don’t try this at home unless you consult with your network team and put together a solid plan. Remember you could lose your job, so you better make sure you know what you’re doing… and oh yeah I’m not responsible if you break your stuff.
How it looks
So to accomplish this we really only need to add a couple of things to a typical application:
- we need a virtual router/firewall that can move along with the other VM’s, this router will sit in front of our web application.
- we need a private vlan where our subnet for the application will be “hidden”, so our VM’s will live on this new hidden subnet behind the router.
- we need to know what routing protocols are used on our physical network so that the virtual router can work with the physical world. Most corporate networks will have something in place like EIGRP, OSPF, RIP v2, etc.
At “Site A” I created a new VLAN (#205) for my private application network and present it to my VMware cluster. There is no routing being done by my physical Cisco gear to access this vlan, instead I created a pfSense virtual firewall that has two network cards. It’s WAN interface is in my management vlan (10.10.1.0/24) and its LAN interface is in the private VLAN #205 (192.168.199.0/24). So it acts as the default gateway for the 192.168.199.0/24 subnet. My web server will then live in this 192.168.199.0/24 subnet behind the router.
On the pfSense box I have also installed the RIP routing protocol package and have enabled RIP v2 (I could have used OSPF but RIP was faster to configure on pfSense). On my Site A Cisco hardware I have also enabled RIP v2 on my management network (10.10.1.0/24) and redistributed those routes into my OSPF routing protocol that routes between Site A and Site B. (if you don’t know what that means its OK… your network guy will).
So at this point I should have a routing table on my routers that shows the 192.168.199.0/24 subnet advertised, and I can ping in and out of the subnet from the rest of my network… both Site A and Site B. So when I ping 192.168.199.10 (my webserver) It hits my Cisco router, and then is forwarded over to the pfSense router where it is then delivered to the proper web server VM.
So at this point we have a working web server at 192.168.199.10 and are able to use it from anywhere on our corporate network. Next up we need to prepare Site B so that if we need to failover or move this web server to that site it is ready.
In preparation for the web application being moved or failed over to the secondary site we need to create another private vlan at Site B… I used VLAN 11 this time and then presented it to VMware, remember this is a non routed VLAN so it is completely isolated from traffic until we do a failover of our web application. Everything else we need for our web app to live here is already in place as long as we have a management network. (Which I do … in this case its 192.168.10.0/24, Zerto and Vmware are already configured as well.)
I then setup Zerto to replication my web application to Site B. The only thing that I did differently was I included my App2-Firewall VM as well as the normal App2-WebApp VM. (So instead of 1 protected VM I have 2).
In the NIC section of the Zerto VPG creation wizard I can select my private VLAN (App2-Private) for APP2-Webapp and the “LAN” interface of the firewall. The WAN interface of the firewall will be attached to the Site B management network which is just VM Network. That management network is 192.168.10.0/24 by the way. (also this screenshot is actually of the VPG after it was failed over… no big deal just as long as you see that my private VLAN’s map to each other and my management network VLAN’s map to each other)
Notice that I am not having Zerto change any of the IP addresses. This is because the pfSense Firewall pulls DHCP on its WAN interface, and because of the dynamic routing protocol I can leave the LAN interface alone (which means I can then also leave the Web server alone too).
I didn’t narrate this video because I have a cold and you wouldn’t want to listen to me cough the whole time, but let me explain what I’m going to walk through in the video.
- First off I show you that while my WebApp is at Site A my cisco gear has a route to the 192.168.199.0/24 subnet via the pfSense router which pulled a DHCP WAN address of 10.10.1.89 on my Site A management network.
- A Zerto move operation is initiated to move both App2-Firewall (the pfSense router) as well as App2-WebApp (the Apache web server) to Site B. There is no fast forwarding in this video so it’s all real-time.
- You see the VM’s shutdown on the left of the video (while they are at Site A). and then they power up in the right vSphere interface (which is Site B).
- I then show you that the Cisco gear has removed the route to 192.168.199.0/24 and I am unable to ping the router or the web app from anywhere on my network. This is normal and important so that all the routers realize that something has changed.
- The VM’s boot up at Site B and we monitor their startup progress. Once pfSense is booted and pulls its DHCP WAN address from Site B’s management network I do a ‘show ip route’ on the Cisco gear at Site A to show that indeed a new route has been put into the routing table. This time the route is via 192.168.10.106 (a Site B management IP).
- I login to the web server via phpmyadmin to demonstrate that the IP is the same as before, and everything is working as if nothing happened (because to the web server nothing but a reboot happened… it has no idea it has moved to a new site)
- Lastly I show that Zerto is delta syncing the data back to site A so that another move operation could take place as soon as it’s done and in sync.
Here is a video of this process actually working. I encourage you to view this in full screen mode, and in HD as it was recorded in 1080p and will look best at that resolution.
So in conclusion I am not telling you to put a pfSense router in front of your applications, and I’m not telling you to put a single web server in a /24 subnet. Planning will be needed before you do anything like this… Measure twice, cut once comes to mind here….
What I am saying though is that SDN is here to stay so you might as well leverage it (and I’m not talking about Cisco ACI or VMware NSX which are both expensive), but you can get virtual ASA firewalls, pfSense routers, and all sorts of other cool stuff that will enable you to plan your applications as if they were going to live somewhere new every day (And most of these are cheap compared to the other SDN players). Using tools like this you can plan for your applications to be mobile, and to bring their firewall and routing rules with them, and once you do that DR and failover is really no big deal at all because the actual workload doesn’t change at all… the only “moving part” is a dynamic routing protocol update.
Now of course this does mean that you will need to provision some extra VLAN’s on your physical network, and will probably need to purchase a few extra Zerto licenses in order to protect those virtual firewalls…. but if it makes the process a million times easier it might just be worth it.
Disclaimer – I work for Zerto, however they have no clue that I’m writing about them, and therefore did not have a chance to review this post. Clearly that means they are not responsible for the content or views expressed in this post. Take that for whats its worth.
IMHO the challenge is not in setting this up for an actual failover, but for the isolated testing you should be doing every so often. You need to have all your NSX “micro-segments” not only exist as routable destinations dynamically updated upstream in a failover, but ALSO exist as a second completely isolated “set” with which you test WHILE production continues to run at the primary site.
An entry/video showing the replicated VMs running in that isolated “bubble” at the DR site at the same time as the “live” systems are running in the primary site would be great.