Azure Scale Sets – A HUGE boost for Zerto 7 RTO time

Zerto 7 was officially released April 16th, 2019 and most of the headlines were around the new integrated backup technologies. However, there is another story that hasn’t made as many headlines, but is equally awesome!

Zerto has been able to replicate and recover VMs into Azure for several years now, and to do that we leverage a “Zerto Cloud Appliance”. The ZCA combines ZVM (manager) and VRA (worker) technology into a single Windows appliance. However, as everyone knows, there is only so much a single VM can do.

The Zerto development team went to work and created something really awesome (Again!). Instead of building out a massive infrastructure of always-on appliances they decided to leverage cloud-native scale-out technology.

Explaining it to my kids

My kids love McDonald’s… why I don’t know… anyhow. When you walk in you get in line to order. Normally the restaurant has one person at the counter taking orders. But if the line gets too long, hopefully, they will add more order takers to the counter.

That allows them to process people faster through the line, and hopefully get more work done in the same amount of time.

McDonald’s doesn’t want to hire 5 order takers and have them at the counter all day, it would cost too much. Likewise, if they only hire one, during lunch people would leave because the process would take too long to get lunch. This is the easiest form of scale sets I can think of.

What are Azure Scale Sets?

In Zerto 7.0 this scale-out technology is only for Azure, so we are using native Azure Scale Sets with a Linux OS image.

But what about the line of people? Well in order to put our new scale set to work, we have added a work queue to the mix as well. This allows us to queue up all the work that needs to be done during a Failover (or migration).

When that queue is empty, the scale set only has a single worker instance. The image below is a scale set without anything to do.

Scale Set with an empty work queue.

When the queue starts to fill up we start adding instances. By default, the scale set will grow to as many as 41 worker instances!

Scale Set … scaling! (this is what it looks like about 2-3 minutes after you click failover.

After the instances have scaled out they ask the queue for work and then process that work. When they are done, they go back to the queue and ask for more work, the process repeats until there is no more work to do.

Once there is no more work, it’s time to shut down extra instances. The scale set manages this all automatically for Zerto.

Scale Set deleting un-needed workers after the failover has completed.

This allows Zerto RTO’s to be reduced by a lot! (Scientific I know!… think future post) Another bonus is that the instances are only 1 vCPU and minimal RAM by default, so it’s super cheap to run only one when you aren’t doing failovers.

We’re not done either!

Rob Strechay has shared the Zerto roadmap on numerous occasions and somewhere on there has been something to the effect of “cloud scale” or “cloud scale out” etc. This technology is the first iteration of that work. However, there is still a LOT more work to do. In coming releases, customers can expect us to also apply this technology to other places in the product as well as add scale technology to our AWS platform as well.

Stay tuned as I’ll be testing Zerto 6.5 and Zerto 7.0 head to head to see how much RTO has improved under real world workloads!

Share This Post

Leave a Reply