Exagrid Notes / Support

My Experience with ExaGrid Support

Let me start out by saying that the support and customer service from ExaGrid is top notch. I get to work with a lot of different vendors at my job, and I must say that when I have to call one I usually plan for a crappy day. However, I don’t have that feeling when I have to call ExaGrid, because they know their shit and don’t try to transfer you 100 times before its fixed. Actually, I should say when they call me because normally they know something is wrong and are on top of it before I am.

Anyhow, now that I have a few weeks of backups out there with Veeam and an Exagrid box I have started to run into issues as described in Part 4 of my ExaGrid post. After talking with Tom, from Exagrid, I have to say that I have not only learned a lot about how their product works but also why my backups started to fail. Tom did a great job of explaining how files are presented back to a backup server and what back-end work the ExaGrid is doing… because of this, it wasn’t hard to put my finger on why the backup job failed. In my book when a vendor goes the extra mile to help me understand why there might be issues… and not just fix them and be 100% reactive… then they are pretty damn awesome in my book.

Synthetic full backups and ExaGrid

One of the best features that Veeam offers is the ability to only pull the blocks of data that has changed from a VMware (and now Hyper-V) datastore. Then take those blocks and combine them with a full backup (VBK file) and create a new full backup. They call this a synthetic full backup.

So what I was having problems with at a customer was 2 things:

  1. Synthetic full backups were taking 6-10x longer then they did on a non-deduplicated backup repository.
  2. Synthetic full backup jobs were failing with errors such as “failed to write to …” with the destination being on the ExaGrid file share.

Today, with the help of Tom at Exagrid I learned why these things are happening. So if you are an already a customer and if you are running into errors with synthetic fulls keep reading, or if your a potential customer keep this info in mind when planning out your Veeam jobs.

First let’s draw out how the ExaGrid is designed to work:

(Click for a larger version 2MB file)

Basically what the ExaGrid tries to do is store the last backup in a fully hydrated form. This allows you to do file restores, instant recovery and all other stuff from the latest backup without waiting for the appliance to rehydrate the files. (This is the normal operation provided the unit is sized properly and space permits the files to remain hydrated)

However… if the unit is undersized, or you just have weeks and weeks of retention on it, and a full week of backups no longer fits in the landing pad then things get a little hairy. First synthetic full backups start taking a really long time… then if things grow a little bit larger… they start to fail. This all comes back to the size of the box. While the landing pad is an elastic area and will dynamically expand if needed, once the “cold storage” area gets full the landing pad cannot expand.

Why does it need to expand you ask?

Well if a file is requested by the Veeam server, the ExaGrid finds the file and checks to see if its in the landing pad (ie. hydrated)… if its not it puts all the pieces back together and re-hydrates the file. This puts the file into the landing pad, which takes up space. Then the Veeam server can read the file (note that entire files do not always need to be re-hydrated… the ExaGrid is smart enough to do just what is needed), but since deduplication is done after IO is stopped on the share, the re-hydrated file will stay in the landing pad until a predetermined amount of time passes once IO stops. So if you have a 1TB ExaGrid and your Full Veeam backup is about 484GB on the disk, the landing pad is now 1TB – 484GB… this leaves approximately 516GB of space in the landing pad. So lets say that each daily incremental is 50GB, and you do an incremental Monday-Friday (Synthetic full on Saturday… which is the default)… now you are left with about 266GB of free space in the landing pad…this of course is assuming worst case and that all files for the previous week must by re-hydrated.

(Side note… the 1st problem I listed should have an obvious explanation now. It takes a lot of CPU cycles to produce a re-hydrated file… so if you have to re-hydrate 500+GB you can expect to wait a while)

So now its Saturday and Veeam starts a Synthetic Full backup… the ExaGrid has re-hydrated all the files it has asked for and we have 266GB free in the landing pad for the 484GB synthetic full file. By now I think you see the problem.. we need more space then the 1TB that is advertised by the ExaGrid.

At this point two things could happen:

1.) If free space is available: Free space on the ExaGrid is allocated to the landing pad dynamically and brings the landing pad up to whatever size is needed by the backup files… your job completes fine. Then later the previous week full backup, as well as the synthetics, are deduplicated and put back into “cold storage”.

(Click for a larger version 2MB file)

2.) If free space is not available: The ExaGrid cannot expand the landing pad to accommodate the new synthetic full backup and therefore tells Veeam that writes to the shared path have failed. Then Veeam gets pissed and your job fails to run.

(Click for a larger version 2MB file)

 

How can I fix this?

Well, you have two options, the first is simple… buy another ExaGrid (let them know I sent you … maybe Ill get commission LOL). The second way is to stop using synthetic full backups inside of Veeam. Instead, tell Veeam to “periodically” do full backups and not build them synthetically. This will put more of a load on your SAN without a doubt because Veeam will now transfer ALL BLOCKS OF DATA each time a full backup runs. So if your SAN is already over worked, or if your backup windows are too long to transfer all the blocks, then you’re probably better off to just buy an additional unit.

 

Before posting this article I left the fellas over at Exagrid proof read it since I’m still fairly new to the technology and they had these points:

1. Consider using the term “repository space” in place of “cold storage”.  That is what you’ll see in our GUI/screen shots.

2. When Veeam deletes no-longer-needed save points, the ExaGrid gets to work removing that un-needed data from the repository as quickly as possible – no need to wait for another backup, etc.  We know its been deleted, so we just purge it from the repository decently and in an optimal order.

3. The sizing calculator we use for Veeam customers is exactly the same as Veeam uses – 1TB of provisioned VM space requires 1TB of disk storage which would be an EX1000.  So “potential customers” should not run into this same situation.

Thanks for the tips guys!

Share This Post

4 Responses to "Exagrid Notes / Support"

  1. Hi Justin – have to agree with you on your ExaGrid support comments. I have spoken to 3-4 differnet folks there & they ALL seem to know everything about a) their product and b) my configurations.
    Robbie
    ps…great blog BTW!!

Leave a Reply