This month Exagrid released an update for their backup storage appliances that makes them fully compatible with Veeam. For those who have no idea what Exagrid is, check out www.exagrid.com but basically Exagrids are a hardware appliance with several terabytes of raw storage that is presented in different ways via CIFS.
The idea is to have a “landing pad” that is a certain size (which is determined by the amount of data you need to back up, so if you need to backup 1Tb of data you can buy the 1Tb appliance), then after your backup job is completed the data that is in the landing pad is processed and is put into a “cold storage” like area. To Veeam the files still look like they are there and are the original full size, but what has happened is the Exagrid has deduplicated and compressed the data in the landing pad and pushed it back to the main storage area. This leaves the landing pad free for the next backup.
So why consider an Exagrid? Well Veeam alone will do a great job of compressing and deduplicating data inside of a job… but what it cannot do is deduplicate and compress across jobs. So your Monday backup is not deduplicated against your Sunday backup or your Tuesday backups. Also if you are doing weekly full backups and are retaining 30 days of backups you still need a pretty decent amount of storage:
Backup Storage Requirement = (Full backup size * 4) + (Average Incremental * 26)
So if a full backup is 500GB you will need at a minimum 2TB of disk space to retain 4 weekly full backups. and if your average daily incremental is 50GB then you also need 1300GB for incrementals. So total for 30 days of backup retention on a standard file system you would need 3.3TB of disk space.
Here is where the Exagrid magic comes in… the Exagrid will see the blocks from all 4 weekly full backups and deduplicate them… so you might only have to store 600GB of data total for all 4 weekly fulls. Plus it will deduplicate all incrementals and full backups together… so it’s looking at every block of data in a CIFS share… not just what is in a single session. So LOTs of space savings to be had!
So the bottom line is use what you have more effectively instead of continually buying disk space.
Stay tuned as I will be explaining how to setup the Exagrid for Veeam Backups as well as the best practices for configuring your Veeam Backup jobs to work with the Exagrid appliances.
It may depend on how you have your veeam jobs setup. I have my jobs setup per folder in my ESX enviroment.
So I don’t get dedupe across different jobs. But backups within the same job are deduped. So I have job with 20 vms in it for business A which is backed up every weekend for 6 weeks.
All of those backups are deduped between the 6 weeks. I just don’t get dedupe across the jobs that has 20 vms for business B.
I agree Exagrid would dedupe across all of the jobs, but its is VERY expensive compared to just buying more disk and using veeam dedupe.
If I understand correctly, the Exagrid performs global deduplication meaning all incoming data is processed regardless of the source or destination on the Exagrid appliance whereas Veeam performs local deduplication meaning it is per-job.
There are advantages to both methods. Taking Veeam out of the picture for a second, with local dedupe, you are not having to send the entire backup job accross the LAN or WAN, the dedupe happens on the client. You can then send the compressed job to a central NAS or SAN, saving bandwith. This is great when backing up roaming laptops or remote servers over a slow WAN. Of course, you’ll want global deduplication for the servers in your datacenter to reduce the storage footprint.
I would imagine most organizations will need a combination of both options. Great article!
I agree, the Exagrid is definitely not a bottom dollar solution. But where they really shine (and ill go through this is a later post 😉 is when you are replicating over the WAN. If I need to replicate backup data offsite, and a full backup is 300+ gig you are going to need a pretty decent connection. But if I have Exagrid on both ends and it knows about all the blocks and I’m getting good ratio’s I will reduce my WAN pipe costs tremendously, and then we can use that extra cash to offset the cost of the Exagrid.
If you have the gear on-site to play with, I’d love to hear what kind of add’l de-dupe rates you are seeing. I’ve wondered does the de-dupe that Veeam does within a job compress the data to a degree that there is no commonality across data sets from multiple jobs (akin to folks who use a software package to compress DB backups, and when they send it to DataDomain, ExaGrid, etc they see no de-dupe results). I’ve heard mixed recommendations when using Veeam to backup to a dedicated de-dupe appliance. Some people say leave Veaam de-dupe on and others say turn it off, but I haven’t seen any sample numbers on what the impact actually is.
We have Veeam’s Dedupe still turned on, but compression is turned off. In Veeam on my one job I’m seeing a 73% savings from the Veeam Dedupe on full backups and a 1% savings from Veeam Dedupe on incrementals. On the compression side though there is no savings from inside of Veeam. Then on top of the savings from Veeam I am also seeing a 5:1 ration on the exagrid. Meaning Veeam has sent 830GB of Veeam vbk and vbi files to the Exagrid, but it is only taking up 164.25GB on the Exagrid… I contribute this to Veeam deduping at the block size of your VMFS and the Exagrid deduping at a much smaller block size. … The smaller the blocks the more likely you will have identical blocks.
Pingback: Exagrid with Veeam Backup Part 2 | Justin's IT Blog
Thanks for the info on what you’re seeing. Have you tried turning off the de-dupe within Veeam and seeing if that raises the de-dupe considerably on the Exagrid box? I’m curious if Exagrid’s de-dupe on native un-deduped Veeam data could be greater than what you get if you have Veeam do some de-dupe and then Exagrid tries to de-dupe the de-duped data.
I received two Test Exagrid’s last week for testing such as this. So I will test stuff like this out and post my findings, but since the Exagrid best practices doc says to leave it on i have to think that they already tested this. After all they want to be able to show off a high dedupe ratio.
Good point, I agree if that’s what Exagrid recommends in the best practices guide, then one would hope that setting will give you the best result. The other advantage of keeping that setting on is it will reduce the volume of data hitting the Landing Zone, hence your Landing Zone does not need to be as big.
With encryption enabled, Exagrid can’t dedupe the backups anyway.
Good to know dan! Thanks