One of the most misunderstood and misused things that I run into are snapshots. Regardless of whether they are at the SAN level or at the VMware level… most people just don’t understand what they are really doing when they take a snapshot. Most of the time they have also only heard the positives about snapshotting and not some of the negatives. So I thought that I would do a post on how I like to explain snapshots to people who have never heard of them before, or who are using them but just don’t really understand them. My goal is to keep this post relevant to a normal person so that even if you are chatting with your mom/dad/wife who are non-techies you might actually get them to understand what you do all day.
Disk Drives Explained
So let’s think of a SAN (or virtual hard drive if you’re doing this at the VM level) as a table… for right now we will say its just a normal table with a flat surface. (A great example would be a coffee table in your living room) The flat surface represents the disk…and more importantly, it is the part of the disk that stores the data. So let’s assume that the only things we put on our table are magazines and other things we can look at like a picture or something. So when we want to read something, we simply stand over the table and look down, when we look down we are able to see one of the magazines on the table and can read them (just like reading from a drive). If we want to add a magazine to the table we just lay it down on the table (like writing to the drive). If we want to toss out a magazine we can just pick it up off the table and trash can it (delete data). Pretty simple so far right?
Adding a snapshot
OK, so now that we have explained what a disk is in layman’s terms, let’s add a snapshot to the mix. Think of a snapshot as that same table… with all its magazines (data) in place… now add a piece of glass across the top of it. So when we try to read anything on the table we can look straight through the glass and see what is below… but we can no longer pick up a magazine and toss it out (delete data permanently from the drive). But what we can do is change that data (or overwrite it), simply by laying a new magazine on top of one under the glass, or by using some paint and painting the glass (think of this as deleting a file but only to the point where it is no longer accessible) where the magazine is under the glass…we cant see it so technically that data is “deleted”. So now we no longer have access to the old magazine under the glass because we overwrite the file or delete them (even though they are still under the glass). But when we look down we can still see through the glass to the magazines that have not been overwritten, and we can see the new additions to the table because they are on top of the glass. Now when we want to write to even a file under the glass… all those changes land on top of the glass… this is how you can “roll back” if you don’t like one of those changes.
Deleting a snapshot
As mentioned in that last sentence, we can roll back the changes that have taken place since a snapshot was taken. Think of this as you get your table just the way you want… and then a 3-year-old kid comes in and between some crayons and markers and paint… just covers the glass with a nice Jackson Pollack. To get your magazines back (IE your data) you cant just get Windex and clean the glass… because that would be like formatting the drive. Ahh, but you can do is just lift that piece of glass off the top of the table and all your stuff is just like it was before disaster struck. The only downside so far is if you had one magazine on top of the glass right when they started their artwork… it is gone, but at least your RPO is better then if you had to go look for another copy of all your stuff at the bookstore…
So what is bad about a snapshot?
I’m not a carpenter and I wasn’t that great at geometry, but let’s give it a whirl. Your table is 2 foot by 4 foot… giving us a surface area of 8 square foot. So let’s say that 8 square foot is equal to 8 Gigabytes. Now when you take a snapshot and lay that piece of glass over the table you still have that 8sq ft under the glass that will always be there until you take the glass off right? Well, now you have a piece of glass that is giving you the potential to fill up another 8sq ft piece of surface area. So in simple terms, you could potentially DOUBLE your disk space usage if you were to completely rearrange the table (defrag, format, or just change all files on the table). Obviously if you don’t make changes above the glass then it is invisible to you and doesn’t take up space… but if you make changes, and then take another snapshot (ie stack a piece of glass on top of the changes that are on top of the first piece of glass 🙂 then you have the potential to take up 8GB more space…. see what the problem could be …. if you just keep stacking snapshots eventually you will run out of space.
Applying to your daily job
So now let’s say that your the system admin at a company with a san or some VM’s and before you do any program upgrades you always take a quick snapshot for rollback purposes… great! BUT make sure you delete those snapshots after the upgrade is completed successfully. What you do when you delete a snapshot is basically lift up the glass from the table and (in a grid-like fashion) you put whatever is on top the glass in the same spot below the glass, and when you are done moving everything off the glass you remove it completely. Leaving you with all your latest data on the table top. (This process will overwrite whatever information was in the spots where you put the stuff from above the glass)… but now you are reading and writing directly from the table which will not cause you to grow your disk space any larger than the 8GB (2 x 4 ft) area.
Don’t get me wrong, snapshots… when used properly… are not a bad thing. And some of the problems with increase disk usage are eliminated with features like thin provisioned disks, but there is always a downside so just be aware of what they are and try not to use snapshots carelessly. You don’t want to be the customer who calls their SAN vendor or VMware VAR because you’re out of disk space 🙂 and I have run into customers who didn’t know what the ill effects were, and just assumed that they could keep a bunch of snapshots just like daily backups. The problem was when they ran out of space and we went to condense them back down into a single disk… as this process can take hours depending on how much data changed from the time the snap was taken until the present time (or until the next snap).
Nice post – thanks. Well explained. I’ll be back!
Excellent explanation of snapshot. What I wanted to know if there is a company that holds the patent to the process & if it has stock available to the public.
No clue, Im sure someone does… everything is patented.
Isn’t performance another important thing to be aware of – once you take a snapshot, the system has to figure out what layer of glass (in your analogy) the data is on, when reading. Writes are always written to the ‘top layer’ but data to be read could be on any one of the layers of glass, and the reading system has to figure that out. Does this have an appreciable impact?
It certainly can impact performance. Sometimes you dont see it sometimes you do. And the more layers you have the more impact it has
Very informative and well explained. Thanks!
Pingback: The Missing Manual Part 2: When Snapshots Go Wrong | Justin's IT Blog