Unable to delete VNXe Volume or inactive checkpoints/snapshots

NOTE: I recommend getting EMC support on the phone before trying any of the commands below, I take no responsibility if you lose data because you use them without fully knowing what they do. As always… if your job is on the line, don’t be a cowboy… call support.

While helping a customer reconfigure their VNXe systems we encountered an issue that would not allow us to delete a LUN. We were canceling the VNXe Replication between their HQ and DR sites, and in its place we planned to use Veeam. To make a long story short we deleted all of the replication jobs on the two SAN’s and then at the DR site we proceeded to delete all of the LUN’s that were being replicated to in order to make room for the new VMware datastores that would be created. However, one of the old LUNs would not let us delete it, and prompted us that there were checkpoint (snapshot) files present.

So I moved over to the snapshot area of the LUN and deleted all of the snapshots, and then tried to delete the LUN again…. we received the same message again. Hmmm…

The customer said that they had seen this error before and that he spoke with EMC support and they were able to quickly resolve the issue in the past. So we started a Webex and left the technician do his thing, basically he said that there were “INACTIVE” checkpoints present for that LUN and that was causing the issue. He requested that we start SSH and login to the VNXe as the service account. After logging in we started the service shell and set our PATH:

svc_service_shell
PATH=$PATH:/nas/bin
export PATH

Next we checked the LUN in question to verify that inactive snapshots were indeed the root cause.

root@spa:/cores/service>fs_ckpt  -l -a
id    ckpt_name                creation_time           inuse fullmark   total_savvol_used  ckpt_usage_on_savvol
96    root_rep_ckpt_94_965516_ 06/01/2012-14:12:24-UTC   y   90%        INACTIVE           N/A
97    root_rep_ckpt_94_965516_ 06/01/2012-14:12:30-UTC   y   90%        INACTIVE           N/A
376   root_rep_ckpt_94_1433097 10/26/2012-14:51:31-UTC   n   90%        INACTIVE           N/A
Info 26306752329: The value of ckpt_usage_on_savvol for read-only checkpoints may not be consistent with the total_savvol_used.

id    wckpt_name               inuse fullmark total_savvol_used  base  ckpt_usage_on_savvol
*** WARNING *** VNXe service shell activated! *** WARNING ***

Here we found three checkpoints (or snapshots) that were being used as part of the replication process that were stuck. We can tell this because two of them are almost 6 months old. To remove the snapshots we used the following syntax:

 

root@spa:/cores/service>/nas/sbin/rootnas_fs -d id=96 -o umount=yes -ALLOW_REP_INT_CKPT_OP
id        = 96
name      = root_rep_ckpt_94_965516_1
acl       = 0
in_use    = False
type      = ckpt
worm      = off
volume    =
rw_servers=
ro_servers=
rw_vdms   =
ro_vdms   =
deduplication   = unavailable
*** WARNING *** VNXe service shell activated! *** WARNING ***
root@spa:/cores/service>/nas/sbin/rootnas_fs -d id=97 -o umount=yes -ALLOW_REP_INT_CKPT_OP
id        = 97
name      = root_rep_ckpt_94_965516_2
acl       = 0
in_use    = False
type      = ckpt
worm      = off
volume    =
rw_servers=
ro_servers=
rw_vdms   =
ro_vdms   =
deduplication   = unavailable
*** WARNING *** VNXe service shell activated! *** WARNING ***
root@spa:/cores/service>/nas/sbin/rootnas_fs -d id=376 -o umount=yes -ALLOW_REP_INT_CKPT_OP
id        = 376
name      = root_rep_ckpt_94_1433097_1
acl       = 0
in_use    = False
type      = ckpt
worm      = off
volume    =
rw_servers=
ro_servers=
rw_vdms   =
ro_vdms   =
deduplication   = unavailable
*** WARNING *** VNXe service shell activated! *** WARNING ***

After after removing them we verified that they had been removed by running the fs_ckpt command again

root@spa:/cores/service>fs_ckpt  -l -a
id    ckpt_name                creation_time           inuse fullmark   total_savvol_used  ckpt_usage_on_savvol
Info 26306752329: The value of ckpt_usage_on_savvol for read-only checkpoints may not be consistent with the total_savvol_used.

id    wckpt_name               inuse fullmark total_savvol_used  base  ckpt_usage_on_savvol
*** WARNING *** VNXe service shell activated! *** WARNING ***

As you can see its not a complex process if you know what you are doing. I recommend that if you have this issue you contact EMC technical support so they can verify that this your issue is the same as the one mentioned here, and to make sure they delete the proper checkpoints and not ones that you may need.

 

Loading

Share This Post

10 Responses to "Unable to delete VNXe Volume or inactive checkpoints/snapshots"

  1. Have you found any good use for VNXe replication? I’m the same as you, used to use it, now I use Veeam. I never could make VNXe replication make sense for me

  2. This is the first encounter I have had with it. Personally I find that the cost of adding the Remote Protection Suite more than out weights the cost of Veeam… plus there is no dedup or compression on the VNXe replication that I know of.

  3. We purchased it with when we bought our pair of VNXe3100s but I have yet to find a use case for it.

    At first we were using VNXe replication to replicate all our datastores. I created a test datastore to try and fail over, and nothing happened, the VM blocked. I’m thinking it was trying to sync the changes before it failed over, but I couldn’t tell.

    The next thing I tried was to just replicate our Veeam backup datastore. I tried to do some DR tests, but never could access the replicated datastore.

    Also, it seems replication is very sensitive. I’ve had to get EMC support to remove the replication connection just about every time I wanted to remove a replication destination. Changing IPs was a pain too. I think you blogged about that before. It’s much easier with some of the recent OS releases though.

    Like I said, I’ve yet to find a solid use case for it. Maybe replicating within a DataCenter?

  4. Hi, when I ssh in as the service account, I don’t have the “svc_service_shell” command available, what code level is this available in?

  5. this is what I get.. what does it mean?

    service@spa spa:~> svc_service_shell
    ERROR: service tool has expired!
    Error executing the service-tool. Exiting…

  6. Hey Vic,

    Its hard to say… I wrote this article a while ago… they may have changed tools around since that time. I would just call support, they should be able to help you out.

Post Comment