How-To: Migrate MS SQL Cluster to a New SAN

During a recent SAN installation I ran into a Microsoft SQL cluster that needed to be migrated from an old SAN to a new SAN. After doing a bunch of reading and testing I developed the plan listed below. As it turned out the cluster that needed to be migrated was using shared VMDK files instead of RDM’s so I was able to just migrate the VMDK’s but I thought I would share the plan for migrating a cluster using RDM disks just in case someone else runs into this situation.

UPDATE:

For an even easier way to do a migration or move check out my new article, its automated and only takes about 30 minutes.

The Process

Part 1. Present the new LUNs

Because the SQL servers are virtual machines using RDM’s I needed to create 3 new LUNs on the new SAN and present them to the VMware servers. These three LUNs would be used for: 1. Cluster Quorum Disk 2. MSDTC Disk 3. SQL Data Disk. I wont dive deep into this step as it would be different for each SAN vendor, but in summary, create your new LUNs as needed and add them to the storage group that is presented to your VMware hosts, after that rescan all of your VMware HBA’s and verify that the VMware hosts can see the LUNs.

Part 2. Add New RDM’s to Primary Cluster Node

Next we will add each of the new RDM Disks to our primary cluster node. Technically we would not have to mount them to the primary node, but I’m doing it that way just to keep things organized. Here are the steps for this section:

  1. Open Edit Settings of Node 1
  2. Click Add, then Select Disk
  3. Pick Raw Device Map as the new disk type
  4. Select the Raw LUN that you want to use
  5. Tell it to store the information about the RDM with the VM
  6. Select Physical Compatibility Mode
  7. Select a Virtual SCSI Node device that is unused (And is on a controller that is in physical mode)
  8. Complete the Wizard
  9. Repeat Steps 2 – 8 to add the number of new RDM’s you will need
  10. Now click ok on the edit settings box to commit the changes
  11. After committing, go back into Edit Settings of node 1 and look to see what the file name is for the RDM’s (mine were SQL1_6.vmdk and SQL1_7.vmdk, we will need these to configure node 2)

Part 3. Add Existing RDM’s to Secondary Cluster Node

  1. Open Edit Settings of Node 2
  2. Click Add, then Select Disk
  3. Pick Existing Virtual Disk as the disk type
  4. Browse to where the config files or Node 1 are on the SAN and select the VMDK file that you made note of in step 10 in Part 2
  5. Select a Virtual SCSI Node device that is unused (And is on a controller that is in physical mode, should probably be the same as the first node)
  6. Complete the Wizard
  7. Repeat steps 2 – 6 for the remaining RDM’s that you need to add to the second node
  8. Now click ok on the edit settings box to commit the changes

Part 4. Preparing the new RDM’s in Windows

Note: these steps are preformed only one node 1

  1. Open Disk Management and Rescan the server for new disks
  2. Right click on the first new drive and select “Online”
  3. Right click again on the first new disk and select “Initialize”
  4. Now right click in the right area of the first new disk and pick “Create Volume”
  5. Complete the new volume wizard and assign a temporary drive letter
  6. Repeat Step 2 – 5 for each new drive

Part 5. Add the new drives to the cluster

  1. Open “Failover Cluster Manager”
  2. Expand out the cluster you are working on and select the Storage item in the left tree.
  3. On the right click Add a Disk
  4. Make sure there are check marks beside all of the new drives you wish to add as a cluster disk
  5. Click OK
  6. Verify that the new disks now appear under Available Storage in the middle column

Part 6. Move the Cluster Quorum Disk

  1. Open “Failover Cluster Manager” if you dont still have it open
  2. Right click the cluster you want to modify and select “More actions -> Configure Quorum Settings”
  3. Select “Node and Disk Majority” (or whatever you already have selected)
  4. Select the new Disk that you want to use from the list (it should save “Available Storage” in the right column)
  5. Click next on the confirmation page
  6. Click Finish on the final step after the wizard has completed

Part 7. Move the SQL Data Disk

  1. Open “Failover Cluster Manager”
  2. Expand out the cluster your working on and select “SQL Server” under Services and applications
  3. Select “Add storage” from the menu on the right
  4. Select the new drive from the list, and click OK
  5. In the middle column right click “Name: YourClusterNameHere” and select “Take this resource offline”
  6. Confirm that you want to take SQL offline
  7. Verify that SQL Server and SQL Server Agent are offline
  8. Open Windows Explorer and copy the SQL data from the old drive to the new drive
  9. Back in Failover Cluster Manger right click on the old disk in the middle column and select “Change drive letter”
  10. Make the old drive a temporary drive letter other then what it currently is, Click OK
  11. Confirm that you want to change the drive letter
  12. Next right click the new drive and select change drive letter, set the new drive’s letter to what the old drive was
  13. Again, confirm you want to change the drive letter
  14. Right click on SQL Server and select “Bring this resource online”, do the same for SQL Server Agent
  15. Right Click “Name: YourClusterNameHere” and select “Bring this resource online” in the middle column
  16. Verify that SQL starts and is accessible

Part 8. Moving MS DTC Witness Disk

From what I have read MSDTC’s witness disk cannot be moved like the SQL data can. Instead you simply delete the DTC instance and then recreate it using the disk that you want to use.

  1. Make sure SQL is shutdown
  2. Next Take the DTC instance offline
  3. Make sure to note the IP address of the DTC and the name
  4. Right click and delete the DTC instance
  5. Now right click on “Services and Applications” and select add new
  6. Pick DTC from the list and click next
  7. Fill in the information that you noted from the old instance, but select the new disk this time.
  8. Finish the wizard and make sure that the new instance is online

Part 9. Verify Operational Status

  1. Verify that SQL Server and SQL Agent are online
  2. Verify that MSDTC is online
  3. Login to SQL using a client application and verify functionality

This part is just to make sure that everything is still working. At this point you need to make sure that SQL is back online and that the client applications that it serves are working properly before we remove our old drives.

Part 10. Remove old disks from Cluster

  1. Open Failover Cluster Manager
  2. Select Storage
  3. Verify that the disks under “Available Storage” are the old drives
  4. Right click each old drive and select “Delete”
  5. Confirm that you wish to delete the cluster disk

Part 11. Remove Old Disks from VM settings

This part would seem simple, but you must make sure you remove the correct RDM’s otherwise you will have problems. The best way that I found to make absolute sure was to make a node of how big the RDM’s were that I would be removing. Then we can browse the datastore of the primary node and see which VMDK descriptor files show that size. Of course this only works if they are different sizes, otherwise you will have to go but which order they are in windows and which order they are via the SCSI buss numbers in the VM settings.

After determining which disks need to be removed (which VMDK files they are that is):

  1. On the secondary node go into Edit Settings and find which RDM drives have the same file name as the ones identified earlier
  2. Select the Remove button at the top of the hardware information page.
  3. Leave it set to “remove from vm” and don’t select delete from datastore
  4. Click OK to commit the changes
  5. Now go to the primary node’s Edit Settings dialog box
  6. Repeat Steps 2 -4, but this time tell it to delete them from disk, as we no longer need the descriptor VMDK files for those RDM’s
  7. Now that there should be nothing else using those RDM’s you can delete them from your old SAN or un-mask those LUNs from your VMware hosts.

Loading

Share This Post

33 Responses to "How-To: Migrate MS SQL Cluster to a New SAN"

  1. Outstanding article. This is exactly what I needed. One thing you should change though is in part 7 step 8 use robocopy with the /copyall switch to copy the data over. If you have changed the default permissions in your source location (for example if you use a specific account to start the SQL services) the permissions do not copy using windows explorer. Robocopy keeps the source permissions.

  2. Thank you so much for this article, and the incredible detail! Those in this situation will likely also have a virtual disk (the C: drive of the guest) in addition to the RDMs for the SQL cluster.

    At what point/how is it recommended to migrate that disk? Is it as simple as an SvMotion after the RDM data is moved over (allowing the physical-mode RDM pointers to migrate beause they’re already pointed to the correct location), or is it more of a best-practice/requirement to step the cluster over after RDM migration by downing the passive node, removing the RDMs, migrating the virtual disk to the new datastore, re-adding the RDMs, powering up, failing over, downing the other node, and repeat…

  3. I am very curious to the use case for this?

    Maybe I am missing something. Since it is going to a new SAN all the disks have to be moved. Wouldn’t migrating to a new cluster accomplish the same with a little less work?

  4. Sure… but then you would need to create a new cluster name and IP and reconfigure your apps. Most customer that i have helped would rather just move the back end and not call the software vendor who will surely ask for a PO to do anything.

  5. Thank You Justin,

    I was just playing through my head possible use cases and avoiding changes to the app is what I was thinking was a possible fit.

    Have you experienced or thought of any inherent risks to this approach?

  6. One step that is worth mentioning after you’re done with the migration:

    Ensure that after you have migrated everything, you right-click the “SQL Server” resource in the Failover Cluster Manager, go to properties, the dependencies tab, and then add your SQL disk (or multiple disks if you’re splitting up Data and Logs on different luns, or have multiple SQL LUNs) with an “AND” operator. Hit OK.

    If you don’t do this, you might not notice anything right away, as the databases will mount normally; however, if you go to attach a database (or even detach, sometimes), you’ll start getting ACL errors about not being able to access the path to your database files.

    Adding these dependencies does not require a services-cycle, so you can do it any time without taking down your SQL cluster to fix this.

  7. Thanks for your Great article,

    just I want to make sure that there is no special configuration for System Database, as some articles recommend to rebuild the system database again using setup instead of copying it.

    Should I repeat step 7 if I have separate lun for Logs?

    Thanks, have a nice day

  8. Thank you for this article, it was very useful to me this weekend. Following these steps worked flawlessly for me.

  9. thanks for the article!
    I ran through this list while moving data for my sql cluster. I saw where you said “note: these steps are performed only on node 1”
    I copied all my data and changed drive letters on just node 1, started everything up. it all looks fine, but none of my drives are online for host 2, even the quorum disk.

    I guess I needed to change drive letters on node 2 as well? now none of them have drive letters. this is going to be messy.

  10. scratch that last comment. I found that as long as clustering services are running, the drive letter change does take place on both nodes as long as you change the drive letter in cluster manager. I got thrown off because the customer also had a mapped drive letter (by login script) that was conflicting with one of the actual drives. After removing it, and the old volumes, everything was able to shake out just fine.

    Thanks again!

  11. Justin,

    Your Documentation/instructions were fantastic!

    I seem to have run into a small issue, my SQL clusters is up, SQL services are up. However there is “File Server” section in the middle directly below the “Server Name”, that service/resource cant start, mostly because i removed the old disks. Is there a way to repoint the File Server service/resource to the new disks. after all i copied all the data from the old disks. Or do i need to delete and recreate this? similar to the DTC. any instructions would be greatly Appreciative

    thanks for your help!

  12. Your Documentation/instructions were fantastic!

    I seem to have run into a small issue, my SQL clusters is up, SQL services are up. However there is “File Server” section in the middle directly below the “Server Name”, that service/resource cant start, mostly because i removed the old disks. Is there a way to repoint the File Server service/resource to the new disks. after all i copied all the data from the old disks. Or do i need to delete and recreate this? similar to the DTC. any instructions would be greatly Appreciative

    thanks for your help!

  13. Your Documentation/instructions were fantastic!

    I seem to have run into a small issue, my SQL clusters is up, SQL services are up. However there is “File Server” section in the middle directly below the “Server Name”, that service/resource cant start, mostly because i removed the old disks. Is there a way to repoint the File Server service/resource to the new disks. after all i copied all the data from the old disks. Or do i need to delete and recreate this? similar to the DTC. any instructions would be greatly Appreciative

  14. Thanks for the article.

    After I copied SQL data drive to another drive with the same drive name, I could not bring SQL Cluster online. I thought it was related to file permissions after move operation.

    Then
    1- ) I re-copied SQL Data via ” robocopy Source Destination /MIR /SEC /SECFIX”
    2-) Under “Roles–>> SQL Server —>> Other Resources–>> SQL Server
    Right click SQL Server, select properties, click “Dependencies”. Change cluster disk value to new disk

  15. Great article.
    SQL Server is working fine with all resources online but only on SQL-01.
    When I try to failover to SQL-02 it fails on bringing “SQL Server” and “Agent” resource online. All Disks are moving fine to second server and are accessible.
    Any advise?

  16. Thank you for these clear concise instructions, worked a treat. Amazing how hard they were to find 🙂 We are mid SAN migration and have moved most data by virtualising old behind new and mirrored LUNs. However can’t do it that way for Cluster storage and this was my answer. Thanks again

  17. I want to thank you for this tutorial which helped me enormously. I Migrate from one SAN to another SAN and all of it was successful with the steps you detailed.

  18. Justin ,

    Thanks for the article.

    I’m in situation where our organization want’s to migrate data from one SAN to another SAN (Array-Array Migration ).

    I’ve a DB server which is configured with High availability solution(DB-Mirroring) for couple of databases. Is there any steps or things to keep in mind while performing the migration( if databases are configured with high availability). Any advise?

  19. Hi Justin,

    I have the requirement to Migrate the SAN from Compelent (legacy storage ) to Hitachi for a Two node VM running on Windows 2003 with SQL 2005 MS cluster. The above post looks good for Windows 2008 failover cluster but would like to check if you can share some details for Windows 2003 SQL 2005 SAN Migration. (we currently cannot migrate the cluster to 2008/2012 due to some legacy app dependency.)
    Thanks for your input in this

  20. I have 2 nodes Cluster running DB .my requirement is , we want are migrating the data from one SAn to another SAN
    We are migrating teh data from Clarion to Unity San and using the Powermig tool

    my query is

    Firstly ,Currently the Qurum disk is running on one of the Partation /emcpowera1

    if u use powermig src emcpower1 to emcpower(sayb) -hostcoy

    Will it move the Quorum disk on new SAN ?

    2. Secondly , if i assign new Quorum disk ( say disk1 ) seperatley say 2 GB , once i created that and change the entries in cluster.conf
    during this period i have stop qdisk on all nodes ,at this time will my existing cluster will work or not . or will loos the DB

    Thanks
    D

  21. @Justin Paul
    Your explanation is great. However just want to understand what we do with the SQLServer Binaries. It is installed on the local drive. This local drive will also be moved to the new storage. How should we handle this ?

Post Comment