Today I was troubleshooting some performance issues between two Open-E SAN’s and found some pretty interesting information for anyone who is considering going with this product. Before I get into what I found let me explain some background information and what caused me to spend the afternoon testing things.
We use an Open-E iSCSI SAN at our main office as a cost-effective block-level storage appliance to house data that has been replicated from various sources to our datacenter (lets call this Node-A). We also have an identical Open-E SAN at another facility about 30 miles away (let’s call it Node-B) which we use as a synchronous mirror of Node-A. Between the two datacenters is a Point-to-Point 20Mbps fiber connection. Well earlier this week I brought a drive back to seed a replication on to the SAN, I plugged the drive into a server and started a file copy to the SAN. I was surprised when I was informed by Windows that it was going to take about 27 hours to copy the file (it was pretty big, but should have only taken maybe a few hours at the most). I tried several other methods to get the data on the SAN but each method seemed to be capped at 20Mbps.
Normally I would expect a SAN that is doing synchronous replication between two nodes to keep a queue of data that needs to be replicated to the other node so that incoming traffic is not waiting on acknowledgments from the remote node. Basically to allow then to keep things running, and then catch up later on. This is what I’m used to with the HP P4000 line of SANs, but then again the P4000 also requires a Gigabit connection for synchronous replication. This is also the same behavior that Open-E does if it cannot contact the remote node… so If the primary node is servicing IO to a server, but the secondary node is offline, Open-E will just keep track of the blocks that have changed since the other node went offline. Then when its back online it will replicate the changes so that both nodes are consistent.
OK so here is what I found.
If the link between the two nodes is X Mbps then IO operations to the primary node are also limited to X Mbps. So in our case, we are limited to 20 Mbps to our primary node, after 20Mbps the SAN will rate limit incoming data so that the two nodes are kept in sync. To test this I set up two servers both running Open-E V6, and plugged both into a gigabit Cisco switch along with a VMware ESXi server. I then created an iSCSI LUN and a VMware datastore on that LUN. I then installed a Debian VM from which I could run the linux command ‘dd’ from.
While everything was running on gigabit ports I ran ‘dd if=/dev/zero of=temp.img bs=1M count=1024’, this test will write 1GB of data to the drive. As I waited I watched esxtop’s network screen I witnessed speeds as high as 200-300 Mbps, and the command took 83 seconds to write the 1Gbps to the Open-E san.
So the next thing I did was configure the port that the second node was plugged into for 10Mbps Full Duplex. I then reran the ‘dd’ command and monitored esxtop, now the speeds I saw never went over 9Mbps. So because Open-E could only replicate to its second node at 10Mbps, it is rate limiting incoming IO to that same speed. When the second node was on a 10Mbps port, it took 1620 seconds to write the exact same command.
Here is the VMware graph of Disk usage while I was running these tests. From 10:20-10:30 I was installing Debian, which is what caused the random disk usage. Then at 10:35 I ran the first test with gig networking between everything. And then as you can see about 10:47 I started the same test while the second node was on the 10Mbps port. Clearly, rate limited because of the synchronous replication.
The last thing I wanted to test was a what-if type situation: What if I’m using a node at site A to store data and maybe run a few VM’s, and I’m replicating to another node on the same campus (gigabit speed) and for whatever reason that node is unreachable for a period of time.
To test this I ran the test again but while I was writing data to the primary node I unplugged the secondary node from the network. I left the transfer finish and then went into the Open-E web interface to check to make sure it knew it was in an inconsistent state (it did). I then plugged the secondary node back in so it could replicate about a gigabyte of data.
What I found is that while the SAN is replicating data from one node to another, it is unable to service IO requests that the primary node receives from my VMware host. Check out the Linux screenshot below, as you can see the host is getting delayed write failures… definitely not a good thing. Also, there is a screenshot of the web interface showing the SAN’s resyncing and it pausing the IO of the Linux VM while it is trying to write another gig of data.
Bottom line is that if you are doing synchronous replication between two Open-E nodes and the secondary is unavailable for some time… when it comes back online and there is a large amount of change data that needs replicating… your workload that is attached to the primary node will not be able to read or write to its LUN’s until the SAN nodes are synced up.
Obviously, I do not recommend using this product for mission-critical applications.