RecoverPoint 4.0 – Unable to join clusters (Gateways and Maintenance Mode)

Hey folks!

I have been working with EMC RecoverPoint for a while now, and I really love the product. It is one of those products I get excited about because once it is installed, it just works!

For those who aren’t familiar, EMC’s RecoverPoint is a replication appliance (physical or virtual) that can be used to provide point-in-time recovery for your applications. Once configured, you create consistency groups of volumes that can be replicated to a remote copy. What makes this different from most replication technologies is RecoverPoint uses a separate journal to keep track of every single write operation to the production volume. Similar to your television’s DVR, you can move backwards and forwards to any point in time that exists in the RecoverPoint journal.

With the launch of RecoverPoint 4.0, EMC added the ability to support multiple site configurations. Now you can replicate data between a production and disaster recovery site, and a tertiary site. Actually, a production source can have up to (4) replica copies.

The support for multiple clusters has changed the way we build the RecoverPoint system. A system was previously comprised of 1 (RP CDP) or 2 (RP CRR) clusters and we had to define each clusters’ environment (IP/DNS/Zoning/Storage) upfront. Now, we do this for each cluster, then ‘Connect’ each of the clusters together.

Admittedly, during the install I have run into a few issues that I hope to expose here.

Here is the basic process:

  • Define the IP configuration (LAN/WAN) and gateway settings for each cluster.
  • Run the RecoverPoint Deployment Manager (DM) against each cluster.
    • A cluster is defined as a collection of RecoverPoint appliances that are collocated, and logically function together.
  • Rerun the DM and select “Connect cluster wizard” and chose the “Prepare new cluster for connection” option.
  • Finally, rerun the DM and select “Connect cluster wizard” and chose “Connect new cluster to an existing system”

Whoops! You only need to prepare one site:

Here is where I ran into my first issue. I ran this command against both clusters in the system I was trying to create. This is a problem because during the ‘Prepare’ operation, the cluster is put into maintenance mode. If both systems are in maintenance mode, you will receive an error while attempting to join the clusters together. The error states it is in maintenance mode, but doesn’t mention how to undo this.

Solution (Finish_Maintenance_Mode)

Use the following procedure to exit maintenance mode on one of the clusters.

  1. SSH into the RecoverPoint Appliance for the system you want to take out of Maintenance mode.
  2. Login using the ‘admin’ account to access the console. (Using the boxmgmt account will not work)
  3. Enter the following command to exit maintenance mode: “finish_maintenance_mode”

– Now the clusters can be connected using the RecoverPoint Deployment Manager’s “Connect Cluster Wizard”, and selecting the “Connect new cluster to an existing system” option.

– On the following page, remember to select the cluster that has NOT been prepared. In our example this is Cluster1. The virtual IP (VIP) of the cluster is “1.1.1.10”

Default Routes (Gateways) in RecoverPoint 4.0

Here is where I ran into the second issue. Depending on the network configuration, the local area network (LAN), and wide area network (WAN) may be in one of (2) different configurations.

  • LAN and WAN are on the same subnet
  • LAN and WAN are on different subnets

In our example, the LAN/WAN are on the same subnet as seen below:

If the LAN and WAN interfaces were on the same subnet, then they should use the same default gateway. If either of the clusters interfaces need to reach something outside of the local subnet (the other cluster in this case), they would send the data to the default gateway, which would route it along to the destination.

When I attempted to join the two clusters, it failed, stating that the clusters did not have an IP connection between them. To verify this wasn’t a bug in the DM, I SSH into a RecoverPoint appliance at each site and ran some connectivity tests (See “Testing Site Connectivity” below). Sure enough, when I tried to ping the remote RecoverPoint appliances WAN address from the RPA, it would fail. The interesting thing was that ALL the LAN/WAN IP addresses could be reached from a client on either subnet. This meant the WAN interface was using the default gateway “sometimes”, but wasn’t when the RecoverPoint cluster was initiating the traffic. When it would attempt to connect to the remote site, it didn’t know how to route.

Here is the broken gateway list. This gateway should apply to both LAN/WAN interface, since 1.1.1.1 is the default gateway for both interfaces.

From the menu, select:

Setup->Enter cluster Richmond details->Connectivity settings->Gateways configuration-> View cluster gateways

clip_image006[4]

 

After contacting support, we identified that we had to create a route for the remote subnet. We did this be adding another gateway. This effectively told the RecoverPoint cluster to use the default gateway for any traffic destined for the remote cluster’s subnet.

From the menu, select:

“Setup->Enter cluster Richmond details->Connectivity settings->Gateways configuration->Configure default gateways->Add a gateway”

Here is the working gateway list:

Now you should be able to reach the remote WAN interfaces. Remember, you will probably have to repeat this for the remote cluster’s gateway configuration.

Testing Site Connectivity

Here are two ways you can test site connectivity.

Ping a particular host:

  1. SSH into an RPA using the boxmgmt account.
  2. From the menu, select “Diagnostics->IP Diagnostics->Cluster Connectivity Tests->Other Hosts”
  3. Enter the IP address of the remote RPA

The results should look similar:

Successful Connectivity:

WAN Connectivity Issues:

You can also specify the interface to ping from:

  1. SSH into an RPA using the boxmgmt account.
  2. From the menu, select “Setup->Advanced Options->System Internal Operations->Run Internal Command”
  3. Enter “ping –I eth0 2.2.2.11”, where ‘2.2.2.11’ is the IP address of the remote RecoverPoint appliances WAN interface. In this example, we are specifying eth0 which is the WAN interface of the local RecoverPoint appliance.

Thanks for visiting!

Comments 11

  1. Good work!! This write up has helped to get past the cluster wizard for RP 4.0, Thank You.

    -nikhil

    1. Post
      Author
    1. Post
      Author

      Thanks for the feedback hitman. Hopefully you take this into consideration if you should find me listed as your next target.

      Thanks again for visiting!

      Martin @Ubergiek

  2. Hi,
    Is there a way in RP where we can run a DR test on a particular CG while the replication is still on?

    Reg

  3. Good ideas – I Appreciate the information – Does someone know if my business might be able to locate a fillable Family Group Sheet version to use ?

  4. Hello admin, i must say you have very interesting articles here.
    Your website can go viral. You need initial traffic only.
    How to get it? Search for: Mertiso’s tips go viral

Leave a Reply to Neil Cancel reply

Your email address will not be published. Required fields are marked *

*