Home Virtualization VMware SRM FAQ part 1
VMware SRM FAQ part 1 E-mail
Written by Administrator   
Tuesday, 04 May 2010 09:05

SRM FAQ - part 1

Michael created a great VMware SRM document and this FAQ is part of it. I want to thank Michael for sharing it with the rest of the world.

Generic

I want to install SRM, what do I need to do?
It is important to understand the SRM installation overview. You must install using the order of operation as shown
in the lab section of this document.
 You must do this on the protected site first, followed by the recovery side. Here is the outline:

  1. SRM application installed at Protected Site
  2. SRM application plug in installed in VI clients that connect with the Protected Site
  3. SRA installed at the Protected Site
  4. SRM application installed at Recovery Site
  5. SRM application plug in installed in VI clients that connect with the Recovery Site
  6. SRA installed at the Recovery Site
  7. SRM configured at the Protected Site
  8. SRM server pairing
    1. Array Configured – both Protected Site and Recovery Site
    2. Inventory Mapping
    3. Protection Group
  9. SRM configured at the Recovery Site
    1. Recovery Plan created

You should now test and tweak SRM. Remember the goal is to have the required VM’s running at the recovery site
in the least amount of time.

What does an SRM lab require?
The ideal SRM lab requires the following:

  • Two VirtualCenter servers
  • Each VirtualCenter server would require at least one ESX server, and the Recovery should have two to
     show the integration with DRS as part of a recovery plan.
  • Each of the two sites requires shared storage that replicates. And it needs to be on the compatibility list.
     This shared storage can be the NetApp simulator, HP / LeftHand VSA, or the EMC Simulator.
     It can also be actual hardare based shared storage that can replicate.

Some of the activities that can be shown would include:

  • Test failover
  • Actual failover
  • Failover with IP customization
  • Failover where multiple VM’s start on various ESX servers
  • The use of a virtual switch that can connect VM’s on different ESX servers to a private network.
     This is very useful for testing. By using a VLAN testing is possible that doesn’t impact the public network.
    Remember that the test bubble network that can be used in SRM only provides for communications on a per ESX host basis.

When does SRM raise VC events?
SRM will raise VC events for the following conditions:

  • Disk space low
  • CPU use exceeded limit
  • Memory low
  • Remote Site not responding
  • Remote Site heartbeat failed
  • Recovery Plan Test started, ended, succeeded, failed, or cancelled
  • Virtual Machine Recovery started, ended, succeeded, failed, or reports a warning

What are the recommended minimum alarm notifications?
We suggest the following alarm notifications. You can set them on the Alarm tab of the SRM status summary page.
Most organization will utilize email notifications but there are other choices as well.
Remember to set these suggest alarm notifications at both sides.

  • Remote Site Down
  • Remote Site Ping Failed
  • Replication Group Removed
  • Recovery Plan Destroyed
  • License Server Unreachable

How do I plan for disk utilization due to SRM database?
Recently we brought out the database sizing tool. Find it at
http://www.vmware.com/files/pdf/Site_Recovery_Manager_1.0U1_Database_Sizing_Calculator.xls.

Where can I find help for installing different array products?
The obvious is you can always visit the vendor of the array for help in the form of documents.
You can also find information elsewhere. A VMware support person has written how to guides for a variety of different arrays. They can be bound at
http://viops.vmware.com/home/people/chogan?view=overview . He has done an excellent job and I hope that his guides help you out.

How can I capture the log and configuration information for support to work with?
This is most easily done after Update 1 by the use of the “Generate Site Recovery Manager Log Bundle” command in the VMware
 \ VMware Site Recovery Manager Start Menu folder. Run this command on the SRM server.
 This command will produce a zipped file on your desktop. It will be in a MM-DD-YYYY-HH-MM.zip
format where is it Month – Day – Year – Hours – Minutes. Always provide the logs with your request for help!

What is the account that is asked for during install used for?
The 1.0 installer prompted for a username during installation. This is the account SRM will use to communicate with the local VC server.
 Since SRM constantly monitors the local VC inventory, this user will be constantly logged into the local VC server.
Changing the password for this account will make it impossible to use SRM. Please note that this should be an account in the Administrators group.
By default, when you install SRM 1.0 or SRM 1.0 U1, all accounts in the Administrators group have complete access to SRM managed objects. Again,
this has not changed with U1. Please try to use AD accounts when you install SRM, and when you log into SRM. Using local accounts can work, but it is a little tricky.
If you need some guidance on using local accounts I can help. This account is NOT the account used by the system – the SRM service uses the Local System Account.

Can I change the IP information for the SRM server?
I would like to change the IP info for the SRM server once it is installed. Is this safe or is there a specific way to do this without issues? When changing the IP info for the SRM server, or if the credentials (account or password) need to be changed you will need to use a special utility to accomplish either of these changes. Once the change is done you will also need to pair the two sites again. You can find detailed info on how to do this on page 85, in Appendix C of the
SRM Admin Guide.

How do I add a script to a Recovery Plan in a call out?
When you add a script to a call out in a recovery plan, it is an empty dialog. Use the information below to add a script that will work as expected. It is important to understand that the scripts or commands must be in the path of the VirtualCenter.

  • Use full paths to all executables – for example “c:\windows\system32\cmd.exe” instead of “cmd.exe”.
  • You can use .exe or .com files only! Command line scripts can only call executables.
  • To run a batch file you should start the shell command with “c:\windows\system32\cmd.exe”. So it would look like “c:\windows\system32\cmd.exe /c c:\scripts\alarmscript.bat”.

How do I change the value for a script timeout?

You can increase or decrease this value by editing the SRM configuration file (vmware-dr.xml). Look for the following section:

<calloutCommandLineTimeout>600</calloutCommandLineTimeout>

Change value to the appropriate value.

During the configuration of SRM I receive a timeout after 300 seconds, how do I change the value for this timeout?

You can increase or decrease this value by editing the SRM configuration file (vmware-dr.xml). Look for the following section:

<CommandTimeout>300</CommandTimeout>

Change value to the appropriate value.

I would like to use trusted certificates with SRM – help!
You can use your own trusted certificates with SRM but it is more complicated than you might expect. There is some excellent information to help you be successful at
http://viops.vmware.com/home/docs/DOC-1261 .

What happens if you move one of the protected VM’s to a datastore that is not part of the VM’s current protected group?
Protection will be revoked for the VM. It will have a small yellow triangle associated with it in its protection group. This will be true even if you move (such as storage VMotion) the VM to another different datastore that is replicated to the recovery site.

Can network customization work for operating systems other than Windows?
Yes. This includes operating systems from Novell, and Red Hat. The specific version information can be found in the SRM Compatibility Matrix document.

Understanding order of operation for bringing VM’s back online.
During the recovery period, the order of recovery VMs is not as obvious as it may suggest. Normal and Low priority protection groups (VMs) will be started one VM per ESX host at the same time. So you could have a number of Normal priority VM’s starting at the same time – but spread across various ESX servers. However, High priority starts VM’s serially regardless of how many hosts are involved. Misconfiguration of the security for storage arrays may impact the start order of VM’s. For example, if the security of the array means it cannot talk to a particular ESX host than that host will not be used to start VM’s during a recovery plan. It is possible to see this without any obvious error messages!

Can I fail-over VMs which have disks on two different arrays, for instance NetApp and EMC?
No, although you can install SRA’s of multiple vendors failing over a VM which has a disk on both arrays will not work.

What does the Repair button do?
The repair button is used when the protected site is not available, and some array reconfiguration is required. Normally it would be done at the protected site, but if it is not available than the repair button can be used.

Is it all over when the recovery plan fails?
You can have a recovery plan fail with some sort of error, but it will complete anything that it can complete. You could then address and solve the error, and run the recovery plan again and if you have correctly address the error your test may in fact correctly complete this time. It will not redo things that it has done correctly already. Once I had a problem with a VM starting and I let the replication finish, did a manual HBA refresh, and tried again. The two VM’s that had already started were not touched, but the third VM that had finished replicating now, was in fact started.

Troubleshooting

Where is the new Run and Test privileges?
After you update to Update 1 you should see a Run and a Test privilege in the roles and priviledges area but you may not. Restart VC and you will see them.

Where are the SRM server logs stored?
They can be found in:

C:\Documents and Settings\All Users\Application Data\VMware\VMware Site Recovery Manager\Logs

You will need to check the vmware-dr-index file to see what is the current log file.

I see a lot of recomputed datastore failures in my mixed 2.5 / 3.5 environment, what’s happening?
If you have ESX 2.5 hosts accessing a protected datastore you will see datastore recomputed datastore failures. Remove the ESX 2.5 host from the datastore.

I’m having pairing issues and it fails at specific %, why?
If you have an issue at approximately 24% it could be related to the license file not being live or installed. Reread the license file or restart the license service.

If you have an issue at approximately 82 or 84% you should make sure that the account you used to connect to the Recovery site has both VC and SRM admin rights. The specific role for SRM is Protected Site Administrator and on the Recovery Site it is called Recovery Site Administrator. This issue occurs most in a Microsoft domain world. The Administrator role includes both the Protected and Recovery site admin roles.

Things to check during troubleshooting of pairing issues would include firewalls between the sites and is the recovery site running VC successfully?

I’m configuring the SRDF SRA and although we replicated storage and it contains VMs I still don’t see “replicated LUNs”.
After checking all the configuration settings on the SRA side, SRM side and the SAN we noticed that the SPC-2 bit was not enabled. This setting is mandatory according to the
FC San Config Guide(page 57) and solved our issues.

“Failed to connect to the management system address when executing the discoverArrays command.”
You should not often see this but it can be addressed by making sure the SRA is in fact installed on the recovery side. You may also need to check routing between the sites (in particular to the Recovery side SRA / storage management interface.

How do I change the SRM change of power state time out values?
The default value is 120 seconds which might not be long enough and could lead to issues when a power off is forced of a VM. You can increase or decrease this value by editing the SRM configuration file (vmware-dr.xml). Look for the following section:

<Recovery>
<powerStateChangeTimeout>120</ powerStateChangeTimeout>
</Recovery>

If this section is not in the .xml file add it. Don’t forget to restart the SRM Service.

Error: Failed to recover datastore:
This error usually indicates that the recovery side cannot communicate with the array on the recovery side. In the SRM logs on the recovery side you can see a Mapped LUN line (s) that will help you see what the protected side is mapped to on the recovery side. This will sometimes help you fix this error message.

We noticed a “SRM unlicensed error” in the logs but we have a good license installed.
If you change the SRM license file(s) you may have a small issue, as it is not the same process as changing an ESX or VC license. You would follow the normal steps of dropping the file in the license folder and rereading the license folder in the license tool. This would be enough for VC or ESX but is not enough for SRM. You could after these steps see the license in the VC Admin License view, but would still see the unlicensed errors in the SRM log. You need to restart the SRM service for the new license change to occur.

I cannot uninstall SRM successfully – what can I do?
Uninstalling SRM will normally require access to the VC that it is paired with. If you do not have that VC running it is hard to uninstall SRM. If you don’t cleanly uninstall SRM you cannot install it again. It is possible to uninstall with no VC if you read the screens carefully and answer appropriately, but I have seen where that doesn’t work. Use one of the ideas below to help if you need it. It is always best to use the Add Remove programs method to uninstall but if that doesn’t work the ideas below should.

msiexec.exe /qn /x {35A202EA-1549-4592-97A5-65F5E4CCDEC9}

Microsoft’s uninstall utility: http://support.microsoft.com/kb/29031

Only three Recovery Plans can run at the same time.
Not sure what the error message is if you try to do more than 3 but at least you now know that only 3 should be executed at the same time. This is due to the QA level of testing and will be significantly improved in the future.

Can I automatically rename my datastore back to it’s original name?

Edit the vmware-dr.xml file in the C:\Program Files\Site Recovery Manager\Config directory and look for a line that reads:

  • <fixRecoveredDatastoreNames>false</fixRecoveredDatastoreNames>

Change it to:

  • <fixRecoveredDatastoreNames>true</fixRecoveredDatastoreNames>

Can I change the administrator’s email address after the installation?
Extension.xml is the configuration xml file where you can change the Administrator Email:

<adminEmail>
 This e-mail address is being protected from spambots. You need JavaScript enabled to view it
 </adminEmail>

Why is Port 80 used in the install but port 443 later?
During install of SRM port 80 is specified and you cannot type in 443, but after the install is complete than SRM talks to VC on 443, so why is 80 specified in the install? Even though SRM uses SSL when it communicates to VC, it does not use port 443. SRM establishes a TCP connection to port 80, than uses an HTTP CONNECT request to establish a tunnel to the VC servers, then does an SSL handshake with the VC over that tunneled connection. The SRM installation enforces these semantics.

I need to rescan my storage twice before I actually see my LUNs can SRM also do this?
To enable the additional rescan, edit the vmware-dr.xml file at both the protected and recovery sites to add a <hostRescanRepeatCnt> element within the <SanProvider> element. Set the value of <hostRescanRepeatCnt> to 2, as shown in the following example:

<SanProvider>

 

.

 

.

 

.
<hostRescanRepeatCnt>2</hostRescanRepeatCnt>
</SanProvider>

 

For SQL server use, does the SRM DB user need the DB_OWNER permission?
For SQL server, the SRM DB user doesn’t not need the DB_OWNER permissions. As long as the schema has the same name as the username, and is the default schema for that user, and is owned by that user, then you are ok.

Unexpected MethodFault (dr.san.fault.ManagementSystemNotFound)
This error occurs after you upgrade the EqualLogic PS Series Interface SRA adapter to the Dell EqualLogic PS Series Interface. You can uninstall the new SRA and install the old one as a work around, but there is another option. You can locate the manifest.xml file in the SRA installation directory, modify the SRA name in it, and restart the SRM service and you would be good to go.

The password of my SRM account has changed how do I change the password for SRM?
You can have some issues with changing account passwords after everything is working. In theory you can use the installcreds.exe file but it has been reported to not always work. In a near future there will be an update to make this process easier but for now you must use the srm-config.exe command. When it is complete you will be able to restart the SRM service and have communication between the SRM servers (will need to repair the communication by doing the pairing again). The format is complex for this command. You must ran it twice, the first time to obtain a thumbprint, and than the second time to actually make the change. Below is a sample command line. This utility is found in the bin directory of the c:\program files\VMware\VMware Site Recovery Manager\config folder. You can find parameter names (such as value for –sitename) in the vmware-dr.xml file found in the config folder.

Srm-config.exe –cmd confuserbased –sitename <local site name> -cfg <SRM configuration file>  -u <username> -vc <host[:port]> [-thumbprint <sha-1 server certificate thumbprint]  
Srm-config.exe –cmd confuserbased –sitename srm-primary –cfg vmware-dr-primary.xml  –u administrator –vc 10.10.10.10 –thumbprint 96:E0:E8:F5:59:1C:BF:6D:81:6C:A2:AB:51:76:24:DE:31:D1:E8

Without the password you will need to use the thumbprint. So run this command the first time without the thumbprint parameter and you will be shown the thumbprint and than run it again with the thumbprint.

If your site name contains spaces enclose the name in quotes.

Last Updated on Tuesday, 04 May 2010 09:19