OpsMgr R2 XPlat Agent Deployment - Field Notes

I have recently been at a customer to implement a fresh installation of Operations Manager R2 into their production environment and what made this deployment even more interesting for me was that management of the a the UNIX servers was required.  So I have put together some pointers that will hopefully help you if you have any problems rolling out the x-plat agent.  The primary flavours of UNIX I was working with were Solaris 8/9/10 and Red Hat 5.
 
Firstly, I must say that overall the deployment of x-plat agents went well and if it wasn't for some patching requirements on the Solaris servers the roll out would have been smooth as the Windows agent roll out.  This was a great experience and I think MS have done really well at this first attempt of x-plat management.
 
So, some of the minor issues I came across during the roll out and their fixes.
 
1.  Certificate errors
There are a number of different certificate based errors you can receive but most of them will be caused by one of two issues.
  • An incorrect certificate already exists
Check the certificate path to see if a certificate already exists.
/etc/opt/microsoft/scx/ssl
If a certificate already exists but is not required, delete the ssl directory and re-attempt the agent installation

Check the certificate has been created with the correct details.  Run the commands below and check the subject contains the correct server host name information as well as the cert issuer and the notBefore and notAfter dates.

openssl x509 –noout –in /etc/opt/microsoft/scx/ssl/scx.pem -subject -issuer –dates

/opt/microsoft/scx/bin/tools/scxsslconfig -v

xplat cert

If the certificate has been created incorrectly you can create a new certificate and force it to overwrite the old one.  Then restart the agent service.

 

/opt/microsoft/scx/bin/tools/scxsslconfig -f
scxadmin -restart

If the certificate is still incorrect it will probably be due to a host name configuration issue which can usually be resolved by checking and editing the resolv.conf file (See Host name configuration issues below).

  • Host name configuration issues
A common issues being seen is that the host name configuration of UNIX/Linux server is not set correctly.  It is highly possible that this information can not be complete or fully correct and the server function normally, however, OpsMgr requires this information to be correct to generate a good certificate.  All host name config issues I have experienced have been related to details in the resolv.conf file not being correct, like incorrect name or more commonly, missing domain name.  Resolve host name issues and you'll get a lot further with your agent deployment.

/etc/resolv.conf

 

2.  Prerequisite issues
Sometimes when pushing out the agent you can get very helpful messages back stating that that the target server does not meet the prerequisites required, however, sometimes the error isn't quite so friendly.  I have found that it is quite common to receive a message when an agent deployment fails stating that multiple logins for root authentication failed but this isn't strictly the reason for the failure.  If the target server does not have the OpenSSL prerequisite installed, an SSH connection may not be able to be established which means the discovery script or agent installation kit can not securely copy over to the server.  At this point OpsMgr does not know that OpenSSL is not installed so will return the authentication error.

I discovered the prerequisite issue by attempting a manual install of the agent which was able to provide much better error information so that might be worth trying if you are experiencing agent deployment issues that you can't get to the bottom of.  Manual agent install guide available here, http://technet.microsoft.com/en-us/library/dd789016.aspx.


3.  Agent deployment failed -2147024809
I'm actually still not sure about this one, I have received it a few times on the first attempt of a push agent installation but have resolved it every time by pushing the agent out again.  I am still investigating and will update my blog if I have any findings.
 
If you are experiencing different issues or my fixes do not work for you, try enabling OpsMgrModuleLogging.  This is easily done and provides much more detailed information as to why an agent deployment failed.  All you need to do is place a file named EnableOpsMgrModuleLogging in the %windir%\temp directory on the management server that will manage the x-plat server.  Once the file is in place, attempt to deploy the agent again and log files will appear in the %windir%\temp directory as if by magic ;-).  I will be doing a web cast of this in the next few days and will also post it on my blog.

Hopefully this post will be helpful to you if experiencing issues with your x-plat deployment.

David