Archive

Archive for the ‘Management’ Category

Application Troubleshooting: An Error Message, Without Context, is Worthless


Note: This blog contains free stock photos that were too hilarious to pass up.

I was inspired to write this article based off an internal discussion I was involved with where someone was requesting a comprehensive list of all possible event log messages delivered in Windows Server. I have always been interested in the reasoning behind an “ask” such as this because it could be misguided. Where to begin? First of all, it is quite a loaded “ask” as it is not specifying whether we are talking about all of the in-box default error messages relating to Windows – the operating system specifically, or are we also talking about all of the roles and features which also contain event providers? Where would all of these go? What would this be used for? Second of all, what is the purpose?

iStock-Unfinished-Business-10

Answers always baffle me: “Oh, for the help desk.” “As a reference.” “For our in-house troubleshooting guide.” The last example leads me to what often results in a bi-directional obtuse conversation where I re-ask what are the use cases and contexts of these error messages. Will the person requesting the error messages also be attempting to follow-up on each error message to determine all of the possible situations in which these may occur? That would be quite an insurmountable task. In most cases, no, these will be either copied-and-pasted into a “guide” – sometimes even being printed out into a very thick, but mostly useless material reference.

HRESULTS vs. Event Logs

First of all, the event log message tells you everything you need to know about the ERROR itself. This has evolved significantly over the past decade with advancements in the Windows. Often, you will get errors/exceptions thrown from an application and the level of description will depend on the generosity of the programmer/developer of the application. One thing is for sure – at the very minimum in most cases, an integer-based or hex-based response will occur. The most common of these are HRESULTS. A few years ago, due to the nature of overlap between operating system components and external software, a tool called ERR.EXE was made available on the Microsoft Download Center (http://www.microsoft.com/en-us/download/details.aspx?id=985.) This tool parsed all of the known header files for windows and applications to provide the corresponding strings and descriptions of known HRESULTS and error codes within the Microsoft software ecosystem. The use of the utility would yield all known matches such as the example below:

err1

This utility was especially useful because it could also automatically translate the decimal-based equivalent of an HRESULT:

err2

But . . . Here’s the Thing. Where do you go from Here?

Well, how did you get there? What was occurring when you encountered that error? For example: Let’s say you get an HRESULT 0x80000005. You use the ERR tool (or you know from memory) that it is “Access Denied.” What was occurring when this happened? Let’s say this occurred within an application. You could then leverage Process Monitor or another tool to trace the issue to see what it file/registry entry/etc. the program was trying to accessed. There is no way to give anything more than a generic recommendation without additional context related to the error.

In the case of an event log entry, there is much more information. The source component, Event ID, description, and more – along with additional XML detail. In the example below, what else would you gather from this simply looking up Event ID 36888 other than what you see in the dialog boxes?

event1

That is where the scenario in which it happened plays an important role in determine what the root cause of this error is. In some cases, only one situation warrants a particular error and resolution. These are the ones we love – yet they are rare. In most cases there will be more than one scenario.

Why are Some Event Messages More Descriptive than Others?

Like with applications errors, event log entries vary regarding the degree of detail. In many cases during the development and evolution of a component, particular errors are hypothetically conceived during the development cycle and are laid out with the event tracing framework. These get further nailed down in the test and beta phases and the events are adjusted accordingly. At the release time as many clear-cut, known issues are mapped out through release notes and knowledge base articles. This process continues through the lifecycle of the component or software.

iStock-Unfinished-Business-9

But again, these are only events. On all of my machines I have ever worked with, I would venture to say that I have only ever encountered a small fraction of all of the potentially available windows events which warrant errors or warnings. There are easily over 200,000 ETW and legacy windows events. Why reinvent the wheel and create a huge list for searching when you already have BING at your fingertips. And with Bing, you can search the event\error with additional contents.

What is probably the most beneficial troubleshooting assistant is the known resolution – in the form of a knowledge base article. The “Symptom(s)” section of most types of knowledge base articles is where the context is mapped out with as much detail as possible. It is usually in the form of “You are running [SOFTWARE\COMPONENT] and you are performing [ACTION]” or “you are attempting to [CONFIGURE\LAUNCH\RUN\CLICK ACTION] and you encounter the following [ERROR\MESSAGE]

iStock-Unfinished-Business-2

The Symptom(s) section is usually followed by the “Cause” section. Often this section is prefaced with “This issue may be caused by . . .” indicating that this may not be the only cause of the issue. Yet this particular cause will be remediated\resolved by the “Resolution” section. That format is what makes the knowledge base article so great. It cuts right down to the break-fix scenario. Symptom-Error-Cause-Resolution.

Building knowledge bases “from the error up” is not an optimal way of building out a framework for a help desk. These are built and drive by the overall experience of the issue itself. Working in support for many years, one of many elements that separated the true escalation engineers from what could automated with a diagnostic utility or front-line support was the ability to resolve an issue for the first time. While the first question was usually “What’s the Error Message?” it was the second question that was most important – “What were you doing?”

iStock-Unfinished-Business-3

Manageability Lingo, Standards, & Acronyms, Oh My!

January 10, 2015 Leave a comment

acronyms

Since the dawn of the 21st century (and even before) you have been hearing many items related to acronyms interchangeably describing manageability features within Microsoft products (as well as others.) For example, WMI has been at the heart of most Microsoft Manageability products and solutions given the fact it is one of the primary interfaces within the Windows operating systems. While Microsoft’s WMI ties mostly to its products, it is based upon a series of open, universal standards. And this is the heart of deciphering how acronyms and standards can be interchangeably used to describe the same entity.

So let’s weave through the sometimes confusing relationship between these manageability acronyms – WBEM, WMI, CIM, DMI, DTMF, WSMAN, WinRM, and SNMP of protocols/interfaces/standards. In this little game, I will try to go through these acronyms within the average blog post attention span. WMI is Microsoft’s implementation of the open Web-Based Enterprise Management (WBEM), which comes from the Distributed Management Task Force (an industry organization.) WBEM relies on protocols – which can come from legacy standards such as RPC (Remote-Procedure Call) or DCOM (Distributed Component Object Model) or more modernized http-based SOAP standards such as WinRM (Windows Remote Management) based on the WS-MAN (Web-Services Management) standard. SOAP (Simple Object Access Protocol) itself, is is a command extension protocol designed to be used with HTTP (Hyper-Text Transport protocol – or the web) or SMTP (Simple Mail Transport Protocol – or internet email.)

The WMI interface – based upon the WBEM standard – is built upon an infrastructure centered upon the Common Information Model (CIM) and its respective Object Manager (CIMOM), is what links management applications and providers. The infrastructure also serves as the object-class store and, in many cases, as the storage manager for persistent object properties. WMI implements the store, or repository, as an on-disk database named the CIMOM Object Repository. As part of its infrastructure, WMI supports several APIs through which management applications access object data and providers supply data and class definitions.

Beyond WMI, WBEM’s architecture extends to a variety of underlying technologies besides WMI and Win32 because not everything is or will always be on Microsoft technologies – including the Desktop Management Interface (DMI), and the Simple Network Management Protocol (SNMP)  Some of these standards define data storage schemas as well as interfaces. Some define commands within communication protocols. Some or more modernized. SNMP has been deprecated in the most recent versions of Windows in favor of technologies such as WinRM.

I like to use the relationship of WinRM and WMI (alongside their open counterpart standards WS-MAN and WBEM) by stating that one is a management protocol and one is a management interface.

industry

To Read more, check out the standards themselves:

WMI Explorer 2.0 is now on Codeplex!

November 11, 2014 2 comments

When I worked in support, I troubleshot WMI quite bit using many tools. One tool I still keep my eye on with regards to ongoing development was – and still is – the WMI Explorer utility. I am happy to report a new version of an excellent troubleshooting tool for WMI is now available:

WMI Explorer 2.0 is now available for download:

https://wmie.codeplex.com/

Requirements:

Microsoft .NET Framework 4.0 Full or .NET Framework 4.5.1
Minimum display resolution: 1024×768
Administrator rights to view some WMI objects
(Optional) Internet access for automatic update check

This is a very intuitive tool for visually troubleshooting WMI issues. It gives you a direct view into the WMI namespace.

New Features include:

New: Asynchronous mode for enumeration of classes and instances in the background.
New: Method execution.
New: SMS (System Center Configuration Manager) Mode.
New: Property tab showing properties of selected class.
New: Input & Output parameter information in Methods tab with Help information.
New: List View output mode for Query Results.
New: Update Notifications when a new version of WMI Explorer is available.
New: Connect to multiple computers at the same time.
New: Quick Filter for Classes and Instances.
New: User Preferences.
New: View WMI Provider Process Information.
Improved: UI display on higher scaling levels and resolution.
Improved: Connect As option to provide alternate credentials.
Improved: Display of embedded object names in Property Grid.

NOTE: This is not an official Microsoft tool, and is available “AS IS” with NO support.

Categories: Management, WMI Tags: ,

Are you *sure* you need to rebuild the WMI Repository?

February 5, 2014 1 comment

To continue the subject of WMI troubleshooting: I am always frustrated at the quick, shotgun method of rebuilding the WMI repository as a rote, rudimentary troubleshooting step. This is very dangerous and risky. Rebuilding the WMI repository manually has resulted in some 3rd party products not working until reinstallation – IN SOME CASES – even this does not work. This is especially a shame since it may not always be necessary and even if it were – if there is severe WMI corruption, it may almost be better to “in-place” upgrade the OS or do a complete reinstallation of the operating system and software as incomplete repositories can yield lingering problems.

Before you go down that road, ask yourself the following:

1.) Have I properly troubleshot the error to the point that the only possible source could be corruption?

2.) Have I researched and installed all of the latest service packs and hotfixes related to WMI? (Hint: Go to http://support.microsoft.com and search WMI, hotfix, and <OS>)

3.) Have I gone through the rudimentary WMI checklist so I don’t get burned by simple things such as firewall rules? (Hint – https://madvirtualizer.wordpress.com/2014/01/22/the-importance-of-troubleshooting-wmi-part-2/)

4.) Have I ran WMIDiag (http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=7684)

and received really bizarre errors like 0x80041010 WBEM_E_INVALID_CLASS or does it fail to connect at all?

5.) Do you fail to connect to WMI Control from Computer Management?

If you answered “yes” to 4 out of the 5 above, here are some steps you can do to safely troubleshoot the WMI repository using “soft” actions:

Instructions for Operating Systems Prior to Windows Vista (XP/2003)

1. Open the Services MSC console and stop the Windows Management Instrumentation service.

2. You will be prompted to stop dependent services. Click “Yes” on the prompt.

3. Once the service is stopped, browse to %SystemRoot%\System32\wbem. Rename the Repository folder to Repository.old.

4. Once the folder has been renamed, restart the WMI and dependent services in the following order:

Windows Management Instrumentation

SMS Agent Host (If SCCM/SMS is installed)

Windows Firewall / Internet Connection Sharing (ICS)

Any other services that were detected as dependencies

5. Once services have restarted, verify the Repository folder has been recreated. Now bear in mind, it could take up to one hour for WMI to fully rebuild.

Instructions for Windows Vista and Later (2008/Win7/2008 R2/Win8)

1.  Open an elevated command prompt.

2.    Verify the WMI repository is not corrupt by running the following command:

winmgmt /verifyrepository

If the repository is not corrupted, a “WMI Repository is consistent” message will be returned. If you get something else, go to step 3. If the repository is consistent, you need to troubleshoot more granularly. The repository is not the problem.

3. Run the following commands to repair WMI:

winmgmt /salvagerepository

If the repository salvage fails to work, then run the following command to see if it resolves the issue:

winmgmt /resetrepository

After the last command, there should be a “WMI Repository has been reset” message returned that verifies the command was successful.

Even the above commands should be a last resort if you are getting an error related to “Access Denied” or “RPC Server unavailable.”

Categories: Management, WMI Tags: , , , ,

The Importance of Troubleshooting WMI Part 2

January 22, 2014 5 comments

To continue my discussion regarding the importance of troubleshooting WMI, I want to move the focus to a devising a targeted approach when troubleshooting so you can optimize the time it takes for you to zero in on the issue.

WMI issues generally fall into the following areas:

Configuration Issues: These are issues relating to the configuration of WMI on the local (or mostly remote) machine including:

•    DCOM Security\Permissions or Configuration
•    Firewall Configuration
•    WMI namespace security

Infrastructure Issues: These are issues related to WMI components including:

•    WMI service setup
•    DCOM registration problems
•    Missing WMI classes
•    Improper WMI provider registration
•    Missing System files
•    WMI repository corruption (*GASP*)
•    Deleted WMI repository (*HEADDESK*)

WMI Managed Entity Issues: These may be issues related to the extensible WMI components including:

•    Security requirements
•    Not running (e.g service, application) or de-installed application
•    External dependencies
As I mention in my last article, you obviously want to verify your firewall rules (which are built into versions of Windows since Windows XP.)

WMI (ASync) Properties – In
Program: %SYSTEMROOT%\System32\WBEM\unsecapp.exe

WMI (DCOM) – In
Port: TCP 135
Program: %SYSTEMROOT%\System32\svchost.exe

WMI (WMI) In-Out
Program: %SYSTEMROOT%\System32\svchost.exe

Then you will want to zero in on the error itself.
0x800706BA – RPC Server Unavailable

When this error appears during connecting to a WMI namespace:

•    The machine does not exist.
•    The machine cannot respond because the appropriate firewall exceptions have not been made. Check firewall settings.

When this error appears during operation it could be:

•    The client machine doesn’t have correct firewall settings for asynchronous call backs.
•    Connecting to a machine which doesn’t exist.

0x80070005 – E_ACCESS_DENIED

When this error occurs during connecting to a WMI namespace –
•    The username/password does not exist.
•    The user does not have the remote launch or remote activation options set.
•    Check dcomcnfg.exe under the COM Security Tab.

When this error occurs during operation –
•    The specific user does not have the DCOM permissions.
•    Minimum authentication level needed for the namespace is more than what is used.
•    Check the settings on the Default Properties tab of DCOMCNFG.EXE.

0x80041003 – WMI Access Denied

During connecting to a WMI namespace – The user does not have the appropriate WMI permissions on a namespace.  Check WMIMGMT.EXE and permissions for that namespace.

During operation – Specific user doesn’t have WMI access permissions.

0x80041001 – Unknown Error

Ah, the UNKNOWN ERROR. Often this is cause by a 3rd-party provider or non-OS software that extends the Repository has been either removed from the environment and left WMI subscriptions in the repository or is malfunctioning.
Enable WMI Verbose logging on the server and review the WMI logs in %SYSTEMROOT%\system32\wbem\logs.   The Wbemess log will show which WMI subscription was sending notifications when the criteria was met.

You will need to follow the steps below to remove the WMI subscriptions once you isolate them:

1. Click Start, run, type Wbemtest then type root\cimv2\applications\ and click “Connect” button
2. Click on ‘Enum Classes’, click the Recursive radio button, click OK.
3. Scroll down until you see _FilterToConsumerBinding class.  Double-click on it.
4. Click the “Instances” button on the right hand side.
5. Choose those you isolated and click on the delete button.

When you retrieve a managed resource in a WMI script, the CIMOM (WMI service) looks for the managed resource’s blueprint (class definition) in the default namespace if no namespace is specified. If the CIMOM cannot find the managed-resource class definition in the default namespace, a WBEM_E_INVALID_CLASS (0x80041010) error is generated.

0x8007000E – Not enough Storage is available to complete this process

This usually indicates a problem with a provider, handle leak, memory leak, or other problem tied to WMI functionality.

Troubleshooting Checklist

1.    Use the WMI Control to ensure that the service is working on the local system.
2.    If the problem involves communicating with a remote system then use the WMI Control to test the ability to connect to the remote system
3.    If the service appears to be working, use verbose logging to see the activity (queries) that is being processed by the service and to identify any problems. You can also use WMICHK and WMIDIAG to check the health of the service and the hosted providers.
4.    For Access Denied type issues verify that the DCOM and WMI Service settings are at default values, and the Network Service account has been granted impersonation rights.
5.    Check the service settings if the WMI service fails to start or if client programs cannot communicate with the service. In some cases you may need to reregister all the modules to recover the service.
6.    If queries appear to be returning an incomplete results set, try increasing the buffer thresholds.
7.    If problems persist, make a backup copy of the existing WMI database (repository), and then try building a new one.

Categories: Management, WMI Tags: ,

Why is it important to Become Familiar with WMI Troubleshooting? Pt. 1

January 19, 2014 4 comments

Often in Virtualization and Management Products like SCVMM, MED-V, Config Manager, UE-V, and App-V the symptom of an issue appears in the respective System Center or MDOP product but the root cause is often caused by an anomaly in an underlying operating system component. Often that component is WMI. For this reason, it is invaluable have a solid understanding of WMI and WMI troubleshooting. WMI is often a component that can cause problems due to one or more of the following WMI issues:

  • Corrupted repository
  • Incomplete namespace
  • Access Denied
  • Invalid String in WMI property/data
  • Unexpected value
  • Memory leak
  • Code Defect by WMI Provider

One of the most common errors encountered is error 0x800706BA – RPC Server Unavailable.

This error has context. If it is during connecting to a WMI namespace, it is usually because:

  • The machine does not exist.
  • The machine cannot respond because the appropriate firewall exceptions have not been made. Check firewall settings.

If it is during operation, it is likely because:

  • The client machine doesn’t have correct firewall settings for asynchronous call backs.
  • Connecting to a machine which doesn’t exist.

First I would verify the firewall rules. I would make sure the following rules are set:

  • WMI (ASync) Properties – In Program: %SYSTEMROOT%\System32\WBEM\unsecapp.exe
  • WMI (DCOM) – In Port: TCP 135 Program: %SYSTEMROOT%\System32\svchost.exe
  • WMI (WMI) In-Out Program: %SYSTEMROOT%\System32\svchost.exe

I deal with WMI problems all the time. I generally follow this little troubleshooting checklist for RPC errors:

  1. Use the WMI Control MMC (WMIC.MSC) to ensure that the service is working on the local system.
  2. If the problem involves communicating with a remote system then use the WMI Control to test the ability to connect to the remote system
  3. For Access Denied type issues verify that the DCOM and WMI Service settings are at default values, and the Network Service account has been granted impersonation rights.
  4. Check the service settings if the WMI service fails to start or if client programs cannot communicate with the service. In some cases you may need to reregister all the modules to recover the service.
Categories: Management, WMI Tags: , , , , , , , , ,