Archive

Archive for March, 2015

Application Troubleshooting: An Error Message, Without Context, is Worthless


Note: This blog contains free stock photos that were too hilarious to pass up.

I was inspired to write this article based off an internal discussion I was involved with where someone was requesting a comprehensive list of all possible event log messages delivered in Windows Server. I have always been interested in the reasoning behind an “ask” such as this because it could be misguided. Where to begin? First of all, it is quite a loaded “ask” as it is not specifying whether we are talking about all of the in-box default error messages relating to Windows – the operating system specifically, or are we also talking about all of the roles and features which also contain event providers? Where would all of these go? What would this be used for? Second of all, what is the purpose?

iStock-Unfinished-Business-10

Answers always baffle me: “Oh, for the help desk.” “As a reference.” “For our in-house troubleshooting guide.” The last example leads me to what often results in a bi-directional obtuse conversation where I re-ask what are the use cases and contexts of these error messages. Will the person requesting the error messages also be attempting to follow-up on each error message to determine all of the possible situations in which these may occur? That would be quite an insurmountable task. In most cases, no, these will be either copied-and-pasted into a “guide” – sometimes even being printed out into a very thick, but mostly useless material reference.

HRESULTS vs. Event Logs

First of all, the event log message tells you everything you need to know about the ERROR itself. This has evolved significantly over the past decade with advancements in the Windows. Often, you will get errors/exceptions thrown from an application and the level of description will depend on the generosity of the programmer/developer of the application. One thing is for sure – at the very minimum in most cases, an integer-based or hex-based response will occur. The most common of these are HRESULTS. A few years ago, due to the nature of overlap between operating system components and external software, a tool called ERR.EXE was made available on the Microsoft Download Center (http://www.microsoft.com/en-us/download/details.aspx?id=985.) This tool parsed all of the known header files for windows and applications to provide the corresponding strings and descriptions of known HRESULTS and error codes within the Microsoft software ecosystem. The use of the utility would yield all known matches such as the example below:

err1

This utility was especially useful because it could also automatically translate the decimal-based equivalent of an HRESULT:

err2

But . . . Here’s the Thing. Where do you go from Here?

Well, how did you get there? What was occurring when you encountered that error? For example: Let’s say you get an HRESULT 0x80000005. You use the ERR tool (or you know from memory) that it is “Access Denied.” What was occurring when this happened? Let’s say this occurred within an application. You could then leverage Process Monitor or another tool to trace the issue to see what it file/registry entry/etc. the program was trying to accessed. There is no way to give anything more than a generic recommendation without additional context related to the error.

In the case of an event log entry, there is much more information. The source component, Event ID, description, and more – along with additional XML detail. In the example below, what else would you gather from this simply looking up Event ID 36888 other than what you see in the dialog boxes?

event1

That is where the scenario in which it happened plays an important role in determine what the root cause of this error is. In some cases, only one situation warrants a particular error and resolution. These are the ones we love – yet they are rare. In most cases there will be more than one scenario.

Why are Some Event Messages More Descriptive than Others?

Like with applications errors, event log entries vary regarding the degree of detail. In many cases during the development and evolution of a component, particular errors are hypothetically conceived during the development cycle and are laid out with the event tracing framework. These get further nailed down in the test and beta phases and the events are adjusted accordingly. At the release time as many clear-cut, known issues are mapped out through release notes and knowledge base articles. This process continues through the lifecycle of the component or software.

iStock-Unfinished-Business-9

But again, these are only events. On all of my machines I have ever worked with, I would venture to say that I have only ever encountered a small fraction of all of the potentially available windows events which warrant errors or warnings. There are easily over 200,000 ETW and legacy windows events. Why reinvent the wheel and create a huge list for searching when you already have BING at your fingertips. And with Bing, you can search the event\error with additional contents.

What is probably the most beneficial troubleshooting assistant is the known resolution – in the form of a knowledge base article. The “Symptom(s)” section of most types of knowledge base articles is where the context is mapped out with as much detail as possible. It is usually in the form of “You are running [SOFTWARE\COMPONENT] and you are performing [ACTION]” or “you are attempting to [CONFIGURE\LAUNCH\RUN\CLICK ACTION] and you encounter the following [ERROR\MESSAGE]

iStock-Unfinished-Business-2

The Symptom(s) section is usually followed by the “Cause” section. Often this section is prefaced with “This issue may be caused by . . .” indicating that this may not be the only cause of the issue. Yet this particular cause will be remediated\resolved by the “Resolution” section. That format is what makes the knowledge base article so great. It cuts right down to the break-fix scenario. Symptom-Error-Cause-Resolution.

Building knowledge bases “from the error up” is not an optimal way of building out a framework for a help desk. These are built and drive by the overall experience of the issue itself. Working in support for many years, one of many elements that separated the true escalation engineers from what could automated with a diagnostic utility or front-line support was the ability to resolve an issue for the first time. While the first question was usually “What’s the Error Message?” it was the second question that was most important – “What were you doing?”

iStock-Unfinished-Business-3

Advertisements

App-V 5: Do you Still have to Run Process Monitor within the App-V Bubble when Troubleshooting Applications?


If – by that question you want to know if you must start an instance of Process Monitor within the virtual application like you did in 4.x – no. You could run process Monitor inside the virtual bubble, but it will not yield you much more results. The reason behind this is simple: unlike previous versions of App-V, the REAL registry as well as the native file system – NTFS – is used in App-V 5.

In App-V 4.6 and earlier, if you did not launch process Monitor in the virtual application’s environment (usually through a command prompt) all you would capture related to the virtual application would be operations to file and registry resources outside the virtual environment. In Version 5, running Process Monitor as normal will capture access to the actual locations including registry, package store, as well as VFS (Virtual File System) COW (Copy-on-Write) locations. Why just that? Because that’s where things “actually” are located. What you will have to understand is that once the operations to where the application “thinks” it is located has been hooked in the file system and/or registry every subsequent operation will continue as such to include operations only to the:

  • Package Store

  • Integration junction points to the Package Store

  • Actual package Store paths

  • VFS COW (Copy-on-Write) checks and locations

You can see examples of these in the following screenshots from Process Monitor:

Procmon and File Operations:

As you can see above, the query to the initial location of where the virtual applications thinks it is browsing (C:program Files (x86)Java) is clearly not natively where it thinks it is. The App-V engine picks up for this (through relationships in memory) and the operations are redirected to the appropriate converged locations in both the User VFS COW and Package Store.

Procmon and Registry Operations

You will see similar operations when tracing specific activities to registry entries. First, where the application thinks it is supposed to be located followed by subsequent operations to the actual state-separated locations in the actual registry.

Why does Process Monitor not show every Single Operation to “Virtual Paths”

The answer is quite simple – because the App-V Client takes care of all of this behind the scenes which is why you will need to have access to and understand the FileSystemMetadata.xml file as it contains all of the file system mappings for both non-tokenized paths and tokenized paths. The easiest and most automatic relationships are the KNOWNFOLDERID paths which automatically resolve to App-V tokenized paths in memory. For non-tokenized paths, it is handled differently on process creation.

Altitude Adjustment of ProcMon Driver

When you look at the altitudes of the App-V file system drivers and their relationship to the driver altitude, you can see that the Procmon driver sits at a lower altitude by default.


This might make you explore the possibility of raising the altitude to see if Process Monitor will capture more information. Please be careful doing this as this could create problems and system instability. Altitudes are managed and allocated by Microsoft  (https://msdn.microsoft.com/en-us/library/windows/hardware/dn641617.) When developers want to register altitude locations for their filter driver, they fill out a special request form. That is how tightly controlled they are in order to prevent instability.