etl | The Mad Virtualizer at Microsoft

On Debugging Virtual Applications: Part 2: Types and Modes

February 21, 2016 madvirtualizer 3 comments

Productive virtual application debugging requires an understanding of the basic fundamentals of debugging compiled software code. For this part of my series on debugging virtual applications, I will be focusing exclusively on these fundamentals. If you are already familiar with these concepts, please allow me to quickly recap these to those readers which may be either not familiar, or only somewhat and looking to solidify these concepts.

Types of Debugging

There are several categories of debugging and the descriptions will vary by vendor, publication, and academic degrees of description. There is almost a guaranteed point of view when it comes to applying it to a specific product or series of products. Being that my discussion primarily revolves around products that run on top of the Windows operating system, my point of view, or slant, is obviously geared towards the types and toolsets that come with Windows.

Live Debugging

Live debugging refers to the mechanism of attaching to a running program or process either invasively or non-invasively. A debugger may attach to a process and wait for exceptions or set a specific breakpoint. The debugger can insert those breakpoints in once attached to the process. The easiest way to think of a breakpoint is to understand its most basic definition: a breakpoint is a place or time at which an interruption or change is made. More information on breakpoints and different breakpoint types within the Windows context can be found here: https://msdn.microsoft.com/en-us/library/windows/hardware/ff538928(v=vs.85).aspx. In addition, live debugging is also commonly used to troubleshoot and analyze code within the developer environment. In those situations the types of breakpoints will vary. For example, you can refer to the examples of breakpoints that are available within the Visual Studio development environment here: https://msdn.microsoft.com/en-us/library/bb161312.aspx. Once attached to the process, a debugger can then step through threads and functions as the application is live.

Print or Trace Debugging

This is the most common method for troubleshooting software applications and operating systems as technically, this can cover a wide scope of methods. An application can run at specific diagnostic levels generating additional output and information that can be collected into a file or database that can be used to isolate and issue. Event traces, log files, debug output all fall into this category. Strictly speaking within Windows, applications can leverage the OutputDebugString or ODS to have an application, service, or operating system component generate what is referred to as “debug spew” and you can use various tools to collect or view this debug trace information. The most popular tool for viewing ODS traces is the Debug View utility (DBGVIEW) from the Sysinternals suite (https://technet.microsoft.com/en-us/sysinternals/debugview.aspx) although this is not the only one. More information on the OutPutDebugString can be found here (https://msdn.microsoft.com/en-us/library/windows/desktop/aa363362(v=vs.85).aspx.)

In addition, there are tools that can hook into the Windows operating system to capture Win32 API and other application functions through the use of a simple user mode monitor (like the API Monitor tool) are even deeper through the use of a kernel-level filter driver (Like Process Monitor.) Literally troubleshooting outside the box and on to the wire – you can use network trafficprotocol analysis tools like Wireshark or Message Monitor (https://www.microsoft.com/en-us/download/details.aspx?id=44226) to capture network traces. These are all forms of trace debugging.

Windows Integrated Tracing and Instrumentation

Prior to Windows Vista, there were Event Logs, ODS tracing, text-based log files, etc. all within Windows each requiring their own tools and APIs. Starting with Windows 2000, Microsoft began incorporating Event Tracing for Windows (ETW) into the operating system and soon, applications and windows components were using this common engine for enabling diagnostics and collecting detailed debug tracing. Viewing of these traces was soon integrated into the Windows Event Viewer and users of App-V 5 are able to often resolve issues using this very mechanism.

The instrumentation mechanism is discussed here ( https://msdn.microsoft.com/en-us/library/zs6s4h68(v=vs.110).aspx) and probably the greatest technical reference on ETW can be found here (https://msdn.microsoft.com/en-us/library/windows/desktop/bb968803(v=vs.85).aspx)

Remote Debugging

Remote Debugging is a form of live debugging where the process of debugging occurs on a system that is different from the debugger. In most Windows cases, this is where there is an issue that needs to be debugged at the kernel mode level prior to the completion of an operating system boot or a system level crash. To start remote debugging, a debugger connects to a remote computer over a network or via a serial cable. The debugger can then control the execution of the program on the remote system and retrieve information about its state. In Windows, this is often done serially or via Firewire.

Post Mortem Debugging

Post Mortem debugging is a very common method of troubleshooting problems within software because it involves viewing a historical point-in-time snapshot of a hang, system, or application crash. This is where a debugger will read in a snapshot of debugging data called a “dump file” which contains existing memory and instruction pointers. The degrees of debugging depend on how much data is collected in the dump file as dump files can vary in what they collect. When it comes to application and system dumps, these can be controlled by the operating system’s default handlers (once called Dr. Watson for user mode applications) as to what information is collected in the dump file.

I was first introduced to the concept of post-mortem debugging reading an article by Matt Pietrek way back in 1992 in Dr. Dobbs Journal. Matt historically is one of the earliest writers on the subject of Windows debugging going back nearly three decades. The amazing thing is you can still read this article I am citing as it is available online: http://www.drdobbs.com/tools/postmortem-debugging/184408832

Execution Modes of Debugging in Windows

When we are speaking of execution modes in Windows, were talking about code that runs either in user mode or kernel mode. The execution mode affects the methodologies and tools you will leverage in order to properly debug the issue. Software is ultimately driven by the processor (CPU.) For a computer running Windows, the CPU runs in two different modes – user mode and kernel mode. The CPU switches between the two depending on the code.

Kernel Mode

The kernel and other operating system components run in kernel mode, hence the term. Rather than a macrokernel like other operating systems, Windows runs a smaller microkernel that runs as process SYSTEM. Like an application loads and uses DLL (dynamically linked libraries) the kernel also loads special modules called executive components and/or filter drivers alongside device drivers. There is essentially only one process running and that is what shows up in the Windows task Manager as “System” and if this application crashes . . . well . . . so does the entire computer. With debugging, when we are debugging in kernel mode, we are essentially debugging this process – however, it also serves as the governor of all of the other processes running on the system in user mode. All code that runs in kernel mode runs in a single virtual address space. This means that a faulty kernel-mode driver is not isolated from other drivers and the operating system itself.

User Mode

Regular applications, middleware, plug-ins, and most services run in user mode. When you start a user-mode application, Windows creates a process for the application. This process will execute one or more threads. I use the description of the process itself being innate in nature. It just owns a private virtual address space, a private handle table, and contains at least one primary thread for execution. This description of a process comes from Jeffrey Richter who has written many books on Win32 programming. Because these processes are isolated from each other, an application is unable to screw up the operation of another separated process if it crashes. Other applications and the operating system are not affected by the crash. Data can be exchanged between these processes through interprocess communication mechanisms but they cannot directly write to address spaces directly. Limiting the virtual address space of a user-mode application prevents the application from altering, and possibly damaging, critical operating system data.

App-V Tie-ins

The App-V product is especially complex when using it as an example because it contains code at both kernel and user mode. The App-V client engine consist of kernel level drivers, a primary service, and user-mode DLL’s that are injected into the processes of virtualized applications.

Next Up: Debugging Misbehaving Application Scenarios

Categories: Uncategorized Tags: AppV, appv5, dbgview, Debugging, etl, etw, procmon, Win32, windbg

App-V: On Operational Troubleshooting of the V5 Client

November 13, 2013 madvirtualizer 10 comments

Update 12/6/2014: The App-V 5 logs have been consolidated and are organized differently as of App-V 5 SP3 – but the event and event formats are still the same. Please read more about it here: http://technet.microsoft.com/en-us/library/dn858700.aspx#BKMK_event_logs_moved

Whenever I am doing customer engagements, I find it always valuable to ensure that the customer has a plan in place for ongoing support. Having my origins in support myself, I can relate to the customers concerns with impact to their helpdesk. Especially when a new product has been introduced in their ecosystem.

In the world of App-V, we divide up troubleshooting problems in two ways: 1) troubleshooting the operational client engine and 2) troubleshooting the application itself. IT departments relegate level 1 operational troubleshooting to their helpdesk team. This requires training your helpdesk with FAQs, CubeNotes, flowcharts, etc. to resolve basic issues and to at the very least, gather the necessary data to help higher level escalation teams isolate the problem.

Training or Retraining your HelpDesk on App-V 5

The App-V Client will be a very different experience for your help desk in terms of toolkit. Conceptually, they will be doing a lot of the same thing but the way they do it will be different.

Package Repair

Repairing a package will still be needed, but instead of using the client UI (which is deprecated in App-V 5.0 SP2) or SFTMIME, you can use the Repair-AppVClientPackage to reset user and or package state. You get can get specifics on this by reading Ryan Cobb’s blog post on the subject here:

http://blogs.technet.com/b/appv/archive/2013/10/28/how-to-quickly-repair-published-app-v-packages.aspx

Also NOTE: I have seen information out there stating that you can also simply delete files directly in the native locations. That is true but this command is much cleaner and useful for the help desk professional. You should also never, ever, ever, ever, ever, ever, ever delete the VFS directories manually as this will also create problems by potentially breaking the user mode VFS and copy-on-write mappings for the application. If you suspect issues with those VFS directories, use the Repair-AppVClientPackage command instead.

Connection Group Cleanup

The same options available for an individual package are also available for a connection group using the Repair-AppVClientConnectionGroup command. In fact, you should be advised that if you repair the connection group instead of the individual packages when you are experiencing issues where you would normally repair the application, this will purge out all of the settings for all applications within the connection group. If there are applications currently in use and it is preventing you from running these repair operations you will need stop all of the package processes. Stop-AppVClientPackage will stop all virtual processes in a package and Stop-AppVClientConnectionGroup will stop all virtual processes in a group.

App-V Client Logs

The SFTLOG.TXT is gone. So are all of the other text-based logs. Knowing how to collect good log data is essential to every help desk professional especially if escalations will be involved. The good news is everything is now logged to event logs. You can view these by opening up the event viewer and navigating to Applications and Services Logs – Microsoft – AppV – Client. The default view will show the three basic event logs: Admin, Operational, and Virtual Applications.

If you would like to view the additional logs that can enabled, navigate to the view menu and select “Show Analytic and Debug Logs.” In many cases, you may need to take advantage of these.

As you will see, there are a tremendous amount of additional logs. Most of these are not enabled by default. You will have to expand the folder, right-click the log and select “Enable Log” to enable it. Be careful turning on a lot of these as it can potentially affect performance.

Which Log do I Use?

Well now that will depend on the type of issue that is occurring be it an error upon publishing, streaming, launching, client service start-up, etc. often, the basic logs will give you what you need to know to fix the issue, but when it doesn’t, it is time to investigate further and these additional logs can come in very handy. The question is, which log or logs should I enable? The actual error message or error code may quickly provide some insight.

The App-V 5 Error Code

Like I did in V4, I am not necessarily concerned with the entire error code. I usually only have to pay attention to the last 10 digits when looking at V5 errors ONCE I have the error in the correct Hex format. Often time, in the event logs, the error code is actually in decimal format. One of the first things you will want to do is get the error in its correct format. For example, if you see the following error in the event log:

64 notification failed with error 5746720685053444123, Package <GUID>, version <GUID>, pid
<PID>, ve id 0

The first thing I do is convert the decimal error over to hex.

5746720685053444123 translated to hex is 0x4FC074040000001B.

If you break it down first separating the last ten digits (04-000000001B) you diagnose this further by resolving the first two digits to the component using the table below:

Hex Code	Category	Where you might further Info
01	Integration	Client-Integration Log
02	Orchestration	Client-Orchestration Log
03	Client Service	Client-Service Log
04	Virtualization Manager	Client-VirtualizationManager Log or Client-Vemgr Log
05	Shared Components	Admin and Operational Logs
06	Catalog	Client-Catalog
07	Publishing	Client-Publishing Log, Client-PackageConfig Log, or FileSystemmetadataLibrary Log
08	Client Common	Admin and Operational Logs
09	Policy	PolicyLibrary Log
0A	Virtualization Subsystem	May need to diagnose using the individual virtual subsystem debug logs.
0B	Subsystem Controller	Client SubsystemController Log
0C	Streaming Manager	Any of the five streaming debug logs
0E	Client Host
0F	Client Configuration	Registry/PowerShell
10	Scripting	Client Scripting Log
11	Client User Interface	Client-StreamingUX Log
12	Sequencer	Only on Sequencer
13	Reporting	Client Reporting Log
14	Manifest	ManifestLibrary Log

Then the last 8 digits will usually just be a standard Windows HRESULT code. In the example above, I know the issue is somewhere in the virtualization manager and the last eight digits are 0x0000001B. The HRESULT alone does not really lead me anywhere so I would likely need to dive further into the Client-VirtualizationManager debug log.

Sometimes, you will get really lucky. For example, if the error is 0xFC03E0480070003. Well, again we know it is an issue with the virtualization manager but the HRESULT 0x80070003 is well-known – ERROR_PATH_NOT_FOUND. You know where I would then go next – Process Monitor of course!

Update 12/6/2014: The App-V 5 logs have been consolidated and are organized differently as of App-V 5 SP3 – but the event and event formats are still the same. Please read more about it here: http://technet.microsoft.com/en-us/library/dn858700.aspx#BKMK_event_logs_moved

Categories: Uncategorized Tags: app-v, AppV, appv5, etl, etw, PoSh, troubleshooting

The Mad Virtualizer at Microsoft

Archive

On Debugging Virtual Applications: Part 2: Types and Modes

App-V: On Operational Troubleshooting of the V5 Client

Author

Twitter Feed

Top Posts & Pages

Archives

MadVirtualizer