Introduction

Remote software support is considered a laborious and time consuming process. When your software does not work correctly on a customer site on the other side of the world, when the problem is difficult to reproduce in the lab, what can you do? How to get all those logs, crash dumps, repro details to you quickly, and get back to the customer with the solution as soon as possible?

DebugLive debugger is a new generation remote debugging solution developed by DebugLive.com Inc. It can help you in situations where effective debugging and troubleshooting was not possible before. With DebugLive debugger, you can follow your application to any place in the world, however remote it is, and debug it in real time or collect the information you need and continue debugging offline.

DebugLive debugger is based on real life debugging experience. Its designers are debugging consultants, who often have to troubleshoot software on remote sites around the world. DebugLive debugger is a very pragmatic solution. It is based on a simple idea - identify the obstacles and stumbling blocks that make remote software support so complicated, and remove them one by one.

This document describes a number of remote debugging scenarios, from simple to more complicated ones, where DebugLive debugger could help solving the issues quickly and effectively.

Scenario 1: Application crashes on the customer site

Sometimes it happens - your application crashes on a customer site. But you are professional. Your application is built with symbols, which are properly preserved for every build released to customers. You also know how to debug a crash dump. You just need to ask the customer to create a crash dump and send it to you, preferably with additional information about the circumstances that led to the problem.

And here is the stumbling block. How to explain the customer how to create the crash dump? First of all, what kind of operating system she is using? Is it XP or Vista? Or something else? Different operating systems have different ways to create crash dumps. Email round-trip number one, please (and what if your corporate policy prevents developers from talking to the customers directly? More emails, more entries in the bug tracking system). OK, she finally replies, and now you know that it is Vista. Now you have to explain her how to use WERCON to get to the crash dumps. She needs to open Registry editor? Oh no, please. Well, she can use Task Manager, select the process and ... OK, OK. She is a very professional accountant / architect / whatever, but Task Manager is just too much.

Of course, you can send her a tool that creates the crash dump and saves it on disk. This tool might even find the process itself when necessary. She just needs to run it at the right time, when the application has crashed. Let's assume she is finally able to do that. We have a dump - on her disk. How to deliver the dump to you? We can use email, of course, but what if the dump is 20MB or more? You know, small dumps are good in simple cases, but more complicated problems require larger dumps. How can she send it to you? By FTP? Burn a CD and send it to you by mail? It's just getting more and more complicated. And what if you would like to get additional information? Log files, Registry settings, Event Log entries? Video recording of the repro?

OK, let's see how it can be done with DebugLive debugger. After the customer reports a problem, you send her a link to DebugLive website (or to your own website - DebugLive server can be hosted on your server, too). She opens the link in her browser, presses a button on the web page, and enters the name and password you sent to her. Then she types in some information that describes the problem, and remote debugging session is started. Still complicated? Please see the paragraph after next.

Now you can also navigate to DebugLive website with your browser, or open DebugLive add-in in Visual Studio. You press a button, join the remote debugging session, and now you are in control. You can list processes, debug them in real time, create crash dumps at any moment of time. You can collect various kinds of troubleshooting information - log files, Registry settings, Event Log entries. You can capture screenshots and record video of the repro, if necessary. All the information you have collected can be automatically uploaded to DebugLive website and stored there for further analysis. And your customer can have a coffee break.

Of course, this can be even simpler if DebugLive application is hosted on your server - even more steps can be removed (the customer might not have to enter the user name and password, for example). How about the corporate policy mentioned above? It can fit into the process nicely - collaborative nature of DebugLive debugger allows support engineers and application developers to communicate while solving issues on customer sites - under strict management supervision, if necessary.

If your customer has more technical experience (selected beta user, or a system administrator), it is even better. DebugLive debugger can be automated with batch files. You send the customer a small file with debugger commands, the customer opens DebugLive debugger (again, in the browser) and runs the batch file. All the necessary data is automatically collected and uploaded to DebugLive server.

What about more complicated situations? Other stumbling blocks? What, for example, if by some reason the build released to the customer does not have debugging symbols? How to analyze the crash dump without symbols? Rebuild the application and generate new symbols? Visual Studio debugger will not load such symbols for the crash dump, because they do not match the build installed on the customer's machine. WinDbg? Yes, it can load unmatched symbols. But if you know how to do it in WinDbg, you also know how fragile and error prone this process is. Unlike other debuggers, DebugLive debugger can use unmatched symbols without extra efforts. Just enter the path to the symbol file, and it will be loaded. The debugger still warns you about unmatched symbols, of course, but it will obey your order and use them.

Scenario 2: Application does not start on a small number of end user machines

OK, what about the following situation? Your application works fine on 95% of end user machines, but still one some machines it fails to start. The operating system tells you that may be reinstalling the application could help. But it does not help, really. Of course, you know that there is something wrong with Side-by-Side components (like Visual C++ libraries) installed on the customer's machine.

What can you do to debug this problem? Of course, there should be an Event Log entry that tells you what component of your application is in trouble. But you do not want to ask the customer to open Event Log, right? And information in Event Log does not actually help much - we know that probably some redistributable component is not installed on the system, or that side-by-side policy files contain something unexpected. But which of the existing tools can help you obtain this information?

Of course, DebugLive debugger can help here, too. Again, the same simple steps are needed to start the remote debugging session, and then you can use the debugger's system information commands to get the information you need. Event Log entries? Of course. The list of components installed in Side-by-Side cache? Sure. The contents of the components' manifests and policy files? Easily. With this information at hand, you will be able to determine the reason of the problem in minutes.

Scenario 3: Installation of a server application on the customer site does not go smoothly

Let's move to more complicated cases. Or are they really that complicated? Let's see. What if you company is installing a new server application on a customer site on the other side of the world, and a support engineer has been sent there to perform the installation. He installs the product, but it does not seem to work correctly. Takes long time to start, or hangs, or simply does something strange. It's understandable, it's version 1, after all.

How to debug this problem? Of course, your server probably has logging embedded. The support engineer can collect the log and send it to you by email. But log files do not always help to understand the problem. We might need a crash dump or two, or, even better, a live debugging session would not harm. How to do it? Visual Studio's remote debugger is not an option - it needs direct connection to the remote site. WinDbg? It could be used, but again, if you know how to do it with WinDbg, you also know that it is not easy, and a visit to the customer's network administrator is probably unavoidable. So it does not worth the effort, in most cases.

What about DebugLive? Here it is really simple. The support engineer navigates to DebugLive website and starts a remote debugging session. You join the session and debug the remote application using your browser, or Visual Studio add-in. Again, not only you can debug the remote application in real time, but you can also collect various kinds of troubleshooting information - Event Log entries, Registry settings, contents of various configuration files - everything that might be instrumental in finding the problem that could not be reproduced in your test lab.

Scenario 4: Mission-critical server application crashes on the customer site from time to time

Next problem. Mission critical server crashes on the customer site from time to time, leaving poor consumers without TV signal, their favorite online game, or something else bad enough. You cannot reproduce the problem in the lab, but you have a hypothesis. May be there is a memory leak, and the server application is slowly running out of memory? Or may be there is no leak, but memory usage patterns result in address space fragmentation over time? How to verify this hypothesis?

With existing tools, there is a number of ways. You can use Perfmon to collect memory-related performance counters while the application is running, and save this information in a file. You can also use ADPlus application (or a custom tool) to create crash dumps from time to time, to capture the state of your application at different moments of time. The problem is, who is going to do that for you on the remote customer site?

Yes, it can be solved (there is VPN and remote access tools, after all), but there is an easier and faster way. Of course, with DebugLive debugger. This time you do not actually have to perform a remote debugging session, because application monitoring tasks are good candidates for automation. Probably there is a system engineer working on the customer site, who does not have time or desire to learn how to use Perfmon or ADPlus but is qualified enough to start DebugLive debugger and run a batch file you sent him. He only needs to specify the directory where to store the log file with the output of the debugger commands and other data collected.

The batch file you send to the system engineer will contain a small number of debugger commands that ask DebugLive debugger to collect memory usage information periodically and save it in the log file, as well as create crash dumps at some intervals. If necessary, the batch file can also ask the debugger to record other aspects of the application's behavior, ADPlus-style.

DebugLive debugger can also be configured to send emails when certain events occur. Thus, when the server application finally crashes, the debugger can send an email to the system engineer. He will package the files created by the debugger, upload them to DebugLive server, and restart the application (and if you want, your batch file can upload the files to the server automatically, to save his time).

The email notification feature of DebugLive debugger can be handy in many application monitoring scenarios. It allows you to monitor applications without baby-sitting them. You no longer have to come and check the debugger from time to time. Instead, the debugger can notify you when it needs attention.

Summary

DebugLive debugger can help you in situations where effective troubleshooting was not possible before. You can use it to reach your application anywhere in the world, debug your applications in real time or offline, collect troubleshooting data, do not rely on inaccurate problem descriptions to solve the software issues. If necessary, DebugLive debugger can be hosted on your server, adapted to your needs, adjusted to your corporate processes.