It frequently surprises me when joining a software development organisation how difficult people find debugging customer problems.
When a program crashes on a developer's machine, the developer is perfectly happy running their debugger of choice to find out why. However, when that same program crashes on someone else's box, or at a customer site, it's back to reading sparse logs and static code analysis (staring at the monitor).
Part of this is down to ignorance - there are several ways to debug a remote issue, and most developers seem not to know about them. My personal favourites are crash dumps and remote debugging, and we'll go over each.
Crash Dumps
In Windows, if you don't have a debugger installed, when an app crashes, it writes a crash dump file. This dump file is written after you click "Don't Send" in the Windows error reporting box, and it's written by a tool called Dr. Watson. Occasionally, you'll find a killed application takes a long time to exit, and it's because it's writing a crash dump then, too.
This crash dump file is written to C:\Documents and Settings\All Users\Application Data\Microsoft\Dr Watson\user.dmp
. This file hangs around; it represents a crash dump of the last application that crashed.
You can load this file into Visual Studio. Open it as you would open a project file. Then click Run, and VS will pretend that the app is running and just crashed - so you can see the stacks of all threads, locals and so on.
In order to make this useful, you'll need to ensure you have symbols for everything. The customer's machine is highly unlikely, in my experience, to have exactly the same set of Windows DLLs as are on your development machine, so installing the Microsoft Symbol Server is important. You also need to have built your app with debug information.
It's vital that you keep a copy of every EXE and DLL you ship to a customer, along with their associated PDBs. If you're not generating PDBs for some reason, then change your release builds now to do so (/Zi and /DEBUG). After all, the only difference in the resulting .EXE file is a path to the original PDB, and a couple of hundred bytes extra in the EXE file is not going to kill you.
Once you have all these, when you load the user.dmp file into Visual Studio, you should get source lines and call stacks for all threads. You may have to put the EXE and DLL files into the same locations on your machine as they are on the crashing machine. This is a peculiarity of Visual Studio, and can be circumvented using WinDbg (see below).
If the user has a debugger installed - sometimes people have old versions of Visual C++, for example - it won't write the crash dump file. This is irritating. To fix it, run the following command from Start - Run:
drwtsn32 -i
which replaces the current debugger with Dr. Watson again.
WinDbg
Visual Studio has a nice interface, but sometimes you need an extra few features, and most of those are present in a little tool called WinDbg.
WinDbg is part of the Debugging Tools for Windows. It's a free debugger, like VS only a lot less friendly. I tend to use a small subset of the features, because I mostly turn to WinDbg when Visual Studio lacks the specials.
First, ensure you've configured the Microsoft Symbol Server. To load a crash dump into WinDbg, simply load it from the File menu. WinDbg will then show you its Command window - yes, we're in command-line territory here.
The next thing to do is ensure you have symbols for your own EXE and DLL modules. Type "!sym noisy" to turn on symbol logging, then go to the Debug - Modules menu item. Select your EXE and DLL files and click Reload for each. Close the dialog box and you'll see information about each of your modules as it tries (and fails) to load the original modules. Close inspection of this information will reveal where (inside your symbol download directory) it is looking for these modules. Note the 16-digit hex number, which is a hash of the executable to identify the particular version. Copy your copy of the modules into the correct places.
Debug - Modules and Reload again will this time fail to load the symbol files. Again, inspect the log for the correct location in which you can place your PDB files. Reload once more should load all your symbols and you're ready to go.
Like VS, you can open a Processes and Threads view, and a call stack. My experience is that WinDbg is slightly better at call stacks. More powerful is the Memory window, in which you can type "esp" at the top to show the raw stack contents. Remember that the stack grows from high memory addresses to low, so as you scroll downwards, you are seeing earlier stack entries. If you set the data format to "Pointers and Symbols", you get a symbol name after each stack entry that might be a symbol.
This data format alone is worth using WinDbg for. You can scroll down the stack, identifying the return addresses of each of the functions shown in the Call Stack view. If there are entries missing from the Call Stack view, you may well be able to find them in the memory view. In addition, you can identify parameters (which will come immediately below the return address) and local variables (which will come immediately above).
If you're really canny, you can load both WinDbg and VS together on the same crash dump. Then you can identify memory locations in WinDbg and view them in VS.
More surprises come from typing !address and !vadump in the command window. These commands will show you the virtual memory contents in various ways - useful for finding out just what all that memory is for, and why you're running out of it.
Recently, I was debugging an app with a variable buffer size. The number of devices that could be connected to this app actually went down with a larger buffer size. However, there was only the one buffer...
Debugging with VS, I discovered an out-of-memory situation - it was actually failing to create the threads, failing with ERROR_OUT_OF_MEMORY. Where was all that memory going?
WinDbg found it pretty quickly - !address displayed a number of 10Mb chunks of memory marked as "stack". It turned out that CMake (considered harmful) had set the thread stack size to 10Mb - and this particular app had about 5 threads per device... so plugging in 20 devices took up 1Gb of virtual memory!
Remote Debugging
Remote debugging is not nearly as useful at customer sites as simply collecting crash dumps - but if you have an in-house test facility, even if it's just a second machine for you, it can be invaluable.
VS' remote debugging is usually installed from the CD-ROM, but if you don't have that handy, you can just copy the Common7\Packages\Debugger folder from your Visual Studio installation to the target machine. I then usually run:
msvcmon -anyuser -tcpip -timeout -1
It's not secure - but it's certainly the most convenient way. You can then go to Tools - Debug Processes in Visual Studio, change the Transport to TCP/IP and type in the destination machine's IP address. Click Refresh to show the processes on the remote machine, click one and Attach to start debugging.