Event Tracing for Windows and Network Monitor

May 13, 2009, 12:44 pm

≫ Next: Windows 7 and ISA Remote Windows Sockets Parsers Available

Event Tracing for Windows, (ETW), has been around for quite a while now as it was introduced in Windows 2000. It's basically instrumented logging that describes what a component is doing. Conceptually, it’s something like the proverbial printf("here1") concept used by programmers, but it is present in retail builds. When you enable logging in a component the result is an ETL (Event Trace Log) file. What’s new is that that Network Monitor can read any ETL file. And with the supplied parsers many network oriented ETW providers can be decoded.

How ETW Works

The idea was to standardize tracing so that it could be turned on for any component with a consistent interface. Before ETW, it was common practice to place DEBUG statements that would output to the debugger. But this often required checked builds of the binaries, special registry entries, and super secret knowledge that sometimes required code access. Now the consumer/provider model with built in enumeration advances and standardizes logging.

Unfortunately, it's not all perfect. As the OS has evolved, so has the story for collecting the ETW data. But the great news is that in Windows 7, you can collect data by scenarios and even include the raw network traffic data, all using ETW tracing.

Subscribing to a Provider

ETW uses a subscription model. Subscribing tools provide a GUID or provider name to receive the logging from that component. Network Monitor is not a subscriber at this point, so we'll have to use other tools to gather the ETW data.

Various operating systems contain different ways of getting ETW traces, but one that seems to work on most Windows operating systems is Logman.exe (http://technet.microsoft.com/en-us/library/bb490956.aspx). When you run Logman, you supply a GUID which identifies a particular provider. To list the providers you can use the /query command.

]]logman query providers
Provider GUID
-------------------------------------------------------------------------------
.NET Common Language Runtime {E13C0D23-CCBC-4E12-931B-D9CC2EEE27E4}
ACPI Driver Trace Provider {DAB01D4D-2D48-477D-B1C3-DAAD0CE6F06B}
Active Directory Domain Services: SAM {8E598056-8993-11D2-819E-0000F875A064}
Active Directory: Kerberos Client {BBA3ADD2-C229-4CDB-AE2B-57EB6966B0C4}
Active Directory: NetLogon {F33959B4-DBEC-11D2-895B-00C04F79AB69}
...
Microsoft-Windows-Winsock-AFD {E53C6823-7BB8-44BB-90DC-3F86090D48A6}
Microsoft-Windows-Winsock-WS2HELP {D5C25F9A-4D47-493E-9184-40DD397A004D}
...

see many providers listed, but we only provide ETW parsers for a subset of these. This is mainly because our focus for this feature is Windows 7 network troubleshooting. As we extend the parser set on http://www.codeplex.com/nmparsers, we will add new support for providers. In fact as we speak we are looking to publish an ETW parser for USB 2.0.

Capturing an ETW Trace with Logman

To capture ETW data, use Logman along with the GUID or provider name and some other parameters. Winsock is the layer that applications use to communicate over TCP. Let's capture all Winsock related ETW events.

logman start my_winsock_log -p Microsoft-Windows-Winsock-AFD -ets

We send it the start command and the name of our session and log file my_winsock_log. We pass it the name of the provider with -p. Alternatively, we could use the {GUID} instead. Finally we pass "-ets" which says to start logging now.

logman stop my_winsock_log -ets

This command will stop the logging, again based on the name you specified when you started. When this command completes you should have a my_winsock_log.etl file that we can be opened with Network Monitor 3.3 or 3.2.

Windows 7 Scenario based ETW Tracing

The story for ETW becomes much more interesting in the Windows 7 and Server2008 R2 world. While Logman still works, Netsh becomes the tool of choice for collecting ETW traces. Also incredibly useful is the addition of tracing by scenario and the ability to capture network traffic via NDIS with ETW tracing. But more on this a little later.

Netsh to Collect an ETW Trace.

Netsh used to be component centric with regards to tracing. But now tracing is a top level command that you can use to start and stop tracing. So here's an example that works like our previous Logman example.

]] netsh trace start provider=Microsoft-Windows-Winsock-AFD TraceFile=my_winsock_log2.etl

Trace configuration:

-------------------------------------------------------------------

Status: Running

Trace File: my_winsock_log2

Append: Off

Circular: On

Max Size: 250 MB

Report: Off

]] netsh trace stop

Correlating traces ... done

Generating data collection ... done

...

Netsh and Scenarios

Scenarios are things like "InternetClient" or "AddressAcquisition". A tracing scenario is defined as a collection of selected event providers. Providers are the individual components of the network protocol stack, such as WinSock, TCP/IP, Windows Filtering Platform and Firewall, Wireless LAN Services, or NDIS.

One of the primary goals for improved network tracing is to allow collection of all relevant information in one step, and then easily view all events associated with a specific action across the network stack. Network tracing provides a quick method for collecting information and diagnosing networking issues by logging events from all providers in the scenario, and then correlating these events by activity. In other words, related events & network packets are grouped together for given activity across different components in the networking stack, from Winsock down to NDIS.

Let's look at a list of possibilities for Windows 7.

]] netsh trace show scenarios

Available scenarios (18):

-------------------------------------------------------------------

AddressAcquisition : Troubleshoot address acquisition-related issues

DirectAccess : Troubleshoot DirectAccess related issues

FileSharing : Troubleshoot common file and printer sharing problems

InternetClient : Diagnose web connectivity issues

InternetServer : Troubleshoot server-side web connectivity issues

L2SEC : Troubleshoot layer 2 authentication related issues

LAN : Troubleshoot wired LAN related issues

Layer2 : Troubleshoot layer 2 connectivity related issues

MBN : Troubleshoot mobile broadband related issues

NDIS : Troubleshoot network adapter related issues

NetConnection : Troubleshoot issues with network connections

P2P-Grouping : Troubleshoot Peer-to-Peer Grouping related issues

P2P-PNRP : Troubleshoot Peer Name Resolution Protocol (PNRP) related issues

RemoteAssistance : Troubleshoot Windows Remote Assistance related issues

RPC : Troubleshoot issues related to RPC framework

WCN : Troubleshoot Windows Connect Now related issues

WFP-IPsec : Troubleshoot Windows Filtering Platform and IPsec related issues

WLAN : Troubleshoot wireless LAN related issues

To enable a scenario and stop it you type the following commands:

netsh trace start scenario=InternetClient

netsh trace stop

Again, an ETW trace file is created but now there are multiple providers within the same trace file. This can help you correlate tracing from multiple places as the problem moves from one component to another.

Icing is "capture=yes"

One of the coolest parts of this new tracing in Windows 7 is that you can capture NDIS network traffic using Netsh. By enabling the capture parameter, you capture network traffic. Not only is this useful for correlating to other component events, but it provides another way to get a network capture.

Since Netsh is remote-able, you could also use this to start a capture on another machine! And using the persistent=yes parameter, you can enable logging during boot to troubleshoot those nagging start up issues.

By adding the capture=yes parameter to any scenario or specific provider collection, network traffic is captured, as well. To get a simple trace, use this command:

netsh trace start capture=yes

The resulting trace contains TCP, IPv4, Ethernet, ...etc. just like a trace taken from the network interface. However the data has extra headers on the front. If you have other provider information mixed in, and the latest parsers form http://www.CodePlex.com/nmparsers, a simple filter to show you just the network traffic is as follows:

NDISPacCap_MicrosoftWindowsNDISPacketCapture

We also update the standard filters on http://www.codeplex.com/nmparsers. As new requests and ideas for standard filters come up, we'll add to the current set.

Netsh Reference

Here's a small table to summarize the Netsh commands referenced:

Start	Starts a trace session
Stop	Stops a trace session
Capture=yes	Turns on raw network capture from NDIS
Provider=ProviderName	Enables tracing for a specific provider
Show Providers	Shows a list of providers
Scenario=ScenarioName	Enables tracing for a specific scenario
Show Scenarios	Shows a list of scenarios

Conversations and Built-in Standard Filters

We've extended the idea of conversations to ETW logs. The conversation tree is populated with activities and providers under that which will help you narrow down interesting traffic. We also provide some standard filters to help you do some basic searches. And as usual, we try to propagate the important information to top of the tree.

And finally the default columns that show up in Network Monitor 3.3 are different when you open an ETW trace. We add a "UT Process Name" column, which stands for Unified Tracing, which displays possibly the process name, as well as the process ID. This replaces the "Process Name" column that shows up when you open capture files.

Where to Go Next?

When assisted support or further troubleshooting is necessary, the improved network trace logs can enable developers and support professionals to quickly isolate network activities and view the combined event data and packet captures in a single file, grouped by a network task and the related network activity.

On MSDN there's a great reference that discusses Win7 troubleshooting with Event Tracing (http://go.microsoft.com/fwlink/?LinkID=145404). This resource explains in greater detail how to troubleshoot problems, create filters, and use conversations with ETW tracing and Network Monitor. And while it's Windows 7 specific, there is still some applicability to older operating systems. And Network Monitor can read any ETL file. While we don't necessarily have parsers for every provider, that doesn't mean a parser can't be written.

↧

Windows 7 and ISA Remote Windows Sockets Parsers Available

June 4, 2009, 12:17 pm

≫ Next: Circling In Shark Waters

≪ Previous: Event Tracing for Windows and Network Monitor

If you don't already know, we have been updating the our parsers for Network Monitor on http://www.CodePlex.com/NMParsers every month. Most recently we have updated the Windows parser set to support Windows 7 protocol updates. In the June parser release on CodePlex we have support for Remote Windows Sockets (RWS) protocol, which is used to proxy TCP and UDP traffic from Winsock applications. So now with the new parser set you can decode this traffic into the upper level protocols that ride on top of RWS.

These parsers rely on Network Monitor 3.3, so please upgrade first if you haven't already. Please visit NMParsers on CodePlex and download the last parser set so you can get the most up to date parsing experience. Enjoy!

↧

Circling In Shark Waters

June 25, 2009, 3:42 pm

≫ Next: TCP Analyzer Expert: Make Your Network Run Faster

≪ Previous: Windows 7 and ISA Remote Windows Sockets Parsers Available

Last week I attended Sharkfest 09 at Stanford CA and I had a wonderful time. It was great to talk to other network geeks like me to better understand this community and see how various tools can be used to illuminate the cloaked world that is your network.

Each day started with a keynote and then there were 3 tracks: Developer, Basic, and Advanced. The Developer track focused on parser development and capturing. For the most part I stuck to the Basic/Advanced tracks, but I did attend the Developer session on creating parsers (or dissectors as they call them). This gave me some insight into alternate ways protocol parsers can be architected. It was also great to hear from the master brain, chief Wireshark architect, Gerald Combs.

The SSL session by Sake Blok was interesting because it exposed the details of a protocol I've had little experience with. It's obvious this is a very important skill moving forward as the world moves to protect its information on the wire. He also provides some cases to explain where things might go wrong.

I found the case study sessions the most useful for me. I love to see how different people attack a problem and what features of a tool they use to get the important information. Especially enlightening were the presentations by Hansang Bae and Laura Chappell. In each case they tackled real world problems with real traces and provided details of how they troubleshoot network issues using a protocol analyzer. Laura was especially entertaining as she described her "Butt Ugly Color Filter" techniques and real world experiences with networking.

While there's no equivalent to being there in person, most of the presentations are available on http://www.cacetech.com/sharkfest.09/. Some of them include traces, which is great for learning on your own.

As I roamed beautiful Stanford, and roaming is what you do on such a vast campus, I thought about all the cool people I met and things I learned. I hope I will be there next year and encourage you to attend if you want to hone your networking skills. Whether you are a developer, beginner or advanced user there's always something to be learned.

↧

TCP Analyzer Expert: Make Your Network Run Faster

June 30, 2009, 5:24 pm

≫ Next: I Can't View My Windows Home Server at Home

≪ Previous: Circling In Shark Waters

Performance problems suck...time! But years of "Where's Waldo" has trained our brains in preparation for this moment. The TCP Analyzer expert, available from our Experts Download Page[ http://go.microsoft.com/fwlink/?LinkID=133950] takes advantage of that training by graphically representing TCP traffic. By looking at normal traffic or comparing the presented picture graph to some known TCP issues you can easily diagnose performance problems.

With the TCP Analyzer Expert you can load a trace, use the conversation tree to locate a TCP stream, and run the expert. If you don't have anything selected, the expert will use the first TCP conversation in the trace. Once it's run it presents you with a UI which will allow you to graph the TCP traffic, Analyzer Round Trip Time, and do some high level diagnosis based on some known issues.

How to Analyze Traffic

Say you suspect a problem or want to analyze some traffic. The first thing you need to do is collect a trace using Network Monitor. TCP Analyzer can try to "guess" the general problem and describe the issue. But for this to work properly you will need to take the trace from the machine initiating the connection. Also it helps to have the entire TCP connection as the window size is negotiated during the TCP 3-way handshake.

Once you start a trace, you then reproduce the performance test and stop the capture. Then save the capture, as Experts can only be run on saved traces. Go back to the start page where you'll see the file you just saved in the recent capture list and open it up.

Finding the TCP Conversation

The next trick is locating which TCP stream you want to run the expert on. In this case I copied a file using explorer and I knew then name of the file I copied. So I created the following filter.

ContainsBin(FrameData, UTF16BE, "myfile")

It could have potentially been ASCII as well, but with SMB I knew it would probably occur as Unicode. BTW, UTF16BE stands for Unicode 16 Big Endian. These days Unicode has many flavors, but UTF16BE is the most common one for Windows machines.

This filter located a bunch of SMB frames which meant I was on the right track. I right clicked a frame, selected Find Conversation, and choose TCP. This locates all other frames in the same conversation which the TCP Analyzer will use to determine which stream to use when it runs. Remember, to see the full stream in Network Monitor, remove the display filter you used to find the frame originally.

Now with the correct conversation selected, I run the TCP Analyzer Expert form the Experts menu. This runs the expert, but in order to get a graph to show up I have to press the graphing button from the toolbar.

Since there is traffic flowing in both directions, you need to determine which you want to concentrate on. You can use the port or IP address to figure this part out. Once you make this determination click the graph. This will display the graph in the main window allowing to you zoom in/out with the mouse wheel and you can drag the main graph around as well to pan.

You can also analyze the Round Trip Time, which is the graph in the middle. However there are some restrictions that have to be met before any information will be available. We won't cover RTT in this blog, but you can see the help for the expert for more information.

Decoding the Graph

The Axis

The Y axis shows the sequence numbers for the given direction. These are defined by TCP when a session initializes. Each sequence number represents the number of bytes transmitted. So sequence 1000-2000 represents 1000 bytes.

The X axis is time and is measure in (ms). This matches the offset as displayed in Network Monitor.

Legend Details

On the time-sequence graphs there are various symbols which can occur. Here's a list of what they mean.

· Receiver Window - Receiver is telling the sender it is currently willing to receive up to this point in the data stream.

· Acknowledged - Receiver is telling the sender it has successfully received all the data up to this point in the data stream.

· Data - The point in the data stream the sender is currently sending.

· SYN - The SYNcronize packet sent at the start of the connection.

· FIN - The FINish packet sent at the end of the connection.

· Discontinuity - Any break in the data stream where the data in the indicated packet doesn't sequentially follow the data in the previous packet. Out-of-order, lost, or retransmitted packets can all cause discontinuities, as can gaps in the capture.

· Presumed Lost - A packet that was later retransmitted (if a sequential group of packets are all later retransmitted, only the first one will be indicated this way).

· Retransmission - A packet that is a retransmission of another packet in the capture.

Understanding Bandwidth-Delay Product

The speed at which you can send data in TCP is dependent on both the bandwidth of your network and the delay. The bandwidth is often referred to in terms like 10Mbps or 100Mbps, which is in bits per second. The delay is how long it takes for data to travel from one place to another and back. While this is related to the speed of light, other things like routers and the computers that are communicating can increase this delay as it takes time to process packets.

By multiplying bandwidth and delay together, we get the maximum amount of data that be "in flight" over one connection between two computers. As you'll see, whether this maximum is utilized depends on how well TCP is tuned. It's important to understand as the delay gets longer it becomes more important to fill the available window.

Pictures of Wrong Behavior

In TCP there are some typical problems that creep up over and over. Sometimes these are configuration issues with the client/server TCP stack or application. And in some cases, the problems can be easily fixed by adjusting the application or TCP window size. Of course, this may also be caused by your network which may require more drastic measures.

The best way to understand right from wrong is to base-line your network when it is working properly. This way you can look at the bandwidth numbers alone and understand if you have degraded. But in absence of this data, you can use the following pictures as a reference in order to identify some common problems.

Bandwidth Limited:

In this case you see that the sent data fills up the window as the data packets (blue X) approach the receive window (red X). The packets are sent at a regular interval, so the only thing limiting your through put is the available bandwidth. This is normally what you want to see as your throughput will always be limited by something.

Receiver Limited:

The packets fill the receiver window, but they go out in bursts as fresh acknowledgement packets arrive and open up the window. This burstiness is an indicator that the window is smaller than the delay-bandwidth product, and thus the protocol can't keep the data stream flowing smoothly.

Sender Limited:

This indicates that one end's window size is less than the bandwidth-delay product. However, unlike the receiver-limited case above, the data packets fall well short of filling the receiver's advertised window. This is a good indicator that the sender's window was the limiting factor. In some cases this is because the application doesn't fill the window completely. As this often does not show up under low latency, a developer might not detect this type of problem in testing.

Congested Limited:

The earlier data points (lower left) look like a bandwidth-limited connection, until two lost packets cause TCP to severely limit the sender's congestion window after recovering from the losses. Note that the last data points (upper right) show the data packets aren't filling the receiver's advertised window as TCP is limiting the sender to a smaller congestion window.

It's important to note that the pictures were created in test environments. Real word applications tend to be more conversational and you'll often have to narrow down the part of the picture you need to focus on. For instance when you start a file copy with explorer, there's a lot of traffic that goes back and forth as you browse for the folder, select the file and then finally drag and drop it on the destination folder. You'll have to learn how to differentiate the actually transfer part from the rest of the traffic.

Power of the Picture

TCP Analyzer does an awesome job of taking a lot of information and summarizing in a picture that can be used to give a good overview of your network’s performance. It can take practice to learn how to read as you understand these scenarios that were presented as well as others. But as you learn you'll find that this is a powerful tool in your tool belt.

↧

I Can't View My Windows Home Server at Home

August 14, 2009, 4:00 pm

≫ Next: Chained Captures and Stitching Them Back Together

≪ Previous: TCP Analyzer Expert: Make Your Network Run Faster

I have a friend who just received his Windows Home Server. Home Server allows you to access it remotely so you can share photos, Remote Desktop and backup documents. The provided documentation includes details on how to setup your router, open ports, and setup an external name like “myhomesrv. homeserver.com.” The problem was, when he went to test this out by typing the address in his web browser, he was shown his router's administrative web page instead of his Windows Home Server web page. Yet, I was able to access the web page fine from my work machine.

Collecting Evidence

I told my friend to download Network Monitor and get a trace. I also asked that he clear his local DNS cache by typing "ipconfig /flushdns". This is important because if a name is already cached it won't try and resolve the name again. This step ensures the resolution traffic will be captured when we reproduce the problem. In just a few minutes he sent me the capture file, and I opened it up.

Filtering on the External Name

I start by opening the trace and looking for DNS traffic by applying the display filter "DNS". In this particular trace there's a bunch of DNS traffic, but by looking at the summary line I can see the name my friend was trying to resolve.

192.168.2.2	192.168.2.1	DNS:QueryId = 0x847E, QUERY (Standard query), Query for myhomesrv.homeserver.com of type Host Addr on class Internet
192.168.2.1	192.168.2.2	DNS:QueryId = 0x847E, QUERY (Standard query), Response - Success, Array[xxx.143.174.204,yyy.46.154.126]

I see the query for "myhomesrv.homeserver.com" and then look for the matching response. In this case it was the next frame, but if you had a lot of traffic you could do a search for a DNS frame with the matching Query ID. And if you didn't know how to create a filter for the QueryID, you could right click on it in the frame details and “add to display filter” to understand how it should look.

Without even having to dig into the frame, you can see the response has all IP address info bubbled to the summary line. (By the way, I've obscured the address with xxx and yyy, but normally these would show as real numbers.) The proof I was looking for was to make sure the name, myhomesrv.homeserver.com, was being resolved to the external IP address of the router. Indeed the IP addresses matched, so I know that the name is resolving properly.

Next, I looked for the TCP setup and HTTP request that should occur since we were trying to browse his personal page. This occurs right after the DNS traffic as well.

192.168.2.2	myhomesrv.homeserver.com	TCP:Flags=......S., SrcPort=60824, DstPort=HTTP(80), PayloadLen=0, Seq=2533385604, Ack=0, Win=8192 ( Negotiating scale factor 0x2 ) = 8192
myhomesrv.homeserver.com	192.168.2.2	TCP:Flags=...A..S., SrcPort=HTTP(80), DstPort=60824, PayloadLen=0, Seq=113434048, Ack=2533385605, Win=5840 ( Negotiated scale factor 0x0 ) = 5840
192.168.2.2	myhomesrv.homeserver.com	TCP:Flags=...A...., SrcPort=60824, DstPort=HTTP(80), PayloadLen=0, Seq=2533385605, Ack=113434049, Win=16425 (scale factor 0x2) = 65700
192.168.2.2	myhomesrv.homeserver.com	HTTP:Request, GET /
mhomesrv.homeserver.com	192.168.2.2	TCP:Flags=...A...., SrcPort=HTTP(80), DstPort=60824, PayloadLen=0, Seq=113434049, Ack=2533386251, Win=7106 (scale factor 0x0) = 7106
mhomesrv.homeserver.com	192.168.2.2	HTTP:Response, HTTP/1.0, Status Code = 200, URL: /

We see that the client attaches to myhomesrv.homeserver.com, which is the same resolved name we saw picked up by DNS in the traffic before. The Network Monitor parsers will automatically resolve names for you when it sees name resolution traffic, but you can always add different columns or simply dig into the frame to verify the IP address.

Now, we see that the traffic is going to the right address. It appears that the name resolution is working correctly and doing want we want. However, the response shows information that looks like my friend’s router’s web page.

Of course this isn't a surprise because this is what we see in the browser as well. Then what happened? Why did the web page from his router appear instead of his home server?

Doing Some Homework

We've identified some strange behavior, what next? A trace from the ISP might give us more information. Personally, I can't even get my ISP to answer simple billing questions so asking for a trace would probably be fruitless. But maybe we can see if other people are experiencing the same problem. After doing some Bing searches, I came across this blog (http://www.myhomeserver.com/?page_id=67). In particular in Step 7 it mentions the "loopback issue".

It appears that some routers don't know what to do with an external address when sent from the inside. As we see, this matches the behavior in the trace. The DNS request returns the address we expect, and the following HTTP request is also sent to the right place. However, we see that the response from the router comes back with the router’s web page. Instead we should have seen the HTTP request get bounced to our Home Server’s internal address.

Buy a New Router?

Well maybe that's extreme. I would suggest checking for a firmware upgrade first. A less expensive simple solution is to use the Home Server machine name in these circumstances. In any case my friend is now able to access his Home Server’s website internally by using http://myhomesrv and externally with the address http://myhomesrv.homeserver.com.

↧

Chained Captures and Stitching Them Back Together

September 9, 2009, 10:49 am

≫ Next: Delayed Write Failure Trace Study

≪ Previous: I Can't View My Windows Home Server at Home

When you use NMCap to capture data you have an option to save the capture files as a chain. As the current capture file format has a limited size, this option allows you to continually capture the data in successive files. This also gives you some flexibility to limit the size. If you are sending files to another person for analysis you could send only the files that relate to the time period where a problem occurred. After using this feature; however, it might be useful to filter and re-stitch these capture files back together.

Capturing Chained Files with NMCap

You can capture using chained files using NMCap by naming the file with a .chn extension. The resulting files are named .cap, but they'll be a "capfile(#).cap" for every chained capture file after the first one. So for instance using the following command:

NMCap /network * /capture ipv4.address==1.2.3.4 /file foo.chn:1M

Will produce capture files which are 1 meg in size and have the following names in this order: foo.cap, foo(1).cap, foo(2).cap and so on. I've also provided a capture filter to limit the traffic to just one address. However, for the best performance I would leave any filtering out.

Combining Captures with NMCap

Using NMCap, you can recombine these to create one large capture file. To do this use the /InputCapture option as follows:

NMCap /InputCapture foo.cap foo(1).cap foo(2).cap /Capture /File out.cap

You could additionally add a filter to limit the information that gets transferred. For instance, say I only wanted to see port 80 traffic in the resulting trace. In that case the following NMCap will get the job done.

NMCap /InputCapture foo.cap foo(1).cap foo(2).cap /Capture tcp.port==80 /File out.cap

Using a Script to Combine Many Capture Files

Now, this might get somewhat tedious the more files you have. We can solve this problem by using a simple CMD Script to create collect all the files for us. Just create a file using notepad called stitch.CMD and place in it these contents:

REM Usage: stitch InCapFileBaseName OutCapFile.cap [Filter]

REM Creates flat output of capture files by date

dir /b /od %1*.cap > %TEMP%\captures.txt

REM Stores ordered file list in environment variable

SET INCAP=/InputCapture

for /f %%c in (%TEMP%\captures.txt) do call :addCap %%c

REM Calls NMCap to combine files

NMCap %INCAP% /capture%3 /file %2.chn:500M[MAH3]

goto :eof

REM Routine to append a file to the environment variable

:addCap

SET INCAP=%INCAP% %1

goto :eof

The CMD script file takes three parameters; the first is the original file name without the .cap extension. The second is the output capture file. Add the 3rd is the filter which is optional. You'll also want to run the script in the directory where all your captures are. Since it searches for *.cap, make sure there aren't any extraneous captures.

↧

Delayed Write Failure Trace Study

September 21, 2009, 11:01 am

≫ Next: SMB Opportunistic Locking Behavior

≪ Previous: Chained Captures and Stitching Them Back Together

In this "Trace Study”, we'll look at a case where the customer is seeing delayed write failures logged in the event log. Delayed write failures are reported when a file being written over the network is inaccessible for a time. Based on a trace taken at the same time as the error was logged, we will determine the cause.

Zooming In

Since we know the file name reported in the event log error, we'll use that name to find where in the trace we are accessing this file. We start by building a filter that uses a property we set for any SMB frame which references a file.

Property.SMBFileName.Contains("dir.txt")

This displays a bunch of frames that reference the "dir.txt" file, but this does not represent the entire conversation. To get the entire conversation, right click any frame and select Find Conversation->SMB. Then remove your display filter and now you will see all the frames associated with this particular SMB conversation. An SMB conversation is usually all operations involving a single file.

The next step is to look for an error of some kind. We do this by creating a color filter (http://blogs.technet.com/netmon/archive/2007/06/28/color-filtering-error-messages.aspx) to make SMB error frames stand out. We'll use this color filter:

(smb.DOSError.Error != 0 AND smb.DOSError.Error != 22)

OR

(smb.NTStatus.Code != 0 && smb.NTStatus.Code!= 22)

I made my color filter have a red background and a white foreground, a color scheme I use to identify errors.

With this color filter enabled, I simply scroll through the trace looking for a red frame to stand out. As they pop up you'll have to look at the specific error and see if it applies. In my case I see a STATUS_NETWORK_SESSION_EXPIRED. Following this traffic I see a Session Setup, and then continued SMB Writes before and after.

SMB:C; Write Andx, FID = 0x400C (\files\dir.txt@#1644), 1 bytes at Offset 32780

SMB:R; Write Andx, FID = 0x400C (\files\dir.txt@#1644), 1 bytes

SMB:C; Transact2, Query File Info, Query File Standard Info, FID = 0x400C (\files\dir.txt@#1644)

SMB:R; Transact2, Query File Info, FID = 0x400C (\files\dir.txt@#1644) - NT Status: System - Error, Code = (860) STATUS_NETWORK_SESSION_EXPIRED

SMB:C; Session Setup Andx, Krb5ApReq (0x100)

SMB:R; Session Setup Andx, Krb5ApRep (0x200)

SMB:C; Write Andx, FID = 0x400C (\files\dir.txt@#1644), 1 bytes at Offset 32780

SMB:R; Write Andx, FID = 0x400C (\files\dir.txt@#1644), 1 bytes

Obviously this is not normal traffic for SMB. Session Setups occur when you first make a connection to a share, but not in the middle of a file transfer. What caused this session to expire?

Zooming Out

When we used the "Find Conversation->SMB" above, we narrowed down the traffic to just one SMB conversation. But something happened on another network conversation in between our Session Setup and the last error. To figure out where to go next, we'll have to zoom out and look at the rest of the traffic around the error in question. I'll select the error frame to keep my context and then click on "All Traffic" at the top of the conversation tree to remove the SMB conversation filter. When I do, I see the following traffic:

SMB:C; Write Andx, FID = 0x400C (\files\dir.txt@#1644), 1 bytes at Offset 32780

SMB:R; Write Andx, FID = 0x400C (\files\dir.txt@#1644), 1 bytes

SMB:C; Transact2, Query File Info, Query File Standard Info, FID = 0x400C (\files\dir.txt@#1644)

SMB:R; Transact2, Query File Info, FID = 0x400C (\files\dir.txt@#1644) - NT Status: System - Error, Code = (860) STATUS_NETWORK_SESSION_EXPIRED

KerberosV5:TGS Request Realm: CORP1.LOCAL Sname: cifs/c01e3n01ads.corp1.local

TCP:Flags=...A...., SrcPort=1162, DstPort=Microsoft-DS(445), PayloadLen=0, Seq=1084491174, Ack=239237167, Win=4163

KerberosV5:TGS Response Cname: Kevin

KerberosV5:TGS Request Realm: CORP1.LOCAL Sname: krbtgt/CORP1.LOCAL

KerberosV5:TGS Response Cname: Kevin

KerberosV5:AS Request Cname: Kevin Realm: CORP1.LOCAL Sname: krbtgt/CORP1.LOCAL

KerberosV5:AS Response Ticket[Realm: CORP1.LOCAL, Sname: krbtgt/CORP1.LOCAL]

KerberosV5:TGS Request Realm: CORP1.LOCAL Sname: krbtgt/CORP1.LOCAL

KerberosV5:TGS Response Cname: Kevin

SMB:C; Session Setup Andx, Krb5ApReq (0x100)

Kerberos Ticket Expired

Once the UI has completed updating the frame summary, my current selection remains on the SMB Error frame which keeps my place. But now some new Kerberos frames show up. This information together with the "Session Expired" message tells us the whole story.

The expired SMB session means we need to re-authenticate. In this case the Kerberos ticket expired and a new ticket had to be issued to us by the server. If we had the original setup traffic, we would be able to see the initial Kerberos ticket with its expiration time. Once this Kerberos negotiation completes, the SMB session is reset using the new Kerberos ticket and the SMB traffic continues where it left off. This authentication interruption in the traffic is what caused our "Delayed Write Failure" event log error message in the first place.

Getting to the Bottom of Things

In this case the Delay Write failure is easily explained. But there are many ways a delayed write failure can be triggered. You can use these same steps to zoom in and zoom out of a trace to understand this type of problem. Next time you see a Delay Write failure in your event log, I hope you can use these steps to figure out why it occurred.

↧

SMB Opportunistic Locking Behavior

September 22, 2009, 10:10 am

≫ Next: Using NMAPI to Access TCP Payload

≪ Previous: Delayed Write Failure Trace Study

Behold the mysterious world of OpLocks (Opportunistic Locking). Often OpLocks will be disabled by a user or system administrator in order to help address a performance problem. And this practice might not always be the best course of action. Understanding how OpLocks behave in a trace can provide you more information so you can properly diagnose an OpLock issue.

What is an OpLock

OpLocks are used to enhance performance on a network where multiple people are accessing the same file. By the way these are somewhat different than the notion of "optimistic locking" in databases. Imagine that you are the only person editing a file on a server. Because nobody else has the file open, you could cache your changes locally for both read and writes. This would improve your performance because you wouldn't have to go over the network for any of this cached information.

Now imagine somebody else opens the file after you do. If you have changes in your local cache, this new user won't see those changes. OpLocks, or more specifically a break of an OpLock in this case, is how your computer is told to flush its local cache.

In general there are different levels of OpLocks, like Batch, Exclusive, and Level 2 which define how a file can be shared with respect to this local caching. But rather than go into a lot of detail about the specifics, let me point you to some references which do a good job of describing more detail.

Example OpLocks in a Trace

In this example we have two clients - Windows XP (SMB) and Windows Vista (SMB2) viewing the same directory on a 3rd computer using explorer. As explorer reads the data, file collisions occur which cause various OpLock traffic. We will focus on a piece of this traffic and describe how the OpLock behavior is working. Once you see what normal traces look like, you can use this information to troubleshoot issues with OpLocks.

Setting up the Trace in Network Monitor

One nice feature I like to use is aliases. This gives me the ability to change IP addresses to something I can better recognize, especially when working with 3 machines as in this case. By right clicking on an address in the source or destination column, I can select "Create Alias for..." and then provide a friendly name. In my case I will call them SRV for the server, and Vista and XP for each client.

The second thing I'll do is add the display filter "SMB or SMB2" so that I only see these protocols. This will get rid of any TCP or unrelated traffic for this demonstration.

Finally, I also added comments to this particular trace. Comments are an easy way to document the traffic that occurs for others to learn from. By adding the "Comment Title" as a column, these comments show up and provide some commentary about what is going on. By the way, the # next to the frame number signifies which frames have a comment. Alternatively you can keep the comment tab open to see each comment as you click on frame. Using the latter method enabled you to see more detail in the description column.

Traffic Analysis

I copy and pasted the data from the Network Monitor summary view. Here is the traffic that occurs between the 3 machines:

Frame Number	Source	Destination	Description	Comment Title
3110#	Vista	SRV	SMB2:C CREATE (0x5), Context=DHnQ,Create Durable Open Handle, Context=MxAc,Maximal Access, Context=QFid,Request Unique File ID , FileName = ...\Documents\desktop.ini@#3110	Vista Client Opens desktop.ini, request oplock batch
3111	XP	SRV	SMB:C; Transact2, Query Path Info, Query File Basic Info, Pattern = \...\Documents\desktop.ini
3112	SRV	XP	SMB:C; Locking Andx, FID = 0x400E (\...\Documents\desktop.ini@#2519)
3113	SRV	XP	SMB:R; Transact2, Query Path Info, Query File Basic Info
3114#	SRV	Vista	SMB2:R CREATE (0x5) Interim Response, FileName = ...\Documents\desktop.ini@#3110	Server response that this command is Pending
3116#	XP	SRV	SMB:C; Close, FID = 0x400E , FileName=\...\Documents\desktop.ini@#2519	XP Client closes desktop.ini
3117	SRV	XP	SMB:R; Close, FID = 0x400E , FileName=\...\Documents\desktop.ini@#2519
3118#	SRV	Vista	SMB2:R CREATE (0x5), Context=MxAc,Maximal Access, Context=DHnQ,Create Durable Open Handle, Context=QFid,Request Unique File ID, FID=0xFFFFFFFF002000C5(...\Documents\desktop.ini@#3110)	Server responds to the Vista client with batch oplock granted
3119#	XP	SRV	SMB:C; Nt Create Andx, FileName = \...\Documents\desktop.ini	XP client wants to open desktop.ini again
3120#	SRV	Vista	SMB2:N OPLOCK BREAK (0x12), Oplock Level II Notification, FID=0xFFFFFFFF002000C5,FileName=...\Documents\desktop.ini@#3110	Server send Oplock break to Level 2 Notification to Vista client
3122	Vista	SRV	SMB2:C CREATE (0x5), Context=DHnQ,Create Durable Open Handle, Context=MxAc,Maximal Access, Context=QFid,Request Unique File ID , FileName = ...\Links@#3122
3123	SRV	Vista	SMB2:R CREATE (0x5), Context=MxAc,Maximal Access, Context=QFid,Request Unique File ID, FID=0xFFFFFFFF002000CD(...\Links@#3122)
3124#	Vista	SRV	SMB2:A OPLOCK BREAK (0x12), Oplock Level II Acknowledgment, FID=0xFFFFFFFF002000C5,FileName=...\Documents\desktop.ini@#3110	Vista Client sends Oplock Level 2 Acknowledge to Server
3125#	SRV	Vista	SMB2:R OPLOCK BREAK (0x12), Oplock Level II Response, FID=0xFFFFFFFF002000C5,FileName=...\Documents\desktop.ini@#3110	Server sends break OpLock break to Level 2 response
3126	SRV	XP	SMB:R; Nt Create Andx, FID = 0x8008 (\...\Documents\desktop.ini@#3119)

As we start in frame 3110, we see that the Vista client opens desktop.ini and requests a Batch OpLock. Since the OpLock request is part of the SMB Create, the actual request is buried in the frame details.

Frame: Number = 3110, Captured Frame Length = 386, MediaType = ETHERNET

...

+ SMBOverTCP: Length = 264

- SMB2: C CREATE (0x5), Context=DHnQ,Create Durable Open Handle, Context=MxAc,Maximal Access, Context=QFid,Request Unique File ID , FileName = paullo\Documents\desktop.ini@#3110

SMBIdentifier: SMB

+ SMB2Header: C CREATE (0x5),TID=0x0009, MID=0x04F2, PID=0xFEFF, SID=0x0001

- CCreate: 0x1

StructureSize: 57 (0x39)

SecurityFlags: 0 (0x0)

RequestedOplockLevel: SMB2_OPLOCK_LEVEL_BATCH - A batch oplock is requested.

...

Frames 3111-3113 contain other traffic our XP client is doing which also happens to touch desktop.ini.

In frame 3114 the server returns a STATUS_PENDING because the server is not yet ready to respond.

Frame: Number = 3114, Captured Frame Length = 194, MediaType = ETHERNET 

...

+ SMBOverTCP: Length = 73

- SMB2: R CREATE (0x5) Interim Response, FileName = paullo\Documents\desktop.ini@#3110 

SMBIdentifier: SMB

- SMB2Header: R CREATE (0x5),TID=0x0000, MID=0x04F2, PID=0x0000, SID=0x0001

StructureSize: 64 (0x40)

Epoch: 0 (0x0)

+ Status: 0x103, Facility = FACILITY_SYSTEM, Severity = STATUS_SEVERITY_SUCCESS, Code = (259) STATUS_PENDING

Command: CREATE (0x5)

...

Frame: Number = 3114, Captured Frame Length = 194, MediaType = ETHERNET

The XP Client is closing desktop.ini so the server will wait for that to complete first. This way it can grant the Batch OpLock the Vista client is requesting. If the XP client keeps the file open, the OpLock might have been denied. Once it completes, the SMB2 Create response is finally returned and the Batch OpLock is granted in frame 3118.

Frame: Number = 3118, Captured Frame Length = 394, MediaType = ETHERNET 

...

+ SMBOverTCP: Length = 272

- SMB2: R CREATE (0x5), Context=MxAc,Maximal Access, Context=DHnQ,Create Durable Open Handle, Context=QFid,Request Unique File ID, FID=0xFFFFFFFF002000C5(paullo\Documents\desktop.ini@#3110) 

SMBIdentifier: SMB

+ SMB2Header: R CREATE (0x5),TID=0x0000, MID=0x04F2, PID=0x0000, SID=0x0001

- RCreate: 0x1

StructureSize: 89 (0x59)

OplockLevel: SMB2_OPLOCK_LEVEL_BATCH - A batch oplock was granted.

...

Frame: Number = 3118, Captured Frame Length = 394, MediaType = ETHERNET

Next another create request for desktop.ini appears in frame 3119 as the XP client wants to reopen the file again. Since this is a second open of the same file, the server has to notify the Vista client to break its OpLock to Level 2 in frame 3120.

3119#	XP	SRV	SMB:C; Nt Create Andx, FileName = \...\Documents\desktop.ini	XP client wants to open desktop.ini again
3120#	SRV	Vista	SMB2:N OPLOCK BREAK (0x12), Oplock Level II Notification, FID=0xFFFFFFFF002000C5,FileName=...\Documents\desktop.ini@#3110	Server send Oplock break to Level 2 Notification to Vista client

The exact algorithm for breaking an OpLock is explained in the system documents referenced above and is related to the file system, so I won't go over those specifics. But in general since two clients have the same file open, the local client caching algorithm has to change. The Vista client can no longer assume the file won't be changed and there for can't cache the file locally.

In frame 3124, the "notify" is acknowledged and now the server can respond back to the Vista client in frame 3125 that the OpLock was broken to level 2. Finally Frame 3126 is the response back to the XP client that the open on desktop.ini has been completed.

3124#	Vista	SRV	SMB2:A OPLOCK BREAK (0x12), Oplock Level II Acknowledgment, FID=0xFFFFFFFF002000C5,FileName=...\Documents\desktop.ini@#3110	Vista Client sends Oplock Level 2 Acknowledge to Server
3125#	SRV	Vista	SMB2:R OPLOCK BREAK (0x12), Oplock Level II Response, FID=0xFFFFFFFF002000C5,FileName=...\Documents\desktop.ini@#3110	Server sends break OpLock break to Level 2 response
3126	SRV	XP	SMB:R; Nt Create Andx, FID = 0x8008 (\...\Documents\desktop.ini@#3119)

Troubleshooting Performance and OpLocks

The previous example worked smoothly as it usually does. But in some instances an OpLock request does not get a response in a timely fashion. In those cases you might see a 35 second delay which is the default timeout for an OpLock. This could cause application timeouts or what seems like a hanging application from the user’s perspective. Also this 35 second delay is a sure sign OpLocks are involved in a performance issue. Just remember that as shown in the example above, multiple clients are probably involved. And it's this type of interaction you must learn to recognize in order to troubleshoot a performance problem with OpLocks.

↧

Using NMAPI to Access TCP Payload

October 7, 2009, 9:31 am

≫ Next: Network Monitor Videos on Channel 9

≪ Previous: SMB Opportunistic Locking Behavior

The TCP Payload often carries data that you want to access directly using the Network Monitor API. Below I will detail how to do this using a simple C++ example and the NMAPI.

Why Not add a TCP.Payload Field?

The TCP Payload can carry all types of payloads depending on the protocol that rides on top of TCP. Most often these represent other protocols, but you might not care about the protocol and instead want to see the payload size or payload data directly. You might think that you could access TCP.Payload to access this data, as this is a valid data field. However, TCP.Payload is only instantiated when no other protocol consumes the data. And in most cases, our parsers are complete enough to attempt to parse the data further. This is a limitation of how NPL works, and means we need to find another way to get the payload data.

Why Not use Property.TCPPayload?

Now there is a property, see this blog for more info on properties, called Property.TCPPayload that you could potentially use. The limitation is that it only works with ASCII or UNICODE data. So for binary information the data does not read properly into the property.

The Solution

The solution is to find the TCP payload depending on the TCP header location and size. We can use Property.TCPPayloadLength to obtain the total length of the payload. And to get the offset into the frame we use the TCP header length (TCP.DataOffset.DataOffset). Finally to get the start of the TCP frame we use the offset of TCP.SrcPort which is the first field in a TCP frame. With these pieces of information, we can use NmGetPartialRawFrame API to grab the raw data from the frame.

So here's the code snippet:

void
GetFramePayload(HANDLE ParsedFrame, HANDLE FrameParser, HANDLE RawFrame)
{
    ULONG ret;
    UINT32 PayloadLen = 0;
    ULONG retlen;
    NmPropertyValueType PropType;

    UINT8 TCPHeaderSize;
    ULONG TCPSrcOffset, TCPSrcSize;


    // Get Payload Length
    ret = NmGetPropertyValueById(FrameParser, TCPPayloadLengthID, sizeof(PayloadLen), (PBYTE)&PayloadLen, &retlen, &PropType);
    if(ret != ERROR_SUCCESS)
    {
        wprintf(L"Error retrieving TCP Payload Length Property, err=%d\n", ret);
        return;
    }

    if(PayloadLen > 0)
    {
        // Get the Data Offset, used to determine the TCP header size
        ret = NmGetFieldValueNumber8Bit(ParsedFrame, TCPDataOffsetID, &TCPHeaderSize);
        if(ret != ERROR_SUCCESS)
        {
            wprintf(L"Error retrieving TCP Header Length Field, err=%d\n", ret);
            return;
        }

        // Get the Offset of TCP.SrcPort which is the first field in TCP.
        ret = NmGetFieldOffsetAndSize(ParsedFrame, TCPSrcPortID, &TCPSrcOffset, &TCPSrcSize);
        if(ret != ERROR_SUCCESS)
        {
            wprintf(L"Error retrieving TCP SRC Header/Offset, err=%d\n", ret);
            return;
        }

        wprintf(L"Offset: %d, Length: %d, HeaderLen: %d\n", TCPSrcOffset/8, PayloadLen, TCPHeaderSize*4);

        // Allocate a buffer based on the Payload Length Property.
        PBYTE buf = (PBYTE)malloc(PayloadLen);

        // Read in the partial frame.  The Offset is in bits.  TCPHeaderSize is off by a factor of 4.
        ret = NmGetPartialRawFrame(RawFrame, TCPSrcOffset/8 + TCPHeaderSize*4, PayloadLen, buf, &retlen);

        // Do what ever you want with buf now.  I'll assume it's ASCII and print it.
        wprintf(L"%S", buf);
    }
}

And here is the initialization code for each of our frame parser to see how each data field and property was added:

HANDLE
MyLoadNPL(void)
{
    HANDLE myFrameParser = INVALID_HANDLE_VALUE;
    ULONG ret;

    // Use NULL to load default NPL set.
    ret = NmLoadNplParser(NULL, NmAppendRegisteredNplSets, MyParserBuild, 0, &NplParser);

    if(ret == ERROR_SUCCESS){
        ret = NmCreateFrameParserConfiguration(NplParser, MyParserBuild, 0, &FrameParserConfig);

        if(ret == ERROR_SUCCESS)
        {

            ret = NmAddProperty(FrameParserConfig, L"Property.TCPPayloadLength", &TCPPayloadLengthID);
            if(ret != 0)
            {
                wprintf(L"Failed to add Property.TCPPayloadLength, error 0x%X\n", ret);
            }

            ret = NmAddField(FrameParserConfig, L"TCP.SrcPort", &TCPSrcPortID);
            if(ret != ERROR_SUCCESS)
            {
                wprintf(L"Failed to add field, TCP.SrcPort, error 0x%X\n", ret);
            }

            ret = NmAddField(FrameParserConfig, L"TCP.DataOffset.DataOffset", &TCPDataOffsetID);
            if(ret != ERROR_SUCCESS)
            {
                wprintf(L"Failed to add field, TCP.DataOffset, error 0x%X\n", ret);
            }

            ret = NmCreateFrameParser(FrameParserConfig, &myFrameParser);

            if(ret != ERROR_SUCCESS)
            {
                wprintf(L"Failed to create frame parser, error 0x%X\n", ret);
                NmCloseHandle(FrameParserConfig);
                NmCloseHandle(NplParser);
                return INVALID_HANDLE_VALUE;
            }
        }
        else
        {
            wprintf(L"Unable to load parser config, error 0x%X\n", ret);
            NmCloseHandle(NplParser);
            return INVALID_HANDLE_VALUE;
        }

    }
    else
    {
        wprintf(L"Unable to load NPL\n");
        return INVALID_HANDLE_VALUE;
    }

    return(myFrameParser);
}

By using TCP.SrcPort, we get rid of any dependency of the stack. This will work on IPv4, IPv6 or any tunneled protocols. Also the TCP.PayloadLength is computed by the parsers which again is agnostic to the carrying protocols.

Party on Your Payload

Now that you have your payload in a BYTE buffer, you can do what ever you want with it. For instance, if you wanted to create an expert to show each payload and response as text, you could simply take the frame number that is referenced and use that to determine the conversation key for the TCP conversation, i.e. using a property Conversation.ID.TCP. Then you can use this to filter all other packets in the same trace with the same TCP Conversation ID. This would give you a high level view of text based traffic like HTTP and FTP. Of course there is a little more work to deal with fragmented data, but the API gives you all the tools to accomplish this.

↧

Network Monitor Videos on Channel 9

October 12, 2009, 7:19 am

≫ Next: Reassembling Packets with the Network Monitor API

≪ Previous: Using NMAPI to Access TCP Payload

We posted some videos to Channel 9 in the last 6 months or so, and I wanted to let everybody know about them.

We have one set of video's that provide some insight into the Network Monitor API and process of creating experts. This series provides an overview of the API and dives deeply in to various aspects like Live Capturing, Parser Engine, and API Overview as well as a general Expert Story to understand the big picture. We plan to release a few more in the upcoming months so stay tuned.

We also have some videos from our Plug Fests, which is where we invite partners to get information on specific technologies to which they want to interoperate.

So please visit the Channel9 (http://channel9.msdn.com/tags/Netmon/) site and learn more about Network Monitor.

↧

Reassembling Packets with the Network Monitor API

October 12, 2009, 11:00 am

≫ Next: Adapters Are Missing After Upgrading to Windows 7

≪ Previous: Network Monitor Videos on Channel 9

Network traffic by nature is fragmented. Limits of various network packet sizes force protocols to chop up data into multiple frames. When you capture data or read it from a trace with the API (NMAPI) you see only the fragments by default. But as the engine is collecting packets, it can be configured to pass up the reassembled payloads as well. For an intro to how assembly works in the UI, please see the video on reassembly. We also released a recent video on Channel 9 which has some information about the API and reassembly. I would also recommend reading the "Introduction to the Network Monitor API" in the help file for a general background.

Configuring the Parser

The first step is to configure your parser to reassemble. Your API tool for breaking apart a frame is called the Frame Parser object. But to create a frame parser, you start by creating a Frame Parser Configuration. This configuration allows you to add data fields and properties. But it also allows you configure your parser for Reassembly and Conversations. In this case Reassembly might depend on Conversations, so we will enable them both. Here's how I setup my Parser Configuration and Frame Parser.

// Returns a frame parser with a filter and one data field.
// INVALID_HANDLE_VALUE indicates failure.
HANDLE
MyLoadNPL(void)
{
    HANDLE myFrameParser = INVALID_HANDLE_VALUE;
    ULONG ret;

    // Use NULL to load default NPL set.
    ret = NmLoadNplParser(NULL, NmAppendRegisteredNplSets, MyParserBuild, 0, &g_NplParser);

    if(ret == ERROR_SUCCESS){
        ret = NmCreateFrameParserConfiguration(g_NplParser, MyParserBuild, 0, &g_FrameParserConfig);

        if(ret == ERROR_SUCCESS)
        {
            // Order is important here, must turn on Conversations before Reasembly.
            ret = NmConfigConversation(g_FrameParserConfig, NmConversationOptionNone , TRUE);
            if(ret != ERROR_SUCCESS)
            {
                wprintf(L"Failed to config reassembly, error 0x%X\n", ret);
            }

            ret = NmConfigReassembly(g_FrameParserConfig, NmReassemblyOptionNone , TRUE);
            if(ret != ERROR_SUCCESS)
            {
                wprintf(L"Failed to config reassembly, error 0x%X\n", ret);
            }

            // Property so we can show the highest protocol description.
            ret = NmAddProperty(g_FrameParserConfig, L"property.Description", &g_DescPropID);
            if(ret != ERROR_SUCCESS)
            {
                wprintf(L"Failed to add field, error 0x%X\n", ret);
            }

            ret = NmCreateFrameParser(g_FrameParserConfig, &myFrameParser, NmParserOptimizeNone);

            if(ret != ERROR_SUCCESS)
            {
                wprintf(L"Failed to create frame parser, error 0x%X\n", ret);
                NmCloseHandle(g_FrameParserConfig);
                NmCloseHandle(g_NplParser);
                return INVALID_HANDLE_VALUE;
            }
        }
        else
        {
            wprintf(L"Unable to load parser config, error 0x%X\n", ret);
            NmCloseHandle(g_NplParser);
            return INVALID_HANDLE_VALUE;
        }

    }
    else
    {
        wprintf(L"Unable to load NPL\n");
        return INVALID_HANDLE_VALUE;
    }

    return(myFrameParser);
}

After creating your Frame Parser Configuration Object, you'll want to set any options first. This will let the engine optimize properly when adding other things like properties and data fields. It's also important that you turn on conversations before reassembly. Placing them in the wrong order will turn off Reassembly due to a bug in our API.

Above we also added a property so that I can show the description of the current frame. This is not necessary for reassembly to work, but it helps us understand the example.

Parsing the Frames

It is up to the parsers (NPL) to mark each frames fragment type: First=1, Middle=2, Last=3 or None=0. The engine tracks these fragments and returns a new inserted raw frame once a Last fragment is detected for a specific protocol.

When you parse a raw frame using NmParseFrame, the last parameter passed is a pointer to a HANDLE that will contain an InsertedRawFrame if one is present. Otherwise this value will be set to INVALID_HANDLE_VALUE for any frame that doesn't return a reassembled payload. For frames that do have a reassembled payload, the handle returned will contain a raw frame. You can now use your frame parser to parse this raw frame.

The main part of my code simply retrieves frames from the capture file iteratively and calls ParseFrame, which does all the work. If an inserted frame is found, the function calls itself. The function is recursive because the handles for a RawFrame, ParsedFrame and InsertedRawFrame have to be closed in the order they were opened. There are other ways to do this, but for this example a recursive routine was the easiest. You will also want to insure the frames are in order. For instance you could use NmOpenCaptureFileInOrder to make sure the TCP frames are ordered correctly.

In my case I parse and display all the frames so that you can get a feel for the pattern that occurs as frames fragments are marked by the engine. It also helps to shows how fragmentation looks at different protocol layers. If you were interested in only the reassembled frames or frames that are not fragmented to begin with, you could identify those as having a fragment type of None and no InsertedRawFrame.

Here's the recursive frame parsing routine:

// Recursive Parsing routine.  If an inserted frame is found, the recusive routine is called again.  This
// allows us to close our handles in the order there were created.
void
MyParseFrame(HANDLE frameParser, HANDLE rawFrame, ULONG curFrame, PULONG reassembleFrames, int reassembleCount)
{
    ULONG ret;
    HANDLE ParsedFrame = INVALID_HANDLE_VALUE;
    HANDLE InsRawFrame = INVALID_HANDLE_VALUE;

    // NmUseFrameNumber and valid unique frame numbers are neccessary for Reassembly to work properly.
    ret = NmParseFrame(frameParser, rawFrame, curFrame + *reassembleFrames, NmFieldDisplayStringRequired | NmUseFrameNumberParameter, &ParsedFrame, &InsRawFrame); 
    if(ret == ERROR_SUCCESS)
    {
        // Returns the highest level protocol description just to show which
        // frame we are working on.
        PBYTE buf = GetDescription(frameParser);

        // Get the fragment information which helps understand what is happening,
        // but not needed for reassembly to work.
        NM_FRAGMENTATION_INFO FragInfo;
        GetFragType(ParsedFrame, &FragInfo);

        wprintf(L"%5d-%d: %5d      %-5.5s-%d    %-.45s\n", curFrame+1, reassembleCount, curFrame+(*reassembleFrames)+1, FragInfo.FragmentedProtocolName, FragInfo.FragmentType, buf);

        free(buf);

        if(InsRawFrame != INVALID_HANDLE_VALUE)
        {
            (*reassembleFrames)++;
            MyParseFrame(frameParser, InsRawFrame, curFrame, reassembleFrames, reassembleCount+1);

            NmCloseHandle(InsRawFrame);
        }
    }

    NmCloseHandle(ParsedFrame);
    NmCloseHandle(InsRawFrame);
}

When doing reassembly you must add the Frame Number parameter. It must also be unique, so you have to remember to increment when adding and parsing the reassembled frames. The GetFragType uses NmGetFrameFragmentInfo API call to determine the fragment type and protocol. You can look at the full example below to see how it works in details, but those ancillary pieces are pretty straight forward.

Looking at an Example

Below is the partial output for an example capture. In my notation, the Frame# contains a number after the dash that shows when multiple iterations occur on a frame. The Reassem# is the frame number that would appear in a reassembled trace in the UI and is what is used to seed each frame with a unique frame number.

Frame# Reassem# FragType Description

5-0: 5 TCP -1 HTTP:Response, HTTP/1.1, Status: Bad gateway,

6-0: 6 TCP -2 TCP:[Continuation to #5]Flags=...A...., SrcPo

7-0: 7 -0 TCP:Flags=...A...., SrcPort=49382, DstPort=HT

8-0: 8 TCP -3 TCP:[Continuation to #5]Flags=...AP..., SrcPo

8-1: 9 -0 HTTP:Response, HTTP/1.1, Status: Bad gateway,

...

In original frames 5-8, you can see a typical TCP fragmentation. Frame 5 is a TCP First fragment. Frame 6 is a middle fragment and frame 7 is traveling in the opposite direction so it's not part of this reassembly stream. Frame 8 is the last frame in the reassembled TCP payload which is marked as the Last fragment. This is where the Inserted Raw Frame is valid and the recursive call to parse the frame would occur. Frame 8-1, is the parsed inserted frame which you can see matches the description of frame #5, but if you looked at it, there would be two differences.

First, since it's an inserted frame it will have a PayloadHeader structure as its top protocol. This is a protocol we manufactured to take the place of the carrying protocol, in this case TCP. Having a duplicate TCP frame would confuse our parsers and perhaps the user as well. So this header takes it place and calls HTTP directly.

Second, this frame will have a larger payload. It will consist of all the payload data from frame 5, 6, and 8.

Two Level Reassembly

In this next example, both TCP and HTTP has fragmented data.

...

33-0: 36 TCP -1 HTTP:Response, HTTP/1.1, Status: Ok, URL: htt

34-0: 37 TCP -2 TCP:[Continuation to #36]Flags=...A...., SrcP

35-0: 38 -0 TCP:Flags=...A...., SrcPort=49384, DstPort=HT

36-0: 39 TCP -3 TCP:[Continuation to #36]Flags=...AP..., SrcP

36-1: 40 HTTP -1 HTTP:Response, HTTP/1.1, Status: Ok, URL: htt

37-0: 41 TCP -1 HTTP:HTTP Payload, URL: http://www.google.com

38-0: 42 -0 TCP:Flags=...A...., SrcPort=49384, DstPort=HT

39-0: 43 TCP -2 TCP:[Continuation to #41]Flags=...A...., SrcP

40-0: 44 TCP -2 TCP:[Continuation to #41]Flags=...A...., SrcP

41-0: 45 -0 TCP:Flags=...A...., SrcPort=49384, DstPort=HT

42-0: 46 TCP -3 TCP:[Continuation to #41]Flags=...AP..., SrcP

42-1: 47 HTTP -3 HTTP:HTTP Payload, URL: http://www.google.com

42-2: 48 -0 HTTP:Response, HTTP/1.1, Status: Ok, URL: htt

...

Frames 33-36 make up the first HTTP fragment. As you can see, the inserted frame at 36-1 is a First fragment, but the protocol is now HTTP. Frames 37-42 make up the next HTTP fragment which is inserted at frame 42-1. This inserted frame is the HTTP Last fragment so now there is yet another inserted raw frame that we must iterate through and parse. Frame 42-2 is the final reassembled frame and contains the original HTTP Response in its entirety. The description matches frame 33 because it the data starts with payload in that frame but it also includes the payloads from frames 34, 36, 37, 39, 40, and 42. However, from the engines point of view, it really collects the payloads from frame 36-1 and 42-1. But each of these is made up from the fragmented frames mentioned above.

The Whole Shebang

Below I've placed the entire source code for the example described in this blog. While it depends on which protocols you are interested in, having access to the reassembled data can provide you with the big picture especially when focusing on application layer traffic.

#include "stdafx.h"
#include "windows.h"
#include "stdio.h"
#include "stdlib.h"
#include "objbase.h"
#include "ntddndis.h"
#include "NMApi.h"

HANDLE g_NplParser = INVALID_HANDLE_VALUE;
HANDLE g_FrameParserConfig = INVALID_HANDLE_VALUE;

ULONG g_DescPropID = 0;    // Global Description Property ID.

// Callback for parser building messages
void __stdcall
MyParserBuild(PVOID Context, ULONG StatusCode, LPCWSTR lpDescription, ULONG ErrorType)
{
    wprintf(L"%s\n", lpDescription);
}

// Returns a frame parser with a filter and one data field.
// INVALID_HANDLE_VALUE indicates failure.
HANDLE
MyLoadNPL(void)
{
    HANDLE myFrameParser = INVALID_HANDLE_VALUE;
    ULONG ret;

    // Use NULL to load default NPL set.
    ret = NmLoadNplParser(NULL, NmAppendRegisteredNplSets, MyParserBuild, 0, &g_NplParser);

    if(ret == ERROR_SUCCESS){
        ret = NmCreateFrameParserConfiguration(g_NplParser, MyParserBuild, 0, &g_FrameParserConfig);

        if(ret == ERROR_SUCCESS)
        {
            // Order is important here, must turn on Conversations before Reasembly.
            ret = NmConfigConversation(g_FrameParserConfig, NmConversationOptionNone , TRUE);
            if(ret != ERROR_SUCCESS)
            {
                wprintf(L"Failed to config reassembly, error 0x%X\n", ret);
            }

            ret = NmConfigReassembly(g_FrameParserConfig, NmReassemblyOptionNone , TRUE);
            if(ret != ERROR_SUCCESS)
            {
                wprintf(L"Failed to config reassembly, error 0x%X\n", ret);
            }

            // Property so we can show the highest protocol description.
            ret = NmAddProperty(g_FrameParserConfig, L"property.Description", &g_DescPropID);
            if(ret != ERROR_SUCCESS)
            {
                wprintf(L"Failed to add field, error 0x%X\n", ret);
            }

            ret = NmCreateFrameParser(g_FrameParserConfig, &myFrameParser, NmParserOptimizeNone);

            if(ret != ERROR_SUCCESS)
            {
                wprintf(L"Failed to create frame parser, error 0x%X\n", ret);
                NmCloseHandle(g_FrameParserConfig);
                NmCloseHandle(g_NplParser);
                return INVALID_HANDLE_VALUE;
            }
        }
        else
        {
            wprintf(L"Unable to load parser config, error 0x%X\n", ret);
            NmCloseHandle(g_NplParser);
            return INVALID_HANDLE_VALUE;
        }

    }
    else
    {
        wprintf(L"Unable to load NPL\n");
        return INVALID_HANDLE_VALUE;
    }

    return(myFrameParser);
}

void
UnLoadNPL(void)
{
    NmCloseHandle(g_NplParser);
    NmCloseHandle(g_FrameParserConfig);
}

ULONG
GetFragType(HANDLE parsedFrame, NM_FRAGMENTATION_INFO *FragInfo)
{
    ULONG ret;

    FragInfo->Size = sizeof(FragInfo);
    ret = NmGetFrameFragmentInfo(parsedFrame, FragInfo);

    return ret;
}

PBYTE
GetDescription(HANDLE frameParser)
{
    ULONG ret;
    NM_PROPERTY_INFO PropInfo;

    // Find out the size of the description property so we can allocate a buffer.
    // MUST intialize the size and name pointer or NmGetPropertyInfo will fail.
    PropInfo.Size = sizeof(PropInfo);
    PropInfo.Name = NULL;
    ret = NmGetPropertyInfo(frameParser, g_DescPropID, &PropInfo);
    if(ret != ERROR_SUCCESS)
    {
        wprintf(L"Error calling NmGetPropertyInfo, %d\n", ret);
        return NULL;
    }

    ULONG retlen = 0;
    NmPropertyValueType propType;
    // Add size of WCHAR for null terminator
    PBYTE buf = (PBYTE)malloc(PropInfo.ValueSize + sizeof(WCHAR));
    ret = NmGetPropertyValueById(frameParser, g_DescPropID, PropInfo.ValueSize, buf, &retlen, &propType);
    if(ret != ERROR_SUCCESS)
    {
        wprintf(L"Error calling NmGetPropertyValueById, %d\n", ret);
        return NULL;
    }

    return buf;
}

// Recursive Parsing routine.  If an inserted frame is found, the recusive routine is called again.  This
// allows us to close our handles in the order there were created.
void
MyParseFrame(HANDLE frameParser, HANDLE rawFrame, ULONG curFrame, PULONG reassembleFrames, int reassembleCount)
{
    ULONG ret;
    HANDLE ParsedFrame = INVALID_HANDLE_VALUE;
    HANDLE InsRawFrame = INVALID_HANDLE_VALUE;

    // NmUseFrameNumber and valid unique frame numbers are neccessary for Reassembly to work properly.
    ret = NmParseFrame(frameParser, rawFrame, curFrame + *reassembleFrames, NmFieldDisplayStringRequired | NmUseFrameNumberParameter, &ParsedFrame, &InsRawFrame); 
    if(ret == ERROR_SUCCESS)
    {
        // Returns the highest level protocol description just to show which
        // frame we are working on.
        PBYTE buf = GetDescription(frameParser);

        // Get the fragment information which helps understand what is happening,
        // but not needed for reassembly to work.
        NM_FRAGMENTATION_INFO FragInfo;
        GetFragType(ParsedFrame, &FragInfo);

        wprintf(L"%5d-%d: %5d      %-5.5s-%d    %-.45s\n", curFrame+1, reassembleCount, curFrame+(*reassembleFrames)+1, FragInfo.FragmentedProtocolName, FragInfo.FragmentType, buf);

        free(buf);

        if(InsRawFrame != INVALID_HANDLE_VALUE)
        {
            (*reassembleFrames)++;
            MyParseFrame(frameParser, InsRawFrame, curFrame, reassembleFrames, reassembleCount+1);

            NmCloseHandle(InsRawFrame);
        }
    }

    NmCloseHandle(ParsedFrame);
    NmCloseHandle(InsRawFrame);
}

int __cdecl wmain(int argc, WCHAR* argv[])
{
    ULONG ret = ERROR_SUCCESS;
    // The first paramryrt should be a file.
    if(argc <= 1){
        wprintf(L"Expect a file name as the only command line parameter\n");
        return -1;
    }

    // Open the specified capture file.
    HANDLE myCaptureFile = INVALID_HANDLE_VALUE;
    if(ERROR_SUCCESS == NmOpenCaptureFile(argv[1], &myCaptureFile))
    {
        // Initialize the parser engine and return a frame parser.
        HANDLE myFrameParser = MyLoadNPL();
        if(myFrameParser != INVALID_HANDLE_VALUE)
        {
            ULONG myFrameCount = 0;
            ret = NmGetFrameCount(myCaptureFile, &myFrameCount); 
            if(ret == ERROR_SUCCESS)
            {
                ULONG totReassembledFrames = 0;
                HANDLE myRawFrame = INVALID_HANDLE_VALUE;

                wprintf(L"Frame#   Reassem#  FragType    Description\n");
                for(ULONG i = 0; i < myFrameCount; i++)
                {
                    HANDLE myParsedFrame = INVALID_HANDLE_VALUE;
                    ret = NmGetFrame(myCaptureFile, i, &myRawFrame); 
                    if(ret == ERROR_SUCCESS)
                    {
                        MyParseFrame(myFrameParser, myRawFrame, i, &totReassembledFrames, 0);

                        NmCloseHandle(myRawFrame);
                    }
                    else
                    {
                        // Print an error, but continue to loop.
                        wprintf(L"Errors getting raw frame %d\n", i+1);
                    }
                }
            }

            NmCloseHandle(myFrameParser);
        }
        else
        {
            wprintf(L"Errors creating frame parser\n");
        }

        NmCloseHandle(myCaptureFile);
    }
    else
    {
        wprintf(L"Errors openning capture file\n");
    }

    // Release global handles.
    UnLoadNPL();

    return 0;
}

↧

Adapters Are Missing After Upgrading to Windows 7

October 23, 2009, 2:35 pm

≫ Next: When You Can't Save Frames From the UI

≪ Previous: Reassembling Packets with the Network Monitor API

If you have just upgraded to Windows 7, you might notice that you no longer see any adapters listed in your Select Networks selection. There is a very simple way to fix this problem.

First run CMD as administrator. If you have not done this before, you can use the search option in the start menu to find CMD. Then right click it and select "Run as Administrator". Now type "nmconfig /install" and enter. This will re-bind the Network Monitor Driver to the adapters. Next time you run Network Monitor, the adapters should show up again.

For more information, please see this KB article.

↧

When You Can't Save Frames From the UI

November 16, 2009, 11:34 am

≫ Next: No Frames Captured Due to Disk Quota

≪ Previous: Adapters Are Missing After Upgrading to Windows 7

You might have run into an occasion when doing a capture from the UI that you are unable to save your capture. You might receive a message like "Not enough storage is available to process this command". The UI tends eat up a lot of resources as it saves conversation information and builds the conversation tree. This is why we recommend you use NMCap, the command line capture utility included with Network Monitor, if you are going to capture a considerable amount of data. But if you do get into this situation, there might be a way to save the trace using the Frame Buffer Manager.

Frame Buffer Manager to Save the Day

Frame Buffer Manager is a tool in Network Monitor that allows you to select frames in any order and from multiple capture files and add them to a new capture file. You can sometimes use this feature to get around this problem. Just follow these steps:

Go to File, Frame Buffer Manager
Select the New File button
In the file save dialog, type in a capture file name and hit save
Hit OK to exit the Frame Buffer Manager window
Select all frames in frame summary (Ctrl+A)
Go to File, Frame Buffer Manager again
Select the file you created in step 3 and Hit OK
Go to File, Frame Buffer Manager again
Select the file you created in step 3
Select the Close File button to save the file

The file will now be saved. If the capture size is larger than 20megs, then it will be split into multiple files labeled, for instance, out.cap, out(1).cap, out(2).cap and so on. You can stitch these back together using NMCap. Please refer to this blog on stitching chained captures back together.

What You Learned

First, use NMCap if you need to capture for long periods of time. Second, if you do happen to find yourself in this situation with the UI, Frame Buffer Manager can possibly provide a way to save your data.

↧

No Frames Captured Due to Disk Quota

November 23, 2009, 2:34 pm

≫ Next: Capturing a Trace at Boot Up

≪ Previous: When You Can't Save Frames From the UI

In certain instances, you start a capture and no frames are captured. Or perhaps the UI suddenly stops displaying new frames. The display doesn't indicate any dropped frames and you've already verified that your selected adapter is the one that should see the traffic. Mysteriously, this worked in the past or maybe it never at all. What could be wrong?

Disk Quota Comes Into Play

We have a concept of a disk quota with Network Monitor. The idea is to protect you from filling up your disk drive. In some cases, a user might not be prepared for the fire hose of traffic that can flood your disk drive when you capture from your 1 Gig network. By default the quota is set to %2 of your disk space which means with a 100MB disk, we try to leave you 2MB free. For example on a 1Gig network, the amount of data you are capturing could easily be 100 Megs a second or more. So our intent is to protect the user from a low disk situation. This is especially critical on servers where low disk space can cause havoc.

An unintended outcome of the disk quota is that frames in the UI and NMCap won't get captured once this quota is met. Furthermore no appropriate error message is displayed leaving you befuddled. In the UI the conversation tree will state "waiting for network traffic...", but no frames ever appear. For NMCap with a filter you will see the same kind of behavior and the saved frame count never increments. For NMCap and no filter, the symptom is somewhat different. Instead, once you reach the limit, we will continue to process the remaining frames. However, the pending frame count never returns to zero.

Changing the Disk Quota

You can change the disk quota. In some circumstances, 2% can represent a large amount of disk space since it a percentage of your total disk size. We allow you to set the quota based on an absolute disk value as well as a percentage. In the UI this can be done by going to the Tools menu, Options, and clicking on the Capture tab.

If you are using NMCap, there is a command line option for either choice: /MinDiskQuotaPercentage and /MinDiskQuota. The default here is also 2%.

Wrap Up

So, if you are taking a capture and find the display is not updating any new frames, in addition to making sure you have the correct adapter selected, check and make sure you have more than 2% of your disk space show as free. If not, adjust the disk quota setting if this is not appropriate for your disk size.

↧

Capturing a Trace at Boot Up

January 4, 2010, 11:45 am

≫ Next: Annotated Traces for Windows System Behavior

≪ Previous: No Frames Captured Due to Disk Quota

Capturing a trace during a boot is a common task that can be difficult to accomplish. In fact the most fool proof way to capture all traffic at boot is to capture the traffic from a 3rd party capturing machine in promiscuous mode. But this requires you to mirror or span a port on your switch, or insert a simple hub into your network so that you can see the traffic from the booting machine. For Windows 7 and Windows 2008 R2, you might be better off using the Netsh /Capture=yes option (see Windows and Network Monitor Event Tracing). But there is another possibility using NMCap as a service which I will unveil to you now.

SRVANY and INSTSRV

These two old resource kit utilities can be used to start any application as a service. And while they were designed for XP and Win2003, I successfully installed and ran my tests on Vista as well. Keep in mind that there isn't much support available for these tools, and your millage might vary. The Windows 2003 resource kit which contains these tools is available here.

Generic instructions are available in this KB article which is what I used as a template. So you can reference it for more details.

The first step is to create a batch file that starts NMCap. I stored my batch file in c:\bootcap and configured NMCap to store the captures file in 5 Mbyte chunks in this same location. This way I can access each new capture as it is created. If you don't use chained captures, accessing them becomes tricky as you might not want, or be able, to stop NMCap when running as a service. I will talk to that a bit more in the next section.

My batch file consists of this one line:

"c:\Program Files\Microsoft Network Monitor 3\nmcap" /network * /capture /file c:\bootcap\bootcap.chn:5M > "c:\bootcap\out.txt"

Feel free to test and see that it works properly by running at a command prompt before moving to the next steps.

Now that I have a working NMCap batch file, we follow the instructions in the referenced KB and set up this batch file as a service. I followed these steps.

1. At an elevated command prompt type the following, where path points to the location of the resource kit tools:

path\INSTSRV.EXE NmCapBoot path\SRVANY.EXE

2. Edit the Registry to add the application path. I'll echo the warning in the KB about messing with the Registry. If you don't know what you are doing, be careful and do a backup if you are fond of this machine.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NmCapBoot

3. From the Edit menu, click “Add Key”. Type the following and click OK:

Key Name: Parameters
Class : <leave blank>

4. Select the Parameters key.

5. From the Edit menu, click “Add Value”. Type the following and click OK:

Value Name: Application
Data Type : REG_SZ
String : c:\bootcap\c.cmd

6. Close Registry Editor.

You now have a service that will start automatically on reboot. You can also manually start it right now by typing "Net Start NmCapBoot" at the command prompt. This is a good step to prove everything is working. Check the output of c:\bootcap\out.txt to make sure there aren't any problems. Also verify that a capture file gets created. You will see the bootcap.cap appear, but it is not accessible until you fill the first 5 Mbytes and the next chained capture file is created. I find viewing a video on the web is one easy way to achieve this. Once you see bootcap(1).cap get created, you should be able to access the first file, bootcap.cap.

Properly Stopping your Capture

One problem with this method is that shutting the service down with "Net Stop NmCapBoot" doesn't properly close NMCap. No capture file will be created for the last chuck of trace data in the buffer that hasn't already been written to a capture file. In fact, stopping the service with "Net Stop" will leave NMCap running. So you'll have to use Task Manager, select "show processes from all users", and stop NMCap manually.

If there's no easy way to kill the capture, then you might need to trigger the capture to stop. However using a filter requires you load the parsers, which won't work as our installed parsers can’t be compiled and loaded to the non-user account running the service. Also filtering is expensive, as we have to parse the frame, so frames might be dropped on a speedy network and/or low disk space.

One simple solution to both problems is to create some simple parser code with NPL and provide an Offset, Length, Pattern type of match instead. This parsing is much quicker and the parser code to support this is trivial. So create a new sparser.npl text file and place this in c:\bootcap which will be accessed by using the /SetNPLPath parameter with NMCap.


UnsignedNumber blob(n)
{
    Size = n;
}

Protocol Frame
{
}

This lets you type a filter like "blob(FrameData, 30, 4)==0x01020304". This particular offset on my network is the IPv4 Destination address for ICMP. Now you can stop NMCap by running "ping 1.2.3.4", though keep in mind you still need to stop the service with the Net Stop command. Mind you could come up with other patterns to stop the trace if the ping doesn't work for you.

Your new batch file, c.cmd, looks like this:

"c:\Program Files\Microsoft Network Monitor 3\nmcap" /SetNPLPath c:\bootcap
"c:\Program Files\Microsoft Network Monitor 3\nmcap" /network * /capture /file c:\bootcap\bootcap.chn:5M /stopwhen /frame blob(FrameData, 30, 4)==0x01020304> "c:\bootcap\out.txt"

This actually calls NMCap twice. The difference is that the first time we set a new NPL path to our sparser.npl which we put in the root of the c:\bootcap directory. This tells NMCap to rebuild and save this as the parser set to use.

Consequently, this changes the parser path for all instances of the UI and NMCap, so you'll have to revert this change if you need to use the normal parsers on this machine. In the UI, you can do this by going to tools, options, from the menu and opening the parsers tab. Then select the Restore Defaults button and follow that by pressing the Save and Reload Parsers button to rebuild the restored default parser set.

Using Alternate Stop Patterns

In the example above, we use the blob type to specify the frame data as the first parameter and then the offset and size. So you can provide a different offset and size if you need to use a different type of frame to stop your trace. The easiest way to do this is to look at an existing trace and use the Offset and Selected Bytes displayed in the Hex details. Then you can create a display filter using the notation above to test and make sure you are triggering the right frame.

Here are a few more examples with offsets. Keep in mind these offsets are specific to my network which is IPv4 on Ethernet for these examples.

Pattern Description	Blob Filter	Command to Stop
ICMPv4 Length - Use the length of IPv4 and the fact that IPv4.NextProtocol is ICMP	"blob(framedata, 16, 2)==0x97 AND blob(framedata, 23, 1)==1"	ping /l 123 /n 1 1.2.3.4
ICMPv4 Data - Search for the "abcd" pattern	blob(framedata, 42, 4)==0x61626364	ping /n 1 1.2.3.4
DNS Name Pattern Match - look for the name "stopme" at a particular offset	blob(framedata, 55, 6)==0x73746F706D65	Nslookup stopme

Caveats and Pitfalls

Captures Can Get Overwritten - Since we are using the same NMCap command line, restarting the service or a reboot will cause capture files to get over written.

Service Start Dependencies - What if the service that sends traffic starts before the NMCapBoot service? In some cases you must set the dependencies of other services to wait for NMCap to start running. Another consideration is that NMCap might also depend on some services, like the capture driver. If you are not capturing the information you want, you may have to play around with the dependencies for the services installed on your machine and OS.

Can't Apply Capture Filter - As we mentioned above we don't have access to the full parser set in this configuration. You could solve this problem other ways, like copying the parsers somewhere and pointing directly to them. However, this is still a problem of performance which you will have to gauge yourself. A simple test is to run NMCap with your filters and watch the pending count during high traffic. If the pending count continues to grow, then you might not be able to keep up with the traffic.

Unable to Stop with Ping - There may be situations when you can't provide a ping to stop the trace. For instance if you wanted to trace a shutdown of a machine, it might be difficult to get NMCap to stop properly thus losing the frames you want to see.

SrvAny Saves the Day

Srvany and InstSrv allow for a unique way to run NMCap as a service to capture logon/logoff type traffic OR longer term monitoring across logins. Using the steps above should provide you with enough information to solve this difficult capturing scenario.

↧

Annotated Traces for Windows System Behavior

January 8, 2010, 11:01 am

≫ Next: Measuring Response Times

≪ Previous: Capturing a Trace at Boot Up

Microsoft publishes protocol documentation on MSDN that is intended to make it easier for others to develop interoperable implementations. “System Documents” provide overviews of system behavior for key systems such as Active Directory, File Sharing and Windows Security. The MSDN documentation for each of the System Documents is available here. We've recently released sets of annotated network captures on the SysDoc CodePlex Site which cover a subset of scenarios for each of the System Documents.

What Kind of Behavior?

For each system component a few choice scenarios were captured and annotated. For example, File Systems have annotated traces for finding a file and configuring a server. Obviously, it would be quite an undertaking to annotate every scenario, but these annotations attempt to cover typical scenarios or a breadth of components.

What's an Annotated Trace?

Starting with Network Monitor 3.3, we can annotate a trace with comments. For more info about trace commenting please reference our blog called Frame Commenting is Here. Frame annotation provides a convenient way to describe what is happening at specific frames in a trace. Each commented frame has a # symbol next to the frame number. Clicking on a frame with comments populates the Frame Comments window in the UI. There are also ways to go to the next comment, search for a comment, and add a comment title column to the Frame Summary window.

Learning by Example

Besides helping you to understand a specific scenario, these annotated traces can be used to get a feel for how you might dissect a trace with your own scenarios. Getting oriented in a trace for an unfamiliar protocol is one of the first steps. With these annotated traces, you have some well documented examples to get your started. We hope you find them useful.

↧

Measuring Response Times

February 24, 2010, 9:38 am

≫ Next: Expert to Decrypt TLS/SSL Traffic

≪ Previous: Annotated Traces for Windows System Behavior

It's often useful to understand how long it takes for a request to get responded to. This helps you gauge how well a client or server is keeping up. This type of measurement can also be done at different layers; however there are some tricks you'll have to learn.

FrameVariable.TimeDelta

In order to filter on the difference in time, you can use FrameVariable.TimeDelta property. This value represents the time from the last physical frame in the trace. One side effect of this is that you can't filter the time delta that results between two filtered frames or two frames in a specific conversation. Leading to perhaps more confusion, the time delta column you see is updated based on the filtered information.

The following filter will find any frame with a time delta greater than 1 second.

FrameVariable.TimeDelta > 10000000

First you'll notice that you have to convert the value based on .1 microsecond chunks. In the example above, 1 second = 1000000 microseconds = 10,000,000. Second, if you view the Time Delta column, you might see some inconsistencies. The time we portray here is based on the last visible frame. But the filter works on the last physical frame. So as soon as any other filter is applied, including clicking in the conversation tree, the values you see in the Time Delta column will not match a Time Delta filter you apply on top of it. Finally responses don't always follow requests, so this method doesn't always work.

Response Times for a Specific Protocol

The fact that we can only filter TimeDelta based on the last physical frame reveals a problem if you want to determine response times for a specific protocol. To get around this problem, save a filtered version of your trace so frames you want to filter on are in your saved file. For instance if you want to see SMB response times, find a specific SMB conversation and save that out to a separate file. Then open that new capture file and use the time delta filter to find your longer response times.

Finding Slow Servers

Using the TimeDelta filter to find slow responding servers and services at any protocol layer is a great way to locate performance issues. Just remember to first save a filtered version of your trace based on the protocol and connection, then type in your FrameVariable.TimeDelta filter. Another great option here is using the Network Monitor API to programmatically analyze a trace for response times. A great example of this is vRTA which I reference in this blog, though goes beyond just response times.

↧

Expert to Decrypt TLS/SSL Traffic

March 8, 2010, 1:49 pm

≫ Next: Network Monitor Parsers and the CodePlex Foundation

≪ Previous: Measuring Response Times

One of the most popular requests we've had is to provide a way to view encrypted traffic. The new Decryption expert aims to solve this problem for TLS/SSL traffic.

Using the Decryption Expert

The purpose of encrypting data in the first place is to hide private information from a third party who has intercepted your network traffic. At first the ability to decrypt this traffic might seem like a violation of this tenant. However, in order to decrypt the traffic you will need to acquire the certificate which contains the private server key. So you can't use this to decrypt just any traffic; you'll need the private key.

After downloading and installing the expert form CodePlex, you will see an option "NmDecrypt" from the expert menu next time you open a saved trace. Next, narrow down the traffic to the TCP conversation you want to decrypt. You can do this with a filter on the TCP port or by choosing the conversation in the tree. If you have already found an encrypted frame, you can use the Find Conversation feature to locate the conversation for you.

Now, run the expert form the main menu or right click the frame. Once you open the Expert you will be presented with a dialog so that you can enter the certificate, password, target output capture file, and optionally a log file. The capture file source will automatically be filled in for you.

Once you are done entering the information hit Start and the expert will attempt to decrypt the selected conversation. If an error is reported, you can provide a log file name to get more detailed information to which can help understand why you the decryption failed.

Viewing the Resulting Trace

When NmDecrypt completes, the resulting trace is automatically opened. One advantage of creating a new capture file is that you can send it to another user. This means the owner of the private key can decrypt the file without having to exchange the key.

The resulting trace will contain all of the original information plus new frames with a protocol header called DecryptedPayloadHeader. Thus you can find all inserted packets by applying this protocol as a filter. Of course you can also create a color filter as well if you want to easily identify them among the encrypted and inserted defragmented frames.

The Decryption expert will also insert fragmented frames, which can for the most part be ignored. These frames are created in the first pass for the expert and provide some level of transparency if you need to troubleshoot this transformation.

Finally, there may be some cases where multiple SSL messages are combined in one frame. In these cases the expert won't split them into multiple frames. While this might be possible to do, we'll leave it as an exercise for the open source community.

NmDecrypt Documentation

The documentation contains more information about using the expert, such as the encryption algorithms that are supported and typical errors you might encounter. You can access the documentation through the expert menu. We also describe how to extract the certificate for Windows machines in the appendix.

NmDecrypt is Open Source

The best part of all of this is that we've released the expert and all the source code on CodePlex. We encourage you to extend and improve this expert. In fact there are known deficiencies, (some might call them bugs :) ), that you could help to resolve. These have been listed on the issues tab in the CodePlex project. Plus there's no reason this same technique could not be extended for other encryption schemes. More info on developing your own experts is available at on our CodePlex Expert Site and feel free to view our new expert integration video on channel 9. Please download and give the expert a try and enjoy!

↧

Network Monitor Parsers and the CodePlex Foundation

March 22, 2010, 7:00 am

≫ Next: Office Parsers Available

≪ Previous: Expert to Decrypt TLS/SSL Traffic

The Network Monitor Parser project is now part of the Systems Infrastructure & Integration Gallery of the CodePlex Foundation. The CodePlex Foundation will now be responsible for further development and is using the new BSD license, which is OSI approved.

For a user perspective, you can still expect frequent updates of the parsers, which are still available on our NMParsers CodePlex site. And the Network Monitor team will still include the latest version of the parsers when we ship the next version. However, our team no longer owns the parser code.

This is a milestone for the project, since the now the community can direct the development through an independent Open Source foundation, with Microsoft a participating community member. We look forward to enlarging the parser usage community.

↧

Office Parsers Available

April 5, 2010, 6:52 am

≫ Next: Network Monitor 3.4 Beta Released on Connect!

≪ Previous: Network Monitor Parsers and the CodePlex Foundation

A new set of parsers for decoding office protocols is now available on the download center. These parsers represent the protocols described by the documents in the MSDN Open Specifications for Office. Simply download and run the parser package for your platform. Next time you run Network Monitor the Office parser set will automatically be available. Keep in mind that the office parser package depends on the latest NetworkMonitor_Parser.msi which is available on the CodePlex site.

For troubleshooting Office related network problems, these parsers can help provide valuable troubleshooting information for Interoperability and IT scenarios. And remember, if you need to disable the parser set in Network Monitor 3.3, you can change the Office parsers set to "Stub" in the Option, Parsers dialog. Enjoy!

↧

How ETW Works

Subscribing to a Provider

Capturing an ETW Trace with Logman

Windows 7 Scenario based ETW Tracing

Netsh to Collect an ETW Trace.

Netsh and Scenarios

Icing is "capture=yes"

Netsh Reference

Conversations and Built-in Standard Filters

Where to Go Next?

How to Analyze Traffic

Finding the TCP Conversation

Decoding the Graph

The Axis

Legend Details

Understanding Bandwidth-Delay Product

Pictures of Wrong Behavior

Bandwidth Limited:

Receiver Limited:

Sender Limited:

Congested Limited:

Power of the Picture

Collecting Evidence

Filtering on the External Name

192.168.2.2

myhomesrv.homeserver.com

TCP:Flags=......S., SrcPort=60824, DstPort=HTTP(80), PayloadLen=0, Seq=2533385604, Ack=0, Win=8192 ( Negotiating scale factor 0x2 ) = 8192

myhomesrv.homeserver.com

192.168.2.2

TCP:Flags=...A..S., SrcPort=HTTP(80), DstPort=60824, PayloadLen=0, Seq=113434048, Ack=2533385605, Win=5840 ( Negotiated scale factor 0x0 ) = 5840

192.168.2.2

myhomesrv.homeserver.com

TCP:Flags=...A...., SrcPort=60824, DstPort=HTTP(80), PayloadLen=0, Seq=2533385605, Ack=113434049, Win=16425 (scale factor 0x2) = 65700

192.168.2.2

myhomesrv.homeserver.com

HTTP:Request, GET /

mhomesrv.homeserver.com

192.168.2.2

TCP:Flags=...A...., SrcPort=HTTP(80), DstPort=60824, PayloadLen=0, Seq=113434049, Ack=2533386251, Win=7106 (scale factor 0x0) = 7106

mhomesrv.homeserver.com

192.168.2.2

HTTP:Response, HTTP/1.0, Status Code = 200, URL: /

Doing Some Homework

Buy a New Router?

Capturing Chained Files with NMCap

Combining Captures with NMCap

Using a Script to Combine Many Capture Files

Zooming In

Zooming Out

Kerberos Ticket Expired

Getting to the Bottom of Things

What is an OpLock

Example OpLocks in a Trace

Setting up the Trace in Network Monitor

Traffic Analysis

Troubleshooting Performance and OpLocks

Why Not add a TCP.Payload Field?

Why Not use Property.TCPPayload?

The Solution

Party on Your Payload

Configuring the Parser

Parsing the Frames

Looking at an Example

Two Level Reassembly

The Whole Shebang

Frame Buffer Manager to Save the Day

What You Learned

Disk Quota Comes Into Play

Changing the Disk Quota

Wrap Up

SRVANY and INSTSRV

Properly Stopping your Capture

Using Alternate Stop Patterns

Caveats and Pitfalls

SrvAny Saves the Day

What Kind of Behavior?

What's an Annotated Trace?

Learning by Example

FrameVariable.TimeDelta

Response Times for a Specific Protocol