7.5. Slow network

This part looks at the network being slow and what the causes can be.

7.5.1. Is the TCP session optimized?

In the list of Current Connections the TCP session should be shown as being optimized and not pass-through.

If it is pass-through, then why is it in pass-through? In the Current Connections reports, either from the GUI under Reports -> Networking -> Current Connections or from the CLI with the command show connections all, is shown if the TCP session is pass-through because of an intentional reason (That is an in-path rule which prevents the TCP session from being optimized) or because of an unintentional reason:

  • SYN on WAN side: The naked SYN packet arrived on the WAN side of the device. This is normal on data-center Steelhead appliances for traffic which will terminate on the hosts in the data-center. If this TCP session is expected to be optimized, then check what is happening with the naked SYN and SYN+ packets during the auto-discovery phase.

  • No Peer: The auto-discovery probe in the SYN+ did not get intercepted by another Steelhead appliance. If this TCP session is expected to be optimized, then check what is happening with the SYN+ packet during the auto-discovery phase.

  • Pre-existing connection: The first packet for this TCP session seen by the optimization service was not the TCP handshake in the beginning of the stream but another packet halfway the stream. This can happen when the optimization service has been restarted. The way out is to either reset the TCP session from the GUI or CLI or to restart the application on the client.

  • Asymmetric Routing: The source and destination IP addresses are in the asymmetric routing table.

A full list of them can be found at KB article S15377 - "What are the Reason Codes for pass-through connections in the Current Connections report?" available from the Riverbed Support website.

7.5.2. Is the Latency Optimization working?

In the list of Current Connections in the GUI the TCP session will show a red triangle if the latency optimization has failed and has reduced the optimization functionality back to bandwidth reduction only.

The following issues can occur:

  • A CIFS session is requested to use SMBv2 and the Steelhead appliance is not able to perform latency optimization on this traffic (Available since RiOS version 6.5). A workaround can be to let the Steelhead appliance negotiate the SMB version back to SMB version 1.

  • A CIFS session uses SMB Signing or a MAPI session encrypted MAPI and the server-side Steelhead appliance isn't configured to support this kind of data.

  • A CIFS session uses SMB Signing or a MAPI session encrypted MAPI and the server-side Steelhead appliance isn't able to impersonate the user against the AD infrastructure. This issue is logged on the server-side Steelhead appliance and can be related to:

    • Inability to determine the Domain Controllers via DNS or to reach a certain Domain Controller.

    • In case the Steelhead appliance is configured to use to one specific Domain Controller, the Domain Controller can have been decommissioned or has out-of-sync data on it.

    • Steelhead appliance object in Active Directory has disappeared.

    • Delegate user in Active Directory has been locked out or has its password changed.

    • Absence of delegate user in Active Directory.

    • Absence of the CIFS server in the Delegate list on the delegate user in Active Directory.

    • Protocol issues between the Steelhead appliance and a Domain Controller (For example, Windows 2008R2 servers where not supported to delegate to until RiOS version 6.1)

    • Time skew between the server-side Steelhead appliance and the Domain Controllers.

  • A signed CIFS session uses a system account to map a drive. Since no user account is used here, the optimization service cannot impersonate anything on this CIFS session. A workaround to reduce the side-effect of blacklisting this server would be to use a special CIFS server for shares mapped by computer accounts so that the blacklisting of this server doesn't affect the optimization of other CIFS shares. This can be overcome by using Kerberos authentication.

  • A MAPI session uses a MAPI dialect which is not supported for latency optimization. Examples are:

    • Blackberry related traffic.

    • Exchange to Exchange traffic.

  • An HTTP session is using an HTTP protocol which uses other methods than GET and PUT. An example of this is the DAV Filesystem which is layered on top of HTTP which is not supported until RiOS 8.5.

7.5.3. Is the slowness on the network?

At this moment, the TCP session is optimized and the latency optimization is working fine. The next step is to determine what goes slow and what goes fast and what the Steelhead appliance complains about.

7.5.3.1. CIFS related issues

The questions which need to be asked are:

  • Is the slowness happening during the reading by an application? This could be a file server issue, like file locking or authentication / authorization.

  • Is the slowness happening during the copying from the file server onto the computer? The expected behaviour here is a fast copy since there is no locking happening.

  • Is the slowness happening during a write operation by an application and also on a re-save without any changes? The first one could be slow, the second one is expected to be fast.

  • Is the slowness happening during a write operation from the computer to the file server? That should be fast, if the file had not changed since you copied it from the file server.

7.5.3.1.1. Response times

For CIFS latency optimization, we expect fast local acknowledgments on the SMB requests like closing files, disconnecting from a share and obtaining a directory listing.

In Wireshark you can see this under Statistics -> Service Response Times -> SMB.

Figure 7.14. Latency optimized SMB requests response times overview

=================================================================
SMB SRT Statistics:
Filter:
Commands                   Calls    Min SRT    Max SRT    Avg SRT
Close                          1   0.000212   0.000212   0.000212
Open AndX                      1   0.134227   0.134227   0.134227
Read AndX                     18   0.354067   0.887230   0.646874
Tree Disconnect                1   0.000169   0.000169   0.000169
Negotiate Protocol             1   0.133985   0.133985   0.133985
Session Setup AndX             2   0.133340   0.136641   0.134991
Tree Connect AndX              2   0.132323   0.134432   0.133378
Query Information Disk         1   0.222203   0.222203   0.222203

Transaction2 Commands      Calls    Min SRT    Max SRT    Avg SRT
FIND_FIRST2                    2   0.287443   0.560205   0.423824
FIND_NEXT2                     4   0.000094   0.009760   0.004776
QUERY_FILE_INFO                1   0.134086   0.134086   0.134086
GET_DFS_REFERRAL               1   0.133184   0.133184   0.133184

NT Transaction Commands    Calls    Min SRT    Max SRT    Avg SRT
=================================================================

If the response times of Close, Tree Disconnect and FIND_NEXT2 are not significantly lower, the CIFS latency optimization might be disabled or not working.

If it is only the response times of the write requests which are not significantly lower, then it might be that the optimization service has detected that the remote disk is near full capacity and has turned off its Write Behind feature for that CIFS session.

Figure 7.15. CIFS Write Behind has been disabled by the optimization service.

CSH sport[14140]: [smbcfe.NOTICE] 679459 {10.0.1.1:3346 192.168.1.1:445} Disk usage reache \
    d threshold (2048 MB free) on share \FOO\BAR, disabling write/setfile pre-acking 

7.5.3.1.2. File formats

When saving a file via an application, the file format it uses is very important for the performance of the transmission of the file.

For example when the file format is uncompressed, a little change in the content will only affect a small of the file.

Take these two files which are nearly the same. Changing a single line in the text one generates a diff of 2 lines. After compression, the files are not only different in size, but as the output of bsdiff (a binary patch generator) shows, the contents of the file are also different enough to generate a patch which is larger than the original file.

Figure 7.16. Comparison of two binary files

[~/size] edwin@t43>ls -al gcc.1 gcc.2
-rw-r--r--  1 edwin  edwin  584775 Oct 21 16:58 gcc.1
-rw-r--r--  1 edwin  edwin  584775 Oct 21 16:58 gcc.2
[~/size] edwin@t43>wc gcc.1
   13118   79291  584775 gcc.1
[~/size] edwin@t43>diff gcc.1 gcc.2
134c134
< XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
---
> gcc \- GNU project C and C++ compiler

[~/size] edwin@t43>zip gcc.1.zip gcc.1; zip gcc.2.zip gcc.2
updating: gcc.1 (deflated 72%)
updating: gcc.2 (deflated 72%)
[~/size] edwin@t43>bsdiff gcc.1.zip gcc.2.zip bsdiff
[~/size] edwin@t43>ls -al gcc.1.zip gcc.2.zip bsdiff 
-rw-r--r--  1 edwin  edwin  163096 Oct 24 07:32 bsdiff
-rw-r--r--  1 edwin  edwin  162215 Oct 24 07:22 gcc.1.zip
-rw-r--r--  1 edwin  edwin  162228 Oct 24 07:22 gcc.2.zip

Because of this huge change, the performance of a save inside the appliance will be like an initial cold transfer.

7.5.3.2. Database connections

Communication towards a database can be done in two ways:

  • The client sends a request and asks for the complete data set. The result will be large blob of data which will have a great data-reduction and speed improvement on it.

  • The client sends a request and asks for the first record, then the next record and the next record. Because of the constant exchange of single packet questions and answers with a small payload, the optimization level will be very small. To make things worse, various optimization features like Neural Framing will become a major delay for each packet.

    The best way to investigate if this stream can be optimized is:

    • Create an auto-discovery in-path rule which targets the traffic towards the IP address of the database server and TCP port. Perform the query and measure how long it takes.

    • Alter the in-path rule to disable Neural Framing. Perform the query and measure how long it takes.

    • Alter the in-path rule to become a pass-through rule. Perform the query and measure how long it takes.

    Depending on which one goes best, use that method.

The best way forward here is to change the client application to perform the query in a large batch method instead of performing a large amount of small queries.