1.3. Three different optimization methods

WAN optimization attacks the issues mentioned previously in three different ways:

1.3.1. Transport layers optimization

These days, the most popular transport layers are Ethernet, IP and TCP.

Figure 1.2. A typical Ethernet frame

.---------------------------------------------------------------------.
| Ethernet | IP | TCP | Payload                                       |
'---------------------------------------------------------------------'
                      <------------------------------ 1460 bytes ----->
                <------------------------------------ 1480 bytes ----->
           <----------------------------------------- 1500 bytes ----->

  • Ethernet is the local network layer to the next hop in the network. The default maximum payload for an Ethernet frame size is 1500 bytes.

  • IP is the layer which takes care of the routing of the packet in the network.

  • TCP is the layer which takes care of the data exchanged between the client and server and which makes sure that the data is fed to the receiving application in the same order as the sending application has delivered it. It takes care of the retransmission of lost packets and time-outs, if there is a problem it backs off a little bit and allows the network to recover.

    Standard TCP has the following features and limitations:

    • The TCP Sliding Window method, which makes it possible for the sender to keep sending data until the limit for maximum number of bytes in flight as specified by the receiver is reached. In the original TCP design this was limited to 64 kilobytes{SOURCE:RFC 793}.

      Figure 1.3. Bytes in-flight over time.

      B I |        +--+--+--+--+--+
      y n |        |  |  |  |  |  |
      t f |        |  |  |  |  |  |
      e l |     +--+  |  |  |  |  |
      s i |     |  |  |  |  |  |  |
        g |  +--|  |  |  |  |  |  |
        h |  |  |  |  |  |  |  |  |
        t +-------------------------- ---> time
                     ^
                      \__ First acknowledgement received
      

    • The original TCP Acknowledge method specifies that the receiver can only acknowledge packets it has received. It cannot tell the sender that it has missed packets but has to wait for a timeout in the acknowledgement from the server. Then the server has to retransmit all the unacknowledged packets.

      Figure 1.4. Retransmission due to a time out in the acknowledgement

          |
          |  +--+--+--+  +--+--+         +--+--+--+
          |  |  |  |  |  |  |  |         |  |  |  |
          |  |N |N |N |  |N |N |         |N |N |N |
          |  |- |- |- |  |+ |+ |         |  |+ |+ |
          |  |3 |2 |1 |  |1 |2 |         |  |1 |2 |
          |  |  |  |  |  |  |  |         |  |  |  |
          |  |  |  |  |  |  |  |         |  |  |  |
          +----------------------------------------- ---> time
                        ^      ^         ^__ ACK timeout
                         \      \___________ TCP Window full
                          \_________________ Lost packet
      

    • The TCP Window Size uses a Slow Start mechanism which starts with a Window Size of 2 * Segment Size{SOURCE:RFC 793} and increases it with one Segment per received acknowledgment packet.

      In case of packet loss it will half the TCP Window Size and slowly increase the Window Size again per received acknowledgment packet.

      Figure 1.5. Retransmission due to time out

          Window Size
          |
          |                       +--+--+                       +--+
          |                    +--+  |  |                    +--+  |
          |                 +--+  |  |  |                 +--+  |  |
          |              +--+  |  |  |  |              +--+  |  |  |
          |           +--+  |  |  |  |  |     +--+--+--+  |  |  |  |
          |  +--+--+--+  |  |  |  |  |  |     |  |  |  |  |  |  |  |
          |  |  |  |  |  |  |  |  |  |  |     |  |  |  |  |  |  |  |
          +----------------------------------------------------------- ---> time
                       ^          ^     ^    ^        ^____________ First ACK
                        \          \     \    \____________________ Retransmission
                         \          \     \________________________ Lost packet
                          \          \_____________________________ TCP Window maximum size
                           \_______________________________________ First ACK
      

Transport layer optimization can add the following features for the traffic between two Steelhead appliances:

  • For Ethernet there is not much which can be improved here, unless there is full control of all the WAN devices between the clients and servers, in which case Jumbo frames can be used.

    The payload of Ethernet Jumbo frames can be increased up to 9000 bytes {SOURCE: Wikipedia Jumbo Frame}. This means the IP payload, instead of having to split a stream in pieces of 1480 bytes, can be split in pieces of 8980 bytes. For the routers in the path this means that they have to process only one sixth of the number of packets. However on the TCP level the improvement is not comparable since the maximum number of outstanding bytes is still limited to the TCP Window size.

  • On the IP level there is no optimization possible.

  • On the TCP level there are several enhancements to improve the traffic flow:

    • In newer TCP stacks a feature called TCP Window Scaling has been introduced which increases the 64 kilobyte window size limit by multiplying it with a power of two value {SOURCE:RFC 1323}. This means that the number of bytes in-flight can be much larger.

    • TCP Fast Retransmission {SOURCE:RFC 2581} is a method for the receiver to signal, by sending three empty duplicate ACKs, that a certain packet has not been received but that later packets have been received. This will reduce the time for the start of the retransmission.

    • TCP Selective ACK {SOURCE:RFC 2018} is a method to inform the sender that further packets have already been received, implicitly telling which packets have been lost by giving the already received TCP sequence numbers. This will reduce the time for the start of the retransmission and the amount of packets retransmitted.

    • On the sender side, the use of TCP Timestamps {SOURCE:RFC 2018} can be used to determine the Round Trip Time (RTT) towards the receiver and thus the expected time that an acknowledgment for a packet should arrive. This will give the sender an indication of a lost packet before the official timer expires.

    • If it is known that the quality of the network is not optimal due to packet loss, or a high latency, then the TCP Window might never be at the maximum size and the performance will be sub-optimal. The sender can start with a large TCP Window and decrease the TCP Window by only one Segment Size instead of a halving it to improve the performance over such kind of networks.

    The use of TCP Selective ACK and TCP Timestamps is negotiated during the setup of the TCP session.

1.3.2. Data reduction

Once the transport layer is optimized, the next step is to reduce the payload sent over it. There are several methods to reduce the size of the data: Compression and dictionaries.

  • Compression is the process of encoding data to use less bytes than the original data took.

    As an example, the string AAAAAA could be compressed to 6"A"(NUL), reducing it from six bytes to three. The string ABABAB could be compressed to 3"AB"(NUL), reducing it from six bytes to four bytes. And the string ABCDEF can only be compressed to 1"ABCDEF"(NUL), increasing it from six bytes to eight bytes.

    Certain data can be compressed very well, like HTML data and source code files. Other data cannot be compressed because it is too unique or compressed already.

  • Dictionaries are lists of patterns that known to both Steelhead appliances, so that instead of having to send a long pattern it will send a reference to that data. For example if the sender receives the pattern AAAAABBBBB and both sides have a reference for AAAAA and BBBBB in their dictionary, the sender can send the labels of these references, the receiver will look them up in the dictionary and forwards the referenced data on the wire.

    These dictionaries are learnt over time by looking at the traffic going through the Steelhead appliances. Patterns which are used often will end up in the front of the dictionary. When the dictionary is full, patterns which were used only once or haven't been used for a long time get removed from the dictionary and space to store new learned patterns is created.

    When a data stream contains patterns which are never seen before, the quality of the sending of the data is considered a cold transfer. When a data stream contains patterns which have been seen before, the quality of the sending of data is considered a warm transfer.

    Unlike compression which doesn't like compressed data, dictionaries do not mind that the data is already compressed. However, this only works if the data is an object retrieved from a remote server, it doesn't work if the data-stream is compressed interactive traffic.

    Storing patterns from encrypted data streams in the dictionary is also a bad thing, because by definition this encrypted data is unique. For example, the first time the string AAAAA is encrypted it shows up as ABCDE, the second time it is encrypted as AEDBC, the time after that it is encrypted as BEBAC". Instead of learning a repetitive pattern, the dictionary will be polluted with patterns which never will be seen again.

  • Encryption integration

    As mentioned previously, patterns of encrypted data streams should not be stored in the dictionary. In trusted environments, WAN optimizers might be able to proxy and perform decryption and re-encryption on behalf of the sender. This way they are able to decrypt the data, optimize it over the WAN and then to encrypt it again to the receiver.

1.3.3. Application latency optimization

Application latency optimization improves the performance of the protocol on top of the transport layer. The gain is when the protocol is predictable or is chatty, meaning that many little requests and responses are exchanged. This is fine for the zero millisecond delay to a server on the local LAN, but very expensive if it has to go over the WAN. To be able to do latency optimization, a deep understanding of the protocol and the client behaviour is required.

Transaction prediction is part of latency optimization. It is a technique to predict what the next request from the client is, based on the past and current requests. For example, on a network file system, if a client opens a file and asks for the first block of a file, there is a huge chance it will ask for the second and the third block of that file too. If however the client opens a file, seeks to a random position, reads 16 bytes, seeks to another position, reads 32 bytes, seeks to another position and reads 24 bytes, that behaviour is not predictable.

Various latency optimization techniques include:

  • Towards network file systems; the basic operations towards remote file systems include the reading and writing of files and scanning through directories.

    • Read-ahead: When a client opens a file and reads the first block, there is a huge chance that it will after that ask for the second block and the third block. The latency optimization sees the request for the first block, but asks the server for the first, second and third block and returns the first block to the client. When the client asks for the second block, the latency optimization already has this block available and the request doesn't have to go over the WAN anymore.

    • Write-behind: When the client creates a new file and writes the first block to it, the local latency optimization can tell the client that the block was written after which the client can prepare and start to write the next block. In the meantime the latency optimization sends the data to the server.

    • Directory caching: When a client asks the first entry in a directory on a server, there is a huge chance it will also ask for the next directory entries. The latency optimization will ask the server for all directory entries, send them to the client side latency optimization which then can answer the consecutive requests from the client without having to send the requests to the remote server.

    • Pre-population: This is a technique in which new files on network file system are transferred over the WAN so that the Steelhead appliances know about the patterns and when the user asks for them they can be warm transferred.

    Data integrity is very important in network file systems, the client should not get the impression that the data has been written to disk while the data is still in transit. There are several commands which shouldn't touch local latency optimization, for example the open and close commands: It is the server which is saying Yes or No on the success of that command.

  • Towards mail servers; the basic operations towards mail-servers are to open a mailbox, retrieve the contents of a message, retrieve attachments for a message, general mailbox maintenance, to move messages between folders and to deliver new messages.

    • Attachment warming: When the mail client polls for new email and retrieves a list of new messages, the latency optimization can check the contents to see if there are any attachments on these messages. If there are attachments, they can be downloaded by latency optimization so that the patterns are known in the dictionary. When the user later reads the message and wants to see the attachment, the patterns are already known in the dictionary and the attachment gets retrieved in "a warm transfer".

    • Attachment write behind: This works just like the write behind feature of the network file systems.

  • Towards web servers; Web servers receive a request and return an object. This object can contain references to other objects, which in turn need to be requested.

    • Caching of objects: If the returned object is a static object which is valid for a long time, for example an audio file or an image, the latency optimization uses the meta data of this object to check how long the object is valid for. If another client asks for the same object and the object is still valid, the latency optimization can serve it to the client without having to get it from the server again.

    • Prefetching of static objects: When the object requested can contain references, the latency optimization can parse the object and retrieve the referenced objects in it before the client asks for them.