11   End-to-end performance

This project investigates theoretical and practical aspects of end-to-end performance to provide high throughput and other qualitative communication characteristics required by applications communicating over wide-area high-speed networks.

Our results are presented on the project web pages, These include all published papers, technical reports, presented talks, experimental data and developed software. In this chapter, an overview of selected project results from 2003 as well as some interesting technical problems are presented.

11.1   Transferring large data volumes over large-scale high-speed networks

The Internet has been a large-scale network spanning long distances almost since its origin. However, two new characteristics have changed the Internet only recently. First, it has become a truly high-speed network with backbone links operating at 10 Gbps or even higher speeds. Second, researchers in fields such as physics or astronomy have started to transfer large volumes of data, from terabytes to petabytes.

As all these three characteristics (long distances, high speeds and large date volumes) have met, one has found that the communication protocols used so far, particularly the reliable TCP transport protocol carrying over 95 % of Internet traffic, as well as the data processing mechanisms on connected computers, no longer suffice to provide required communication qualities. Their considerable improvement is necessary in order to achieve high throughput and other qualitative characteristics, such as low delay fluctuation, required by current applications. This usually belongs to the end-to-end performance field.

In 2003, several papers on transfer of large data volumes in large-scale networks were presented on both domestic and international conferences. Some interesting technical details are presented here.

11.2   End station configuration

We have found that at speeds approaching some 100-300 Mbps, most performance problems result from suboptimal configuration of end stations. At higher speeds above some 300 Mbps, modifying the characteristics of communication protocols is usually necessary. In this section, we shall mention the most important end station configuration details which influence the throughput achievable.

11.2.1   Socket buffers

Socket buffers on both sides of a connection (sender and receiver) limit the TCP protocol window of outstanding data (which must fit in the smaller of these two buffers) and are therefore a critical factor which influences the achievable throughput. The window size limits the volume of data that can be transferred during one RTT (round trip time) interval. Default size of socket buffers in most operating systems ranges from 16 kB to 64 kB. Such a small window combined with RTT at the order of tens or hundreds of milliseconds, which is common in long-distance communication, limit the throughput to several or several tens of Mbps regardless of the bandwidth available in the network.

Socket buffer sizes can be adjusted either for all new connections opened in an operating system, or individually for each socket opened within an application. Some operating systems, such as Linux, provide a sort of autoconfiguration and window moderation which adjusts the buffer size according to current requirements and available memory. For example, in the Linux operating system, one can use the following command to set default sender and receiver socket buffer sizes to 2 MB for all new connections:

systctl -w net/ipv4/tcp_rmem=4096 2097152 16777216
systctl -w net/ipv4/tcp_wmem=4096 2097152 16777216

An example of socket buffer size influencing the throughput achievable between the CESNET2 (CZ) and UNINETT (NO) networks is shown in Figure. However, there are more details involved. Linux further modifies the requested buffer sizes according to values of some kernel variables, resulting in a TCP window limit different from the specified socket buffer size. Linux also includes several other TCP implementation specifics influencing performance. We described some of them in [UbC03] and [UbC03a]. A more detailed technical report describing more Linux internals, whose understanding is useful for the end-host performance tuning, is being prepared.

[Figure]

Figure 11.1: Relation between the achievable throughput and socket buffer size

[Figure]

Figure 11.2: RTT fluctuations during one TCP connection

Unfortunately, we can also get into trouble by setting the socket buffers too large and allowing the window to grow too much. Big windows can fill up router queues, which together with traffic fluctuations increases probability of a queue overflow and packet loss. As a result, congestion control will react by reducing the data sending rate. We can try to predict this phenomenon by observing the relation between current throughput and window size, or by monitoring the RTT fluctuations. For example, we can see in Figure that the RTT of a monitored connection reached up to several multiples of the basic RTT measured on an unloaded network which was about 40 ms. At some time, a packet was lost and throughput was reduced. Consequently, the RTT stabilised again. However, it turns out that in highly-multiplexed backbone circuits with complex traffic dynamics, determining conclusively that the RTT growth and fluctuations have been really caused by filling up the router queues is very difficult.

An important Linux networking component that requires proper configuration is a network adapter transmission queue (txqueue). Each network adapter has its own txqueue. Packets from all connections transmitting through a network adapter come to its txqueue before they are moved to the adapter and sent to the network. We discussed the txqueue behaviour in more detail in [UbC03a]. As a rule of thumb, the ifconfig command setting the txqueue to 1000 packets for a Gigabit Ethernet adapter can be used.

11.2.2   Application tuning

Throughput can also be limited by the application. For example, we noticed low throughput while copying files over a network using the well-known scp utility. Socket buffers, txqueue and other networking components in the operating system were configured properly. Processor load on both end stations was low.

We found that the problem was caused by the way the ssh protocol (used by the scp utility) handles its data: it is split into 32 kB blocks which are acknowledged at the application level; the default maximum number of outstanding blocks is four. Thus, the ssh protocol creates its own application window with a default maximum size of 128 kB above the TCP window. The size of this window can be set in the source code of the ssh distribution. The influence of increased ssh window on the throughput is shown in Figure. Of course, increasing the data rate to be ciphered and deciphered increases the processor load as well.

[Figure]

Figure 11.3: Relation between the throughput and internal ssh protocol window

11.3   PERT

PERT (Performance Enhancement and Response Team) is an emerging international initiative which attempts to create technical and organisational framework to help users resolve their networking application performance problems. To some extent, PERT should enhance performance just like CERT improves security.

CESNET takes an active role in PERT preparation, presently within the TF-NGN Geant activity. In the second half of 2004, PERT should become a part of the proposed GN2 project. Our experience with PERT preparation has become a part of the D8.1 deliverable "Multi-domain monitoring and PERT" of the GN2 project.

We identified two groups of people interested in the PERT activities and willing to become pilot PERT users. The first group are the GRID researchers (particularly people from the Masaryk University); the other group are people taking care of the streaming video data transfer over the Internet.

We started to build the PERT web pages which should include three parts:

We proposed a structure of performance problem description and we created a trial database using MySQL and PHP4 scripts. After getting more experience and considering the requirements, we concluded that a new database version based on the Request Tracker (RT) will be needed. We identified the following requirements and motivations leadings to our decision to use the RT:

We propose that during escalation, each case will be investigated first to determine the likely problem area and subsequently forwarded to the person responsible for resolving problems in that area. The following potential problem areas have been identified:

Perhaps the most difficult task will be finding and training the right "front-line" people accepting cases and identifying problem areas, as well as people responsible for individual problem areas.

Another task critically important for the PERT success is availability of a good performance monitoring system. Requirements on such system are currently being specified based on experience from many individual performance measurements. The system should also be developed in the GN2 project framework.

11.4   Data link bandwidth estimation

Available bandwidth along a certain network path, i. e. the part of the installed bandwidth not currently used by existing traffic, is a very important dynamic network characteristics. It suggests what throughput can be expected for additional applications, whether any network segment is overloaded or failing, or whether network upgrade may be necessary.

Available bandwidth measurement tools, such as iperf, try to completely fill all remaining bandwidth by sending data as fast as possible and measuring the achieved throughput. Obviously, this method affects the existing traffic significantly and may be used only for a short time.

In contrast, the free capacity estimation tools send only several carefully scheduled packets and try to estimate the bandwidth available by analysing the sending and receiving times of testing packets.

11.4.1   Classification of bandwidth estimation tools

As the prospect of estimating the available bandwidth without stressing the existing traffic appears attractive, we have decided to investigate if these tools can also be used in large high-speed networks. Previous studies were mostly limited to lower speeds or simple network topologies. The bandwidth estimation tools can be classified according to the following criteria:

We classified several known tools representing different approaches in Table.

Every link Installed vs.    
Tool vs. bottleneck free bw Method Location
Clink bottleneck installed bw RTT sender
Sprobe bottleneck installed bw dispersion sender + receiver
Pchar every link installed bw RTT sender
Pathchar every link installed bw RTT sender
Pathrate bottleneck installed bw dispersion sender + receiver
Pathload bottleneck free bw dispersion* sender + receiver
ABwE bottleneck free bw. dispersion sender + receiver
* relative one-way delay

Table 11.1: Classification of bandwidth estimation tools

The pathload tool reports IP-level available bandwidth, whereas the ABwE tool reports free bandwidth normalized for the TCP protocol.

11.4.2   Observation summary

We summarized our observations on behaviour of bandwidth estimation tools in the CESNET technical report 25/2003. We shall mention several interesting findings here.

The pathload tool, as distributed, can estimate bandwidth up to some 120 Mbps. After tuning some of its internal constants, we managed to make it work at some 800 Mbps. However, tests on our testbed with traffic bandwidth generated by a packet stream showed that pathload could provide only very coarse estimates in this range. When accuracy of 100 Mbps was requested, all results fitted in; however, when accuracy of 10 Mbps was requested, most results were out of range.

Within another experiment, a set of several bandwidth measurement and estimation tools was deployed for a period of one month on two paths over the Géant network, consisting of more than 10 routers and OC-48 or Gigabit Ethernet links. Every hour, one set of traffic measurements and estimations by each tool took place:

A sample of measured results in one four-day period is shown in Figure.

[Figure]

Figure 11.4: Bandwidth measurement and estimation

One can see that values produced by different tools vary significantly and concluding which value is close to the real available bandwidth is difficult. We can assume that parallel TCP iperf or UDP iperf are more likely to fill the available bandwidth, but they also more stress the existing traffic and so they can report results higher than bandwidth really available. The pathload command is very unreliable and often systematically underestimates the available bandwidth. A more detailed discussion of our observations can be found in an internal project report [UKr03].

11.5   Computer network simulations for congestion control research

Computer network simulation and emulation allows researchers to conduct experiments on models of computer networks in order to evaluate protocol behaviour and compare alternatives under defined and repeatable conditions, which would not be possible on real networks with unpredictable traffic dynamics. The most widely known network simulator is the ns2. Our experience with using ns2 for congestion control research as well as our additions and enhancements to this simulator have been published in the CESNET technical report 26/2003. In this section, a summary of some of our findings and recommendations for use of ns2 follows.

The ns2 is a freely available discrete-event object-oriented network simulator which provides a framework for building a network model, input data specification, output data analysis and result presentation. Source code is also available which allows users to add new features to the simulator, such as support for new communication protocols, monitoring tools, etc.

In real networks, four components make up the end-to-end packet delay. The ns2 tool simulates all these delay components except for the processing delay:

11.5.1   Installation and simulation scripts

The ns2 tool is implemented in C++ and Tcl and should run on any Posix-like operating system (tested on FreeBSD, Linux, SunOS and Solaris) and on Microsoft Windows. The ns2 uses several other software packages (Tcl/Tk, xgraph, etc.) which can be installed either separately or together with ns2 from the "ns-allinone" package. Some of these packages are mandatory, while others are optional, such as the nam-l for animation of a simulation run.

Once the ns2 is installed, a simulation task is specified by a simulation script written in Tcl. This script describes the network topology (nodes and their interconnection), communications protocols (e.g., TCP) and events (scheduling of data streams to be sent). Lengths of packet queues attached to links and maximum size of TCP window can also be specified. Creating the simulation scripts is a complex task which requires understanding of the ns2 object classes and Tcl programming.

11.5.2   TCP in ns2

There are two flavours of TCP in ns2. The first is a one-way TCP which uses objects of different classes on the sender and receiver sides. For the sender side, several classes are available for TCP: Tahoe, Reno, Newreno, Vegas and Sack or Fack, supporting selective acknowledgements. For the receiver side, three classes are available for TCP receiver: without delayed acknowledgements, with delayed acknowledgements and with selective acknowledgements. Subclasses can be derived from these supplied classes to implement modifications to the standard TCP congestion control. The second flavour is a two-way TCP which uses objects of the same class on both the sender and the receiver sides. One-way TCP is used more frequently than the two-way TCP which implements only the Reno congestion control and is considered under development.

TCP in the ns2 differs from real TCP implementations in several aspects that need to be considered during simulations, such as absence of flow control or sender blocking calls. It also does not include any throughput indication needed for almost any simulation. Our observations of TCP in ns2 have been published in the project report [UbK03].

11.5.3   Example of simulation using ns2

One of the network topologies frequently used in simulations is shown in Figure. Hosts connected to router R1 send data to hosts connected to router R2. The sum of data rates produced by source hosts is usually bigger than throughput of the link between router R1 and router R2, making it a bottleneck link. This link has also a specified non-zero packet loss rate and one-way delay while the links between hosts and routers usually are lossless and have fixed one-way delay and throughput.

[Figure]

Figure 11.5: A simple simulation topology

The following steps must be taken:

  1. Create an object for the ns2 simulator.
  2. Create objects for network nodes, links and queues attached to links and specify their parameters, thus creating the network topology.
  3. Create objects for the TCP sender and TCP receiver and specify their maximum window sizes.
  4. Create objects for the sending and receiving applications and attach them to the TCP sender and TCP receiver objects, respectively.
  5. Schedule events, such as start and end times of data streams and when the simulation should stop.
  6. Start the simulation.

An example simulation script implementing the previous steps (refered to by corresponding numbers in comments) on the given network topology can look as follows:

# 1. Create an object of the ns2 simulator
set ns [new Simulator]

$ns color 0 Red
$ns color 1 Blue

proc finish {} {
        exit 0
}

# 2. Create objects for network nodes, links and queues attached to links
#    and specify their parameters, thus creating the network topology
set pc1 [$ns node]
set pc2 [$ns node]
set r1 [$ns node]
set r2 [$ns node]
set em [new ErrorModel]

# Set link characteristics
$ns duplex-link $pc1 $r1 90Mb 20ms DropTail
$ns duplex-link $r1 $r2 50M 100ms DropTail
$ns duplex-link $r2 $pc2 90Mb 20ms DropTail

$ns queue-limit $pc1 $r1 6000000
$ns queue-limit $r1 $r2 300000

$ns duplex-link-op $pc1 $r1 orient right
$ns duplex-link-op $r1 $r2 orient right
$ns duplex-link-op $r2 $pc2 orient right

$em unit pkt
$em ranvar [new RandomVariable/Uniform]
$em set rate_ 0.0001
set streams 5
set segsize 1500

for {set i 0} {$i < $streams} {incr i} {
# 3. Create objects for the TCP sender and receiver and specify maximum
#    window sizes
 set tcpz($i) [new Agent/TCP/Reno]
 set tcpc($i) [new Agent/TCPSink]
 $ns attach-agent $pc1 $tcpz($i)
 $ns attach-agent $pc2 $tcpc($i)
 $tcpz($i) set fid_ 0
 $tcpc($i) set fid_ 1
 $ns connect $tcpz($i) $tcpc($i)
 $tcpc($i) listen
 $tcpz($i) set window_ 500
 $tcpz($i) set segsize_ $segsize 
# 4. Create objects for the sending and receiving application and
#    attach them to objects for the TCP sender and receiver, respectively
 set snd($i) [new Application/FTP]
 set rcv($i) [new Application/TCPCNT]
 $snd($i) attach-agent $tcpz($i)
 $rcv($i) attach-agent $tcpc($i)
}

set null [new Agent/Null]
$em drop-target $null
$ns lossmodel $em $r1 $r2

# 5. Schedule events, such as the start and end times of data streams 
#    and when the simulation is to stop
for {set i 0} {$i < $streams} {incr i} {
 $ns at 0 "$snd($i) start"
}
$ns at 0 "$rcv(0) settimer 0.1"
$ns at 0 "$tcpc(0) settimer 0.1"

for {set i 0} {$i < $streams} {incr i} {
 $ns at $TIME "$snd($i) stop"
}

$ns at $TIME "$rcv(0) stop"
$ns at $TIME "finish"

# 6. Start simulation
$ns run

11.5.4   Memory requirements

The volume of memory required by the ns2 for a simulation depends on the number of packets within the simulated network and on the number of packet headers maintained for each packet. In fast long-distance networks, which are often a subject of current research in congestion control, the number of packets within the network can be some tens or hundreds of thousands and the volume of memory required can grow to several gigabytes. The memory requirements can be lowered by first removing all packet headers and then adding only the required headers. For example, the following commands can be added at the beginning of a simulation script:

remove-all-packet-headers
add-packet-header TCP IP

11.5.5   Scripts for batch processing

To evaluate the congestion control mechanisms under various network conditions, a set of simulations of a selected network topology must be run where the network characteristics of the bottleneck line are varied. These include the link bandwidth, packet loss rate and one-way delay. We may also wish to experiment with different packet sizes, number of parallel streams, as well as changing the test duration and time granularity for computing the resulting characteristics, such as the achieved throughput.

We added logging of TCP connection characteristics and created a set of scripts to simplify the use of ns2 for simulation of common experimental scenarios with various link characteristics and protocol parameters. The inter-relations of individual scripts are illustrated in Figure:

[Figure]

Figure 11.6: Scripts for batch simulation processing

The sequence of script actions can be described as follows:

11.5.6   Throughput measurement

To monitor the throughput at the application level, we created a new class Application/TCPCNT; to monitor throughput at the TCP level, we modified the class Agent/TCPSink. A description of these enhacements can be found in [UbK03].

11.5.7   Reaction to a change of available bandwidth

In order to study responses of a congestion control mechanism to increased or decreased available bandwidth, we created a sender-side application class Application/TCPFTP which generates periodic bursts of packets. To start the application, the following commands in the simulation script can be used:

set snd [Application/TCPFTP]
$snd set interval_ n
$snd set burstsize_ m
$snd start

where n is the period in seconds and m is the number of MSS-length packets to be sent in each period. The application must be attached to the TCP sender - see the example simulation scripts. To stop the application, the following command in the simulation script can be used:

$snd stop

11.5.8   Adjusting the AIMD parameters

In the original ns2 TCP, the congestion control parameters within the slow start as well as congestion avoidance phases are fixed. The latter is based on AIMD(1, 0.5). To be able to experiment with recent proposals of Fast TCP, changing the AIMD parameters should be possible. We have modified certain ns2 classes so as to be able to adjust both the slow start and congestion avoidance parameters. A detailed description of these enhancements can be found in [UbK03].

11.5.9   Asynchronous monitoring of TCP characteristics

The Ns2 can synchronously monitor the TCP charakteristics (cwnd, ssthresh,...) after any of them is changed. In some cases, an asynchronous monitoring (recording the values of all characteristics in a given time interval) may bring clearer results. Therefore, we modified the ns2 to allow asynchonous monitoring as well. A detailed description can also be found in [UbK03].

11.5.10   Difficulties we ran into

We encountered several symptoms of unexpected behaviour and ran into some problems when using the ns2:

These phenomena are presented together with explanations for some of them in [UbK03].

At the present time, the ns2 simulator is used for research in congestion control for long-distance high-speed networks. A paper on this topic is being prepared.

11.6   Developed software

In 2003 we created the following software packages:

11.6.1   Evaluation of bandwidth measurement and estimation tools

A set of scripts used for evaluation of bandwidth measurement and estimation tools. The obtained results were presented in the CESNET technical report 25/2003.

11.6.2   Analysis of time and geographical characteristics of network traffic

A set of tools for analysing time and geographical characteristics of network traffic from netflow records. These tools were used to analyse the CESNET international traffic. A technical report on this topic is being prepared.

11.6.3   Linux kernel monitoring

A patch for configuring and monitoring certain events in Linux kernel that influence performance of TCP bulk transfers. Particularly, it allows setting up the AIMD speed as well as enabling, disabling and monitoring the CWV and CWR mechanisms. This patch is being used for congestion control research; a paper on this topic is being prepared.

11.6.4   NIST Net deterministic patch

A patch that provides deterministic packet loss and queue length for a popular emulation package NIST Net. Our enhancements will be described in a technical report on our experience with network emulation.

11.7   Other project activities

Together with the Optical networks and their development project we conducted an experiment using the Intel 10 Gigabit Ethernet PC adapters in order to evaluate the feasibility of providing a 10-Gigabit Ethernet connectivity up to the end stations. The results were presented in CESNET technical report 10/2003.

Building productive relationships with international partners leading to motivating proposals of further research activities within several planned 6th Framework Programme projects is also regarded as an important project result.

11.8   Planned activities

In 2004 we plan to concentrate on three research areas. The first area is congestion control in long-distance high-speed networks. We managed to gain a lot of experience in this field and we are working on several papers and technical reports on this topic. The second area is performance monitoring. Our intention is to implement the results of the SCAMPI project which is developing a programmable monitoring platform for the high-speed Internet. The third area is the Performance Enhancement and Response Team.

previous
contents
next
metacentrum elearning liberouter live shows videoserver eduroam