5   Network and Traffic Monitoring

5.1   Monitoring the Network Infrastructure

The main task in this area was to continue the development of the G3 system. It is a tool designed for a continuous monitoring of a large-scale computer network infrastructure.

5.1.1   Measurement Module for the G3 System

The measurement relies primarily on the SNMP (Simple Network Management Protocol) functionality. SNMP is used as the basic mechanism for retrieving data from network devices. Our aim is to investigate new methods for primary data collection in order to increase the effectiveness of SNMP, i.e., get more information from the captured and processed data without using more aggressive requests rates. On the basis of these experiments, we created the core of the G3 measurement module in 2004. The rest of this fundamental module was finished in the second quarter of 2005 and the module was then used in experimental operation.

For the rest of 2005, we focused on increasing the system stability and also extended the measurements (in several steps) to cover the majority of CESNET2 backbone devices. This way we were able to test the measurement module with a relatively wide set of different devices. More than 80 network devices are currently being measured representing more than 3500 monitored network interfaces. With the experience from the first phase of tests, we decided to redesign the architecture of the internal data storage mechanism in order to relax the relatively high demands on I/O operations. This affected especially the numeric counter-type items stored in RRD (Round Robin Database). We added buffers with controlled batch mode for writing to the database and also started to migrate the data structure from "single item" to "group of items" model. Now we are able to set (during system initialisation) the number of items that will be stored in a single RRD. Although the improved measurement module is still rather young, it is currently stable enough to satisfy the long-term measurement needs.

5.1.2   Prototype of the Basic User Interface for G3

The design and prototype implementation of the basic user interface for G3 were the main tasks of this activity in 2005. The first version was supposed to include the key features envisioned in the proposed architecture of a new infrastructure monitoring system. The user interface consists of two basic components: navigation and visualisation. A more detailed description of both is contained in the technical report [Ko¹05].

5.1.3   Navigating in the G3 User Interface

Network administrator will probably never be able to agree on the preferred style of navigation in management tools. Nonetheless, in most cases they at least agree that a tree-like navigation structure is a meaningful starting point. In order to make the system as flexible as possible and future-proof, we tried to build on this basic idea and implement functions that enable users to choose and interactively configure a subjectively optimal structure and behaviour of the navigation tree. Here are some examples:

Interactively configurable template for visualising the navigation tree allows for changing the content as well as the hierarchy of the tree.

[Figure]

Figure 5.1: Template - setup 1

[Figure]

Figure 5.2: Navigation tree according to template setup 1

[Figure]

Figure 5.3: Template - setup 2

[Figure]

Figure 5.4: Navigation tree according to template setup 2

Experimental object filtering is a part of the mechanism for creating the navigation tree. The user interface allows to set up multiple conditions at the same time using AND and OR operators and optionally bind these conditions to a particular type of value via explicitly selected descriptive items. Conditions can be interpreted as substrings or regular expressions.

Automated object aggregation is an integral part of the mechanism for creating the navigation tree. It depends on the current template setup. A single navigation label can be used as a pointer to multiple physical objects. This may be useful (together with the complementary functions in the visualisation component, see below) in cases when multiple physical objects share some function and should thus be visualised as a single unit, such as traffic over parallel lines. The step-by-step merging of objects towards a single label can be seen in the following sequence of examples. In order to achieve the expected result in this case, one has to set up an appropriate filtering condition.

[Figure]

Figure 5.5: Starting template

[Figure]

Figure 5.6: Corresponding navigation tree

[Figure]

Figure 5.7: Template after some modifications

[Figure]

Figure 5.8: Corresponding navigation tree

[Figure]

Figure 5.9: Final template

[Figure]

Figure 5.10: Final navigation tree

The navigation tree may optionally be visualised as "fully expanded" - usually when the number of objects is low and/or reduced (by filtering, for example) - or as partially and interactively "expanded" or "collapsed".

[Figure]

Figure 5.11: "Expand&collapse" navigation tree example

Special extension functions are intended to increase the effectiveness of network administration. They should make the orientation easier and/or extend the navigation tree with information obtained from processing specific items measured within the time frame of interest. It may be for example sums or extremes of items containing information about interface errors or system restarts and others. The automated object aggregation is applied in this case too, so the final information may hide a lot of unnecessary details. On the other hand, these operations consume a lot of resources, therefore they should be limited to a reasonable number of objects.

[Figure]

Figure 5.12: Added information about rebooted systems within the requested time window

[Figure]

Figure 5.13: Network interfaces with possible problems

[Figure]

Figure 5.14: Faster orientation - technological part of interface descriptions is suppressed and inactive interfaces marked

5.1.4   Visualisation in G3 User Interface

The behaviour of the visualisation component is in accord with our effort to show the dynamics of the network infrastructure and also complements the features described above in the section on navigation.

Typical network infrastructure monitoring systems visualise the temporal evolution of some quantity by using average values over a given time interval. Averaging is often used even in long-term views. As the G3 system is designed to capture certain aspects of network dynamics (within limitations given by the measurement methods), the variance in the measured quantity is also important.

[Figure]

Figure 5.15: Time step between consecutive requests for data sent to a network node

[Figure]

Figure 5.16: One way bit rate (including limits) - data were requested according to the time schedule displayed above

It is often convenient to have the possibility to display multiple items in a single graph. On the other hand, including the envelope curves makes graphs with more than two separate items confusing. We implemented a mechanism that allows to easily define (outside the user interface) the appropriate visualisation style for each item, e.g., stand-alone vs. in combination with other items.

[Figure]

Figure 5.17: Two separate items in a single graph including value limits

In the area of aggregation, the visualisation functions are analogical to those of the navigation component. So the possibility of creating single output from multiple data sources is implicitly assumed. The final result thus involves a fair amount of computation - summarisation, finding extremes or anything else that makes sense for the items being visualised and the given purpose.

[Figure]

Figure 5.18: Aggregated traffic over two parallel lines (interfaces)

Even if aggregation is used, one sometimes needs to pick and observe one or more objects from an aggregated set (hidden behind a single navigation label). For this purpose, we implemented an additional module for sub-navigation. Secondary selection can be done according to object identifier or values of its descriptive items.

[Figure]

Figure 5.19: Sub-navigation - objects selection based on object identifier

[Figure]

Figure 5.20: Sub-navigation - objects selection based on descriptive item value

The actual functionality of the prototypical implementation of the G3 system user interface is a starting point for subsequent development and our future research in the area of large-scale network infrastructure measurements. The G3 system in its current status is expected to serve as a proof of concept that is open to further improvements. However, only long-term practical testing and feedback from network administrators will ultimately show whether this direction is viable.

5.2   Network Traffic Monitoring

In 2005, we focused the development efforts on the experimental user interface for FTAS (Flow-based Traffic Analysis System).

The experience of backbone administrators who use FTAS in their everyday work shows that retrieval of primary flow data sets for a relatively large time frame in interactive mode is far from being optimal and effective. Users are forced to periodically confirm the continuation of single queries. Such requests (large time frames, neither pre-processed nor aggregated data) arise mostly in connection with analysis of attacks or attack attempts, which means it is an important way of using FTAS. Therefore, we implemented a mechanism that allows for background selections.

Another extension of the FTAS user interface related to the previous one is the background query notification. A short email containing status information about the processed request may be sent to the email address specified by the user in the FTAS query window. The message also contains a locator pointing directly to the results (if any). Potential typos in email addresses do not threaten security - access to the results is subject to the same AA mechanism as in the case of interactive work.

[Figure]

Figure 5.21: Finished background query - notification example

Some users require output in the form of summarised values per aggregation time intervals within the time frame of interest. For example, it may be hourly summaries during the last day or daily summaries during the last week. The previous implementation of the user interface was only able to show the rates (packets per second, bits per second) and the proper size of the aggregation interval was determined automatically by the system. As a first step, we enabled users to enter the size of the aggregation time interval in the query specification window. Here we chose the "experimental" variant - users do not pick a value from a pre-defined set but are allowed to enter any value and the system itself corrects the value, if necessary, in order to be able to run a selection. We also implemented the corresponding functions in the viewer. Users can now easily switch between summarised and rate-type visualisations.

[Figure]

Figure 5.22: Output example - rate-type visualisation

[Figure]

Figure 5.23: Output example - aggregation time intervals summaries

previous
contents
next
metacentrum elearning liberouter live shows videoserver eduroam