4 Programmable hardware
In 2005, the activity Programmable Hardware followed essentially the same direction as in the previous year. Our aim was to concentrate the resources on the main ongoing projects but, at the same time, investigate other potential applications of programmable hardware.
The year 2005 marked the end of two major IST-FP5 projects with our participation: SCAMPI and 6NET. During the final review of the SCAMPI project, the reviewers recommended to pursue further commercial utilisation of project results, namely the SCAMPI monitoring adapter. CESNET put a lot of effort into this process in 2005, yet the definitive solution is still being negotiated.
By the end of 2005, the development team had 74 active members. We substantially expanded the VHDL group because it turned out that effective integration of students into the complex development work requires them to gain some experience by doing simpler tasks first. After such a testing period, lasting approximately a year, we select best newcomer students for participating in the serious projects.
Figure 4.1 shows the annual increase of lines in our CVS repository. While such statistics are often rather dubious, it still gives some insight into the amount of development work being done.
The internal organisation of the development team underwent an important change in 2005: Previously, the developers were managed mainly via specialisation groups such as VHDL, system software etc. Since mid 2005 we introduced a new hierarchy that is based primarily on projects. Individual project groups include all developers, testers and other support persons who are supposed to contribute to project results. This change was motivated especially by the need for a tighter control of project timelines. Five projects are currently running:
- FlowMon - development of IP flow monitoring probe, led by Martin Žádník (Brno University of Technology).
- Liberouter - development of an IPv6/IPv4 router, led by Jiří Tobola (Brno University of Technology).
- SCAMPI - development of the SCAMPI monitoring adapter, led by Tomáš Martínek (Brno University of Technology).
- IDS - development of a payload scanner for intrusion detection systems, led by Petr Kobierský (Brno University of Technology).
- PAGEN - development of a packet generator, led by Jan Pazdera (Brno University of Technology).
Details about these projects are in following sections, but first let us have a look at the new generation of COMBO cards that are used by all projects.
4.1 New Cards
Several new cards were designed for the COMBO family:
- COMBO6X motherboard for 64-bit/66 MHz PCI and PCI-X was already tested and several minor flaws that were found resulted in an updated design of the card.
- COMBO6E motherboard for the Express PCI bus is currently being manufactured.
- COMBO-4SFPRO interface card is available in two variants, one for Gigabit Ethernet and another for SDH STM-16. The card passed successfully initial tests and is ready for deployment.
- COMBO-4XFPRO was designed to support 10-Gigabit Ethernet. However, due to serious problems with electronic components, the current design does not conform to the 10GE standards and the card will have to be redesigned.
The advantages of the new cards over their predecessors are:
- By implementing support for the new bus standards - PCI-X and Express PCI - we open new design possibilities for network devices with COMBO cards: The PCI bus is no more the bottleneck and so it is easier to utilise the host processor along with COMBO hardware.
- Specialised chips were replaced by new functions or pre-programmed modules directly in the field-programmable gate arrays (FPGA): Instead of phyters for network interfaces we now use the fast serial Rocket IO circuits inside Xilinx Virtex-II Pro FPGA and also the PLX PCI bridge that was used on the original COMBO6 card was replaced by an FPGA with a commercial implementation of the respective PCI bridge, so called Intellectual Property Core.
- The FPGA series Virtex-II Pro introduced another interesting feature: one or more PowerPC processors integrated inside the FPGA. The powerful but difficult FPGA-based approaches can thus be effectively complemented by standard processor-based operations that are generally easier to program. We are currently using the embedded PowerPC processors for implementing the Bus-master DMA functionality.
10-Gigabit Ethernet was supported already by the COMBO-2XFP interface cards developed for the SCAMPI project. Unfortunately, after we got first two pieces of this card, Intel discontinued production of a crucial component - DS12010 phyter. Therefore, a logical decision for the new generation of interface cards seemed to be to use the serial Rocket IO circuits instead. Vendor specifications claimed support for 10GE in Xilinx Virtex-II Pro XC2VP20 FPGA (speed grade 7). The design of the new COMBO-2XFPRO cards was thus based on them. However, Xilinx later called off support for 10GE in these chips, without giving any details. This left us no choice other than to terminate the development of COMBO-2XFPRO and start working on another design, this time based on Virtex-4 FPGA where 10GE should be fully supported.
4.2 FlowMon Probe
Development of the FlowMon probe started in 2004 within the Joint Research Activity 2 (Security) of the GN2 project. The goal is an autonomous monitoring probe capable of collecting information about IP data flows in high-speed networks.
IP flow data are traditionally generated by IP routers. Using an autonomous probe instead has several advantages:
- The probe is an essentially invisible element of the network infrastructure at both link and network layers. This fact significantly decreases the likelihood of remote attacks.
- The main task of routers is by definition packet routing and forwarding. Other operations, especially those that are CPU-intensive, must thus be avoided as much as possible. A separate monitoring device is more flexible in this respect.
- As a special case of the previous item, some routers impose sampling on the input traffic. Even if sampling is not required, in some cases it is the only way for keeping the router operational, especially on high speed interfaces. For some applications, such as security analysis, sampling is highly undesirable.
Further technical details about the probe can be found in the technical report [ZaL05] and article [ZPK05].
4.2.1 Hardware
The hardware accelerator for the FlowMon probe is based on a combination of the COMBO6 mother card and a Gigabit Ethernet daughter interface card. The latter could currently be either COMBO-4MTX (with metallic ports) and COMBO-4SFP (with SFP transceivers).
4.2.2 Firmware
Firmware for the FlowMon hardware accelerator is an entirely new design written in the VHDL language. It is able to process both IPv4 and IPv6 traffic simultaneously. From the viewpoint of the monitored link, the probe acts as a repeater: The ingress traffic received on one port of a COMBO interface card is immediately transmitted to another port while a copy of the traffic is passed to the firmware for flow processing. The device can monitor data on the link in both directions simultaneously. The firmware is able to maintain records about up to 65,536 flows.
The processing pipeline is shown in Figure 4.2 and works as follows: Packets received by the Input Buffer (IBUF) are passed to the Header Field Extractor (HFE). This unit parses L2, L3 and L4 headers, extracts all relevant fields and records them in a fixed data structure named Unified Header (UH) that is stored in a RAM-based queue (Statistical FIFO). In parallel, certain key fields from the UH are used as input to the CRC-64 hash function implemented by the HASH unit. From the 64-bit result of the CRC-64 function, we use 57 bits as the unique identifier of the flow. By default the key field are: source and destination IP addresses, source and destination port numbers and the L3 protocol number. The firmware can be configured to mask out an arbitrary selection of bits from these key fields so that only the remaining bits are used for computing the hash function. The flows would be more aggregated in this case.
The use of a hash value as the flow identifier means that packets with differing key fields may occasionally map to the same hash value so that these packets are incorrectly classified as belonging to the same flow. However, given the statistically uniform distribution of CRC-64 values, the probability of an undetected collision of flows is N×2-57, where N is the actual number of flows in the cache. As the maximum number of flows in the cache is 216, the collision probability cannot be higher than 2-41, or 4.55×10-13. With the current maximum throughput of half million packets per second it means 7 collisions per year on the average. While this probability is sufficiently low for most purposes, we have also taken into account the possibility of an attacker generating a hash collision on purpose, e.g., by injecting carefully forged traffic first and then launching another (hostile) traffic that will be classified as the previous forged flow due to the hash collision. This scenario is not extremely difficult to realise given the fact that the hash function is known. To oppose this threat, the HASH unit is initialised with a random seed so that the values of the hash function are not predictable.
The hash value is stored in the Hash FIFO. On the opposite end of this queue, the Hash Search Unit (HSRCH) takes the hash values one-by-one and searches the Hash Memory in order to find out whether the hash value is already present. If it is the case, the SCTRL unit is instructed to update the statistics of the existing flow. Otherwise, a new entry in the Hash Memory is created.
The Manager unit (MAN) looks after all flows entries in the Hash Memory and also manages the list of free memory locations. The flow entries are kept in a bidirectional list sorted by the timestamp of the flow entries. This way, inactive flows are easily recognised when their age exceeds a given threshold (inactive timeout).
Finally, the Storage unit (SCTRL) collects the statistics about active flows and exports flow records according to instructions obtained from the MAN and HSRCH units.
The current firmware version also includes a special experimental sampling procedure: sample-and-hold. Its operation is similar to the standard statistical sampling with the following significant difference: sampling is avoided for flows that already have entries in the flow cache. This way, one can get a very precise information about large flows.
Several parameters governing firmware behaviour can be modified on the fly, without disrupting its operation:
- active timeout in the range 0-1200 seconds
- inactive timeout in the range 0-60 seconds
- sampling rate in the range 1-65536
- sampling rate for sample-and-hold in the range 1-65536
- threshold for sample-and-hold - sampling is not started unless the number of entries in the flow cache is higher than this value
4.2.3 Device Driver
Device driver for the FlowMon probe is available for Linux 2.4 and 2.6 kernels. It is quite sophisticated in that it allows concurrent access of multiple applications to the flow records. A single shared memory block is used for storing up to 16,384 flow records in a logical structure of a circular buffer. When the buffer becomes full, the oldest records are rewritten by the new ones. Each application keeps its own pointer to the buffer and can also lock up to 1024 records in order to prevent these records from rewriting before the application is finished with reading them.
Applications are allowed to access the driver only through a special low-level library, libcsflow. This library implements certain common functions and for debugging purposes it also enables application testing without access to the FlowMon probe itself - the flow records can be supplied from a disk file.
4.2.4 Flow Exporter Program
First application to use the FlowMon probe is the flow exporter program for NetFlow version 9 (RFC 3954). It is currently able to send export packets to a single collector via IPv4/UDP transport. The collector IP address and destination UDP port can be configured through command line parameters.
The NetFlow v9 format is flexible in that it allows to define the contents of the flow records in essentially any sensible way. Such a definition is passed to the collectors by means of so-called templates. Our exporter program currently supports six templates for all combinations of IPv4 or IPv6 on one side and the most common L3 protocols TCP, UDP and ICMP on the other side. Two additional templates (IPv4/OTHER and IPv6/OTHER) are used for all remaining L3 protocols.
All templates share the following seven fields:
- FIRST_SWITCHED - timestamp of the first datagram in the flow
- LAST_SWITCHED - timestamp of the last datagram in the flow
- OUT_PKTS - number of datagrams belonging to the flow
- OUT_BYTES - number of bytes belonging to the flow
- IPV4_SRC_ADDR a IPv6_SRC_ADDR - source IP address
- IPV4_DST_ADDR a IPv6_DST_ADDR - destination IP address
- PROTOCOL - Layer 3 protocol number
Additional fields present in protocol-specific templates are shown in the Table 4.1.
| Field | TCP | UDP | ICMP | Description |
|---|---|---|---|---|
| L4_SRC_PORT | X | X | source port | |
| L4_DST_PORT | X | X | destination port | |
| DST_TOS | X | X | type-of-service octet | |
| TCP_FLAGS | X | TCP flags | ||
| ICMP_FLAGS | X | ICMP flags |
Table 4.1: Data fields in protocol-specific templates
The templates contain vital information for the collectors to be able to interpret the data. It is thus important to re-send the template descriptions every once in a while - the period can be configured through a command line parameter.
4.2.5 Tests of the Prototype
In November and December 2005 we realised first successful tests of the probe, both in a laboratory environment and a production network. Other independent tests are being prepared together with our partners in the GN2 project, to whom CESNET lent (or will lend) the prototypes of the probe. One of them is already up and running in Utrecht (The Netherlands), being tested by our colleagues from SURFnet. So far, the biggest obstacle for effective testing is non-existent or buggy support for NetFlow v9 in the available collectors.
We tested the throughput of the firmware with the help of the Spirent AX/4000 network analyser. The results are shown graphically in Figure 4.3.
From the graph it is apparent that the FlowMon engine has two independent bottlenecks:
- The firmware currently runs at 50 MHz clock rate, which means that with 16-bit wide data path the theoretical throughput limit is 800 Mbps. This bottleneck limits the throughput of big packets at the upper-right end of the curve in Figure 4.3.
- Header Field Extractor cannot process more than approximately 500,000 packets per second. This affects especially the throughput of small packets at the lower-left end of the curve in Figure 4.3.
Both bottlenecks will be addressed in the future versions of hardware and firmware. With the new COMBO6X motherboard the clock rate will be increased to 100 MHz meaning a theoretical throughput of 1.6 Gbps. Also, a new implementation of HFE is now being tested that was designed to process at least 2 million packets per second. After these improvements are implemented, we will be able to monitor 1 Gbps circuits at line rate for any mix of packet sizes.
We also tested the probe with real data from the CESNET2 backbone. It was connected to a port of the backbone access router in Brno that connects the entire Brno Academic Computer Network. Generated NetFlow v9 data were processed by the FTAS software, see Chapter 5. The following figures show few sample results1. They match quite closely the expected outcome, thus indicating that data sent by the probe are qualitatively correct. For a quantitative assessment, we are preparing another test where data obtained from the probe will be directly compared to those generated by the neighbouring router.
4.2.6 Future Plans
The FlowMon probe, together with the FTAS and NetFlow Monitor programs that are also being developed by CESNET, already provides a relatively comprehensive solution. It could be used for a number of application, not only in the area of security analysis but also for QoS monitoring, data accounting and network capacity planning.
Our goals for the nearest future (till end of February 2006) include primarily quantitative improvements and overall consolidation of the software. The following new functions will be implemented:
- Export in NetFlow version 5 format
- Support for multiple collectors
- Filtration of records: each collector will get only the data it is entitled to see.
Configuration of the probe will be done via the text user interface of the Netopeer system, see Section 4.3.2.
During the first half of 2006, we plan to finish a new version of the probe that will be based on the COMBO6X motherboard and COMBO-4SFPRO interface card. This version will have a throughput of 1.6 Gbps and support for the SDH STM-16 protocol, along with Gigabit Ethernet. New cards will also provide a larger SSRAM capacity allowing to increase the size of the flow cache up to 512,000 flows.
In a longer perspective, we will also pursue research and experimental directions such as implementation of the IPFIX protocol (RFC 3917) or various sampling strategies.
4.3 Liberouter
The aim of the Liberouter project is the development of a dual-stack (IPv6 and IPv4) gigabit router based on the standard PC architecture with a high-performance hardware accelerator of packet forwarding and filtering. This way we can improve the throughput of a PC router platform compared to the software-only solution. The project was supported by the 6NET project, see Chapter 18.
At the beginning of the year 2005, a first functional prototype of the Liberouter design - network interface card with hardware packet filtration - was presented at a review of the 6NET project. In the course of 2005, the development of the Liberouter project was divided into two development branches:
- network interface card with hardware packet filtration (NIFIC project)
- the original router (Liberouter project)
Both projects are documented using our new XML documentation system that is accessible from our web site.
4.3.1 NIFIC Project
Firmware
During the first half of the year 2005, the development efforts concentrated on finishing the NIFIC project. This project extends the features of the original Liberouter prototype by adding packet forwarding and replication in hardware. This can be used for line-rate filtering on high speed network interfaces, as the packets are processed exclusively in hardware. Another potential usage could be splitting (and filtering) the network stream into several streams according to packet headers so that for example potentially dangerous streams can be forwarded into honey-pots. Such a device would essentially be a filtering bridge capable of classifying packets according to headers of the link, network and possibly higher layers.
In comparison to the first Liberouter prototype, significant changes are especially at the output part of the hardware design, see Figure 4.8.
Incoming packets is received by the Input Packet Buffer (IBUF) and passed to the Header Field Extractor (HFE). The HFE pushes the body of the packet (including original headers) into the Packet FIFO (PFIFO). In parallel, it also parses the packet headers and creates a structure called Unified Header that is stored in UH FIFO (UHF). Look-Up Processor (LUP) performs lookup in its filtering and forwarding tables and issues instructions for further packet processing. The Dispatcher (DISP) grabs the packet body and executes the LUP instructions by either:
- Discarding the packet.
- Sending the packet to software (via SW_OBUF). In this case, the packet is received by the COMBO6 card driver and passed to the operating system. Later, after all necessary processing is done, it can be transmitted again via Output buffers.
- Sending the packet directly to an output interface (via REP). In this case, the packet is directly copied to one or more output interfaces and then transmitted via Output buffers (without software processing).
Any combination of these actions is also allowed. For example, the packet can be forwarded to an output interface as well as sent to software.
The firmware of the NIFIC project is now available for the COMBO6 motherboard and either COMBO-4MTX or COMBO-4SFP. While the NIFIC project is less complex than the Liberouter project, the utilisation of hardware resources (especially of the FPGA on the COMBO6 card) is rather high. Due to this fact, the implementation of a full-fledged router has been postponed to the next generation of COMBO6 cards (COMBO6X and COMBO6E).
Software
Most of the system software is shared by the NIFIC and Liberouter projects. In particular, control tools for common components and packet filtering by the Look-Up Processor are the same. We are still investigating methods for partitioning the LUP rules between the associative (CAM) and static (SSRAM) memory - see technical report [Ant05]. One part that is specific for the NIFIC project is the COMBO6 driver, which allows standard access to the network interfaces of the COMBO6 card including packet reception and transmission. The driver is available for the NetBSD and Linux operating systems. A new version of the driver for the next generation of COMBO cards is now under development. It uses the PowerPC processor that is embedded in the new Virtex-II Pro FPGAs for implementing the bus-master DMA functionality that will allow higher throughput of the system bus.
In order to help our research partners (and also our developers) with installing and configuring the environment for COMBO cards, we developed a new system for preparing firmware and software distribution packages. The packages are generated automatically and contain all files necessary for the given project. The packages can be downloaded from our web site. The system was received very positively and is now also used in other projects such as FlowMon.
4.3.2 Liberouter Project
Firmware
Along with completing the NIFIC project, VHDL developers also worked intensively on the components of the Liberouter design. Towards the end of 2005, all components were integrated together. Although some features in those components have not been implemented yet, the complex Liberouter design already forwarded first packets in a software simulation. In 2006, the development will concentrate on implementing the missing features and moving on from software simulation to hardware implementation in the new COMBO6X and COMBO6E cards.
The architecture of the Liberouter design is shown in Figure 4.9. While the input part is the same as in the NIFIC project, the output part has several new components:
- REP (Replicator)
- This component replicates the packet to several output interfaces. This is useful, for example, for multicast transmissions. Replicator processes LUP instructions, replicates the packet references as necessary and puts them to the proper priority queue, together with instructions for further processing.
- PQ (Priority Queues)
- Set of priority queues for each output interface allowing to implement simple QoS policies.
- OPE (Output Packet Editor)
- Nanoprocessor for packet editing. It modifies L2 and L3 header data before transmitting the packet - this includes, for example, changing the MAC addresses, decrementing the TTL value etc.
- DRAM (SDRAM Scheduler + SDRAM Controller)
- SDRAM scheduler arbitrates access to DRAM from all units (HFE, REP, OPE). SDRAM controller stores and retrieves data to/from DDR SDRAM.
At the moment, two of the components - Replicator and DRAM Scheduler - are fully implemented. Work on the remaining units - OPE, PQ and SDRAM controller - was divided into two phases. First versions of these components with limited functionality are already available and will be used for testing the Liberouter design in hardware. Later in 2006 we will concentrate on finishing full-featured versions of all units. In parallel, we will also redesign the performance-critical units in the input part (HFE, LUP) in order to to achieve better performance and FPGA utilisation, and implement also other improvements, for example extended instruction sets.
Software
The main software component for accelerated packet switching is the combod daemon. Its role is to compile certain kernel data - routing tables, neighbour (ARP) caches and packet filter chains - into a single lookup structure for LUP. We developed a prototype method that combines the routing table and neighbour cache into so-called RA table. The packet filter function is added by inserting interval decision diagrams representing the packet filter rules into appropriate parts of the address spaces in the RA table. Formal model of this step is under development, together with quantitative criteria for assessing the behaviour of the structure. The interim results are summarised in the technical report [Ant05].
Apart from the work on the combod daemon, a set of control tools for hardware components was implemented. These tools allow setting parameters and testing all components of the Liberouter design through a unified user interface. Functions related to packet routing will be moved to the libcombo library, which will be used by the combod daemon for controlling nanoprograms and editing the actual routing and filtering rules stored on COMBO6 card.
Formal Verification
In the area of formal verification we concentrated on verifying the design of the TX_BUFFER and generic FIFO components, namely synchronous and asynchronous versions of the FIFO and FIFO BRAM components. The TX_BUFFER verification proved correctness of the corresponding VHDL code by showing that buffer overflow is not possible. Moreover, formal specification of the temporal properties resulted in a more precise the description of some signals.
The parametrised version of FIFO BRAM component was verified. Parameter values has been limited to 5-bit address space, as for larger values the state explosion problem was unavoidable. Verification results revealed a bug in the component design, namely in the control of LSTBLK signal, which indicates that the last block in the queue is available. Consequently, this bug was precisely located and corrected. The new version of the design has also been verified. For this purpose, the previous temporal specification was reused. Such a repeated use of the specification is a novel result that is also significant for the formal verification theory.
The detailed results of the verification process described above can be found in the formal verification section of the project web.
Netopeer Configuration System
The Liberouter project also includes the Netopeer system for consistent configuration of routers and other network devices. Netopeer uses the XML language for internal representation of the configuration data.
In 2005, we concentrated on finishing the text user interface based on the ncurses library. This interface now provides all required functions and is being intensively tested.
As a vehicle for transporting configurations to the target network device and reporting back errors we selected the netconf protocol that is being developed by IETF. The software developed as a part of the master thesis [Zlo05] is one of the first implementations of this protocol.
Interesting results were also achieved in the development of the metaconfiguration application [Mat05] that enables configuration of entire networks at a higher level of abstraction and automatically generates configurations of individual routers in the Netopeer language. We also created a module that produces a graphical representation of the network in the SVG format.
By the end of 2005, we adjusted the Netopeer system so that it can now be used also for the FlowMon probe, see Section 4.2. Using the hardware and software specification of the probe, we created a corresponding XML schema and slightly modified the text interface. The remaining task is now to develop the necessary back-end that will process the Netopeer data and configure the FlowMon probe accordingly.
4.4 SCAMPI Adapter
The main objective of the EU project SCAMPI was to develop a high-performance monitoring adapter for link speed up to 10 Gbps. When dealing with multi-gigabit traffic rates, it is no more possible to monitor incoming data flows using conventional computers with network interfaces, mainly due to the limited throughput between the network interface and CPU. The functionality of the SCAMPI monitoring adapter was therefore divided between the software part (used mainly for adapter configuration and control) and dedicated hardware - the COMBO6 card that was designed for processing performance critical parts of the system using the FPGA technology.
In the scope of the SCAMPI project, the primary task of our team was to design and implement the adapter firmware and low-level software for communication with the COMBO6 card. For SCAMPI we also developed the COMBO-PTM card whose task is to generate precise time stamps that are especially important for network performance statistics such as one-way delay.
4.4.1 Firmware
The main function of the adapter firmware is to receive packets from an input interface, assign precise time stamps and perform packet analysis and classification according to the requirements of user level applications. Based on the results of the classification process, the adapter firmware has to collect statistical data, perform packet filtration or sampling and search packet payload for occurrence of specific patterns.
Based on this specification, the firmware architecture was designed to be able to collect statistical data about up to 256 different data flows. The data are generated from information about individual packet length and time stamp. Next, the design supports up to 16 sampling units that can be configured to operate in one of the following sampling modes:
- Deterministic - every n-th packet is accepted
- Stochastic - packets are accepted randomly based on a preconfigured probability
- Byte deterministic - packet containing each n-th byte is accepted
Using a fast associative memory, the design is able to match up to 512 different patterns at 3.2 Gbps rate.
Development of the SCAMPI adapter firmware was split into two phases. First, the 1 Gbps version of the adapter was developed. Detailed information about this version is available in the 2004 research report.
The second phase addressed the development of the 10 Gbps monitoring adapter. In order to increase the available bandwidth, the input data stream is split into several data paths. The input buffers were completely redesigned and some of the existing components also had to be replicated (HFE, FIFO). The overall design was partitioned between the COMBO6 card and the add-on COMBO-2XFP interface card. Compared to the first version, more powerful modules for CRC computation were utilised, internal system of communication protocols was generalised and many others improvements were adopted. Architecture of the 10 Gbps monitoring adapter is shown in Figure 4.10.
The adapter functionality can be summarised in the following characteristics:
- Input packets arrive through the standard XGMMI interface at 10 Gbps rate. In the Input Buffer component (IBUF), packet checksums are verified and the data stream is split into four 2.5 Gbps streams. In addition, a precise time stamp generated by the Time Stamp Unit (TSU) is attached to each incoming packet.
- In the next step, input packets are parsed by four dedicated HFE processors with a task-specific instruction set. The output of HFE is a data structure called Unified Header (UH) that is passed to the Look-Up Processor (LUP) for classification.
- During the classification process, packets are divided into several groups according to the requirements of user applications. In order to accelerate the classification process, LUP uses fast associative CAM memory.
- Based on the result of packet classification, the Statistical Unit (STU) updates corresponding statistical data. Optionally, data flows specified by the application can be sampled using the Sampling Unit (SAU). Finally, the Payload Checker Unit (PCK) may search payload of certain packets for occurrence of specified patterns.
- The packets selected for detailed inspection in software are sent from buffers (PFIFO) through the Dispatcher component (DISP) to the Output Buffer (OBUF). There, the packets wait till they are transferred to the host computer memory via the PCI bus.
During the year 2005, the second phase of the firmware was fully implemented and tested with real backbone traffic on the 10 Gbps link between Prague and Brno. A successful demonstration of the monitoring adapter took place during the final review of the SCAMPI project in Brno.
4.4.2 System Software
System software for the SCAMPI adapter consists of two parts:
- low level drivers for communication with the COMBO6 card
- software tools for transforming the user-specified configuration into the form required by the firmware implementation
Low level drivers provide access to the internal firmware registers, memories and counters. Software tools for preparing LUP configuration developed within the Liberouter project were adapted for the SCAMPI adapter. In particular, support for collecting statistical data (STU), sampling (SAU) and pattern matching (PCK) was added.
User applications communicate with the monitoring system through an application programming interface called MAPI (Monitoring API). MAPI uses the Scampidump library to upload the filter specification to the card. Scampidump accepts rules expressed in a subset of the standard BPF (Berkeley Packet Filter) syntax. The rules are analysed, compiled into the form of a nanoprogram for LUP, and uploaded to the COMBO6 card.
4.5 IDS Probe
The aim of the IDS project is the development of a network intrusion detection (NIDS) device - an integrated software/hardware tool capable of detecting unauthorised access to computer systems or networks and malicious network traffic such as viruses, trojan horses and worms. NIDS device combines packet classification and payload scanning. The latter operation is the bottleneck of the existing software-only NIDS systems such as Snort. Their throughput depends on processor type and number of strings and regular expressions to be searched but in general they cannot cope with traffic rates beyond 600 Mbps, which is not enough for modern multi-gigabit networks.
Hardware acceleration of string matching is a subject of an intensive research. Several approaches to string and regular expression matching utilising the FPGA technology are being investigated. The most promising approach is based on nondeterministic finite automata (NFA), see e.g., [ClS04]: string or regular expression patterns are translated into a NFA, which is further transformed into an equivalent hardware representation, see Figure 4.11.
By using an NFA that accepts 8 bits in one clock cycle, we are able to achieve throughput 800 Mbps with firmware running at the 100 MHz clock frequency. The throughput can be further increased by utilising so-called Extended NFA (ENFA) that accept multiple characters in one clock cycle. In this case, throughput between 1 and 10 Gbps can be reached. Going beyond 10 Gbps is currently impossible due to insufficient FPGA capacity, so we must look for optimisations allowing us to deal with the entire Snort rule set at high network speeds.
Before making a decision about the NIDS system architecture, we first collected statistics about the Snort rule set. The results indicated clearly that our future research should concentrate on hardware optimisation of ENFA with the goal of decreasing the demands on chip resources. Progress in this area would help us to represent a larger set of Snort rules in a single FPGA and, at the same time, increase the NIDS throughput. We started to work on a first NIDS prototype that will use the new COMBO6X and COMBO-4SFPRO cards.
4.6 Packet Generator
The aim of the Packet Generator project (PAGEN) is to create a powerful packet generator equipped with a precise time stamp unit. The proposed architecture is able to generate packet headers and data according to user specifications and send them at times that are either given explicitly or picked randomly from a pre-defined probability distribution. Time stamps are generated every 32 ns. The projected throughput of the device is 10 Gbps. PAGEN is primarily oriented on COMBO6X and COMBO6E cards that use high-performance buses: PCI-X and PCI Express. The second part of the design generating the precise time stamps is placed on the COMBO-PTM card that was developed for the SCAMPI project.
PAGEN was designed so that it can also serve for analysing throughput between two given network nodes. In Figure 4.12, several time-synchronised servers equipped with PAGEN are deployed in all network nodes and connected into a ring. A node requesting throughput measurement (the initiator) generates a special packet with an initial time stamp. Every node along the loop appends its unique identifier and two time stamps specifying the receive and transmit times. The initiator, after receiving back the probe packet, is then able to perform throughput analysis.
The Packet Generator project is divided into two phases. In the first phase, the firmware will emit precisely timed Ethernet frames generated in software at rates of up to 1 Gbps. The architecture is shown in Figure 4.13.
The purpose of individual components is as follows:
- Packet Memory (PM)
- provides storage for frames generated by software and filled asynchronously by software applications via the PCI bus.
- Time Stamp Unit (TSU)
- generates precise 64-bit time stamps every 32 ns. Its main part is placed on the COMBO-PTM card equipped with a precise crystal (2 ppm).
- Packet Sending Scheduler (PSS)
- used to schedule frame transmission according to a specification provided by the application.
- Packet-Sending Controller (PSC)
- assembles the frame by adding the preamble, starting delimiter, ending delimiter and checksum and transmits it to the network.
Software for the first phase is designed as a scripting language controlling the generation of frame data, PM storage and frame transmission.
The aim of the second phase is to develop a full-featured packet generator and editor with the throughput up to 10 Gbps. The firmware will be divided into two parts. The generator part will be able to generate L2/L3/L4 headers and frame data according to user-specified rules (e.g., randomly generated values with specified distribution, data samples etc.). As a generator and editor engine, an effective high-speed stream processor will be used. The second (output) part will transmit packets with inter-packet intervals picked from a given probability distribution (uniform, Gaussian, exponential etc.). The firmware architecture is shown in Figure 4.14.
Software for the second phase is also designed as a scripting language. It will describe the rules (so-called templates) for the frame generation and transmission. When executing the script, the software will select an appropriate set of nanoprograms for the stream processor mentioned above and generate valid templates. As a front-end to the scripting language, we are considering a user-friendly application that would allow to generate the scripts in an easy way.
The PAGEN project is still in an early stage of development. The specification has been done, the Time Stamp Unit is fully implemented and the stream processor architecture is under development.
Footnotes:
|
|
contents |
next
|
![[Figure]](firmware.gif)
![[Figure]](fw_nific.gif)
![[Figure]](fw_liberouter.gif)
![[Figure]](scampi_10Gbps.gif)