9   MetaCentrum

Grids - large scale distributed systems of computers, data repositories and other equipment interconnected via computer networks - are becoming an indispensable part of the global research and development infrastructure. Management and further development of the Grid infrastructure in the Czech Republic is the main goal of the MetaCentrum activity. This activity thus provides not only the necessary base for participation in international grid-related activities and projects but also provides an essential prerequisite for the development of science in general. The MetaCentrum activities are closely coordinated both with other activities of the CESNET research plan, especially in the areas of security (development and extensive use of PKI and Certification Authority) and collaborative environments, and also with international activities that work on Grids. CESNET intensively participates in the EU EGEE project (see Chapter 15). MetaCentrum also intensively collaborates with other projects that either use the grid infrastructure or develop it, such as MediGRID.

MetaCentrum activities fall into the following four areas:

Within the MetaCentrum activity we pursued all the directions mentioned above. However, the main emphasis has been put on the first two, i.e., the Grid operations and user support. For the larger part of 2005, the development of the security infrastructure was addressed by an independent project of the CESNET Development Fund - Hardware Tokens (with Masaryk University being the main investigator). MetaCentrum started to use the results of this project since October 2005. MetaCentrum research activities were closely coordinated with the research within the EU network of excellence CoreGRID, where Masaryk University is also involved as a partner.

9.1   Grid Operations

The Grid operations group is responsible for the management and further development of the Grid infrastructure. This infrastructure consists of computational clusters, data depots, and a tape archiver. The MetaCentrum management group works in a close cooperation with the management groups from the individual nodes (at University of West Bohemia, Charles University, and Masaryk University) and thus guarantees full and transparent interconnection of both locally and globally managed computational and storage resources.

All computational and storage resources of MetaCentrum are located in the following four sites (all clusters use dual CPU nodes with Intel Pentium/Xeon or AMD Opteron):

During the year 2005, the oldest computational systems were decommissioned. The SGI machines with MIPS processors (with the exception of the Mat computer at Charles University) and the Pasifae computer at the University of West Bohemia are no more available for ordinary users. Their efficiency and computing power was below end-users' expectations and the support by their vendors became either prohibitively expensive or not available at all.

The computational and storage capacity has not been significantly improved in 2005. We used the equipment budget for the planned update of the backup system. After an initial market survey, we decided to replace the existing backup and archive system - a tape library with on-line uncompressed capacity of 12 TB - whose capacity was no more sufficient, by a new tape library based on the LTO-3 technology, where each tape has an uncompressed capacity of 60 GB. The selected technology offers not only large storage capacity but also provides an investment protection, as it is a new technology whose market availability is guaranteed for many years to come. The tape library NEO8000 from Overland Storage with 500 tape positions won the public tender. The price offered during the tender allowed us to buy two equivalent libraries with a total on-line capacity of 400 TB uncompressed.

Each library has four Hewlett-Packard Ultrium 3 tape drives and one robotic mechanism for tape exchange. A backup server (dual CPU AMD Opteron with 4 GB RAM) is attached to each library through three SCSI Ultra320 channels. One library is located at the University of West Bohemia and the second at Masaryk University. A combination of EMC Legato NetWorker with home-developed software (the replacement of the Autochanger module, see below) manages both libraries. The server at the University of West Bohemia with an 8 TB disk array EasySTOR 1606PSA serves as an auxiliary cache that allows to use the tape library with the maximum transfer speed even in the case when the data delivery over the network is uneven (the lifetime of the library drives and tapes degrades when the tapes must be often stopped and restarted due to the slow data delivery; it also has a negative impact on the total available tape capacity). A similar disk array will be attached to the second tape library in Brno during the year 2006.

We developed a software replacement for the Legato NetWorker Autochanger module that is necessary for running the tape library in the MetaCentrum environment. Own development as an alternative to purchasing the module was chosen both for economic reasons (the price of this module is around 1.2 million crowns for one library) and also in order to have a more flexible solution capable of adding specific extension for media management. Such extensions are very hard or even impossible to integrate with the Autochanger module. In 2006, we plan to develop this software solution further by including a new function for monitoring the state of the tape drivers and media (failures, usage history etc.).

The existing distributed backup environment provides a robust and fault-tolerant solution that could survive even "catastrophic" events such as a complete crash of one of the MetaCentrum sites. Despite the fact that the tender started early in the second quarter of 2005, the library has been delivered as late as December, due to the procedures enforced by law. The routine deployment is planned for the year 2006, so far only experimental checks of the whole system have been performed.

Connection of the Brno MetaCentrum site to the CzechLight high-speed optical network, which is being built as part of the Optical Networks activity, was one of the steps in building the national Grid infrastructure. Specifically, during the first half of 2005 we purchased a Cisco Catalyst 6506 switch equipped with 24 Gigabit Ethernet ports, three 10GE LAN PHY ports (to connect local computers) and one 10GE WAN PHY port (experimental extension card on loan from Cisco, used to connect the wide area network). The switch was connected to the 10 Gbps line to Prague through which MetaCentrum is directly connected to the international high-speed networks and projects, GLIF in particular. We also purchased one Chelsio T210 card with hardware-accelerated TCP stream processing on the 10GE interface. All the equipment described above was extensively used during the preparation and realisation of demonstrations at the iGrid 2005 conference in San Diego and at other similar events that are discussed in more detail in Chapter 10. The same high-speed line will be used in 2006 for experiments with high-speed data delivery (backup) between the two tape libraries.

Other activities of the Grid operations group were:

The MetaCentrum operations group also maintains and further develops the Perun system that was inherited from the previous research plan of the CESNET association. This system is used to manage information about end users and also about certain Grid components. It simplifies considerably the administrative burden for both system administrators and end users (e.g., through minimising errors in personal data). During the year 2005, we extended this system by new components that allow easier (more "intelligent") communication with end users, new authentication features (use of end user certificates) and virtual organisation support. Specifically, the new development of the Perun system focused on the following areas:

In 2005 we also continued the development of Grid infrastructure monitoring tools. They were gradually deployed in the production MetaCentrum environment. The complex character of these tasks required involvement of scientists from practically all the MetaCentrum groups. Together with an MU Faculty of Informatics student working on his bachelor thesis entitled MetaCentrum Monitoring Service, we proposed and implemented an extension of the Ganglia system to suit the MetaCentrum requirements. This includes support for modularised addition of new sensors, a possibility to define stored data formats based on their type and also simple notification support for preconfigured conditions (situation, event, etc.). The modified Ganglia system is currently deployed on all MetaCentrum nodes and its web interface is presented at the URL http://lindir.ics.muni.cz/ganglia. The information provided by the Ganglia system is also accessible through the MetaCentrum portal.

The Ganglia system regularly sends all measured values via multicast to other nodes of the cluster. This way, reliability of the whole system is guaranteed even in case of a single node failure. Selected nodes (usually the front-ends) store the received values in a database (RRD, text logs) and make this information available through web interfaces that could thus also provide long term deployment statistics.

Information about the current state of clusters is also available through the PBSPro system, as it also runs a monitoring process on each cluster node. The PBSPro is primarily used as a scheduling system, but since it also works with up-to-date information about nodes, disc usage, free disc capacity and other characteristics, these could be made available to the end users. The main difference between the Ganglia and PBSPro monitoring systems lies in the fact that Ganglia provides automatically the last-measured value while PBSPro performs measurements on demand and the value is part of the answer. PBSPro is thus capable of providing not only really up-to-date values, but also data that are unavailable through the Ganglia system, such end user quotas on a particular machine.

As the notification support in the Ganglia system is not sufficient for a fully distributed Grid management (Ganglia's primary purpose is to provide monitoring information about a cluster, not a whole Grid), we are working on the integration of the Ganglia system with the Nagios notification tool. After this work is finished, we will be able to couple Ganglia with locally run Nagios instances at individual MetaCentrum sites. This will lead to a faster reaction to failures and unexpected states.

9.2   Security

The MetaCentrum Security group is responsible for the development of the MetaCentrum security infrastructure. This infrastructure is based on the Kerberos system, where a token, generated before the first access to the MetaCentrum resources or during the first authentication, is used to prove the end user electronic identity. While the Kerberos system is used as the only internal MetaCentrum security protocol, end users can also use a login/password pair for their primary authentication, e.g., to the portal or when accessing individual cluster nodes via ssh. OTP (One Time Password) is also available, but it is used primarily by system administrators rather than end users.

The consequence of having multiple supported authenticated mechanisms is the violation of the SSO (Single Sign-On) principle that allows to prove the identity only once, typically during the first access to some MetaCentrum resource, and uses the electronic identity generated during this first access when the same user later accesses other resources (within a defined time period). Another deficiency of the MetaCentrum security infrastructure is that the PKI certificates are used in an inconsistent way. This issue must be resolved, for user certificates is the most common (and usually the only supported) authentication method in large-scale international Grids. In 2005 we focused on a consistent SSO authentication environment that is still internally based on the Kerberos system as before but requires authentication either via user certificates (the preferred solution) or via a previously generated TGT Kerberos ticket. We expect that all users will ultimately access the MetaCentrum resources with a pre-generated electronic identity that will be supported by all MetaCentrum subsystems.

In accord with this plan, we removed all inconsistencies in the current implementation and gradually migrated to a unified authentication interface for all the MetaCentrum services. In the second half of 2005 we focused on the integration of smart cards and USB tokens to the MetaCentrum environment. We used the results of the project Universal Authentication through hardware tokens. This project was supported by the CESNET Development Fund and led by Masaryk University, with participation of several other institutions also involved in the MetaCentrum activities. The MetaCentrum production environment has been extended to fully support PKI, whereas in 2004 PKI has been used only as a proof of concept for selected administrative tasks. We started to organise training for MetaCentrum users, where we provide them with USB tokens and teach how to generate and correctly register their private certificates. We closely collaborate with the CESNET Certification Authority. A new Registration Authority (RA) was established at Masaryk University. Apart from serving MetaCentrum users, this RA is also used during the training events mentioned above.

We also worked on enhanced cooperation between Grid security architectures based on a Kerberos and PKI protocols. We fully integrated authentication based on user certificates or their proxies with the MetaCentrum Kerberos-based environment. All the implemented solutions are available on several platforms, at least on MS Windows and Linux operating systems. In collaboration with the Grid management group we extended the Perun system to support translation between individual end-user identities independently of the authentication protocol used. We extended the technology of hardware tokens by including support for proxy certificates generation. For the MS Windows environment, we modified the Globus libraries and PuTTY and WinSCP clients to work directly with certificates stored on hardware tokens. All the developed software is available under an open source license through the MetaCentrum portal (we are also preparing a Travelkit variant known from the distribution of the Kerberos system).

We also modified the access methods to the MetaCentrum portal to support direct authentication via end-user certificates. The major component of this modification is a new Kerberos support module that we published on SourceForge. It is currently the most widely used Kerberos authentication module for the web environment.

Near the end of 2005 we focused our efforts on mobile users. We aim at providing them unrestricted Internet access even behind the most restrictive firewalls. We installed an OpenVPN server with PKI-based authentication and clients supporting directly the hardware tokens.

During 2005, the security group initiated a close collaboration with other groups working on security of distributed network environments. As a follow-up to the hardware token activities we started to collaborate with groups at the Faculty of Informatics at MU and Faculty of Information technology at VUT Brno that develop new hardware tokens. Together with the CESNET AAI activity, we work towards creating a unified national AAI infrastructure that will be shared between the classical network and Grid environments.

9.3   User Support

The MetaCentrum user support group is responsible for communication between end users and MetaCentrum administrators. It is also the primary contact for users having problems or requests. As end users are dispersed over the whole country (sometimes even abroad), we use electronic tools for providing support, such as the web portal, request tracking system and e-mail, in exceptional cases also telephone.

The MetaCentrum portal underwent radical changes already in the year 2004. During 2005 we focused on developing it further and adding new features. The portal has a public and private (authenticated) part, the latter containing web pages with specific support information for MetaCentrum system administrators. We updated the portal home page by adding fast links to key parts of the portal and related projects. We also improved the English part, where the users can, for example, enter their data and generate the English version of the application form in the PDF format. News and information about failures are now generated dynamically according to users' preferences and also available through the RSS channel. These news and problem announcements also have an explicit expiration time so that they are automatically removed and archived. All news are available in the Czech and English version.

In cooperation with the security group we enhanced the portal to accept authentication by means of X.509 certificates. In order to support wide deployment of hardware tokens and increase the general MetaCentrum security, we decided to gradually migrate to authentication methods that do not require an explicit login and a password. We prefer user certificates that are stored in hardware tokens. Three new actions in the My account section of the MetaCentrum portal - change of command line shell, modification of quotas, change and sending a priority request to the request tracking system - now require authentication with the user certificate. We will keep adding more services such as access to the full history of a request in the RT system and, on the other hand, stopping services relying on the simple login-password authentication. Also, we would like to streamline access to MetaCentrum for new users in that applicants with a valid certificate from the CESNET Certification Authority will not have to send the paper application form but only register their valid certificate.

On the MetaCentrum portal, we have also been continuously adding or updating documentation describing the individual modules and application software available on the MetaCentrum resources. Specific attention has been paid to providing English versions of the documentation.

A new portal section - USB tokens - now presents all information necessary for using the hardware tokens in the MetaCentrum environment covering topics such as token initialisation, certificate generation and upload to the token, import and use of certificates in the Mozilla, Firefox and MS Internet Explorer browsers. Included is also information about using USB tokens in the Linux and MS Windows operating systems as well as interaction with Globus GSI applications. The necessary software is also available in the same section. Apart from that, we organised three user trainings complementing the token distribution.

We created a DTD schema for the documentation of program modules. It can be used for generating both the documentation intended for the portal directly through the portal and on-line help for accessing individual modules.

While the request tracking (RT) system is managed by the MetaCentrum grid management group, the user support group is responsible for sorting and assigning individual requests that cannot be immediately solved. The group also takes care of escalating the tickets (the corresponding statistics are available on the MetaCentrum management pages). We also started the integration of the RT system into the unified SSO environment that uses certificates as the primary authentication protocol.

The user support group also organised a workshop with representatives of the John von Neumann Institute of Computing in Juelich (Germany). Among other topics, they demonstrated how MetaCentrum users could get access to the high performance computing resources of the supercomputing centre Juelich.

9.4   Other Research Activities

Apart form the research and development activities mentioned in the previous sections, in 2005 the MetaCentrum team was also engaged in the development of Grid infrastructure monitoring systems.

In the first half of 2005 we worked on an architecture for decentralised state monitoring of grid resources with a decentralised robust storage of monitoring results. Our initial hypothesis was that classical monitoring initiated form a single central site could not provide an appropriate view of the actual state of the grid infrastructure. The experience with matrix tests (e.g., pairwise connectivity and data transfers between the machines) had shown that centrally collected information sometimes does not provide enough evidence as to whether individual Grid nodes will work together or not. The firewall setup is one of the important factors, as its configuration is often static and does not support dynamic requirements and changes in the Grid environment, in particular after new machines have been added.

We proposed a new monitoring architecture of probe programs - called worms - that autonomously traverse through the grid nodes and perform predefined tests (from the viewpoint of the grid infrastructure they behave as standard grid applications). The worms are managed and controlled and their results collected by another program layer consisting of so-called shepherds. The shepherds are organised in a peer-to-peer structure with redundant data storage mechanisms. The shepherds are also able to take over the worms from a failed peer.

The tests performed by worms could be divided into three categories: single-, two-, and three-point tests. Single-point tests check the availability of a particular service or configuration of a specific machine they are running on. For example, configuration of all accepted certification authorities may be verified.

Two-point tests check availability of services running on other machines from a particular machine where the worm is currently running. An example is a check of whether the gsissh connection can be established. Finally, three-point tests check whether a pair of remote machines can be used for a certain purpose such as gridFTP transfer. The worm receives candidates for two- and three-point tests from its shepherd or selects them from a predefined set.

The worms check the grid infrastructure not only by running explicit tests, but also by utilising the job scheduling infrastructure. Any observed problem is immediately recorded and reported by the corresponding shepherd.

The core worms are simple programs that could run on any machine regardless of its architecture (from nodes of common clusters through the SMP computers up to the vector supercomputers). The actual tests are assigned by shepherds as test modules. A failure in a particular test (including the inability to run it) does not have any negative impact on the core worm behaviour. The worms can run under any identity (administrator, service, a particular end user etc.) so even problems associated with a particular identity are easily detected.

The worms-based tests could be combined with passive monitoring results (reports from the grid infrastructure components as used during normal operations). This way, the monitoring overhead of the infrastructure is minimised. For example, a two-point data transfer test is unnecessary if it is known that recently real users successfully transferred their data between those two nodes.

In the second half of 2005 we focused on the development of the C-GMA architecture, designed as a general protocol to support interoperability of different GMA implementations. C-GMA stands for Capability-based Grid Monitoring Architecture and is based on an extension of the producer/consumer model of the standard Grid Monitoring Architecture. This extension uses metadata describing data attributes and component capabilities. Each data stream (or, more precisely, each event) transferred through the monitoring infrastructure has an associated attribute that keeps information about the requirements on the components of the monitoring infrastructure. The data stream (event) is transferred using a particular infrastructure node only if the capabilities of this node match the attributes of the data stream (event). A unique event that must not be lost thus cannot be transferred through unreliable nodes, so only nodes guaranteeing persistence are eligible. Every component publishes its capabilities. The mediator is responsible for finding matching pairs of nodes/components whose capabilities match the data stream attributes. Using specific attributes, two or more specific GMA implementations could be interconnected. Intermediary components that are able to transfer data between otherwise independent GMA implementations can be also defined.

We proposed and published a C-GMA model with three levels of abstraction. We also have a prototype mediator implementation that uses a specific metadata description form based on the ClassAd model. This implementation served as a proof that simultaneous comparison of three ClassAds - necessary for complex selection of whole groups of mutually collaborating monitoring infrastructure components - is functional.

9.5   Summary

The coordinated efforts of individual MetaCentrum groups advanced further the national grid infrastructure of the Czech Republic. The requirement to guarantee production-level operation generated new research, development, and implementation activities that also contributed to a better and more efficient use of MetaCentrum resources, considerable improvements in interaction with end users and also led to a gradual deployment of new technologies.

A number of activities resulted from the decision to use PKI authentication using USB tokens. We modified the MetaCentrum portal to support access via the X.509 certificates and created new sections available only to users authenticated with a X.509 certificate. The organised training events helped to get a closer contact with end users. The experiences were used for designing the new portal version and structuring the available user documentation.

English versions of web pages were added to the portal, new navigation elements improved orientation within the portal. The year 2005 also marked a consistent and extensive use of the RT request tracking system dealing with both user requirements and internal MetaCentrum problems (failures, software errors etc.). More than 150 closed tickets is a clear indication that this service has been intensively used. The use of the RT system also clearly demonstrated a need for a new internal policy and workflow defining the steps of ticket processing (assignment, escalation procedures, checks, availability of the ticket history etc.). Also, several missing features of the RT system itself were identified, such as fast handling of certain requests or issuing new tickets generated by an administrator instead of bundling related problems together with the original ticket that must then remain open even though the original problem has been already solved. Integration of the RT interface into a unified authentication MetaCentrum environment is a new task for the year 2006.

During the 2005 we also successfully completed a tender for a new tape library. The offered price allowed us to buy two independent tape libraries, creating thus a robust distributed backup and archiving environment for MetaCentrum with a very high backup capacity.

In 2005 we also clarified the overlap and borderline between the research plan of CESNET and the research plan Parallel and distributed systems of the Faculty of Informatics and Institute of Computer Science at the Masaryk University. The research on grid resource planning and scheduling will henceforth be addressed by the Masaryk University that will also focus on special applications for the Grid environment (currently complex image processing). This will also include the development of tools for migrating applications to the grid. MetaCentrum research activities will remain focused on the security area and further development of the grid monitoring architectures.

previous
contents
next
metacentrum elearning liberouter live shows videoserver eduroam