9 METACentre
The primary focus of the METACentre activity is a development and production support of the distributed infrastructure - the national Grid which connects computing and data resources and provides a solid foundation for advanced applications using the computer network.
The METACentre activity is closely coordinated with work on the EU 6th Framework Program project EGEE (see page XY). This close collaboration of both groups is also a strong guarantee that results and achievements of the international project are immediately deployed within national infrastructure and at the same time it provides a forum where the METACentre results can be presented and may be used in the international environment.
For the year 2004, the METACentre activities were split into the following areas:
- Production operation of the METACentre infrastructure
- Establishing user support, including a full reconstruction of the METACentre portal
- Research and development of new grid monitoring infrastructure
- Security in the Grid environment
Research and development in the field of resource scheduling has not been a part of METACentre activities mostly due to the budget restrictions. In the long term we expect to use the results of research (conducted at the Masaryk University in Brno) as a part of the research plan of the Faculty of Informatics and Institute of Computer Science.
9.1 Production operation
Clusters of personal computers are the main METACentre computing facility. We take care of three sites - at the University of West Bohemia in Pilsen, in the CESNET premises in Prague and at the Masaryk University in Brno - where some 262 CPUs have been operating by the end of 2004. All these clusters use Intel Pentium CPUs (ranging from Pentium III at 700 MHz to 3 GHz Pentium 4 Xeon dual core processors) with the Debian Linux operating system. Nodes within some clusters are also connected through the high-capacity low-latency Myrinet network which provides up to 2 Gbps duplex transmission speed. The Myrinet-based clusters are used for computations with very high requirements on speed, latency and throughput of the network connecting individual cluster nodes. However, users can access alternate computing environments, most notably the 64-bit systems by the IBM (based on the Power4+ processors) and by the AMD (based on the Opteron processor). Both these 64-bit systems run under the SuSE Linux operating system to provide an environment as similar as possible to the one used on more conventional 32-bit clusters.
In cooperation with the University of West Bohemia, Charles University and Masaryk University, the METACentre also operates high-end systems by the SGI and HP. The SGI servers provide almost 100 MIPS CPUs in Brno and Prague, the HP/Compaq AlphaServer in Pilsen is equipped with EV7 processors. The METACentre also operates a high-capacity tape library providing 12 TB of uncompressed space. This library is used to backup all METACentre sites, as the CESNET2 network throughput is sufficient to transmit even the high volume backups. The service is also offered to universities and other academic institutions. The continuous backup of the CESNET videoarchive is an example of the extended service - more than 1.5 TB of digital video material is currently archived. The tape library is served by the NetWorker system by Legato (or rather IBM, which has bought Legato recently), backups are kept for three months.
More detailed information about METACentre hardware and software is available at the METACentre portal, meta.cesnet.cz.
Despite budgetary restrictions, the METACentre computing capacity has been upgraded during 2004. For the purchase, we considered both the 32- and 64-bit architectures, but after evaluating all the proposals and taking into account the current state of compilers, development tools and environments for 64-bit architecture, we decided to stay with the proved IA-32 architecture. We purchased a new cluster with 70 Intel Pentium 4 Xeon processors with 1 MB secondary caches and working at 3 GHz frequency. The cluster uses dual CPU nodes, each with 2 GB of RAM and one 80 GB ATA disk. The cluster is currently located in the CESNET premises in Prague where it should gradually replace the dated SGI cluster with Pentium III processors.
However, the cluster purchase and installation has been just one of the activities related to the METACentre infrastructure operations. The group also participated in the following activities:
- Deployment of process accounting on all cluster nodes. This service provides detailed data about use of all programs installed in the METACentre. The statistics are available on the METACentre portal, and will be also gradually used for scheduling decisions, to provide a fairer resource sharing among METACentre users.
- Software maintenance, specifically the maintenance of scheduling systems. During 2004 the PBSpro scheduling system license has been extended to cover all the METACentre nodes. This unification of scheduling systems simplifies the situation both for system administrators and for end users - they will not have to become acquainted with different job management and submission systems. The narrowed focus to just one scheduling and batch queue management systems allowed to get better knowledge of its implementation and to be able to modify it to suit better to our purposes. This led, e.g., to our identifying and correcting the cause of unexpected end user job aborts. During the year, all the nodes were also upgraded to a new version of operating system used.
- Cluster management, including tools for cluster monitoring. Nowadays, end users can monitor the state of their jobs through a portal where all the commonly needed information is available: cluster node state, state of individual queues, state of individual jobs (see http://meta.cesnet.cz/cs/state/resources/index.html and Figure.).
- The most commonly used applications were ported to the environment with non IA-32 architecture. Results of this work will be immediately available outside the METACentre, e.g., users of the National Centre for Biomolecular Research will use the AMD Opteron ported versions on their new cluster, purchased near the end of the year and consisting of 8 dual 64-bit AMD Opteron nodes.
The METACentre manages and further develops the Perun system which has been designed during the previous research period. The Perun system takes care of managing the information about users and some Grid components and extensively simplifies administrator work. The system has been enhanced to support the PKI authentication (the basic METACentre authentication system is based on the Kerberos). We extended the corresponding data schema, development of appropriate CLI (Command Line Interface) tools and we also made available a gridmap file service (generating the authorization information). The Perun system is built on top of the Oracle database which we upgraded to version 10. In the following period, the Perun system will be used outside the METACentre for managing our resources used within the EGEE project. We prepared a first independent distribution of the Perun system including an installation guide.
All user-visible applications were integrated into the METACentre portal.
In 2004 we designed and developed a completely new subsystem for notification handling. Various events in the database are watched and processed with a set of customizable scripts which evaluate them into notifications - currently email messages. The covered events are either parts of standard workflow, e.g., arrival of a request for an account (the responsible person is notified to perform required action as soon as possible), approving a registration or creating an account (the requesting user is informed), or abnormal conditions, e.g., repeated failure in communication with a managed computer. Currently the notification subsystem runs on top of the production database.
The Perun system has been presented also at the international workshop in Cracow in December (A. Křenek, Z. Sebestianová: Perun - Fault-Tolerant Management of Grid Resources, Cracow Grid Workshop 2004).
The Masaryk University in Brno successfully defended a project DiDaS (Distrubuted Data Depots) in the second half of 2004. This project resulted in a distributed data storage accessible through the IBP protocol (Internet Backplane Protocol). Some 15 TB of disk storage is available at 7 sites throughout the Czech Republic. All the data depots are directly connected to the CESNET2 network backbone (see didas.ics.muni.cz). As all the original project partners are CESNET members, usually directly involved in the METACentre activities, further development of the distributed data storage will become a part of the METACentre activities. We will take care of finding new use for the capacity which is currently available as a temporary data storage for data intensive computations (extensively used, e.g., during digital video transcoding). It is also used as a temporary storage for large data volumes (hundreds of GB and more), e.g., unpacked archives, large intermediate results, redundant copies of often used read-only data processed at many different clusters, etc.).
9.2 User support and portal
A new concept of presenting information about the METACentre and its activities has been created during 2004. The METACentre portal has been completely refurbished - transformed from the original version based on (undocumented) combination of static HTML pages, PHP and Perl scripts to a new, modern framework. Static content is managed with the OpenCMS system (Content Management System), interactive pages are built using the J2EE technology (Java 2 Enterprise Edition).
The portal is available at meta.cesnet.cz; it is gradually filled with contents in both the Czech and English language versions.
The portal is a gateway to new services through which we increase user friendliness when dealing with the METACentre. The public interactive part of the portal has been completely redone. It includes parts that deal with registration to the METACentre, access to end users' personal data stored in the system, management of application and activation of end user accounts at individual METACentre machines, and also one part that deals with submiting yearly reports. In collaboration with other groups within the METACentre activity, we also enhanced the interactive part of the portal with the status pages where end users have access to the information about state of the METACentre resources, including information about actual load on individual nodes; information from the PBSPro batch queuing system is presented in a concise graphical form.
Near the end of 2004 we also implemented a notification service to announce information about planned and unplanned failures and blackouts of all the METACentre data and computing resources and services.
To deal with user requests and to support better communication between end users and METACentre system administrators, we started to deploy a standardized request tracking system. We selected the RT system, as it is widely used in CESNET for tracking network-related requests and its expert support was available locally. We purchased primary and backup RT system servers shared with the management and operation of the CESNET2 network. We use RT system version 3 and we are building a local electronic (virtual) helpdesk which is shared with the EGEE support. This helpdesk will be fully integrated into the METACentre portal during 2005.
9.3 Grid Infrastructure Monitoring
Grid Infrastructure Monitoring is the major research activity of the METACentre. However, during 2004 the work has been still focused mostly on activities associated with supporting the Grid management and the actual research has been performed within the EGEE project.
Cluster node activity monitoring has been switched to the ganglia system, we focused mainly on its modifications and enhancements. We extended the set of sensors, but the using the ``old'' (classical) system, which is not adequate for production use. To create a better support, a bachelor degree thesis has been completed to specify new interfaces for adding sensors and also to register hooks into the central core of the ganglia system. This will allow storing the monitoring data to other formats than the RRD - the natively supported format. End users will be able to specify selection constraints for stored data, which could be sent to an SQL database (mainly the MySQL), to a socket or to a pipe. This will allow connecting the ganglia system directly with notification services that need a simple data input (here provided via the socket or pipe). The Ganglia web interface can be found at https://lindir.ics.muni.cz/ganglia, however, the access requires METACentre user authentication. We plan to integrate the ganglia outputs into the portal during 2005.
The actual research has been focused on the design and development of a new model for Grid monitoring architecture (GMA). Near the end of the year we presented (at the Cracow workshop) the Capability based Grid Monitoring Architecture (CGMA). This is a general framework where different monitoring infrastructure components can coexist and cooperate while staying relatively independent. This is both origin and requirements independent, i.e., the components can be implemented by various people or groups with no tight cross-synchronization, and the purpose they are supposed to serve could be quite different (this also includes possibility to add components that do not have interfaces introduced and used by previously introduced components). Requirements that are contradictory could be served by different components, but all components share a common framework which removes any unnecessary duplication within the infrastructure.
The principal idea of the CGMA is the introduction of data attributes and component capabilities. These are meta-descriptions of data handling constraints and rules on one side and meta-descriptions of capabilities provided by individual components. These meta-data are, together with other information about the data and components, stored in an enhanced registry which serves as a mediator between data requirements and component capabilities.
The data meta-descriptions - called attributes - are in fact constraints for data handling and use. Examples are ``cautious handling'' (use persistent elements only), ``secret'' (handling allowed only through the components that provide trust at the specified level), or ``priority'' (speeds up their processing, e.g., by sending first from any queue). Examples of the complementary meta-data descriptions of the components - the capabilities - are ``reliable'' (no data are lost), ``secure level X'' (trusted element at the specified level), or ``fast and volatile'' (data are transmitted with as little overhead as possible, however, they can become lost).
In order to describe the essential CGMA functionality, we use an analogy with the Quality of Service (QoS) concept in networking. The attributes of an event can be thought of as its QoS requirement, and the task of the infrastructure is finding an appropriate path for the event, using the desired QoS tags and the knowledge about component capabilities.
The research and development of the CGMA concept continues in 2005, including a prototype implementation for the EGEE project.
Integrating the monitoring and information services is another part of our research activities. The METACentre portal does not distinguish between these two services (i.e., no distinction between monitoring services and information services exists any more). The portal provides unified and uniform access to integrated information without distinguishing how they were collected.
Internally, the infrastructure of directory services is used for integrated access to information stored in different parts of the Grid infrastructure. During 2004 we designed and deployed a new infrastructure of LDAP servers. It is fully integrated with the Perun system and it guarantees an incremental, almost immediate propagation of changes from the Perun databases into the directory system.
We changed the way how data from Perun to LDAP are fed. Developed technology does not demand any changes on the Perun or other data source side. Perun had already a feature ensuring that after each data change, full table dump can be initiated but only if the predefined time period from last full dump passed. This feature ensures that changes are propagated mostly immediately but prevents too much frequent updates. We developed a method of translating a full dump into an incremental update on the side of the LDAP infrastructure. Each full dump is translated to LDIF data format and compared to current LDIF representation of LDAP server state. As a result, we get an incremental file in a form of LDIF update format which is simply executed by the common LDAP utility "ldapmodify".
In comparison to previous update mechanism (nightly full dump based rebuild of directory server content), we get nearly real-time change propagation, no server downtimes (the full rebuild required server to be stopped) and a possibility of using standard replication between the LDAP server replicas (only primary LDAP server is updated by our technology, all replicas can be configured to use standard LDAP replication flow originated on primary LDAP server). The replication is the basis of high availability of the LDAP system.
A second source of data to be presented in the METACentre directory services are data from other GRID services. The LDAP directory is here to allow unified access to basic information gathered by different services in the METACentre. Our partial caching idea is designed for this usage pattern. The LDAP server acts here as a gateway to a primary source of information. It is not only translating service specific data representation and access protocol to LDAP but it can also cache the data. In our design a way exists for the user how to specify if he needs really fresh data (with some time penalty) or if cached information is enough for his usage (getting the advantage of quick response). But this decision is not really free: the fresh data option is available only when posing simple query (returning one directory entry); users always get cached data when queries returning many entries have been asked.
Prototype implementation of partial caching LDAP gateway as an OpenLDAP backend was developed and we are evaluating it on an internal testbed. Implemented proof-of-concept service provides access to user quota and usage information acting as a gateway to the AFS filesystem and METACentre computing clusters home filesystem services (Linux NFS servers). We plan to move it into production use after evaluating the experience with the prototype use.
9.4 Security
For heterogenous distributed systems like the Grid, security plays an ever increasing role. This led to founding an independent Grid security group within the METACentre. This group is responsible for all security related actions. In the medium term we expect closer collaboration with other security related groups and teams already existing within the CESNET organization where we currently use directly only the services provided by the AAI group in relation to the Certification Authority activities. Authentication service interoperability, together with the study of available or new, under development only authorization services, has been the main actual focus of the security group in 2004.
The METACentre uses Kerberos as its basic authentication mechanism and protocol. However, end users are not restricted to Kerberos only when they make the first authentication to the system. We support smooth transition between different authentication services. Outside the Kerberos, the major supported authentication services are PKI with certificates, one-time password and hardware tokens. We implemented appropriate libraries and we are working on a universal cross-authentication service called credential wallet. Results of this work has been presented internationally and the security group within EGEE has expressed interest to use them. The one-time password infrastructure research led to a Diploma Thesis (at the Faculty of Informatics, Masaryk University, Brno) entitled ``Authentication infrastructure for one-time passwords''. We developed and tested a one-time password generator for mobile phones and PDAs, and attracted interest of the EduRoam group.
9.5 Summary
The METACentre continued developing the national Grid infrastructure in the Czech Republic. The main work has been focused on increasing the user friendliness of the whole system and to make easier the access to resources. This priority led to developing a new portal, redesigning and upgrading its content, unifying the batch queue management, simplifying the administration of end-user information and accounts, and to further research in secure access to the METACentre resources. The portal provides all the relevant information to end users: they can find there all information about individual resources and their state, and also about jobs. Thus the portal removes the need to manually track resources and/or jobs through the whole infrastructure (usually done through logging into the nodes directly). Very important has been also our close collaboration with the EGEE project, both in the development and re-engineering activities and also in the management and operation of the Grid - important part of METACentre resources became a part of the pan-European EGEE Grid.
The research and development activities were focused not only on the portal design and development but also on the area of Grid monitoring and information services. The major result here is the development of the Capability-based Grid Monitoring Architecture, CGMA.
In 2005 we will continue to further support all the above mentioned activities, continuously focusing on increasing user friendliness. We will also work more closely with other CESNET research and development groups. Most notably we have extensive cooperation plans with the security group and with the group working in the area of collaborative services. Both these areas will be also developed as part of the MediGRID project which has been accepted for funding starting on January 1st, 2005 within the Information Society funding program, governed by the Academy of Science of the Czech Republic.
New activity for year 2005 will be close collaboration with the CzechLight, leading to experimental use of optical networks. At the end of 2004 we already purchased a dual-processor AMD Opteron server and two 10 GE network interface card with an onboard TCP hardware accelerator from Chelsio. The server with one of the cards will be used for direct connection to the CzechLight lines and it will support research in the truly high-speed network protocols (i.e., transmission speed above 1 Gbps), using these protocols to support high end parallel computing (in collaboration with, among others, the Louisiana State University in the USA).
|
|
contents |
next
|