The intensive monitoring and research that is planned within DANUBIUS-RI over the coming decades will produce a great number of physical, chemical and biological “non-digital” samples.  These samples will produce a valuable resource and could provide an internationally-important environmental archive of samples to support future research.  However, this will only be possible if the samples from each Supersite are equivalent (i.e. they are sampled, processed, preserved and stored using common or equivalent methods

The Data Centre of the DANUBIUS-RI will be the central point of the e-Infrastructure providing all services, including distributed services, through a single entrance point. The most important goals for the e-Infrastructure are: operational continuity and reliability, secure environment and long-time operation without service interruption. The continuous and long-time availability for all services provided by the DANUBIUS-RI e-Infrastructure is the main reason to build-up a dedicated Data Centre. To achieve this goal, all components of the Data Centre should comply to highest industry standards

The Data Centre of the DANUBIUS-RI will be the central point of the ICT infrastructure and the single point of access for the users to all services. Main services should include: collect all gathered data from supersites and nodes involved or associated to the infrastructure, store the all the types of data, aggregate the data from local and distributed storages, offer the necessary computing power and storage space for modelling and simulation processes and provide search functionalities to access the data.

The role of the Data Centre

The Data Centre of DANUBIUS-RI e-Infrastructure (DCD-RI) will provide a bundle of services to the research and academic community, such as, but not limited to: collect and store all gathered primary data from other supersites, sites and nodes involved or associated to the infrastructure, aggregate the data in respect of different criteria, provide the necessary computing power and storage space for modelling the data, store and classify the results of modelling with the associated metadata, provide search capabilities in accordance with different criteria.

Besides the aforementioned roles, one of the main concerns of DCD-RI will be the operational continuity; it is necessary to build a reliable digital infrastructure for IT operations in order to minimize any chance of disruption for the services provided to the DANUBIUS consortium and academic communities. Another important concern is the security, thus the data centre has to offer proactive conditions to minimize the probability of security breaches, but also to facilitate reactive actions and incident handling, if any. Therefore, DCD-RI must fulfil the highest standards to ensure the integrity and functionality of computing and storage environment, this meaning the redundancy or power supplies (including emergency backup power generators), cooling systems, networking connection systems (internal and external), as well as fire protection and security access systems.

The basic functionalities of DCD-RI will be accomplished by servers and storage systems providing the services to the consortium, transparent to the users, using adequate software solutions. There are 2 main architectures to be established: IT infrastructure (software, hardware and networking) and facility/engineering infrastructure. The access to the DCD-RI resources has to be done in accordance with the DANUBIUS Data Policy using a unique portal.

DCD-RI

One of the important roles of IT infrastructure is to secure the access to the digital content in DCD-RI and the services provided. Therefore, a portal for access is necessary, with multiple functions, to separate access to general public data and access to confidential data and services. The portal will be the main entrance for the users, redirecting the user to the services and data assigned to his role/rights of access after he is authenticated and if authorized. The diagram of the general architecture and algorithms of the portal is shown in Figure.

The user authentication is based on a federative system of identity providers. The system provides a unified framework to allow the users access to different resources by federating the Identity Providers (IdPs) and Service Providers (SP). Thus, if the user’s partner institution owns IdP service, associated to the federation, the user will use the institutional account and password to be granted access, therefore no supplementary account is required to access DCD-RI. The authentication system of the portal asks the home institution IdP to validate the user identity. The system can be also used to authenticate users for access to other research infrastructures data.

Figure  Access diagram and the general sections of the data

DATA CENTRE

Access and connectivity

The DANUBIUS-RI project is a collaborative opportunity for researchers from the European community to study water and technology. Considering the distribution of researchers throughout Europe, we can discuss issues about their connectivity and the methods of accessing resources from a data centre located in another geographic region.

By using GEANT services – the pan-European research and education network that interconnects Europe’s National Research and Education Networks (NRENs), the collecting sites, processing sites and storage sites of useful data within the project can be interconnected. Depending on the degree of importance and needed security of the data transmitted between the sites, it is possible to use services provided by GEANT through a Layer 3 Virtual Private Network (L3VPN), which add security by isolating specific traffic between the sites from the usual IP traffic.
Regarding the methods of managed access of resources available in another data centre, it can be used the concept of federal authentication by implementing federated identity solutions to which partner institutions connect through an Identity provider (IdP). For the implementation of the federated authentication, eduGAIN, also a GEANT service, can be used. EduGAIN interconnects identity federations from Europe.

Accessing GEANT services is possible using facilities of NRENs that offer connectivity services for research and education community inside a country.

Computing capacities

At the European level, the computing capabilities have two major directions:

  • High Performance Computing (HPC) infrastructure. The HPC infrastructure is represented by Pan-European High Performance Computing infrastructure and services (PRACE) that gives academic and industry users free access to the supercomputing facility. Access to PRACE resources is based on peer review. The researcher sends the project to the technical committee to verify compliance with the eligibility requirements, which, in positive situations, makes a schedule for access to power computation;
  • Distributed Computing e-infrastructures – The project of Distributed Computing e-infrastructures is called European Grid Infrastructure (EGI) which provides access to high-throughput computing resources consisting of 300 data centres which host 650 K core processors, 285 PB online storage and 280 PB archive storage. Access to the EGI resources is based on membership in a community-supported research group such as ATLAS, ALICE, LHCb and others.

Data storage

Research projects can use the European e-infrastructure of integrated data services and resources (EUDAT) infrastructure to store vital information in a secure environment. The EUDAT it is a collaboration project consisting from multiple nodes that provide services for: upload and retrieval, identification and description, movement, replication and data integrity. Many research groups from Europe share storage infrastructure from data centres called service provides. This service provides can be only for one research domain (thematic service provider) or for multiple domains (generic service provider). Any researcher or research group can access the resources of EUDAT in conformity with the terms and conditions agreed between EUDAT and the relevant service provider.

The components:

– DANUBIUS-RI storage systems, there are many storages included: the storage for the main repository, the linked storages from DANUBIUS-RI members

  • the storage for the cloud system,
  • the storage for the high-performance cluster and the backup storage for all data and for the software.

– Cloud computing to provide the necessary computing resources to operate all services( the DANUBIUS-RI portal, the metadata registry, the necessary software for data ingestion, further processing and storage and to provide all operations on the data flow)

– High performance computing (HPC) services to be used by the modelling and analysis nodes.