Jan Krüger; Bielefeld University, Bielefeld
Infrastructure-as-a-service (IaaS) is a model of cloud computing in which a virtualized IT infrastructure is made available to users via the Internet. Together with Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS), IaaS is one of three general cloud service models. Within the IaaS model users manage the operating system, middleware, applications and data to take advantage of compute and storage resources. The IaaS provider is responsible for providing virtualization, storage, network, and servers. As a result, users do not need a local data center, avoiding the administrative overhead including maintenance and updating of hardware and software components. The user controls the infrastructure via an application programming interface (API) or a graphical user interface (dashboard). IaaS enables easy scaling and updating as well as the addition of resources as required.
Making use of available IaaS resources turns out to be a challenge for many users not familiar with cloud environments. While launching tens or even hundreds of virtual machines (VMs) is easy using the API or dashboard, a whole software application stack needs to be deployed on these VMs to fully utilize the resources. Many bioinformatics workflows use classical high-performance computing (HPC) environments with scheduling systems to distribute their compute jobs on HPC clusters. These kinds of environments need to be set up in the cloud to easily move the existing workflows to cloud environments.
BiBiGrid [1] is an open source tool for an easy cluster setup inside a cloud environment. BiBiGrid is independent of the operating system and cloud provider. Currently it supports backend implementations for Amazon (AWS), Google (Google Compute), Microsoft (Azure) and OpenStack.
Starting a cluster requires a valid configuration file and credentials of the cloud provider.
The configuration file specifies the composition of the requested cluster. During resource instantiation BiBiGrid configures the network, local and network volumes, (network) file systems and deploys the software for immediate use of the started cluster. When using pre-installed images a fully configured and ready to use cluster is available within a few minutes.
BiBiGrid uses Ansible to configure standard Ubuntu as well as Debian cloud images. Depending on your configuration BiBiGrid can set up an HPC cluster for grid computing (Slurm Workload Manager), a shared file system (NFS on local discs and attached volumes), a cloud IDE for writing, running and debugging (Theia Web IDE) code, and a monitoring system (Zabbix). Custom Ansible scripts can be used to further customize the cluster after first initialization.