0 Cloud
0 Cloud
ABSTRACT: Cloud computing is becoming an adoptable technology for many of the organizations with its dynamic
scalability and usage of virtualized resources as a service through the Internet. It is growing rapidly with applications in
almost any area, including academia. [1]Cloud computing is model for enabling a convenient, on- demand and network
access to a shared pool of configurable computing resources, (networks, servers, storage, applications, and services) that
can be rapidly provisioned and released with minimum management effort or service provider interaction. High
Performance Computing (HPC) enables scientists and researchers to solve complex problems that require many
computing capabilities. Cluster computing has become popular in academia and industry. Clusters of servers are
used for a variety of distributed applications like simulation, data analysis, web services, and so on.
Hence, one of the areas that need to be researched more is handling of large data sets in the cloud. La rge data set is one
of important characteristics of HPC applications. In this paper, we will discuss private cloud infrastructure
implementation for each university using OpenStack by considering HPC. Then, create a collaboration mechanism for
each University private clouds throughout the country using a federated cloud architecture.
I. INTRODUCTION
With the rapid development of processing and storage technologies and the success of the Internet, computing
resources have become cheaper, more powerful and more ubiquitously available than ever before. This
technological trend has enabled the realization of a new computing model called cloud computing, in which
resources (e.g., CPU and storage) are provided as general utilities that can be leased and released by users through the
Internet in an on-demand fashion. Now a days governments, Academic institutions, research centers, different
governmental and non-governmental institution are now adopting cloud computing as a solution for ever increasing IT
related problems and needs, for example plenty of academic institutions are tend to use Google’s email app as a
solution for their enterprise email system, another example now a day’s every individual is storing files on a cloud
storage like Google Drive, Dropbox, SurDoc etc. So, in one ways or another we are now using cloud offerings. At
present, cloud is giving services beyond common cloud service like SaaS, PaaS, and IaaS it is used for High
Performance Computing Infrastructure. Cloud computing presents a unique opportunity for batch processing and
analytics jobs that analyze terabytes of data and can take hours to finish.
Cloud technologies such as Google MapReduce, Google File System (GFS), Hadoop and Hadoop Distributed File
System (HDFS), Microsoft Dryad, and CGL-MapReduce adopt a more data-centered approach to parallel runtimes [2]
[3]. In these frameworks, the data is staged in data/compute nodes of clusters or large-scale data centers. The main goal
of this paper is to create a framework for Cloud base HPC infrastructure for Ethiopian Universities. The motivation of
this paper work laid on EthERNet [4] project, the project is aimed to build and deliver highly interconnected and high
performance networks for Universities and other Educational and Research Institutions in Ethiopia. More specifically,
EthERNet was aimed to build and deliver high performance networking that connected these institutions with each other
and similar institutions in the world, and by doing this to enable them to share educational resources and collaborate
both within Ethiopia and globally
Copyright to IJIRCCE www.ijircce.com 4124
ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
II.RELATED WORK
The Computational Intelligence Research Group (CIRG) at University of Pretoria, South Africa [5], they were doing
research on CI algorithms, but students face challenge that the problems they are trying to solve are not trivial. This
means that the search space for CI algorithms can become extremely big, resulting in very computationally expensive
workloads. To achieve statistical significance results, each students workload needs to include in order of thousands
experiment using different parameters, inputs, problem types, etc. each student’s work load could potentially take
day or weeks and for some extra cases it takes even months to compute on a single workstation running 24 hours 7
days. Students from CIRG attempt to solve their challenge by running their experiments on more than one workstation
simultaneously. This provide some improvement on throughput, scalability and failover but it has many problems on
scheduling and management.
At the end they come up to automate their problems with cloud computing, cloud computing describe both a
platform and a type of application. A cloud computing platform dynamically provisions, configures, reconfi gures and
de-provisions servers as needed. Servers in the cloud can be physical machines or virtual machines. A cloud is a pool of
virtualizes computer resources that can:
Host a variety of different workloads, including batch-style- back-end jobs and interactive, user-facing
applications
Allow workloads to be deployed and scaled-out quickly through the rapid provisioned of virtual or physical
machines
Support redundant, self-recovering, highly scalable programming models that allow workloads to recover
from many unavoidable hardware/software failures
Monitor resource use in real time to enable rebalancing of allocations when needed
So for these researches cloud computing simplifies management, scheduling and booking of computing resources easily.
Another technology they use to automate their research is create Grid application with Apache Hadoop [6], it is an open-
source framework for running parallel computing applications on large clusters of commodity hardware. Apache Hadoop
is based on the MapReduce algorithms. MapReduce [7] is a programming model that allows a large task to be broken
down (or mapped down) into multiple smaller tasks that can be processed as individual jobs. The reduce function
combines the output of all the smaller jobs in a specified manner to provide the output of the original large task.
The Apache Hadoop framework takes care of job management aspects such as keeping track of which jobs run on which
nodes, which jobs complete successfully, which jobs need to be restarted due to failures, and other tasks.
As Hadoop jobs run on distributed cluster, data management is of key importance. Apache Hadoop uses the Hadoop
distributed file system (HDFS) to create multiple replicas of data items across different nodes in the cluster. Using this
application, data reliability is increased through redundancy. Data also kept close to computing resource that uses it,
which increases performance.
The combination of MapReduce algorithms and HDFS enables parallel grid applications to be developed and
deployed rapidly and easily, with minimal development time being spent on grid management aspects.
Using the IBM cloud and Hadoop, the CIRG students at the University of Pretoria realize a number of benefits:
[6] A more economical solution for acquiring the necessary computational resources is cloud computing. A common
pattern is to have bulk data that needs to be transformed, where the processing of each data item is essentially
independent of other data items; that is, using a single-instruction multiple-data (SIMD) algorithm. Hadoop Core
provides an open source framework for cloud computing, as well as a distributed file system.
On [8] HPC Cloud is the continuation of clouds main philosophy with one key difference. As virtualization is not
suitable for all workloads, HPC Cloud must support both virtualized and direct access computing resources. This allows
workloads that can be virtualized to be scaled with demand without interfering with other physical hosts. Virtualization
gives HPC a flexibility it has not had before. As the processing core density of computer nodes increases a single
operating system starts to make less sense. With virtualization a single node can run multiple operating systems at
the same time to allow multiple users to use the same resource. This allows the HPC infrastructure to have a
higher utilization from users within its organization and just its continued investment. Virtualization has another key
benefit in decoupling a user’s job from the physical resource running it. There are mainly two implications for the High
Performance Computing (HPC) level of machine utilization in universities [9] . First, the need for HPC machines in
universities - it is mainly to solve/compute demanding problems. Second, it is very hard to accommodate the
hardware resources for HPC. An example of universities using HPC is a VCL.
Figure 2: Proposed Framework for Cloud Enabled HPC for Ethiopian Universities
We can use the advanced level of flexibility, scalability, and autonomy within HPC environment, universities
can leverage the native abilities of the open source offerings provided by Apache and OpenStack. To have a fully
scalable and flexible HPC environment, it must run on a private cloud environment that provides both storage and
compute nodes. To do that, the Universes must build the private cloud first, then add HPC. So, at this point, Swift, Nova
and RabbitMQ are certainly needed as well as controller nodes for managing and maintaining the environment.
In order to integerate this technologies which are OpenStack based private cloud infrastructure and Hadoop based
luster we use a special API called Savana controller, this controller allow as to give HPC as a service for cloud
users. Figure 4, show the savanna API to integrate Hadoop infrastructure and OpenStack Infrastructure.
BIOGRAPHY
Mr. Samuel Fentahuen is a graduate student in Software Engineering from Adam Science and
Technology University, Ethiopia. And he has more than 6 years of system administration and system
development experience and his area of interest is Cloud Computing, Datacenter, System Development
and System Administration, Linux, VMWare and Hyper-V