Unit I & II Cloud Computing
Unit I & II Cloud Computing
Cloud computing refers to manipulating, configuring, and accessing the applications online. It offers online data
storage, infrastructure and application and involves both a combination of software and hardware based computing
resources delivered as a network service. Example: Suppose we want to install MSWord in our organization’s
computer. We have to bought the CD/DVD of it an install it or can setup a S/W distribution server to automatically
install this application on your machine. Every time Microsoft issued a new version, we have to perform the same
task. If some other company hosts your application, that is, they handle the cost of servers and manage the software
updates. The customers are charged as per their utilization, that is, as per the usage. It reduces the cost of using that
software along with the reduction in the cost of installation of heavy servers. Additionally, cloud aids in reducing
the cost of electricity bills
User
User
User
In the public cloud (or external cloud) computing resources are dynamically provisioned over the Internet via Web
applications or Web services from an offsite thirdparty provider. Public clouds are run by third parties, and
applications from different customers are likely to be mixed together on the cloud’s servers, storage systems, and
networks.
Private cloud (or internal cloud) refers to cloud computing on private networks. Private clouds are built for the
exclusive use of one client, providing full control over data, security, and quality of service. Private clouds can be
built and managed by a company’s own IT organization or by a cloud provider.
A hybrid cloud environment combines multiple public and private cloud models. Hybrid clouds introduce the
complexity of determining how to distribute applications across both a public and private cloud.
Clients: Devices that end users interact with to manage their information on cloud. There can be different types of
clients such as:
Mobile Clients: Includes PDAs or smartphones, like a Blackberry, Windows Mobile Smartphone, or an iPhone.
Thin Clients: Computers that do not have internal hard drives, but rather let the server do all the work, but then
display the information.
Thick Clients: Thick clients are regular computer, using a web browser like Firefox or Internet Explorer to
connect to the cloud.
A thin client is a computing device that's connected to a network. Unlike a typical PC or “fat client,” that has the
memory, storage and computing power to run applications and perform computing tasks on its own, a thin client
functions as a virtual desktop, using the computing power residing on the networked servers.
Advantages of Using Thin Clients
Thin clients are becoming an increasingly popular solution, because of their price and effect on the environment.
Lower hardware costs:Thin clients are cheaper than thick clients because they do not contain as much hardware.
They also last longer before they need to be upgraded or become obsolete.
Lower IT costs:Thin clients are managed at the server and there are fewer points of failure.
Security: Since the processing takes place on the server and there is no hard drive, there’s less chance of malware
invading the device. Also, since thin clients don’t work without a server, there’s less chance of them being
physically stolen.
Data security: Since data is stored on the server, there’s less chance for data to be lost if the client computer
crashes or is stolen.
Less power consumption:Thin clients consume less power than thick clients. This means you’ll pay less to power
them, and you’ll also pay less to aircondition the office.
Ease of repair or replacement: If a thin client dies, it’s easy to replace. The box is simply swapped out and the
user’s desktop returns exactly as it was before failure.
Less noise: Without a spinning hard drive, less heat is generated and quieter fans can be used on the thin client.
Datacenter: Datacenter has a collection of servers where the application to which you subscribe is housed. It is a
large room in the basement of your building or a room full of servers on the other side of the world that you access
via the Internet. There is a growing trend in the IT world of virtualizing servers. The software can be installed
allowing multiple instances of virtual servers to be used. There can be half a dozen virtual servers running on one
physical server.
Distributed Servers: The distributed servers are in geographically disparate locations. They give the service
provider more flexibility in options and security. For instance, Amazon has their cloud solution in servers all over
the world. If something were to happen at one site, causing a failure, the service would still be accessed through
another site.
Eucalyptus:
Eucalyptus is an opensource cloud computing platform designed to enable users to create and manage their own
Infrastructure as a Service (IaaS) environments. It allows organizations to set up private clouds that can operate
similarly to public cloud services like Amazon Web Services (AWS). Eucalyptus was initially developed by
researchers at the University of California, Santa Barbara, and has since evolved into a robust platform used for
cloud computing.
Key Features
1. Open Source:
Eucalyptus is opensource, meaning that organizations can freely access, modify, and distribute the software. This
allows for customization according to specific needs.
2. AWS Compatibility:
One of Eucalyptus’s standout features is its compatibility with Amazon Web Services APIs. This enables users
to leverage existing AWS tools and applications, facilitating easier migration to Eucalyptus for those familiar with
AWS.
3. Flexible Architecture:
Eucalyptus supports a flexible architecture that allows for the integration of various cloud resources, including
storage, computing, and networking.
4. MultiCloud Support:
It allows users to create hybrid cloud environments, enabling the combination of private and public cloud
resources.
6. User Management:
It provides features for user authentication and authorization, ensuring secure access to cloud resources.
Components
2. Walrus:
Walrus is the storage component of Eucalyptus, similar to Amazon S3. It provides object storage capabilities,
allowing users to store and retrieve data.
Nimbus:-
1. Definition:
Nimbus is a term that can refer to several different concepts across various fields, including weather phenomena,
cloud computing, and even cultural or artistic references. Below are some of the most prominent meanings:
2. Meteorological Meaning:
In meteorology, "nimbus" refers to a type of cloud that is associated with precipitation. The term derives from
Latin, meaning "rain."
3. Cloud Computing:
In the realm of technology, "Nimbus" can refer to various cloud computing platforms and services, particularly
those that offer Infrastructure as a Service (IaaS) or Software as a Service (SaaS).
Nimbus (Software): Some software solutions and platforms use the name "Nimbus" to suggest flexibility and
scalability, akin to the cloud itself. For example, Nimbus Note is a productivity app that helps users organize notes,
tasks, and information in a cloudbased environment.
4. Cultural References:
The term "nimbus" is also used in art and religion, particularly to describe a halo or glowing aura surrounding a
divine or holy figure. This representation signifies sanctity and otherworldliness.
OpenNebula:- is an open-source cloud computing platform designed to manage virtualized data centers and
enable the creation of private and public clouds. Below is a detailed overview of OpenNebula, covering its
architecture, features, use cases, and more.
Overview
What is OpenNebula? OpenNebula is an open-source software solution that enables the deployment and
management of virtualized infrastructures. It provides a complete cloud management stack, allowing organizations
to build their own private clouds or hybrid clouds by integrating with public cloud services.
Architecture
Key Components:
Frontend: The central management component, providing a web interface and command-line tools for
managing resources.
Data Stores: Repositories for storing images, templates, and other data required for virtual machines
(VMs).
Virtualization Hosts: Physical servers or nodes that run virtual machines, which can be managed using
various virtualization technologies (e.g., KVM, VMware, LXC).
Networking: Networking components allow the management of virtual networks, security groups, and IP
addressing.
Key Features of OpenNebula:
Multi-Cloud Management: Ability to integrate with public clouds (e.g., AWS, Azure) and manage them
alongside on-premises resources.
Elasticity: Support for scaling resources up or down based on demand, enabling efficient resource
utilization.
Self-Service Portal: Provides a user-friendly web interface for end-users to manage their VMs, networks,
and storage without needing to interact directly with the administrators.
Templates and Snapshots: Users can create reusable templates for VM deployments and take snapshots
for easy recovery.
Monitoring and Reporting: Built-in tools for monitoring the performance and resource usage of the cloud
infrastructure.
Security and Access Control: Role-based access control (RBAC) to ensure that users have the appropriate
permissions for managing resources.
CloudSim is a simulation framework designed for modeling and simulating cloud computing
environments. It is widely used in research and development to facilitate the study of cloud computing
resource management, scheduling, and other operational strategies. Below is a detailed overview of
CloudSim, including its architecture, features, use cases, and more.
Overview
What is CloudSim? CloudSim is an open-source toolkit developed primarily at the University of
Melbourne. It allows researchers and developers to create simulations of cloud computing environments,
enabling the analysis of various cloud services, resource management, and workload management
strategies.
Key Features
Main Features of CloudSim:
Resource Management Simulation: CloudSim can simulate various cloud resources, including servers,
storage, and network capabilities.
Dynamic Resource Provisioning: Users can model and evaluate the impact of different provisioning
strategies in a cloud environment.
Customizable: It allows for customization of cloud data centers, virtual machines (VMs), and user
applications.
Workload Modeling: Users can create various workload models to simulate different types of applications
and their resource requirements.
Extensible: CloudSim's architecture is modular, allowing researchers to extend its capabilities with new
algorithms and components.
Cloud Service Modeling: It supports the simulation of Infrastructure as a Service (IaaS) and can be
adapted for Platform as a Service (PaaS) and Software as a Service (SaaS) models.
Advantages
1. CostEffective: Being opensource reduces licensing costs associated with proprietary cloud solutions.
2. Customization: Organizations can tailor the platform to their specific requirements.
3. Control and Security: A private cloud setup gives organizations greater control over their data and security
measures.
4. Familiarity with AWS: Compatibility with AWS APIs makes it easier for users familiar with AWS to transition
to Eucalyptus.
Disadvantages
1. Complexity: Setting up and managing an Eucalyptus cloud can be complex and may require specialized
knowledge.
2. Community Support: As an opensource platform, support may rely more on community forums than on
dedicated customer service.
Multitenancy: Multitenancy allows multiple users to make use of the same shared resources. Modern applications
such as Banking, Financial, Social networking, ecommerce, B2B etc. are deployed in cloud environments that
support multitenanted applications.
Service Oriented Architecture (SOA): SOAis essentially a collection of services which communicate with each
other.SOA provides a looselyintegrated suite of services that can be used within multiple business domains (Figure
7). The approach here is usually implemented by Web service model.
Microsoft Azure
Overview: Azure is a cloud computing platform created by Microsoft, offering a range of services that integrate
seamlessly with other Microsoft products.
Key Features:
Hybrid Capabilities: Strong support for hybrid cloud solutions, allowing businesses to integrate onpremises
resources with cloud services.
Wide Service Range: Offers services like Azure Virtual Machines, Azure Functions, and Azure SQL Database.
Use Cases: Ideal for businesses already using Microsoft software, supporting applications, data storage, and
DevOps processes.
IBM Cloud
Overview: IBM Cloud combines IaaS and PaaS offerings with a strong emphasis on enterprise solutions and
hybrid cloud environments.
Key Features:
AI Integration: Incorporates IBM Watson for AI services and analytics.
Enterprise Focus: Tailored for large organizations needing secure and compliant solutions.
Use Cases: Often used in industries like finance, healthcare, and manufacturing, where security and compliance
are critical.
5. Oracle Cloud
Overview: Oracle Cloud provides cloud infrastructure and applications, especially known for its database
offerings.
Key Features:
Database Services: Oracle Database as a Service (DBaaS) offers powerful database management in the cloud.
Enterprise Applications: Includes Oracle ERP, HCM, and other applications for business management.
Use Cases: Frequently chosen by enterprises for its robust database capabilities and applications.
6. Alibaba Cloud
Overview: Alibaba Cloud is the largest cloud service provider in Asia and has been expanding globally.
Key Features:
Comprehensive Services: Offers a wide range of services including data storage, cloud security, and big data
solutions.
Focus on Asia: Tailored to meet the needs of businesses in the Asian market, with compliance and support for
local regulations.
Use Cases: Increasingly used by businesses operating in Asia, as well as global companies looking to expand in
the region.
7. Salesforce
Overview: Salesforce is primarily known for its customer relationship management (CRM) solutions, but also
offers a cloud platform for developers.
Key Features:
CRM Focus: Provides tools for sales, customer service, and marketing.
App Development: Offers the Salesforce Platform for building custom applications.
Use Cases: Widely adopted in sales and customer service sectors, enhancing customer engagement and
relationship management.
DigitalOcean
Overview: DigitalOcean is a cloud provider aimed at developers, offering simplicity and ease of use.
Key Features:
DeveloperFriendly: Provides straightforward interfaces and API access for easy management of cloud
resources.
CostEffective: Known for its competitive pricing, making it attractive for startups and small businesses.
Use Cases: Popular among individual developers and small teams for deploying applications and websites
quickly.
VMware
Overview: VMware specializes in virtualization and cloud infrastructure solutions, helping organizations manage
their IT environments.
Key Features:
Virtualization Technology: Offers powerful tools for managing virtual machines and data centers.
Hybrid Cloud Solutions: Supports hybrid cloud strategies to integrate onpremises and cloud resources.
Use Cases: Commonly used by enterprises looking to optimize their existing infrastructure while transitioning to
the cloud.
Rackspace
Overview: Rackspace provides managed cloud services, helping businesses manage their cloud environments
effectively.
Key Features:
Managed Support: Offers expert support for various cloud platforms, including AWS, Azure, and Google
Cloud.
Custom Solutions: Tailors cloud solutions to meet specific business needs.
Use Cases: Suitable for companies needing assistance with cloud management, migration, and optimization.
environment for learning, teaching, experimenting, etc. to students, faculty members, and
researchers. Everyone associated with the field can connect to the cloud of their organization and
access data and information from there.
Technology enhanced Learning or Education as a Service (EaaS):There are the following
education applications offered by the cloud
Example:
Google Apps for Education: Google Apps for Education is the most widely used platform for free
webbased email, calendar, documents, and collaborative study.
Chromebooks for Education: Chromebook for Education is one of the most important Google's
projects. It is designed for the purpose that it enhances education innovation.
Tablets with Google Play for Education: It allows educators to quickly implement the latest
technology solutions into the classroom and make it available to their students.
Testing and development: Setting up the platform for development and finally performing
different types of testing to check the readiness of the product before delivery requires different types
of IT resources and infrastructure. But Cloud computing provides the easiest approach for
development as well as testing even if deployment by using their IT resources with minimal expenses.
Organizations find it more helpful as they got scalable and flexible cloud services for product
development, testing, and deployment.
EGovernance Applications: Cloud computing can provide its services to multiple activities
conducted by the government. It can support the government to move from the traditional ways of
management and service providers to an advanced way of everything by expanding the availability of
the environment, making the environment more scalable and customized. It can help the government
to reduce the unnecessary cost in managing, installing, and upgrading applications and doing all these
with help of could computing and utilizing that money public service.
Cloud Computing in Medical Fields: In the medical field also nowadays cloud computing is used for
storing and accessing the data as it allows to store data and access it through the internet
without worrying about any physical setup. It facilitates easier access and distribution of
information among the various medical professional and the individual patients. Similarly, with help
of cloud computing offsite buildings and treatment facilities like labs, doctors making emergency
house calls and ambulances information, etc can be easily accessed and updated remotely instead of
having to wait until they can access a hospital computer.
experience. It allows us to communicate with our business partners, friends, and relatives using a
cloudbased video conferencing. The benefits of using video conferencing are that it reduces cost,
increases efficiency, and removes interoperability.
Summary
Cloud computing offers various cloud management tools which help admins to manage all types of
cloud activities, such as resource deployment, data integration, and disaster recovery.
Cloud computing refers to manipulating, configuring, and accessing the applications online.
Cloud computing virtualizes systems by pooling and sharing resources. Systems and storage can be
provisioned as needed from a centralized infrastructure, costs are assessed on a metered basis,
multitenancy is enabled, and resources are scalable with agility.
Cloud computing eliminates the need for IT infrastructure updates and maintenance since the service
provider ensures timely, guaranteed, and seamless delivery of our services and also takes care of all
the maintenance and management of our IT services according to the service level agreement (SLA).
Cloud computing can be expensive if you don’t know how to manage your computing resources and
take maximum advantage of them.
Cloud computing lets us deploy the service quickly in fewer clicks. This quick deployment lets us get
the resources required for our system within minutes.
Advantages
1. Cost Effective: Being opensource reduces licensing costs associated with proprietary cloud
solutions.
2. Customization: Organizations can tailor the platform to their specific requirements.
3. Control and Security: A private cloud setup gives organizations greater control over their data and
security measures.
4. Familiarity with AWS: Compatibility with AWS APIs makes it easier for users familiar with AWS
to transition to Eucalyptus.
Disadvantages
1. Complexity: Setting up and managing an Eucalyptus cloud can be complex and may require
specialized knowledge.
2. Community Support: As an opensource platform, support may rely more on community forums
than on dedicated customer service.
IaaS, PaaS and SaaS are the three most popular types of cloud service offerings. They are
sometimes referred to as cloud service models or cloud computing service models.
IaaS, PaaS and SaaS are not mutually exclusive. Many mid-sized businesses use more
than one, and most large enterprises use all three.
'As a service' refers to the way IT assets are consumed in these offering and to the
essential difference between cloud computing and traditional IT. In traditional IT, an
organization consumes IT assets—hardware, system software, development tools,
applications—by purchasing them, installing them, managing them and maintaining
them in its own on-premises data center.
In cloud computing, the cloud service provider owns, manages and maintains the assets;
the customer consumes them via an Internet connection, and pays for them on a
subscription or pay-as-you-go basis.
So the chief advantage of IaaS, PaaS, SaaS or any 'as a service' solution is economic: A
customer can access and scale the IT capabilities it needs for a predictable cost, without
the expense and overhead of purchasing and maintaining everything in its own data
center. But there are additional advantages specific to each of these solutions.
GuideRealize the full value of your hybrid cloud
Connect and integrate your systems to prepare your infrastructure for AI.
Related content
Register for the guide on app modernization
IaaS
The difference is that the cloud service provider hosts, manages and maintains the
hardware and computing resources in its own data centers. IaaS customers use the
hardware via an internet connection, and pay for that use on a subscription or pay-as-
you-go basis.
Typically IaaS customers can choose between virtual machines (VMs) hosted on shared
physical hardware (the cloud service provider manages virtualization) or bare metal
servers on dedicated (unshared) physical hardware. Customers can provision, configure
and operate the servers and infrastructure resources via a graphical dashboard, or
programmatically through application programming interfaces (APIs).
Notes
Benefits of IaaS
Compared to traditional IT, IaaS gives customers more flexibility build out computing
resources as needed, and to scale them up or down in response to spikes or slow-downs
in traffic. IaaS lets customers avoid the up-front expense and overhead of purchasing
and maintaining its own on-premises data center. It also eliminates the constant
tradeoff between the waste of purchasing excess on-premises capacity to accommodate
spikes, versus the poor performance or outages that can result from not having enough
capacity for unanticipated traffic bursts or growth.
Higher availability: With IaaS a company can create redundant servers easily, and even
create them in other geographies to ensure availability during local power outages or
physical disasters.
Lower latency, improved performance: Because IaaS providers typically operate data
centers in multiple geographies, IaaS customers can locate apps and services closer to
users to minimize latency and maximize performance.
Comprehensive security: With a high level of security onsite, at data centers, and via
encryption, organizations can often take advantage of more advanced security and
protection they might provide if they hosted the cloud infrastructure in-house.
Ecommerce: IaaS is an excellent option for online retailers that frequently see spikes in
traffic. The ability to scale up during periods of high demand and high-quality security
are essential in today’s 24-7 retail industry.
Internet of Things (IoT), event processing, artificial intelligence (AI): IaaS makes it
easier to set up and scale up data storage and computing resources for these and other
Notes
Software development: With IaaS, the infrastructure for testing and development
environments can be set up much more quickly than on-premises. (However, this use
case is better suited to PaaS, as you'll read in the next section.)
PaaS
Users access the PaaS through a graphical user interface (GUI), where development
or DevOps teams can collaborate on all their work across the entire application lifecycle
including coding, integration, testing, delivery, deployment and feedback.
Examples of PaaS solutions include AWS Elastic Beanstalk, Google App Engine,
Microsoft Windows Azure and Red Hat OpenShift on IBM Cloud.
Benefits of PaaS
The primary benefit of PaaS is that it allows customers to build, test, deploy run, update
and scale applications more quickly and cost-effectively than they might if they had to
build out and manage their own on-premises platform. Other benefits include:
Low- to no-risk testing and adoption of new technologies: PaaS platforms typically
include access to a wide range of the latest resources up and down the application stack.
This allows companies to test new operating systems, languages and other tools without
having to make substantial investments in them, or in the infrastructure required to run
them.
A more scalable approach: With PaaS, organizations can purchase extra capacity for
building, testing, staging and running applications whenever they need it.
Less to manage: PaaS offloads infrastructure management, patches, updates and other
administrative tasks to the cloud service provider.
API development and management: With its built-in frameworks, PaaS makes
it easier for teams to develop, run, manage and secure APIs for sharing data and
functionality between applications.
Agile development and DevOps: PaaS solutions typically cover all the
requirements of a DevOps toolchain, and provide built-in automation to
support continuous integration and continuous delivery (CI/CD).
The vendor manages all upgrades and patches to the software, usually invisibly to
customers. Typically, the vendor ensures a level of availability, performance and
security as part of a service level agreement (SLA). Customers can add more users
and data storage on demand at additional cost.
Today, anyone who uses a or mobile phone almost certainly uses some form of SaaS.
Email, social media and cloud file storage solutions (such as Dropbox or Box)
are examples of SaaS applications people use every day in their personal lives.
Benefits of SaaS
The main benefit of SaaS is that it offloads all infrastructure and application
management to the SaaS vendor. All the user has to do is create an account, pay the fee
and start using the application. The vendor handles everything else, from maintaining
the server hardware and software to managing user access and security, storing and
managing data, implementing upgrades and patches and more.
Notes
Minimal risk: Many SaaS products offer a free trial period, or low monthly fees that let
customers try the software to see if it will meet their needs, with little or no financial
risk.
Anytime/anywhere productivity: Users can work with SaaS apps on any device with a
browser and an internet connection.
Easy scalability: Adding users is as simple as registering and paying for new seats—
customers can purchase more data storage for a nominal charge.
SaaS, Paas, IaaS are not mutually exclusive—most organizations use more than one, and
many larger organizations today use all three, often with traditional IT.
But in some cases, any of the three 'as-a-service' models will offer a viable solution. In
these cases, organizations typically compare the alternatives based on the management
ease they offer versus the control they give up.
For example, suppose that a large organization wants to deliver a customer relationship
management (CRM) application to its sales team. It might:
Choose a SaaS CRM solution, offloading all day-to-day management to the third-party
vendor, but also giving up all control over features and functionality, data storage, user
access and security.
Choose a PaaS solution and build a custom CRM application. In this case, the company
would offload management of infrastructure and application development resources to
the cloud service provider. The customer would retain complete control over
application features, but it would also assume responsibility for managing the
application and associated data.
Notes
Build out backend IT infrastructure on the cloud by using IaaS, and use it to build its own
development platform and application. The organization's IT team would have complete
control over operating systems and server configurations, but also bear the burden of
managing and maintaining them, along with the development platform and applications
that run on them.
IaaS, SaaS, PaaS and IBM Cloud
IBM has a broad menu of IaaS, PaaS and SaaS offerings to meet your company’s needs up
and down the stack. IBM’s rich and scalable PaaS solutions help organizations develop
cloud native applications from scratch, or modernize existing applications to benefit
from the flexibility and scalability of the cloud. IBM also offers a full IaaS layer of
virtualized compute, network and storage within our full-stack cloud platform, and
more than 150 SaaS business applications to help you innovate.
(Reference=https://www.ibm.com/topics/iaas-paas-saas)
Provisioning: Users can quickly provision new databases through an easytouse interface, often with
just a few clicks.
Maintenance: The service provider handles routine maintenance tasks, such as patching, upgrades,
and backups, which are critical for database reliability and security.
Data Security: DBaaS providers typically offer builtin security measures, such as encryption and
access controls, ensuring data is protected from unauthorized access.
Performance Optimization: Providers often include tools for monitoring and optimizing database
performance, helping users achieve better application performance.
functionalities, including voice calls, video conferencing, messaging, and collaboration platforms.
CaaS allows businesses to manage their communication needs through a single, integrated platform
without the complexity of traditional telephony systems.
Unified Interface: Users can access various communication methods (e.g., voice, video, messaging)
from a single platform, streamlining communication workflows.
Scalability: Organizations can easily scale their communication services up or down based on user
needs, making it suitable for businesses of all sizes.
Integration with Other Services: CaaS solutions often include APIs that allow organizations to
integrate communication features into their existing applications or workflows.
Mobility: Users can access communication tools from any device with internet access, facilitating
remote work and collaboration.
MapReduce
MapReduce is a powerful programming model designed for processing large data sets in a distributed
computing environment. It allows developers to write applications that can process vast amounts of
data across a cluster of machines. The core of the MapReduce model consists of two main functions:
the Map function and the Reduce function.
Map Function: This function takes input data, processes it, and outputs intermediate keyvalue pairs.
For example, in a word count application, the Map function would read a document and output pairs
such as (word, 1) for each word found.
Reduce Function: The Reduce function takes the intermediate keyvalue pairs produced by the Map
function, aggregates them, and produces the final output. Continuing with the word count example, it
would sum the counts for each word.
The overall process involves several steps: data is split into manageable chunks (splitting), processed
in parallel by the Map function (mapping), intermediate data is grouped by key (shuffling and
sorting), and finally, the Reduce function generates the output. This approach enables efficient
processing of large datasets, making it ideal for data analytics, log analysis, and machine learning
applications.
Scalability: GFS can scale out easily by adding more machines to the cluster, accommodating
increasing data loads.
Fault Tolerance: It automatically replicates data across multiple servers to prevent data loss in case of
hardware failures.
Large File Handling: GFS is optimized for large files, which are divided into chunks for storage and
management.
High Throughput: The system is designed for high data throughput, making it suitable for
dataintensive applications.
Notes
GFS has influenced the design of other distributed file systems, including Hadoop's HDFS, making it
a foundational technology in big data processing.
Distributed Storage: Files are broken down into large blocks (default 128 MB or 256 MB) and stored
across multiple nodes, enabling parallel processing.
Data Replication: Each block is replicated (typically three times) across different nodes to ensure
durability and availability, which is crucial for fault tolerance.
High Throughput Access: HDFS is optimized for high throughput access to application data, making
it ideal for batch processing jobs.
Fault Tolerance: It automatically recovers from hardware failures, ensuring that data remains
accessible.
HDFS is widely used for storing large datasets, acting as a data lake for various data sources, and
facilitating big data analytics.
Hadoop Framework
The Hadoop framework is an opensource software framework designed for the distributed storage and
processing of large datasets using simple programming models. It consists of several key components:
Hadoop also boasts a rich ecosystem of tools and projects that enhance its capabilities, such as:
Apache Hive: A data warehousing tool that allows users to query and manage large datasets using a
SQLlike language.
Apache Pig: A highlevel platform for writing data processing programs, offering a more
straightforward interface compared to raw MapReduce.
Apache HBase: A NoSQL database built on top of HDFS, providing realtime access to large datasets.
Apache Spark: A fast data processing engine that can run on top of Hadoop for realtime data
analytics.
Hadoop is used across various industries for tasks such as largescale data processing, analytics, and
data storage, making it a cornerstone of big data technology.