0% found this document useful (0 votes)

26 views21 pages

NIA - Chapter 13

Chapter 13 discusses server hardware strategies, highlighting the importance of server fleets in providing various services within organizations. It outlines three common strategies: using a single server for multiple services, deploying unique machines for each service, and buying large machines to create virtual machines. The chapter emphasizes the need for data integrity, inventory management, and reducing variations to optimize server management.

Uploaded by

npallaresf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views21 pages

NIA - Chapter 13

Uploaded by

npallaresf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Chapter 13.

Server Hardware Strategies

Another group of machines in your organization is the server ﬂeet. This population of
machines is used to provide application services, web services, databases, batch
computation, back-o ice processing, and so on.

A single machine might provide one service or many. For example, a single server might be
a dedicated file server, or it might be a file server, a DNS server, and a wiki server while
performing many other functions. Alternatively, some services might be too large to fit on a
single machine. Instead, the work of the service may be distributed over many machines.
For example, Google’s Gmail service is distributed over thousands of machines, each
doing a small fraction of the work.

By deﬁnition a server has dependents, usually many. A single server may have hundreds of
clients relying on it. A web server may have thousands of users depending on it. Contrast
this to a laptop or desktop, which generally has just a single user.

In a model that has one big server, or just a few, any investment made in a server is
amortized over all the dependents or users. We justify any additional expense to improve
performance or increase reliability by looking at the dependents’ needs. In this model
servers are also expected to last many years, so paying for spare capacity or expandability
is an investment in extending its life span.

By comparison, when we have hundreds or thousands of servers, saving a little money on

a per-machine basis saves a lot of money overall. If we are buying 100 machines, and can
save a few dollars on each one, those savings add up quickly.

This chapter’s advice applies to a typical enterprise organization that mostly purchases
o -the-shelf hardware and software, with a limited number of home-grown applications.
Volume 2 of this book series is more speciﬁcally geared toward web-based applications
and large clusters of machines.

There are many strategies for providing server resources. Most organizations use a mix of
these strategies.

The three most common strategies are:

• All eggs in one basket: One machine used for many purposes

• Beautiful snowﬂakes: Many machines, each uniquely conﬁgured

• Buy in bulk, allocate fractions: Large machines partitioned into many smaller virtual
machines using virtualization or containers

In addition, there are variations and alternatives:

• Grid computing: Many machines managed one as unit

• Blade servers: A hardware architecture that places many machines in one chassis

• Cloud-based compute services: Renting use of someone else’s servers

• Software as a service (SaaS): Web-hosted applications

• Server appliances: Purpose-built devices, each providing a di erent service

The rest of this chapter describes these strategies and provides advice on how to manage
them.

What Is a Server?

The term “server” is overloaded and ambiguous. It means di erent things in di erent
contexts. The phrase “web server” might refer to a host being used to provide a web site (a
machine) or the software that implements the HTTP protocol (Apache HTTPD).

Unless speciﬁed, this book uses the term “server” to mean a machine. It refers to a
“service” as the entire hardware/software combination that provides the service users
receive. For example, an email server (the hardware) runs MS Exchange (the software) to
provide the email service for the department.

Note that Volume 2 of this book series deals with this ambiguity di erently. It uses the
term “server” to mean the server of a client–server software arrangement—for example, an
Apache HTTPD server. When talking about the hardware it uses the word “machine.”

13.1 All Eggs in One Basket

The most basic strategy is to purchase a single server and use it for many services. For
example, a single machine might serve as the department’s DNS server, DHCP server,
email server, and web server. This is the “all eggs in one basket” approach. We don’t
recommend this approach. Nevertheless, we see it often enough that we felt we should
explain how to make the best of it.

If you are going to put all your eggs in one basket, make sure you have a really, really, really
strong basket. Any hardware problems the machine has will a ect many services. Buy
top-of-the-line hardware and a model that has plenty of expansion slots. You’ll want this
machine to last a long time.

In this setting it becomes critical to ensure data integrity. Hard disks fail. Expecting them
to never fail is irrational. Therefore RAID should be used so that a single disk can fail and
the system will continue to run. Our general rule of thumb is to use a 2-disk mirror (RAID 1)
for the boot disk. Additional storage should be RAID 5, 6, or 10 as needed. RAID 0 o ers
zero data integrity protection (the “zero” in the name reflects this). Obviously, di erent
applications demand di erent configurations but this fits most situations. Putting plain
disks without RAID 1 or higher on a server like this is asking for trouble. Any disk
malfunction will result in a bad day as you restore from backups and deal with any data
loss. When RAID was new, it was expensive and rare. Now it is inexpensive and should be
the default for all systems that need reliability.

Data integrity also requires regular data backups to tape or another system. These
backups should be tested periodically. As explained in Section 43.3.3, RAID is not a
backup strategy. If this system dies in a ﬁre, RAID will not protect the data.

If a server is expected to last a long time, it will likely need to be expanded. Over time most
systems will need to handle more users, more data, or more applications. Additional slots
for memory, interfaces, and hard drive bays make it easy to upgrade the system without
replacing it. Upgradable hardware is more expensive than ﬁxed-conﬁguration equivalents,
but that extra expense is an insurance policy against more costly upgrades later. It is much
less expensive to extend the life of a machine via an incremental upgrade such as adding
RAM than to do a wholesale replacement of one machine with another.

Forklift Upgrades

The term forklift upgrade is industry slang for a wholesale replacement. In such a
situation you are removing one machine with a metaphorical forklift and dropping a
replacement in its place.

Another problem with the “one basket” strategy is that it makes OS upgrades very di icult
and risky. Upgrading the operating system is an all-or-nothing endeavor. The OS cannot be
upgraded or patched until all the services are veriﬁed to work on the new version. One
application may hold back all the others.

Upgrading individual applications also becomes more perilous. For example, what if
upgrading one application installs the newest version of a shared library, but another
application works only with the older version? Now you have one application that isn’t
working and the only way to ﬁx it is to downgrade it, which will make the other application
fail. This Catch-22 is known as dependency hell.

Often we do not discover these incompatibilities until after the newer software is installed
or the operating system is upgraded. Some technologies can be applied to help in this
situation. For example, some disk storage systems have the ability to take a snapshot of
the disk. If an upgrade reveals incompatibility or other problems, we can revert to a
previous snapshot. RAID 1 mirrors and ﬁle systems such as ZFS and Btrfs have
snapshotting capabilities.

Upgrading a single machine with many applications is complex and is the focus of Chapter
33, “Server Upgrades.”

The more applications a big server is running, the more di icult it becomes to schedule
downtime for hardware upgrades. Very large systems permit hardware upgrades to be
performed while they remain running, but this is rare for components such as RAM and
CPUs. We can delay the need for downtime by buying extra capacity at the start, but this
usually leads to more dependencies on the machine, which in turn exacerbates the
scheduling problem we originally tried to solve.

13.2 Beautiful Snowﬂakes

A better strategy is to use a separate machine for each service. In this model we purchase
servers as they are needed, ordering the exact model and conﬁguration that is right for the
application.

Each machine is sized for the desired application: RAM, disk, number and speeds of NICs,
and enough extra capacity, or expansion slots, for projected growth during the expected
life of the machine. Vendors can compete to provide their best machine that meets these
speciﬁcations.

The benefit of this strategy is that the machine is the best possible choice that meets the
requirements. The downside is that the result is a fleet of unique machines. Each is a
beautiful, special little snowflake.
While snowflakes are beautiful, nobody enjoys a blizzard. Each new system adds
administrative overhead proportionally. For example, it would be a considerable burden if
each new server required learning an entirely new RAID storage subsystem. Each one
would require learning how to configure it, replace disks, upgrade the firmware, and so on.
If, instead, the IT organization standardized on a particular RAID product, each new
machine would simply benefit from what was learned earlier.

13.2.1 Asset Tracking

When managing many unique machines it becomes increasingly important to maintain an

inventory of machines. The inventory should document technical information such as the
operating system and hardware parameters such as amount of RAM, type of CPU, and so
on. It should also track the machine owner (either a person or a department), the primary
contact (useful when the machine goes haywire), and the services being used. When
possible, this information should be collected through automated means; this makes it
less likely the information will become outdated. If automated collection is not possible,
an automated annual review, requiring a irmations from the various contacts, can detect
obsolete information that needs to be updated.

A variety of inventory applications are available, many of which are free or low cost. Having
even a simple inventory system is better than none. A spreadsheet is better than keeping
this information in your head. A database is better than a spreadsheet.

13.2.2 Reducing Variations

Always be on the lookout for opportunities to reduce the number of variations in platforms
or technologies being supported. Discourage gratuitous variations by taking advantage of
the fact that people lean toward defaults. Make right easy: Make sure that the lazy path is
the path you want people to take. Select a default hardware vendor, model, and operating
system and make it super easy to order. Provide automated OS installation, conﬁguration,
and updates. Provide a wiki with information about recommended models and options,
sales contact information, and assistance. We applaud you for being able to keep all that
information in your head, but putting it on a wiki helps the entire organization stick to
standards. That helps you in the long term.

The techniques discussed in Part II, “Workstation Fleet Management,” for managing
variation also apply to servers. In particular, adopt a policy of supporting a limited number
of generations of hardware. When a new generation of hardware is introduced, the oldest
generation is eliminated. This practice also works for limiting the number of OS revisions in
use.

Google’s SysOps Death Squads

Google’s SysOps (internal IT) organization had a policy of supporting at most two Linux
releases at any given time, and a limited number of hardware generations. For a new OS
release or hardware model to be introduced into the ecosystem, the oldest one had to be
eliminated.

Eliminating the old generation didn’t happen automatically. People were not continuing to
use it because of laziness; there was always some deep technical reason, dependency,
lack of resources, or political issue preventing the change. Finding and eliminating the
stragglers took focused e ort.
To accelerate the process, a project manager and a few SAs would volunteer to form a
team that would work to eliminate the stragglers. These teams were called “death
squads,” an insensitive name that could have been invented only by people too young to
remember the politics of Central America in the late twentieth century.

The team would start by publishing a schedule: the date the replacement was available,
the date the old systems would no longer receive updates, and the date the old systems
would be disconnected from the network. There were no exceptions. Management had
little sympathy for whining. Employees sometimes complained the ﬁrst time they
experienced this forced upgrade process but soon they learned that restricting
technologies to a small number of generations was a major part of the Google operational
way and that services had to be built with the expectation of being easy to upgrade or
rebuild.

The team would identify machines that still used the deprecated technology. This would
be published as a shared spreadsheet, viewable and editable by all. Columns would be
ﬁlled in to indicate the status: none, owner contacted, owner responded, work in progress,
work complete. As items were complete, they would be marked green. The system was
very transparent. Everyone in the company could see the list of machines, who had
stepped up to take ownership of each, and their status. It was also very visible if a team
was not taking action.

The death squad would work with all the teams involved and o er support and assistance.
Some teams already had plans in place. Others needed a helping hand. Some needed
cajoling and pressure from management. The death squad would collaborate and assist
until the older technology was eliminated.

This system was used often and was a key tactic in preventing Google’s very large ﬂeet
from devolving into a chaotic mess of costly snowﬂakes.

13.2.3 Global Optimization

While it sounds e icient to customize each machine to the exact needs of the service it
provides, the result tends to be an unmanageable mess.

This is a classic example of a local optimization that results in a global de-optimization.

For example, if all server hardware is from one vendor, adding a single machine from a
di erent vendor requires learning a new ﬁrmware patching system, investing in a new set
of spare parts, learning a new customer support procedure, and so on. The added work for
the IT team may not outweigh the beneﬁts gained from using the new vendor.

The one-o hardware might be rationalized by the customer stating that spares aren’t
required, that ﬁrmware upgrades can be skipped, or that the machine will be self-
supported by the customer. These answers are nice in theory, but usually lead to even
more work for the IT department. When there is a hardware problem, the IT department
will be blamed for any di iculties. Sadly the IT department must usually say no, which
makes this group look inﬂexible.

The Unique Storage Device

A university professor in the mathematics department purchased a custom storage
system against the advice of the IT department. He rationalized the exception by promising
to support it himself.

During a software upgrade the system stopped working. The professor was unable to ﬁx
the problem, and other professors who had come to rely on the system were dead in the
water.

After a month of downtime, the other professors complained to the department chair, who
complained to the IT department. While the department chair understood the agreement
in place, he’d hear none of it. This was an emergency and IT’s support was needed. The IT
group knew that once they got involved, any data loss would be blamed on them. Their
lack of experience with the device increased that risk.

The end of this story is somewhat anticlimactic. While the IT group was establishing an
action plan in cooperation with the vendor, the professor went rogue and booted the
storage server with a rescue disk, wiped and reinstalled the system, and got it working
again. Only then did he concede that there was no irreplaceable data on the system.

Unfortunately, this experience taught the professor that he could ignore IT’s hardware
guidance since he’d get support when he needed it, and that he didn’t really need it since
he ﬁxed the problem himself.

The other faculty who had come to depend on this professor’s system realized the value of
the support provided by the IT group. While the previous system was “free,” losing access
for a month was an unacceptable loss of research time. The potential for catastrophic
data loss was not worth the savings.

What the IT department learned was that there was no such thing as self-support. In the
future, they developed ways to be involved earlier on in purchasing processes so that they
could suggest alternatives before decisions had been made in the minds of their
customers.

13.3 Buy in Bulk, Allocate Fractions

The next strategy is to buy computing resources in bulk and allocate fractions of it as
needed.

One way to do this is through virtualization. That is, an organization purchases large
physical servers and divides them up for use by customers by creating individual virtual
machines (VMs). A virtualization cluster can grow by adding more physical hardware as
more capacity is needed.

Buying an individual machine has a large overhead. It must be speciﬁed, ordered,

approved, received, rack mounted, and prepared for use. It can take weeks. Virtual
machines, by comparison, can be created in minutes. A virtualization cluster can be
controlled via a portal or API calls, so automating processes is easy.

VMs can also be resized. You can add RAM, vCPUs, and disk space to a VM via an API call
instead of a visit to the datacenter. If customers request more memory and you add it
using a management app on your iPhone while sitting on the beach, they will think you are
doing some kind of magic.
Virtualization improves computing e iciency. Physical machines today are so powerful
that applications often do not need the full resources of a single machine. The excess
capacity is called stranded capacity because it is unusable in its current form. Sharing a
large physical machine’s power among many smaller virtual machines helps reduce
stranded capacity, without getting into the “all eggs in one basket” trap.

Stranded capacity could also be mitigated by running multiple services on the same
machine. However, virtualization provides better isolation than simple multitasking. The
beneﬁts of isolation include

• Independence: Each VM can run a di erent operating system. On a single physical host
there could be a mix of VMs running a variety of Microsoft Windows releases, Linux
releases, and so on.

• Resource isolation: The disk and RAM allocated to a VM are committed to that VM and
not shared. Processes running on one VM can’t access the resources of another VM. In
fact, programs running on a VM have little or no awareness that they are running on VMs,
sharing a larger physical machine.

• Granular security: A person with root access on one VM does not automatically have
privileged access on another VM. Suppose you had five services, each run by a di erent
team. If each service was on its own VM, each team could have administrator or root
access for its VM without a ecting the security of the other VMs. If all five services were
running on one machine, anyone needing root or administrator access would have
privileged access for all five services.

• Reduced dependency hell: Each machine has its own operating system and system
libraries, so they can be upgraded independently.

13.3.1 VM Management

Like other strategies, keeping a good inventory of VMs is important. In fact, it is more
important since you can’t walk into a datacenter and point at the machine. The cluster
management software will keep an inventory of which VMs exist, but you need to maintain
an inventory of who owns each VM and its purpose.

Some clusters are tightly controlled, only permitting the IT team to create VMs with the
care and planning reminiscent of the laborious process previously used for physical
servers. Other clusters are general-purpose compute farms providing the ability for
customers to request new machines on demand. In this case, it is important to provide a
self-service way for customers to create new machines. Since the process can be fully
automated via the API, not providing a self-service portal or command-line tool for
creating VMs simply creates more work for you and delays VM creation until you are
available to process the request. Being able to receive a new machine when the SAs are
unavailable or busy reduces the friction in getting the compute resources customers
need. In addition to creating VMs, users should be able to reboot and delete their own
VMs.

There should be limits in place so that customers can’t overload the system by creating
too many VMs. Typically limits are based on existing resources, daily limits, or per-
department allocations.
Users can become confused if you permit them to select any amount of disk space and
RAM. They often do not know what is required or reasonable. One strategy is to simply
o er reasonable defaults for each OS type. Another strategy is to o er a few options:
small, medium, large, and custom.

As in the other strategies, it is important to limit the amount of variation. Apply the
techniques described previously in this chapter and in Part II, “Workstation Fleet
Management.” In particular, adopt a policy of supporting a limited number of OS versions
at any given time. For a new version to be introduced, the oldest generation is eliminated.

13.3.2 Live Migration

Most virtual machine cluster management systems permit live migration of VMs, which
means a VM can be moved from one physical host to another while it is running. Aside
from a brief performance reduction during the transition, the users of the VM do not even
know they’re being moved.

Live migration makes management easier. It can be used to rebalance a cluster, moving
VMs o overloaded physical machines to others that are less loaded. It also lets you work
around hardware problems. If a physical machine is having a hardware problem, its VMs
can be evacuated to another physical machine. The owners of the VM can be blissfully
unaware of the problem. They simply beneﬁt from the excellent uptime.

The architecture of a typical virtualization cluster includes many physical machines that
share a SAN for storage of the VM’s disks. By having the storage external to any particular
machine, the VMs can be easily migrated between physical machines.

Shared storage is depicted in Figure 13.1. The VMs run on the VM servers and the disk
volumes they use are stored on the SAN. The VM servers have little disk space of their
own.
Three VM (Virtual Machine) servers VM server A, VM Server B, and VM Server C are shown
on the left and SAN (Storage Area Network) is shown on the right. VM Server A consist of
seven blocks labeled, VM1, VM2, VM3, VM4, VM5, VM6, and VM7. Virtual Server B
consists of two blocks labeled, VM8 and VM9, and Virtual Server C shows two blocks
labeled, VM10 and VM11. The Storage Area Network shows eleven blocks labeled,
VM1, VM2, VM3, VM4, VM5, VM6, VM7, VM8, VM9, VM10, and VM11. A network
connects VM server A, VM Server B, and VM Server C, and the network is connected to the
Storage Area Network.

Figure 13.1: VM servers with shared storage

In Figure 13.2 we see VM7 is being migrated from VM Server A to VM Server B. Because
VM7’s disk image is stored on the SAN, it is accessible from both VM servers as the
migration process proceeds. The migration process simply needs to copy the RAM and
CPU state between A and B, which is a quick operation. If the entire disk image had to be
copied as well, the migration could take hours.
Three VM (Virtual Machine) servers VM server A, VM Server B, and VM Server C are shown
on the left and SAN (Storage Area Network) is shown on the right. VM Server A consist of
seven blocks labeled, VM1, VM2, VM3, VM4, VM5, VM7, VM6, VM7, VM7, empty, VM7,
VM8. Virtual Server B consists of two blocks labeled, VM8 and VM9, and Virtual Server
C shows two blocks labeled, VM10 and VM11. The Storage Area Network shows eleven
blocks labeled, VM1, VM2, VM3, VM4, VM5, VM6, VM7, VM8, VM9, VM10, and VM11. A
network connects VM server A, VM Server B, and VM Server C, and the network is
connected to the Storage Area Network. VM7 in the Storage Area Network is connected to
four VM7s of VM Server A.

Figure 13.2: Migrating VM7 between servers

13.3.3 VM Packing

While VMs can reduce the amount of stranded compute capacity, they do not eliminate it.
VMs cannot span physical machines. As a consequence, we often get into situations
where the remaining RAM on a physical machine is not enough for a new VM.

The best way to avoid this is to create VMs that are standard sizes that pack nicely. For
example, we might define the small configuration such that exactly eight VMs fit on a
physical machine with no remaining stranded space. We might define the medium
configuration to fit four VMs per physical machine, and the large configuration to fit two

VMs per physical machine. In other words, the sizes are , , and . Since the
denominators are powers of 2, combinations of small, medium, and large conﬁgurations
will pack nicely.

13.3.4 Spare Capacity for Maintenance

If a physical machine needs to be taken down for repairs, there has to be a place where
the VMs can be migrated if you are to avoid downtime. Therefore you must always reserve
capacity equivalent to one or more physical servers. Reserving capacity equivalent to one
physical server is called N + 1 redundancy. You may need N + 2 or higher redundancy if
there is a likelihood that multiple physical machines will need maintenance at the same
time. For example, a network with hundreds of physical machines is likely to have many
physical machines out for repair at any given time.

One strategy is to keep one physical machine entirely idle so that, when needed, the VMs
from the machine to be repaired can all be migrated to this machine. This scheme is
depicted in Figure 13.3(a). When physical machine A, B, C, or D needs maintenance, its
VMs are transferred to machine E. Any one machine can be down at a time. It doesn’t
matter which combination of VMs are on the machine, as long as the spare is as large as
any of the other machines.

In machine A, column A shows a block with four rows labeled VM, column B shows three
rows labeled VM, column C shows two rows labeled VM, column D shows eight cells
labeled VM, and column E shows four blank rows. In machine B, column A shows a block
with four rows labeled VM, columns B, C, D, and E have four rows in which the ﬁrst three
rows are labeled VM. In machine c, column A has three rows labeled VM, columns B, C, D,
and E have four rows in which the ﬁrst three rows are labeled VM. In machine d, column A
has three rows labeled VM, column B has three rows in which two rows are labeled VM,
columns C and D have four rows in which three rows are labeled VM, column E has four
rows labeled E.

Figure 13.3: VM packing

Of course, this arrangement means that the spare machine is entirely unused. This seems
like a waste since that capacity could be used to alleviate I/O pressure such as contention
for network bandwidth, disk I/O, or other bottlenecks within the cluster.

Another strategy is to distribute the spare capacity around the cluster so that the
individual VMs can share the extra I/O bandwidth. When a machine needs to be evacuated
for repairs, the VMs are moved into the spare capacity that has been spread around the
cluster.

This scheme is depicted in Figure 13.3(b). If machine A requires maintenance, its VMs can
be moved to B, C, D, and E. Similar steps can be followed for the other machines.

Unfortunately, this causes a new problem: There might not be a single machine with
enough spare capacity to accept a VM that needs to be evacuated from a failing physical
host. This situation is depicted in Figure 13.3(c). If machine A needs maintenance, there is
no single machine with enough capacity to receive the largest of its VMs. VMs cannot
straddle two physical machines. This is not a problem for machines B through E, whose
VMs can be evacuated to the remaining space.

If machine A needed to be evacuated, ﬁrst we would need to consolidate free space onto a
single physical machine so that the larger VM has a place to move to. This is known as a
Towers of Hanoi situation, since moves must be done recursively to enable other moves.
These additional VM moves make the evacuation process both more complex and longer.
The additional VM migrations a ect VMs that would otherwise not have been involved.
They now must su er the temporary performance reduction that happens during
migration, plus the risk that the migration will fail and require a reboot.

Figure 13.3(d) depicts the same quantity and sizes of VMs packed to permit any single
machine to be evacuated for maintenance.

Some VM cluster management systems include a tool that will calculate the minimum
number of VM moves to evacuate a physical machine. Other systems simply refuse to
create a new VM if it will create a situation where the cluster is no longer N + 1 redundant.
Either way, virtual clusters should be monitored for loss of N + 1 redundancy, so that you
are aware when the situation has happened and it can be corrected proactively.

13.3.5 Uniﬁed VM/Non-VM Management

Most sites end up with two entirely di erent ways to request, allocate, and track VMs and
non-VMs. It can be beneﬁcial to have one system that manages both. Some cluster
management systems will manage a pool of bare-metal machines using the same API as
VMs. Creating a machine simply allocates an unused machine from the pool. Deleting a
machine marks the machine for reuse.

Another way to achieve this is to make everything a VM, even if that means o ering an an

extra large size, which is a VM that ﬁlls the entire physical machine: . While such
machines will have a slight performance reduction due to the VM overhead, unifying all
machine management within one process beneﬁts customers, who now have to learn only
one system, and makes management easier.

13.3.6 Containers

Containers are another virtualization technique. They provide isolation at the process level
instead of the machine level. While a VM is a machine that shares physical hardware with
other VMs, each container is a group of processes that run in isolation on the same
machine. All of the containers run under the same operating system, but each container is
self-contained as far as the ﬁles it uses. Therefore there is no dependency hell.

Containers are much lighter weight and permit more services to be packed on fewer
machines. Docker, Mesos, and Kubernetes are popular systems for managing large
numbers of containers. They all use the same container format, which means once a
container is created, it can be used on a desktop, server, or huge farm of servers.

A typical containerization cluster has many physical machines running a stripped-down

version of a server OS. Since each container is self-contained, the OS does not need much
installed other than the kernel, the management framework, and the ﬁles required to boot
the system. CoreOS and Red Hat Atomic are two such stripped-down operating systems.

Pros and cons of virtualization and containerization, as well as technical details of how
they work, can be be found in Volume 2, Chapter 3, “Selecting a Service Platform,” of this
book series.

Self-service container administration is made possible by Kubernetes and other container

management systems. Customers provide the container speciﬁcation and the
management system ﬁnds an available machine and spins up the container.

One bad situation people get into with containers is a lack of reproducibility. After using
the system for a while, there comes a day when a container needs to be rebuilt from
scratch to upgrade a library or other ﬁle. Suddenly the team realizes no one is around who
remembers how the container was made. The way to prevent this is to make the
container’s creation similar to compiling software: A written description of what is to be in
the container is passed through automation that reads the description and outputs the
container. The description is tracked in source code control like any other source code.
Containers should be built using whatever continuous integration (CI) system is used for
building other software in your organization.

13.4 Grid Computing

Grid computing takes many similar machines and manages them as a single unit. For
example, four racks of 40 servers each would form a grid of 160 machines. Each one is
conﬁgured exactly alike—same hardware and software.

To use the grid, a customer speciﬁes how many machines are needed and which software
package to run. The grid management system allocates the right number of machines,
installs the software on them, and runs the software. When the computation is done, the
results are uploaded to a repository and the software is de-installed.

A big part of grid management software is the scheduling algorithm. Your job is held until
the number of machines requested are available. When you request many machines, that
day may never come. You would be very disappointed if your job, which was expected to
run for an hour, took a week to start.

To start a big request, one has to stall small requests until enough machines are available,
which wastes resources for everyone. Suppose your job required 100 machines. The
scheduler would stall any new jobs until 100 machines are idle. Suppose at ﬁrst there are
50 machines idle. An hour later, another job has completed and now there are 75
machines idle. A day later a few more jobs have completed and there are 99 machines
idle. Finally another job completes and there are 100 free machines, enough for your job to
run. During the time leading up to when your job could run, many machines were sitting
idle. In fact, more than 75 compute-days were lost in this example.

The scheduler must be very smart and mix small and big requests to both minimize wait
time and maximize utilization. A simple algorithm is to allocate half the machines for big
jobs. They will stay busy if there is always a big job ready to run. Allocate the other half of
the machines for smaller jobs; many small and medium jobs will pass through those
machines.

More sophisticated algorithms are invented and tested all the time. At Google the
scheduling algorithms have become so complex that a simulator was invented to make it
possible to experiment with new algorithms without having to put them into production.

Typically grids are very controlled systems. All allocations are done though the grid
management and scheduling system. Each machine runs the same OS conﬁguration.
When entropy is detected, a machine is simply wiped and reloaded.

Because many machines are being purchased, shaving a few dollars o the cost of each
machine pays big dividends. For example, a video card is not needed since these
machines will not be used for interactive computing. For a 1,000-machine cluster, this can
save thousands of dollars. RAID cards, fancy plastic front plates with awesome LED
displays, redundant power supplies, and other add-ons are eliminated to save a few
dollars here and there. This multiplies out to thousands or millions of dollars overall.

Data integrity systems such as RAID cards are generally not installed in grid hardware
because the batch-oriented nature of jobs means that if a machine dies, the batch can be
rerun.

While one wants a grid that is reliable, compute e iciency is more important. E iciency is
measured in dollars per transaction. Rather than examining the initial purchase price, the
total cost of ownership (TCO) is considered. For example, ARM chips are less expensive
than x86 Intel chips, but they are slower processors. Therefore you need more of them to
do the same job. Is it better to have 1,000 high-powered Intel-based machines or 2,000
ARM-based machines? The math can be fairly complex. The ARM chips use less power but
there are more of them. The amount of power used is related directly to how much cooling
is needed. The additional machines require more space in the datacenter. What if the
additional machines would result in a need to build more datacenters? TCO involves
taking all of these factors into account during the evaluation.

Grid computing has other constraints. Often there is more bandwidth between machines
on the same rack, and less bandwidth between racks. Therefore some jobs will execute
faster if rack locality (putting all related processes in one rack) can be achieved.
Grid computing is more e icient than virtualization because it eliminates the virtualization
overhead, which is typically a 5 to 10 percent reduction in performance.

Grids are easier to manage because what is done for one machine is done for all
machines. They are fungible units—each one can substitute for the others. If one machine
dies, the scheduler can replace it with another machine in the grid.

Case Study: Disposable Servers

Like many web sites, Yahoo! builds mammoth clusters of low-cost 1U PC servers. Racks
are packed with as many servers as possible, with dozens or hundreds conﬁgured to
provide each service required. Yahoo! once reported that when a unit died, it was more
economical to power it o and leave it in the rack rather than repair the unit. Removing
dead units might accidentally cause an outage if other cables were loosened in the
process. Eventually the machine would be reaped and many dead machines would be
repaired en masse.

13.5 Blade Servers

There is a lot of overhead in installing individual machines. Each machine needs to be

racked, connected to power, networked, and so on. Blade servers reduce this overhead by
providing many servers in one chassis.

A blade server has many individual slots that take motherboards, called blades, that
contain either a computer or storage. Each blade can be installed quickly and easily
because you simply slide a card into a slot. There are no cables to connect; the blade’s
connector conveys power, networking, and I/O connectivity. Additional capacity is added
by installing more blades, or replacing older blades with newer, higher-capacity models.

Blades can be used to implement many of the strategies listed previously in this chapter.
Computers within the blade chassis can be allocated individually, or used to create a
virtualization cluster, or used to create a grid.

Another beneﬁt of blade systems is that they are software conﬁgurable. Assigning a disk to
a particular computer or a computer to a particular network is done through a
management console or API. Does a particular blade computer need an additional hard
drive? A few clicks and it is connected to storage from another blade. No trip to the
datacenter is required.

Blade systems are most cost-e ective when the chassis lasts many years, enabling you to
upgrade the blades for the duration of its lifetime. This amortizes the cost of the chassis
over many generations of blades. The worst-case scenario is that shortly after investing in
a chassis, a newer, incompatible chassis model becomes available. When you buy a
blade-based system, you are betting that the chassis will last a long time and that new
blades will be available for its entire lifespan. If you lose the bet, a forklift upgrade can be
very expensive.

Grid Uniformity

A division of a large multinational company was planning on replacing its aging multi-CPU
server with large grid computing environment, implemented as a farm of blade servers.
The application would be recoded so that instead of using multiple processes on a single
machine, it would use processes spread over the blade farm. Each blade would be one
node of a vast compute farm to which jobs could be submitted, with the results
consolidated on a controlling server.

This had wonderful scalability, since a new blade could be added to the farm and be
usable within minutes. No direct user logins were needed, and no SA work would be
needed beyond replacing faulty hardware and managing which blades were assigned to
which applications. To this end, the SAs engineered a tightly locked-down minimal-access
solution that could be deployed in minutes. Hundreds of blades were purchased and
installed, ready to be purposed as the customer required.

The problem came when application developers found themselves unable to manage their
application. They couldn’t debug issues without direct access. They demanded shell
access. They required additional packages. They stored unique state on each machine, so
automated builds were no longer viable. All of a sudden, the SAs found themselves
managing 500 individual servers rather than one uniﬁed blade farm. Other divisions had
also signed up for the service and made the same demands.

A number of things could have prevented this problem. Management should have required
more discipline. Once the developers started requesting access, management should
have set limits that would have prevented the system from devolving into hundreds of
custom machines. Deployment of software into production should have been automated
using a continuous integration/continuous deployment (CI/CD) system. If the only way to
put things in production is by automated deployment, then repeatability can be more
easily achieved. There should have been separate development and production
environments, with fewer access restrictions in the development environment. More
attention to detail at the requirements-gathering stage might have foreseen the need for
developer access, which would have led to the requirement for and funding of a
development environment.

13.6 Cloud-Based Compute Services

Another strategy is to not own any machines at all, but rather to rent capacity on someone
else’s system. Such cloud-based computing lets you beneﬁt from the economies of scale
that large warehouse-size datacenters can provide, without the expense or expertise to
run them. Examples of cloud-based compute services include Amazon AWS, Microsoft
Azure, and Google Compute Engine.

13.6.1 What Is the Cloud?

Cloud computing is a marketing term fraught with confusing interpretations. Marketing

people will gladly tell you that the cloud solves all problems, as if there are no problems
left in the world now that the cloud is here.

There are three common deﬁnitions for the cloud, each coming from di erent
communities:

• Consumers: When a typical consumer uses the term the cloud, they mean putting their
data on a web-based platform. The primary beneﬁt is that this data becomes accessible
from anywhere. For example, consumers might have all their music stored in the cloud; as
a result their music can be played on any device that has Internet access.
• Business people: Typically business people think of the cloud as some kind of rented
computing infrastructure that is elastic. That is, they can allocate one or thousands of
machines; use them for a day, week, or year; and give them back when they are done. They
like the fact that this infrastructure is a payas-you-go and on-demand system. The on-
demand nature is the most exciting because they won’t have to deal with IT departments
that could take months to deliver a single new machine, or simply reject their request.
Now with a credit card, they have a partner that always says yes.

• IT professionals: When all the hype is removed (and there is a lot of hype), cloud
computing comes down to someone else maintaining hardware and networks so that
customers can focus on higher-level abstractions such as the operating system and
applications. It requires software that is built di erently and new operational methods. IT
professionals shift from being the experts in how to install and set up computers to being
the experts who understand the full stack and become valued for their architectural
expertise, especially regarding how the underlying infrastructure a ects performance and
reliability of the application, and how to improve both.

This book uses a combination of the last two deﬁnitions. The on-demand nature of the
cloud beneﬁts us by providing an elastic computing environment: the ability to rapidly and
dynamically grow and shrink. As IT professionals, it changes our role to focus on
performance and reliability, and be the architecture experts.

13.6.2 Cloud Computing’s Cost Beneﬁts

Over the years computer hardware has grown less expensive due to Moore’s law and other
factors. Conversely, the operation of computers has become increasingly expensive. The
increased operational cost diminishes and often eliminates the cost reductions.

The one place where operational cost has been going down instead of up is in large grids
or clusters. There the entire infrastructure stack of hardware, software, and operations can
be vertically integrated to achieve exceptional cost savings at scale.

Another way to think about this issue is that of all the previously discussed strategies, the
“buy in bulk, allocate fractions” strategy described earlier is generally the most
economical and proves the most ﬂexible. Cloud-based compute services take that
strategy to a larger scale than most companies can achieve on their own, which enables
these smaller companies take advantage of these economics.

As these cost trends continue, it will become di icult for companies to justify maintaining
their own computers. Cloud computing will go from the exception that few companies are
able to take advantage of, to the default. Companies that must maintain their own
hardware will be an exception and will be at an economic disadvantage.

Adoption of cloud computing is also driven by another cost: opportunity cost. Opportunity
cost is the revenue lost due to missed opportunities. If a company sees an opportunity but
the competition beats them to it, that could be millions of potential dollars lost.

Some companies miss opportunities because they cannot hire people fast enough to sta
a project. Other companies do not even attempt to go after certain opportunities because
they know they will not be able to ramp up the IT resources fast enough to address the
opportunity. Companies have missed multi-million-dollar opportunities because an IT
department spent an extra month to negotiate a slightly lower price from the vendor.
If cloud computing enables a company to spin up new machines in minutes, without any
operations sta , the ability to address opportunities is greatly enhanced. At many
companies it takes months, and sometimes an entire year, from the initial request to
realizing a working server in production. The actual installation of the server may be a few
hours, but it is surrounded by months of budget approvals, bureaucratic approvals,
security reviews, and IT managers who think it is their job to always say “no.”

We are optimistic about cloud computing because it enables new applications that just
can’t be done any other way. Kevin McEntee, vice-president at Netflix, put it succinctly:
“You can’t put a price on nimble.” In his 2012 talk at re:Invent, McEntee gave examples of
how the elastic ability of cloud computing enabled his company to process video in new
ways. He extolled the value of elastic computing’s ability to let Netflix jump on
opportunities that enabled it to be a part of the iPad launch and the Apple TV launch, and
to quickly launch new, large libraries of videos to the public. Netflix has been able to make
business deals that its competition couldn’t. “If we were still building out datacenters in
the old model we would not have been able to jump on those opportunities” (McEntee
2012).

There are legal and technical challenges to putting your data on other people’s hardware.
That said, do not assume that HIPAA compliance and similar requirements automatically
disqualify you from using cloud-based services. Cloud vendors have a variety of compliant
options and ways of managing risk. A teleconference between your legal compliance
department and the provider’s sales team may lead to surprising results.

It is also possible to have an in-house cloud environment. With this approach, a

department runs an in-house cloud and bills departments for use. This is common in
industries that must meet strict HIPAA compliance requirements but want the economic
beneﬁts of massive scale.

Volume 2 of this book series, especially Chapter 3, “Selecting a Service Platform,”

contains advice about selecting and managing a cloud platform.

13.6.3 Software as a Service

Software as a service (SaaS) means using web-based applications. Many small

companies have no servers at all. They are able to meet all their application needs with
web-based o erings. For example, they host their web site using a hosting service, they
use a web-based payroll provider, they share files using Dropbox, they use Salesforce for
customer relationship management and sales process management, and they use Google
Apps for Work for word processing, spreadsheets, and presentations. If they use
Chromebooks, iPads, or other fixed-configuration web-browsing devices, they don’t need
traditional computers and can eliminate a lot of the traditional enterprise IT infrastructure
that most companies require.

This strategy was impossible until the early 2010s. Since then, ubiquitous fast Internet
connections, HTML5’s ability to create interactive applications, and better security
features have made this possible.

When a company adopts such a strategy, the role of the IT department becomes that of an
IT coordinator and integrator. Rather than running clients and services, someone is
needed to coordinate vendor relationships, introduce new products into the company,
provide training, and be the ﬁrst stop for support before the provider is contacted directly.
Technical work becomes focused on high-level roles such as software development for
integrating the tools, plus low-level roles such as device support and repair management.

13.7 Server Appliances

An appliance is a device designed speciﬁcally for a particular task. Toasters make toast.
Blenders blend. One could do these things using general-purpose devices, but there are
beneﬁts to using a device designed to do one task very well.

The computer world also has appliances: ﬁle server appliances, web server appliances,
email appliances, DNS/DHCP appliances, and so on. The ﬁrst appliance was the
dedicated network router. Some sco ed, “Who would spend all that money on a device
that just sits there and pushes packets when we can easily add extra interfaces to our VAX
and do the same thing?” It turned out that quite a lot of people would. It became obvious
that a box dedicated to doing a single task, and doing it well, was in many cases more
valuable than a general-purpose computer that could do many tasks. And, heck, it also
meant that you could reboot the VAX without taking down the network for everyone else.

A server appliance brings years of experience together in one box. Architecting a server is
di icult. The physical hardware for a server has all the requirements listed earlier in this
chapter, as well as the system engineering and performance tuning that only a highly
experienced expert can do. The software required to provide a service often involves
assembling various packages, gluing them together, and providing a single, uniﬁed
administration system for it all. It’s a lot of work! Appliances do all this for you right out of
the box.

Although a senior SA can engineer a system dedicated to ﬁle service or email out of a
general-purpose server, purchasing an appliance can free the SA to focus on other tasks.
Every appliance purchased results in one less system to engineer from scratch, plus
access to vendor support in case of an outage. Appliances also let organizations without
that particular expertise gain access to well-designed systems.

The other benefit of appliances is that they often have features that can’t be found
elsewhere. Competition drives the vendors to add new features, increase performance,
and improve reliability. For example, NetApp Filers have tunable file system snapshots
that allow end users to “cd back in time,” thus eliminating many requests for file restores.

13.8 Hybrid Strategies

Most organizations actually employ a number of di erent strategies. Typically a small

organization has a few snowﬂakes and possibly a few eggs in one basket. Medium-size
organizations usually have a virtualization cluster plus a few snowﬂakes for situations
where virtualization would not work. Large organizations have a little of everything.

Use what is most appropriate. Small companies often can’t justify a grid. For large
companies it is best to build a private, in-house cloud. If a company needs to get started
quickly, cloud computing permits it to spin up new machines without the delay of
specifying, purchasing, and installing machines.

Often one platform is selected as the default and exceptions require approval. Many IT
departments provide VMs as the default, and bare-metal requests require proof that the
application is incompatible with the virtualization system. Some companies have a “cloud
ﬁrst” or “cloud only” strategy.

Depending on the company and its culture, di erent defaults may exist. The default
should be what’s most e icient, not what’s least expensive.

The only strategy we recommend against is an organization trying to deploy all of these
strategies simultaneously. It is not possible to be good at all strategies. Pick a few, or one,
and get very good at it. This will be better than providing mediocre service because you are
spread too thin.

13.9 Summary

Servers are the hardware used to provide services, such as ﬁle service, mail service,
applications, and so on. There are three general strategies to manage servers.

All eggs in one basket has one machine that is used for many purposes, such as a
departmental server that provides DNS, email, web, and ﬁle services. This puts many
critical services in one place, which is risky.

With beautiful snowflakes, there are many machines, each configured di erently. Each
machine is configured exactly as needed for its purpose, which sounds optimal but is a
management burden. It becomes important to manage variations, reducing the number of
types of things that are managed. We can do this many ways, including adopting a policy
of eliminating one generation of products before adopting a new one.

Buy in bulk, allocate fractions uses virtualization or containers to achieve economies of

scale in hardware management but provides a variety of machine sizes to users.
Virtualization provides the ability to migrate VMs between physical machines, packing
them for e iciency or to move VMs away from failing hardware. Packing VMs must be done
in a way that maintains N + 1 redundancy.

There are several related strategies. Grid computing provides hundreds or thousands of
machines managed as one large computer for large computational tasks. Blade servers
make hardware operations more e icient by providing individual units of computer power
or storage in a special form factor. Cloud-based computing rents time on other people’s
server farms, enabling one to acquire additional compute resources dynamically.
Software as a service (SaaS) eliminates the need for infrastructure by relying on web-
based applications. Server appliances eliminate the need for local engineering knowledge
by providing premade, preconﬁgured hardware solutions.

Organizations use a mixture of these strategies. Most have one primary strategy and use
the others for speciﬁc instances. For example, it is very common to see a company use
virtualization as the default, physical machines as the exception, and SaaS for designated
applications.

Exercises

1. What are the three primary hardware strategies? Describe each.

2. What are the pros and cons of the three primary hardware strategies? Name two
situations where each would be the best ﬁt.

3. What is the hardware strategy used in your organization? Why was it selected?
4. How does your organization beneﬁt from the hardware strategy it uses? Why is it better
or worse than the other two strategies?

5. Why is data integrity so important in the all eggs in one basket strategy?

6. Why won’t RAID protect data if there is a ﬁre?

7. What is dependency hell?

8. With beautiful snowﬂakes, each server is exactly what is needed for the application.
This sounds optimal. How can it be a bad thing?

9. In your environment, how many di erent server vendors are used? List them. Do you
consider this to be a lot of vendors? What would be the beneﬁts and problems of
increasing the number of vendors? Decreasing this number?

10. What are some of the types of variations that can exist in a server ﬂeet? Which
management overhead does each of them carry?

11. What are some of the ways an organization can reduce the number and types of
hardware variations in its server ﬂeet?

12. In what way is buy in bulk, allocate fractions more e icient than the other strategies?

13. What is a server appliance?

14. Which server appliances are in your environment? What kind of engineering would you
have to do if you had instead purchased a general-purpose machine to do the same
function?

15. Which services in your environment would be good candidates for replacement with a
server appliance (whether or not such an appliance is available)? Why are they good
candidates?

16. Live migration sounds like magic. Research how it works and summarize the process.

17. If your organization uses virtualization, how quickly can a new VM be created? Request
one and time the duration from the original request to when you are able to use it.

18. Section 13.3 says that it can take minutes to create a VM. If the answer to Exercise 17
was longer than a few minutes, investigate where all the time was spent.

19. The cloud means di erent things to di erent groups of people. How are these
deﬁnitions related? Which deﬁnition do you use?

20. What can cloud-based computing do that a small or medium-size company cannot do
for itself?

WINDOWS SERVER NOTES (Latest)
100% (3)
WINDOWS SERVER NOTES (Latest)
68 pages
Harvard Business Review Case Study Agero
100% (1)
Harvard Business Review Case Study Agero
12 pages
Deed of EXCHANGE of MOTOR VEHICLE
100% (10)
Deed of EXCHANGE of MOTOR VEHICLE
2 pages
Research Assistants 1
No ratings yet
Research Assistants 1
2 pages
Lea 2 - Comparative Police System (Lesson 1)
No ratings yet
Lea 2 - Comparative Police System (Lesson 1)
8 pages
CONSTI LAW I - Council of Teachers V Secretary of Education PDF
100% (6)
CONSTI LAW I - Council of Teachers V Secretary of Education PDF
4 pages
AZURE Data Fundamentals Final
No ratings yet
AZURE Data Fundamentals Final
185 pages
System Administration and IT Infrastructure Services
100% (1)
System Administration and IT Infrastructure Services
69 pages
Chapter 3 - Servers
No ratings yet
Chapter 3 - Servers
53 pages
L4NIT - Window Server Administration
No ratings yet
L4NIT - Window Server Administration
73 pages
Lecture 3 - Availability Mechanism
No ratings yet
Lecture 3 - Availability Mechanism
27 pages
Windows Server Architecture
No ratings yet
Windows Server Architecture
5 pages
NIA - Chapter 15
No ratings yet
NIA - Chapter 15
13 pages
Service Diesel Motorpal World India: Explanations
No ratings yet
Service Diesel Motorpal World India: Explanations
13 pages
DC Interrview May 2024
No ratings yet
DC Interrview May 2024
34 pages
Syllabus-EE 414, 517, Deep Learning, Fall 2023
No ratings yet
Syllabus-EE 414, 517, Deep Learning, Fall 2023
4 pages
Service Levels For Cloud Applications Ready For The Cloud
No ratings yet
Service Levels For Cloud Applications Ready For The Cloud
36 pages
Server Technology 600
No ratings yet
Server Technology 600
200 pages
GBC - Group Contract Assignment Guidelines and Rubric 2023 3
No ratings yet
GBC - Group Contract Assignment Guidelines and Rubric 2023 3
4 pages
Biogase and Its Uses
No ratings yet
Biogase and Its Uses
1 page
Enterprise Computing Note
No ratings yet
Enterprise Computing Note
108 pages
Job Application Letter Title
100% (1)
Job Application Letter Title
8 pages
George Henri Hazard - GM
No ratings yet
George Henri Hazard - GM
9 pages
Admin UK 22 14 DigiSub
No ratings yet
Admin UK 22 14 DigiSub
100 pages
Chapter 2 PDF
No ratings yet
Chapter 2 PDF
16 pages
Petts, Ann - Shapley, Bernard - On Supervision - Psychoanalytic and Jungian Analytic Perspectives-Karnac (2007)
100% (1)
Petts, Ann - Shapley, Bernard - On Supervision - Psychoanalytic and Jungian Analytic Perspectives-Karnac (2007)
266 pages
Experimental Study On The Application of Polymer Modified Bitumen in The Flexible Pavement
No ratings yet
Experimental Study On The Application of Polymer Modified Bitumen in The Flexible Pavement
1 page
Lab 6 Excel
No ratings yet
Lab 6 Excel
6 pages
Case Study Week
No ratings yet
Case Study Week
12 pages
Serverless-How I Learned To Stop Worrying and Start Loving The Cloud
No ratings yet
Serverless-How I Learned To Stop Worrying and Start Loving The Cloud
14 pages
Group 4 - Enterprise Systems
No ratings yet
Group 4 - Enterprise Systems
73 pages
System Administration & Maintenance
No ratings yet
System Administration & Maintenance
42 pages
Assessment Task 2
No ratings yet
Assessment Task 2
16 pages
Cloud Computing Concepts
No ratings yet
Cloud Computing Concepts
59 pages
UECS3223-Lecture 1-Server Technology
No ratings yet
UECS3223-Lecture 1-Server Technology
67 pages
Va 28 16 00
No ratings yet
Va 28 16 00
48 pages
An Architecture For Modular Data Centers: James Hamilton
No ratings yet
An Architecture For Modular Data Centers: James Hamilton
8 pages
Weekly Calendar
No ratings yet
Weekly Calendar
53 pages
Absorption Costing: Management Accounting For Planning and Control
No ratings yet
Absorption Costing: Management Accounting For Planning and Control
3 pages
NIA - Chapter 14
No ratings yet
NIA - Chapter 14
16 pages
Certificates
No ratings yet
Certificates
54 pages
Cloud Computing: Traditional Sever Concept
No ratings yet
Cloud Computing: Traditional Sever Concept
35 pages
Math 7 Unit 2 Introducing Proportional Relationships Extra Practice Problems KEY
No ratings yet
Math 7 Unit 2 Introducing Proportional Relationships Extra Practice Problems KEY
2 pages
Security and Utility Computing: The Savvis Way
No ratings yet
Security and Utility Computing: The Savvis Way
25 pages
Computer Server Management and Security Setting
No ratings yet
Computer Server Management and Security Setting
18 pages
3 Servers
No ratings yet
3 Servers
26 pages
Imsysad C3
No ratings yet
Imsysad C3
7 pages
03-Chapter 3 - Servers
No ratings yet
03-Chapter 3 - Servers
26 pages
Performance Concepts
No ratings yet
Performance Concepts
35 pages
10 Things Old Servers
No ratings yet
10 Things Old Servers
4 pages
Ease Server Support With Pre-Configured Virtualization Systems
No ratings yet
Ease Server Support With Pre-Configured Virtualization Systems
8 pages
Adil 1234
No ratings yet
Adil 1234
1 page
RH Cloud IaaS WP 6943777 0611 DM Web
No ratings yet
RH Cloud IaaS WP 6943777 0611 DM Web
7 pages
Lab Rats Inc.: Roject LAN
No ratings yet
Lab Rats Inc.: Roject LAN
59 pages
Tutor's Support Pack: HND Accounting Graded Unit 3 DE66 35
No ratings yet
Tutor's Support Pack: HND Accounting Graded Unit 3 DE66 35
172 pages
Enterprise Network Servers
No ratings yet
Enterprise Network Servers
2 pages
Module 5 Servers
No ratings yet
Module 5 Servers
22 pages
Capacity Planning For IT - 2
No ratings yet
Capacity Planning For IT - 2
20 pages
Servers
No ratings yet
Servers
25 pages
It Infrastructure
No ratings yet
It Infrastructure
27 pages
Developing Applications: Google
No ratings yet
Developing Applications: Google
11 pages
Module 1 - Server Overview
No ratings yet
Module 1 - Server Overview
21 pages
Basic Firefighting Course
No ratings yet
Basic Firefighting Course
15 pages
w2-ITT565-Lecture2-UFUTURE SERVER MGT AND MAINTENANCE
No ratings yet
w2-ITT565-Lecture2-UFUTURE SERVER MGT AND MAINTENANCE
29 pages
PFA Chemical Resistance Chart
No ratings yet
PFA Chemical Resistance Chart
8 pages
Menú Especial: Platos para Compartir
No ratings yet
Menú Especial: Platos para Compartir
1 page
PressureVesselHandbook 10theditionbyEMegyesy 1
No ratings yet
PressureVesselHandbook 10theditionbyEMegyesy 1
246 pages
Microsoft Solutions For Small & Medium Business: Medium IT Solution Series Medium Business Solution For Core Infrastructure
No ratings yet
Microsoft Solutions For Small & Medium Business: Medium IT Solution Series Medium Business Solution For Core Infrastructure
18 pages
Chickering Seven Principles
No ratings yet
Chickering Seven Principles
6 pages
Ce2304 Nol
No ratings yet
Ce2304 Nol
171 pages
Exercises Chapter 6 Capital Allowance
No ratings yet
Exercises Chapter 6 Capital Allowance
2 pages
17 Cloud Computing
No ratings yet
17 Cloud Computing
35 pages
Biostar H61MLB Spec
No ratings yet
Biostar H61MLB Spec
2 pages
All1 7ForMidTerm PDF
No ratings yet
All1 7ForMidTerm PDF
97 pages
10) Introduction To Web Applications Material
No ratings yet
10) Introduction To Web Applications Material
8 pages
Personal Hygiene - Learn: Do Not Wear Jewellery and Watches While Working
No ratings yet
Personal Hygiene - Learn: Do Not Wear Jewellery and Watches While Working
16 pages
About MCB: Vision Statement
No ratings yet
About MCB: Vision Statement
7 pages
Submitted To Submitted by Ms. Rajdeep Kaur Kamal Lect. CSE Dept
No ratings yet
Submitted To Submitted by Ms. Rajdeep Kaur Kamal Lect. CSE Dept
6 pages
How To Design A System To Scale To Your First 100 Million Users - by Anh T. Dang - Level Up Coding
No ratings yet
How To Design A System To Scale To Your First 100 Million Users - by Anh T. Dang - Level Up Coding
34 pages
Chapter 2 - Preparing For Server Installation: at A Glance
No ratings yet
Chapter 2 - Preparing For Server Installation: at A Glance
7 pages
Chapter 2 - Operating Systems
No ratings yet
Chapter 2 - Operating Systems
12 pages
Computer Servers
No ratings yet
Computer Servers
8 pages
The Necessity of Change Management in The Intel Server Space
No ratings yet
The Necessity of Change Management in The Intel Server Space
11 pages
2 Plugins Changelog
No ratings yet
2 Plugins Changelog
3 pages
Availability Digest: Blueprints For High Availability
No ratings yet
Availability Digest: Blueprints For High Availability
8 pages
What Is IT Infrastructure and What Are Its Components
No ratings yet
What Is IT Infrastructure and What Are Its Components
7 pages
Chapter 2 - Operating Systems
No ratings yet
Chapter 2 - Operating Systems
12 pages
Group 1 - Summary Chap 5
No ratings yet
Group 1 - Summary Chap 5
7 pages
Magic Quadrant For Blade Servers
No ratings yet
Magic Quadrant For Blade Servers
10 pages
Semi Automated Wireless Beach Cleaning Robot
No ratings yet
Semi Automated Wireless Beach Cleaning Robot
3 pages
Servers
No ratings yet
Servers
8 pages
012 Cleanliness
No ratings yet
012 Cleanliness
34 pages
Chapter 1 Server
No ratings yet
Chapter 1 Server
5 pages
Commerce Clause Flowchart
100% (1)
Commerce Clause Flowchart
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

NIA - Chapter 13

Uploaded by

NIA - Chapter 13

Uploaded by

Chapter 13.

Server Hardware Strategies

By comparison, when we have hundreds or thousands of servers, saving a little money on

The three most common strategies are:

• Beautiful snowﬂakes: Many machines, each uniquely conﬁgured

In addition, there are variations and alternatives:

• Grid computing: Many machines managed one as unit

• Cloud-based compute services: Renting use of someone else’s servers

• Software as a service (SaaS): Web-hosted applications

• Server appliances: Purpose-built devices, each providing a di erent service

13.1 All Eggs in One Basket

13.2 Beautiful Snowﬂakes

13.2.1 Asset Tracking

When managing many unique machines it becomes increasingly important to maintain an

13.2.2 Reducing Variations

Google’s SysOps Death Squads

13.2.3 Global Optimization

This is a classic example of a local optimization that results in a global de-optimization.

The Unique Storage Device

13.3 Buy in Bulk, Allocate Fractions

Buying an individual machine has a large overhead. It must be speciﬁed, ordered,

13.3.2 Live Migration

Figure 13.1: VM servers with shared storage

Figure 13.2: Migrating VM7 between servers

13.3.4 Spare Capacity for Maintenance

Figure 13.3: VM packing

13.3.5 Uniﬁed VM/Non-VM Management

A typical containerization cluster has many physical machines running a stripped-down

Self-service container administration is made possible by Kubernetes and other container

13.4 Grid Computing

Case Study: Disposable Servers

13.5 Blade Servers

There is a lot of overhead in installing individual machines. Each machine needs to be

13.6 Cloud-Based Compute Services

13.6.1 What Is the Cloud?

Cloud computing is a marketing term fraught with confusing interpretations. Marketing

13.6.2 Cloud Computing’s Cost Beneﬁts

It is also possible to have an in-house cloud environment. With this approach, a

Volume 2 of this book series, especially Chapter 3, “Selecting a Service Platform,”

13.6.3 Software as a Service

Software as a service (SaaS) means using web-based applications. Many small

13.7 Server Appliances

13.8 Hybrid Strategies

Most organizations actually employ a number of di erent strategies. Typically a small

Buy in bulk, allocate fractions uses virtualization or containers to achieve economies of

1. What are the three primary hardware strategies? Describe each.

6. Why won’t RAID protect data if there is a ﬁre?

7. What is dependency hell?

13. What is a server appliance?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.