Principles of System Administration
Principles of System Administration
Jan Schaumann
jschauma@netmeister.org
Preface ii
2 Unix 28
2.1 Unix History . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.1.1 The Operating System . . . . . . . . . . . . . . . . . . 28
2.1.2 Networking . . . . . . . . . . . . . . . . . . . . . . . . 35
2.1.3 Open Source . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 Basic Unix Concepts and Features . . . . . . . . . . . . . . . . 38
1
CONTENTS 2
3 Documentation Techniques 55
3.1 System Documentation Writing 101 . . . . . . . . . . . . . . . 56
3.1.1 Know Your Audience . . . . . . . . . . . . . . . . . . . 56
3.2 Online Documentation . . . . . . . . . . . . . . . . . . . . . . 58
3.3 Different Document Types . . . . . . . . . . . . . . . . . . . . 59
3.3.1 Processes and Procedures . . . . . . . . . . . . . . . . 60
3.3.2 Policies . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3.3 Online Help and Reference . . . . . . . . . . . . . . . . 62
3.3.4 Infrastructure Architecture and Design . . . . . . . . . 63
3.3.5 Program Specification and Software Documentation . . 64
3.4 Collaboration . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.5 Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8 Automation 213
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
8.2 Of Laziness And Other Virtues . . . . . . . . . . . . . . . . . 214
8.3 Benefits of Automation . . . . . . . . . . . . . . . . . . . . . . 216
8.3.1 Repeatability . . . . . . . . . . . . . . . . . . . . . . . 216
8.3.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . 217
8.3.3 Flexibility . . . . . . . . . . . . . . . . . . . . . . . . . 219
8.4 Who benefits from automation? . . . . . . . . . . . . . . . . . 220
8.4.1 Ourselves . . . . . . . . . . . . . . . . . . . . . . . . . 220
8.4.2 Our Peers . . . . . . . . . . . . . . . . . . . . . . . . . 221
8.4.3 All Users . . . . . . . . . . . . . . . . . . . . . . . . . . 222
8.5 Levels of Automation . . . . . . . . . . . . . . . . . . . . . . . 222
8.6 Automation Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . 224
CONTENTS 5
10 Networking 265
11 Security 266
7
List of Figures
8
Preface to the preface
Jan Schaumann
New York, October 11, 2022
i
Preface
The book you are holding is intended to cover the broad topic of “System
Administration” from a conceptual point of view with a particular focus on
the foundation of systems administration in a large scale context. While the
book will pay attention to Operating System specific details – in particular
the Unix family of operating systems – it does not provide step-by-step in-
structions, such as how to set up a mail server or how to run a monitoring
system. Instead, the basic underlying principles and the factors that help
decide how to best set up a system for a given purpose will be the topic of
discussion.
Wherever possible, case studies and real world examples based on my own
experiences in both small and large environments will be used to illustrate
the lessons learned.
ii
PREFACE iii
Conventions
A lot of ink and many more innocent bits have been wasted on the difference
of the terms “UNIX®”, UNIX, “Unix”, and “unix-like”. For the purpose of
this book it hardly warrants further distinction anymore: all such systems
discussed in this text and used for examples are “a kind of Unix” (though
few will actually be certified by The Open Group and thus be allowed to call
themselves UNIX®). These include the various Open Source implementa-
tions and derivatives such as the BSD family or Linux (itself trademarked,
by the way). We will use the term “Unix” to refer to these different systems.
If we have to distinguish between the “Linux” operating system and other
“Unices” (the plural we shall avoid), we will follow common convention and
simply call the operating system technically consisting of the Linux kernel
and all additional software – much of it provided by or derived from the GNU
Project – as “Linux” rather than “GNU/Linux”.
PREFACE vi
$ whoami
jschauma
$ su
Password :
# whoami
root
# exit
$
For simplicity, we will use the terms “vendor” or “provider” in either case
when referring to the entity distributing the software.
websites and the Internet at large – and to allow them to use such materials
so long as they properly cite their resources.
Systems
Practical exercises usually target the Unix family of operating systems, where
they focus on exposing students to multiple variations of any given system. I
usually assign exercises to be done on the following operating systems, where
possible:
• Linux – different distributions allow for valuable insights into different
solutions to the same problem and frequently illustrate the point that
one system running “Linux” may behave rather differently from another.
To this end, I usually pick a Red Hat related distribution (Fedora,
CentOS) and at least a Debian based distribution (Debian or Ubuntu).
• one of the BSDs – as a developer of the NetBSD operating system,
I’m of course partial to this particular variant, but any one from this
lineage will suffice to illustrate the genetic Unix heritage.
• Solaris – now freely available as derivatives of the unfortunately short-
lived OpenSolaris project or available as the commercial original ver-
sion, this system is “different enough” from the more common Linux
distributions to provide a good comparison.
Even though the general concepts of System Administration apply across
all operating systems and even across administrative domains such as support
for a large number of desktop systems versus large deployments of servers
in datacenters, we will focus primarily on the large scale installations and
infrastructure components. As a result, and due to the significantly different
philosophies and practical means by which these systems are maintained, we
explicitly exclude Mac OS X and the Windows family of operating systems as
a target platform for assignments. Doing so allows us to not get distracted by
implementation details and fundamental platform differences and to instead
focus on internalizing the principles and lessons explained.
Programming Assignments
Programming assignments are normally checked on the university’s systems.
I seldom specify the language in which students have to write their programs.
PREFACE ix
This reflects the real-world scenario, where languages are not usually dictated
and the objective is first and foremost to “get the job done”. A few select
exercises may specify the programming language to use; there is usually an
explicit or implicit requirement, or the implementation in one language lends
itself to better illustrate the lessons learned.
Programming assignments are usually graded not only by functionality,
but also by code quality, user interface and other factors. A demonstrated
understanding of the three core pillars – Scalability, Security, Simplicity – is
essential. Frequently the assignments will target these criteria, if at times as
hidden requirements.
The practice of giving students maximum liberty in their research also
extends to programming assignments. Students are allowed – encouraged,
in fact – to search for solutions and reuse existing code, to collaborate and
discuss possible approaches amongst each other. It should be noted, though,
that as in their workplace, students need to make sure to have permission
to (re)use the code in question and to properly credit the origin. I have
found that this distinction between lifting code found on a website without
acknowledgment and deriving your own solution from an attributed reference
has become harder for students to make, which is all the more reason to
encourage responsible research.
I believe that this freedom to work as one would under “normal” cir-
cumstances teaches awareness of code licensing and improves the student’s
research skills. Frequently, the most important part of a successful homework
submission is not the functionality of the program itself, but the required ac-
companying documentation, which illustrates the student’s problem solving
progress.
Acknowledgements
My biggest thanks go out to the late Prof. Lawrence Bernstein, formerly of
Bell Laboratories, my former Computer Science and Software Engineering
professor, an IEEE and ACM fellow, and an industry expert on Trustworthy
Computing. Throughout my career, I have found myself going back to the
lessons he tried to teach me, the examples he gave, the direction he provided.
Larry is the reason all of these words have eventually come into existence and
been put into a semi-coherent form:
How about writing a book? If you follow your class notes it is
PREFACE x
not a burden.
This message came out of the blue back in February of 2011. After a
few emails going back and forth, life intervening, my second daughter being
born, and me generally trying to figure out how to approach such a project,
I eventually signed a contract with Wiley & Sons to actually write this book.
The fact that I never actually completed the work and eventually with-
drew my agreement with Wiley & Sons notwithstanding, I remain humbled
by the confidence and encouragement I received from Prof. Bernstein. I
deeply regret not having been able to complete this work before he died on
November 2nd, 2012. Thanks for being a great teacher, Larry!
Bibliography
[1] Hubert Feyrer, System Administration Training in the Virtual Unix Lab
- An e-learning system with diagnosis via a domain specific language as
base for an architecture for tutorial assistance and user adaption, Shaker
Verlag GmbH, Aachen, Germany, 2009. ISBN 978-3-8322-7874-8.
[2] https://aws.amazon.com/education/
Part I
1
Chapter 1
An Introduction to System
Administration
2
CHAPTER 1. INTRODUCTION 3
certainly involves a lot of cables. But then, don’t we have Network Adminis-
trators, or is that a more specialized subcategory of System Administrators?
System Administrators seem to spend as much time typing cryptic com-
mands into dark terminal windows as they do running cables and labelling
hardware, and while they may no longer shuffle punch cards, they frequently
do write programs to help them complete their tasks. System Administra-
tors are known to get woken up in the middle of the night when things go
“bump”, and as a result they are also known to have a fondness for caffeinated
drinks. They are able to kickstart a generator, assess the required cooling
power for their server room, use duct tape in creative and unexpected ways,
assemble servers out of mysterious looking parts, and may end up handling
a circular saw or other heavy machinery when their Leatherman multi-tool
cannot complete the task.
System Administrators plan, budget and design networks and backup
or storage systems, add and delete users (well, user accounts, anyway1 ),
install and update software packages, draft policy documents, fight spam
with one hand while rebuilding a corrupted revision control system with
the other. They have access to all systems in the organization, may un-
dergo retina- and fingerprint scanners to access “Mission Impossible”-style
protected datacenters and spend countless hours in front of a multitude of
computer screens, typing away on oddly shaped keyboards consuming not
entirely healthy amounts of coffee and energy drinks.
Well... in some places, a System Administrator might do all of this. In
others, there might be different, more specialized people for the various tasks:
there might be datacenter technicians and “SiteOps”, Network Administra-
tors, System Programmers, System Architects, Operators, Service Engineers,
Storage Engineers, Site Reliability Engineers, Virtual Operations, Infrastruc-
ture Architects... the number of different job titles for things that might
otherwise fall into the more general “SysAdmin” category seems endless.
The various areas of expertise included in the day to day routine such as
system design, infrastructure architecture, system fault diagnosis, hardware
benchmarking, and others eventually require experience in a number of re-
lated fields; a lot of them involve more than just a little bit of programming
experience and in some cases complex infrastructure tools based on solid
1
There is a strong love/hate relationship between System Administrators and their
users. Much as the SA may joke that they wish they could make users disappear, without
users the systems they are in charge of might well hum along uninterrupted, but would
ultimately be entirely useless.
CHAPTER 1. INTRODUCTION 5
A typical bug
At one place of employment we had a tool that would add users to
a given host by gathering the login information for all the users,
ssh(1) to each host and update /etc/passwd using the host’s
native tools (such as useradd(8), etc.). Many a system admin-
istrator has written a similar tool, and for the most part, this
program worked reasonably well. (We will discuss how to better manage
user accounts across large numbers of machines later in this book.)
But all software fails eventually, and for one reason or another I had to
debug the tool. Groveling through a few hundred lines of perl that had
clearly been written by different people in different stages of their career,
I came across a code block that included the following call:
chmod(0666, "/dev/null");
Asking around, it became clear that somehow some of the systems ended
up in a state where /dev/null was unreadable or -writable by normal
users, which leads to a number of complications. But nobody had been
able to track down just how exactly this happened until one day we were
lucky enough to witness the change in permissions, but the only activity
by a user with sufficient privileges to make such a change was a sudo(8)
invocation of less(1).
for the root user, we explicitly symlinked all common such history files
to /dev/null to avoid accidental leaking of secrets to disk.
And therein lies the bug: less(1) was invoked by the super user, it would
check the permissions on the file /root/.lesshst, follow the redirection
to /dev/null find that they’re not 0600 and call chmod(2), yielding
and unreadable/unwritable /dev/null. We were able to confirm this be-
haviour and identified a code change in later versions of less(1) that
fixed this problem.
very general terms. Consider the following description given by the Buerau
of Labor Statistics[2]:
While this captures in general the common job duties of a System Ad-
ministrator, it does not answer the question we asked initially: Just what
exactly does a System Administrator do?
In order to better grasp the generic description, I’ve found it useful to
break down the title into its components, System and Administrator. What,
then, is a “System”? Any dictionary will give you a reasonable definition
along the lines of:
skills and a rather useful distinction between “small uniform”, “complex”, and
“large and complex” sites in the LISA booklet “Job Descriptions for System
Administrators”[5], but in searching for a suitably clear definition of the pro-
fession, we realize that the formal education as a System Administrator is
almost completely lacking.
Job descriptions and postings normally list a degree in Computer Science
or Computer Engineering as “desirable”; more frequently you will find a re-
quirement expressed as “BS, MS or PhD in Computer Science or equivalent
work experience”. Computer Scientists will wonder just how, exactly, “work
experience” can be mapped, treated as equivalent to their formal education
– all too often, “Computer Science” in this context is equated with “knows
how to program, understands TCP/IP”.
As invaluable as actual hands-on experience is, it is no substitute for a
formal degree in the area of Computer Science. (Nor is having such a degree
a substitute for practical experience.) Rare is the candidate who can fully
understand the average and worst-case runtime of his or her programs and
scripts without having learned Big-O Notation; rare the system programmer
who completely grasps the elegance of modularization provided by, say, func-
tion pointers without having learned the λ-calculus. Becoming a proficient
System Administrator without this background is certainly possible, but it
makes it significantly harder to advance beyond a certain level.
Be that as it may, the focus on many years of real-world experience as the
primary qualification is both cause and effect. In the past, there really existed
no formal training whatever: technically skilled people found themselves in
the position of being the go-to guy to fix the computers, to make things work.
Perhaps they ended up working with a more senior administrator in a sort
of informal apprenticeship, but for the most part their skills were learned
by experience, through trial by fire. The lack of any formal program in this
area necessitates and then further encourages self-learning, and many things
simply can only really be understood by experience. It’s a cliché, a platitude
that you learn the most from your worst mistakes – it is also entirely true.
As a result, many of the senior people in hiring positions do not have
an academic background and are hesitant to require what was not available
then to fill current openings. System Administrators are also still frequently
treated as – and at times view themselves as fulfilling – a “janitorial” role. If
no industry standard exists by which to measure a candidate’s accomplish-
ments in the field, how can we require formal education for a job that eludes
definition in the first place?
CHAPTER 1. INTRODUCTION 9
result, System Administration has long been a profession that is learned pri-
marily by experience, where people grow into a position in order to fulfill the
requirements of an organization rather than follow a career path well-defined
by courses, degrees, and meaningful certifications.
The industry has responded by producing a large number of practical
certification exams that hope to attest to the student’s proficiency in the
subject matter. Practically, most of these certifications appear to primarily
test a person’s ability to memorize specific commands, further reinforcing the
notion that System Administration can be reduced to empirical knowledge,
making practical experience indeed equivalent, if not superior to one of those
scholastic degrees.
But certification that one remembers a specific vendor’s unique configu-
ration file syntax does not imply actual learning has taken place; holding a
one-week “boot camp” – the name itself is revealing of it’s educational inten-
tion – to drill enough information into the participants’ heads such that they
pass an exam at the end of the week does not guarantee long-term retention
of that knowledge. Furthermore, there is no oversight of the topics taught in
such classes, no review of suitability or correctness.
System Administrators who excel at their jobs do so not because they
have accumulated arcane tribal knowledge about obscure pieces of hardware
and odd software systems (useful as that is), but because they understand
fundamental underlying principles and combine that knowledge with their
concentrated experience.
System Administration – its principles, fundamentals and practice – should
be taught. Since you are reading this book, chances are that you are cur-
rently taking a class in System Administration as a student or perhaps you
are teaching such a class. But we cannot teach an entire profession in a single
class or even using a few select courses. It is necessary to develop academic
degree granting programs that comprehensively prepare students for the var-
ied and wide requirements imposed on them when they enter the industry.
Such programs should be combined with extensive real-world and hands-on
experience wherever possible, perhaps by including internships, cooperative
education and apprenticeships provided by industry leaders. As students
you are entitled to practical and useful exercises; as educators we have an
obligation to create these and make available the required resources.
We need classes that combine the academic mission of fostering indepen-
dent research with the factual requirements posed by the positions offered in
the marketplace. We need instructors who have years of experience in var-
CHAPTER 1. INTRODUCTION 11
ied environments, who understand the practical restrictions that may make
the ideal theoretical solution utopian but who are consciously aware of these
boundaries and able to teach their students the awareness of them. At the
same time, we need scholars willing to further this profession through re-
search and build a theoretical body of knowledge to become the foundation
of future programs.
these can be added to a system after it has been built: trying to apply
“security” after the system interfaces have been defined yields restrictions,
limitations; trying to make a system with inherent limitations perform under
circumstances it was not designed for yields hacks and workarounds – the
end result frequently resembles a fragile house of cards more than a solid
reliable structure.
The third fundamental feature of expertly designed systems, Simplicity,
is simultaneously obvious and counter-intuitive. Simplicity underlies both
scalability and security, since reduced complexity implies better defined in-
terfaces, minimized ambiguity in communications or data handling and in-
creased flexibility. As with the other two core aspects, simplicity cannot be
added after the fact; it must be inherent in the architecture. Simplicity is
the enabler of both scalability and security.
An exceptional system exhibits inherent structural integrity (another
wonderful analogy to the world of bricks and mortar), and this integrity is
provided by these three pillars. Our focus on these components may initially
seem arbitrary: meeting requirements across different teams with different
priorities has long been the bane of many a program manager due to each
team focusing on different qualities, of which we could have picked any num-
ber. However, upon more detailed analysis and with some years of experience
we have time and again found Simplicity, Security and Scalability to be the
core qualities enabling a harmonious development- and deployment process.
Throughout this book, we will analyze the systems we discuss with special
focus on these crucial criteria. You will find that we will frequently talk in
broad abstracts or surprising detail about the implications of our design
decision.
1.4.1 Scalability
In recent years, the word “scalability” has become one of the defining require-
ments for virtually all technical solutions. Providers of infrastructure services
and products throw it around in an attempt to impress their customers by
how much load their systems can handle. It typically seems near synonymous
with “high performance”, the ability to handle large amounts of data, traffic,
connections, and the like. This may be misleading, as people might be in-
clined to believe that such capabilities come at no additional (or incremental
or proportional) cost. In fact, scalability does not relate to the costs of the
running system, but to its architecture. A scalable system may well require
CHAPTER 1. INTRODUCTION 15
While many systems are flexible, few are able to handle input or require-
ments an order of magnitude different from those initially conceived of when
the system was designed. In order to accomodate such changes, systems
are said to either scale vertically – that is, one system is able to handle the
added load, possibly by addition of certain resources (network bandwidth,
CPU power, memory, disk space, ...) – or to scale horizontally – that is,
a single system is replicated and demands are spread across them evenly.
Either approach has certain implications on the overall architecture. For
horizontal scaling to be beneficial, for example, the problem space needs to
be such that a distribution is both possible and algorithmically feasible. How-
ever, this adds communication overhead, as interfaces and data flow becomes
more complex.
Whether by vertical or horizontal means, a scalable system is one that
readily adapts to changing demand. The designer may not know what will
be required of it in the future, but will make choices that permit the system
to grow and shrink without bumping into arbitrary limits.
We will take a closer look at the implications of this idea on the software
development practices and overall Unix Philosophy in Chapter 2; for the time
being, let us consider the Unix tradition of simple tools operating on streams
of text as a wonderful example of a clear interface definition. Anybody de-
CHAPTER 1. INTRODUCTION 16
veloping a tool that accepts input only from a file restricts the flexibility of
the tool; this frequently goes hand in hand with an implicit limitation on
the amount of data that can be handled (think maximum file size, buffers
frequently allocated to read in the entire file, the ability to seek(2) on the
file handle, ...). By choosing text rather than a binary format, any future
use of the output is not limited by the original author’s imagination of what
future users might wish to accomplish. Given sufficiently large or, rather,
diverse datasets, building more complex systems that perform equally well
under heavy load using complicated, yet limited interfaces so frequently de-
veloped by major software companies easily becomes a frustrating exercise
in determining these boundaries.
But scalability is not only of concern when tools are developed; as we de-
sign infrastructure components, we need to be acutely aware of the interfaces
between them and what kind of data can flow back and forth. Frequently we
have no control over the input (both type and amount), so our systems need
to be fault tolerant of many unenvisioned circumstances. Even though some-
what counter-intuitive at first, I argue throughout this book that a robust
system will remain resilient to overall failure by being comprised of infras-
tructure components that themselves may fail quickly (and explicitly) and
that such well-defined behaviour underlies true scalability.
Hindsight being 20/20, scalability related issues often can be traced back
to a lack of imagination or confidence in the system. What if our initial use
case increases not by a factor of two or three, but hundredfold? Suppose our
system is still in use in five years – will average input be likely to remain the
same? In our discussions, exercises and problems we will encourage students
to consider how a system performs if circumstances and inputs change by
an order of magnitude. The ability to anticipate extraordinary change in
requirements requires some practice and experience by the System Adminis-
trators in charge of the infrastructure; reacting in well-defined ways to such
change is one of the core features of a reliable system, and the principles that
make a system scalable – fault tolerance, data abstraction, clearly defined
interfaces – must be applied at all stages. We will reinforce these axioms
throughout all chapters in this book.
1.4.2 Security
All too frequently, the software or information technology industry treats
system security as an afterthought, as something that can be added to the
CHAPTER 1. INTRODUCTION 17
final product once it has met all the other functional requirements, after the
user interface has been determined and after all code has been written. It is
then not surprising that the old adage that security and usability are directly
and inversely related seems to hold true. Nevertheless, I would argue that
the very problem statement – “the more secure you make something, the less
usable it becomes” 6 – reflects a fundamental misunderstanding, as it implies
that usability is present first and security “added” afterwards.
This approach suggests that the only way to reduce risks is to take away
functionality; but any time you do that or otherwise restrict the users, they
will either stop using the product/system altogether or come up with a cre-
ative solution that works around your newly imposed restrictions. To be
effective, security needs to be built into the system from the design phase on.
That is, rather than starting out with a solution that provides the desired
functionality and then attempting to figure out how to get it to a secure
state, we should instead begin with a secure albeit restricted state and then
slowly add functionality – without compromising safety – until the desired
capabilities are available. That is, we need to view security as an enabling
factor present at the design’s conception.
Much like software, a system infrastructure is usually developed with
much looser an idea of what one wishes to accomplish than people are willing
to admit. In order to achieve maximum functionality and the widest possible
use, people tend to design interfaces with few restrictions. Often it is the role
of the security-minded engineers to ask questions that require the developers
or designers to revisit their requirements; proactive security imposes restric-
tions on what the system is able to do. That is a good thing! General-purpose
systems are much harder to design than special-purpose sytems (“Constraints
are friends”[8] whitelists provide significantly better and more reliable secu-
rity than blacklists.
Any parent can tell you that it is nearly impossible to take away a toy
from a child, even if the toy has not been used in months or years. The very
idea of something being taken away seems to trigger in people (of all ages) a
“but I might need it some day” response. Giving users access to a resource
(network access, disk space, physical access, CPU power, ...) is trivial – re-
stricting the use once access has been granted is near impossible; the genie
cannot be put back into the bottle. It is therefore imperative to understand
6
We will elaborate on the somewhat surprisingly accurate corrolary that “The more
secure you make something, the less secure it becomes.”[10] detail in a later chapter.
CHAPTER 1. INTRODUCTION 18
precisely what level of access to what resources you actually need and not
build a system for the most widest possible use. Well defined – restricted –
small components can still provide the flexibility to build scalable systems,
but each component needs to be designed from the beginning with security in
mind. Just as a chain is only as strong as its weakest link, an infrastructure
is only as secure as its most open or exposed component.
• Cryptography, and the three main features that allow it to help miti-
gate risks:
– Secrecy or Confidentiality
– Accuracy or Integrity
– Authenticity
• Physical Security
• Service Availability
• Service Design
• Social Engineering and other aspects of human nature
• Trust
1.4.3 Simplicity
As our discussion of scalability and security suggested, the practical applica-
tion of these principles yields a reduction of interfaces, end-points, use cases
and overall variance. In other words, scalable and secure systems are less
complex and – it is worth drawing this distinction explicitly – much less com-
plicated. A complex system may be well-organized and exhibit a clear, logical
structure yet require subtle or intricate connections or components. Compli-
cated systems, on the other hand, are irregular, unpredictable or difficult to
follow.
“Complexity is the enemy.” This quote has been repeated endlessly in so
many contexts that it can almost be assumed common knowledge amongst
software engineers, cryptographers and other security experts[11]. But while
many people in the information technology industry may agree on this in
principle, I have found that, like so many aphorisms, the translation into the
real world – the actual application of its consequences – trails far behind the
good intentions.
In the world of System Administration reducing complexity is particularly
difficult. As we discussed in Sections 1.1 and 1.2, managing large computer
networks has inherent complexity: multiple components must interface with
each other in many ways to help people – simultaneously the origin of true
entropy as far as computers are concerned and dangerously predictable –
accomplish their individual and collective goals. But it is important to dif-
ferentiate between required complexity and accidental complexity[12]. We
strive to reduce overall complexity by building ultimately intricate systems
out of smaller, simpler components.
To repeat an almost ubiquitous prime example of how simplicity enables
astounding flexibility and may allow you to build complex systems, we will
time and again draw the analogy to the concept of toy building blocks: them-
selves simple, nearly unbreakable and available in a reasonably limited num-
ber of shapes and sizes you can build just about anything with them.
Like these little blocks, the Unix operating system builds on the philos-
ophy of simple tools that “do one thing and do it well”, that work together,
commonly by being connected via the ubiquitous pipe, operating on text
streams[13]. We aim for solutions that exhibit comparable elegance. “Sim-
ple” does not, it should be noted, mean “trivial” or “easy”: it is so much easier
to add features, to increase the output, to justify additional input than it is
to reduce a component to its bare necessities. As Antoine de Saint Exupéry,
CHAPTER 1. INTRODUCTION 20
observed[14]:
System Administration and revisits these topics as well as some of the latest
trends in how System Administration intertwines with other disciplines – the
terms “DevOps” and “Agile Infrastructure” (or “Agile Operations”) deserve
particular attention in that context.
A few years ago, the Personal Computer signified a paradigm change away
from the central Mainframe computer accessed remotely by users towards a
distributed storage and compute model. System Administrators found them-
selves supporting more and more individually powerful connected devices,
trying to centralize certain resources and services such as shared filespace or
regular backups, for example. In the recent past, the move towards software
and data storage services “in the cloud” has brought the evolution full circle:
once again, we are challenged to provide reliable central services that are
accessed by small, though increasingly mobile, devices.
The separation of systems from networks, if it ever really existed, is disap-
pearing. Standalone systems separated from the Internet are rare; systems
operating completely on their own without any network connectivity have
become largely impractical. Does this change the job of the System Admin-
istrator? Does it make things more or less interesting, difficult, challenging,
exciting, different? Will we see another shift back towards a model where
data storage and processing happens again on the edge nodes of the network?
We don’t know. But we do know this: whatever happens, System Ad-
ministrators will be needed. Perhaps they will carry a different title and per-
form different practical tasks, but the concept of the profession will remain.
Organizations small and large will need somebody with the knowledge and
experience to analyze, troubleshoot and design new infrastructures; some-
body who builds and maintains scalable and secure systems; somebody with
an appreciation of the simple things in life.
Figure 1.2: An IBM 704 mainframe. With cloud computing coming full
circle, we already live in the future!
Problems and Exercises
Problems
1. Create a course notebook (electronic or otherwise). In it, write down
your notes about each chapter, add any links to additional informa-
tion, noteworthy insights, etc. After each chapter, write down lessons
learned. Differentiate between those directly applicable to you and
those you consider worthwhile reviewing or investigating in the future.
2. Create a folder for your course work as a central place for all your doc-
umentation. Whenever you go through practical exercises, make sure
to write down how you solved a specific problem. Your documentation
will later on help you review the lessons you learned and can be used
as a practical how-to guide, so make sure to verify all steps before you
write them down.
23
CHAPTER 1. INTRODUCTION 24
(a) Review their mission and content and consider joining their mail-
ing lists. They provide a great way to keep up to date with real-
world experiences and itemlems.
(b) How do these organizations compare to the ACM, IEEE or ISOC?
8. Consider the systems you have access to: what are their primary func-
tions, for what goals were they designed? Suppose they grew by an
order of magnitude – what itemlems would you foresee?
9. Consider the systems you have access to: what kind and how is access
granted? What kind of security itemlems can you imagine?
10. Research the terms “DevOps” and “SRE”SRE. In how far do the prac-
tices it represents change the typical job description of a System Ad-
ministrator?
Exercises
1. Practice basic system tasks in your virtualized environment by creating
different OS instances and run them. Once running, log in on the
virtual host and run the required commands to:
2. Repeat Exercise 1 for a different Unix flavor than what you are used
to. For example, if you used a Linux instance for Exercise 1, repeat it
using OpenSolaris or FreeBSD.
3. Determine the Unix commands you execute most frequently (for ex-
ample via analysis of your shell’s history file). Analyze the top three
commands for their complexity and interfaces.
CHAPTER 1. INTRODUCTION 25
(a) Change your login shell for at least a week to one that you are not
accustomed to. Make note of the changes in behaviour.
(b) Disable tab completion and any aliases you have defined in your
login shell for at least a week. How does this change your work-
flow?
(c) Review the detailed documentation of your login shell’s builtins.
Practice the use of those that you are not familiar with.
BIBLIOGRAPHY 26
Bibliography
[1] “less bug, more /dev/null”, Jan Schaumann, on the Internet at http:
//is.gd/NpihsH (visited November 29, 2015)
[9] Billy Hollis Still Builds Apps, transcript of an interview with Billy Hol-
lis, “.NET Rocks”, on the Internet at http://is.gd/Suk2hr (visited
December 23, 2011)
[10] Donald A. Norman, “THE WAY I SEE IT: When security gets in the
way”, in Interactions, November 2009, Volume 16, Issue 6, ACM, New
York, NY
Unix
28
CHAPTER 2. UNIX 29
about every other Unix-related book already covers this topic in great detail.
In this chapter, we summarize these developments with a focus on the major
milestones along the road from the birth of Unix as a test platform for Ken
Thompson’s “Space Travel” game running on a PDP-7 to the most widely
used server operating system that nowadays also happens to power consumer
desktops and laptops (in the form of Linux and Apple’s OS X), mobile de-
vices (Apple’s iOS is OS X based and thus Unix derived; Google’s Android
is a Linux flavor), TVs, commodity home routers, industry scale networking
equipment, embedded devices on the Internet of Things (IoT), and virtually
all supercomputers1 . We will pay attention to those aspects that directly re-
late to or influenced technologies covered in subsequent chapters. For much
more thorough and authoritative discussions of the complete history of the
Unix operating system, please see [2], [3] and [5] (to name but a few).2
Let us briefly go back to the days before the Unix epoch. Unix keeps time
as the number of seconds that have elapsed3 since midnight UTC of January
1, 1970, also known as “POSIX time” 4 . The date was chosen retroactively,
since “Unics” – the Uniplexed Information and Computing Service, as the op-
erating system was initially called5 – was created by Ken Thompson, Dennis
Ritchie, Brian Kernighan, Douglas McIlroy and Joe Ossana in 1969. That
is, Unix predates the Unix epoch!
It is interesting and a testament to the clean design to see that the basic
1
TOP500[1], a project ranking the 500 most powerful computers in the world, listed
over 89% as running a version of Linux or Unix.
2
The “Unix Heritage Society” mailing list[6] is another particularly noteworthy resource
in this context. It continues to be an incredible source of historical, arcane, and yet fre-
quently and perhaps surprisingly relevant information and discussions around the history
of the Unix family of operating systems. It is notable for the regular participation of many
of the original developers and researchers from the early days of Unix.
3
It is worth adding that this does not include leap seconds, thus making Unix time a
flawed representation of what humans like to refer to as linear time. Leap seconds are
inserted rather unpredictably from time to time, and Unix time has to be adjusted when
that happens. Worse, negative leap seconds are possible, though have never been required.
Just more evidence that Douglas Adams was right: “Time is an illusion, lunch time doubly
so.”[7]
4
This is also the reason why, for example, Spam with a “Sent” date set to 00:00:00
may, depending on your timezone offset from UTC, show up in your inbox with a date of
December 31, 1969.
5
The name was a pun on the “Multics” system, an alternative for which it was initially
developed as.
CHAPTER 2. UNIX 30
The different direction taken by the CSRG and the commercial entities
which licensed and then sold the Unix operating system and the evolution of
the code as it was merged between these branches ultimately lead to two main
directions: the BSD derived family of systems and the ones tracing back to
(AT&T’s) Unix UNIX V, or SysV. The latter had four major releases, with
System V Release 4, or SVR4, being the most successful and the basis of
many other Unix versions. Multiple vendors entered the operating system
marketplace and tried to distinguish themselves from their competitors via
custom (and proprietary) features, which lead to significant incompatibilities
between the systems (and much frustration amongst System Administrators
in charge of heterogeneous environments).
It only contributes to the overall confusion that “Version 7 Unix”, the
last version of the original “Research Unix” made available by Bell Labs’
Computing Science Research Center, was released prior to and became the
basis of “System III”, from whence “System V” would ultimately derive.8
(Linux, not being a genetic Unix – that is, it does not inherit nor share any
code directly with the original version from Bell Labs – can be seen as a
third main flavor, as it borrows semantics and features from either or both
heritages. This can at times be both a source of great choice and flexibility
as well as of frustration and confusion.)
Similarly, 4.1BSD would have been called 5BSD, but AT&T feared that
would lead to confusion with its own “UNIX System V”. As a result, the
BSD line started using point releases, ending with 4.4BSD.
8
You can download or browse the source code and manual pages of many historical
Unix versions on the website of the Unix Heritage Society[29].
CHAPTER 2. UNIX 32
Linux, one of the most widely used Unix versions today – technically
CHAPTER 2. UNIX 33
a “Unix-like” operating system, as it inherits from neither the SysV nor the
BSD lineages – has its own unique history, invariably tied to that of the GNU
Project. Developed on and inspired by MINIX, it was created in 1991 by
Linus Torvalds as a “(free) operating system [...] for 386(486) AT clones”[12].
Since a kernel all by itself does not an operating system make, Linux was
soon bundled with the freely available software provided by the GNU Project
and, like that software, licensed under the GNU General Public License.
The GNU Project in turn was started by Richard Stallman in 1983 9 to
provide a Unix-like operating system, and by 1991 it provided a large num-
ber of essential programs and tools (starting with the ubiquitous emacs(1)
editor) and of course including the GNU Compiler Chain gcc(1), the GNU
C Library (glibc), as well as the GNU Core Utilities; however, it was still in
need of a kernel. When Linux was released, it filled this void and GNU/Linux
was born. It is interesting to note that despite the unique license this oper-
ating system was released under – in a nutshell: you get the source and are
free to use and modify it, but any modifications need to be released under
this same license – it has found widespread adoption by commercial entities
and countless products are based on it.
Different organizations, both commercial and volunteer-based, have sprung
up to provide different versions of the GNU/Linux OS. Inherently similar on
a fundamental level, they tend to differ in their package manager (see Chap-
ter 5.5 for a detailed discussion of these components), administrative tools,
development process, and user interface choices. Some companies trade rapid
adoption of new features available in the open source kernel for a reputation
of stability and offer commercial support for their particular Linux flavor.
Even though nowadays hundreds of these Linux distributions exist, the
two dominant variations in the server market tend to be those based on “Red
Hat Enterprise Linux” as well as derivatives of Debian GNU/Linux. The for-
mer, a commercial product licensed to users by Red Hat, Inc., gave birth to
the “Fedora” and CentOS community projects, while in 2012 Canonical Ltd.’s
“Ubuntu” OS became the most widely used Debian derivative. Changes to
the core components continue to be merged across all distributions, but the
specific bundling of custom tools lead to different Linux flavors drifting fur-
ther apart.
With all this back and forth between the various versions, trying to keep
9
Note that this makes the GNU project 8 years older than Linux!
CHAPTER 2. UNIX 34
that Open Source projects, such as the BSDs certainly could not afford.
Eventually, SUSv3 and POSIX:2001 (formally known as IEEE 1003.1-
2001) became more or less interchangable; we will commonly refer to sys-
tems or interfaces as being “POSIX-compliant” (or not, as the case may be).
At the time of this writing, the latest version is POSIX:2008[15], which is
divided into a Base Definition, the System Interfaces and Headers, and the
Commands and Utilities. It should be mentioned, though, that not only is
“the nice thing about standards that you have so many to choose from”[16],
as an old phrase coined by Andrew S. Tanenbaum goes, but also that a rec-
ommendation or requirement does not necessarily have to make sense or be
realistic to be included in a standard. We will occasionally notice discrepan-
cies between what POSIX demands and what different OS vendors chose to
implement. As two entertaining examples, please refer to the section of the
fcntl(2) manual page on e.g. a NetBSD system[17] that elaborates on the
locking semantics or the fact that POSIX could be interpreted to require a
cd(1) executable10 .
2.1.2 Networking
No review of the history and basic features of the Unix operating system
would be complete without a mention of the parallel evolution of the In-
ternet. As we noted in Section 2.1.1, the development of the Unix system
and that of the predecessors of what ultimately became the Internet were
not only related, but became inseparably merged. The ARPANET imple-
mented the concept of packet switching, allowing payload to be broken into
small datagrams and routed along different paths; its adoption of TCP/IP[20]
as its protocol suite effectively marked the beginning of the modern Inter-
net. Even though some companies developed their own TCP/IP stack, the
code included in the Berkeley Software Distribution quickly became the most
widely used implemention and ultimately replaced other network protocols11 .
In the early days of the Internet, the various different networks – ARPANET,
CSNET, MILNET, NSFNET, NSI, etc. – were connected via specific gateway
10
If the problem of a cd(1) executable isn’t immediately obvious to you... well, see
Problem 4!
11
Microsoft, for example, did not include TCP/IP in their operating systems until Win-
dows 95, allowing other companies to sell their implementations as add-on software. The
move from their native NetBIOS protocol to the BSD derived TCP/IP stack helped make
the latter the de-facto Internet standard protocol suite.
CHAPTER 2. UNIX 36
But even before the advent of the Internet, Unix included networking
capabilities. Through its layers of abstraction it was possible to implement
support for different networking technologies and allow applications to be
network protocol agnostic. In fact, some applications, such as email were
available and in use prior to any traditional networking capabilities. The na-
ture of Unix as a multiuser system lead to the development of tools, amongst
them the mail(1) program, to allow these users to communicate efficiently
with one another and across systems. We will frequently review how the
nature of a scalable tool allows it to function equally well regardless of where
input data comes from or what transport mechanism is used; a simple, well
defined program can deliver mail on a single system while relying on a sep-
arate transport service (i.e., UUCP or SMTP) to handle connections with
other systems.
Furthermore, the software implementing such services was developed on
and then included in the Unix operating system. As a result, the Internet
and its infrastructure were growing in parallel to the capabilities of Unix, one
enabling the other to become more powerful and ubiquitous. And so today,
the overwhelming majority of the systems powering the core infrastructure
components of the Internet, such as, for example, the DNS root servers or
most web- and mail servers, are running on a Unix variant13 : the by far most
popular implementation of the DNS specification is, not surprisingly, the
Berkeley Internet Name Domain (BIND) server[21]; sendmail, exim, and
postfix push the majority of the world’s email[22]; the apache web server
still handles more than 45% of all HTTP traffic on active sites than any other
web server[23].
12
Every now and then you may encounter a scruffy oldtimer who insists on pointing out
that their email address is something along the lines of “...!orgserver!deptserv!mybox!user”.
You can trivially impress them by calling it their “bang path” and agreeing that @-based
email addresses are newfangled humbug.
13
As noted in the introduction, we continue to count Linux as a “Unix variant” to avoid
constant repition of the phrase “Unix or Linux”.
CHAPTER 2. UNIX 37
component of the total cost of ownership is the actual purchase price, and
access to the source code (which in some cases may well come under specific
terms of the license with commercial and/or closed source software) is some-
what independent thereof. What’s more important – within the context of
this book, anyway – is that the very concept of Open Source is embedded
in the Unix philosophy and culture, and as a result System Administrators
frequently expect to be able to analyze the source code to the applications
and operating systems they run.
But not only are we able to inspect how a piece of software works, we
need to. All too frequently do we encounter problems or try to analyze a
system’s behaviour where the question of what on earth might be going on
is answered with this advice: “Use the source, Luke!” – Unix has let us do
precisely that since the beginning.14
Various other shells have been created since then, mostly following either
the general Bourne shell syntax or that of Bill Joy’s C csh(1) Shell. The
most notable shells today include: the Almquist shell ash(1), a BSD-licensed
replacement for the Bourne shell, frequently installed as /bin/sh on these
systems; the GNU Project’s Bourne-again shell bash(1), which is the default
shell on most Linux systems and known for a large number of added features;
the Korn shell ksh(1), named after David Korn and which became the basis
for the POSIX shell standard; the TENEX C shell tcsh(1), a C shell variant
developed at Carnegie Mellon University; and perhaps the Z shell zsh(1)
another very feature rich Bourne shell variant.
As a scripting language and due to its availability on virtually every Unix
flavor, /bin/sh is assumed to be the lowest common denominator: a Bourne-
or Bourne-compatible shell. On Linux, bash(1) is typically installed as both
/bin/bash and /bin/sh, and it behaves (somewhat) accordingly based on
how it was invoked. Unfortunately, though, its ubiquity on Linux systems
has led to a shell scripts masquerading as /bin/sh compatible scripts that
are, in fact, making use of bash(1) extensions or rely on bash(1) compat-
ibility and syntax. This becomes frustrating to debug when trying to run
such scripts on a platform with a POSIX compliant /bin/sh.
All Unix shells include the ability to perform I/O redirection. Each pro-
gram has a set of input and output channels that allow it to communicate
with other programs. Like the concept of the pipe, these streams have been
part of Unix’s design from early on and contribute significantly to the consis-
tent user interface provided by all standard tools: a program accepts input
from standard input (or stdin) and generates output on standard output (or
stdout); error messages are printed to a separate stream, standard error (or
stderr).
The shell allows the user to change what these streams are connected to;
the most trivial redirections are the collection of output in a file, the suppres-
sion of output, acceptance of input from a file, and of course the connection
of one program’s output stream to another program’s input stream via a pipe
(see Listing 2.1 for Bourne-shell compatible examples).
The concept of these simple data streams being provided by the operat-
ing system was inherent in the Unix philosophy: it provided abstraction of
interfaces, reduced overall complexity of all tools using these interfaces, and
dictated a simple text stream as the preferred means of communication. We
will have more to say on the Unix philosophy in Section 2.2.4.
CHAPTER 2. UNIX 41
Finally, the unix shell provides for job control, a necessity for a multitask-
ing operating system. When a user logs into the system, their login shell is
started, serving as the primary interface between the user and the OS. After
entering a command or a pipeline, the shell will create (“fork”) a new process
and then execute the given programs. The standard streams are connected
as illustrated in Figure 2.2. While the program or pipeline is running, the
user cannot do anything else – she has to wait until the command completes
and control is returned to the shell. In the mean time, all she can do is
twiddle her thumbs; so much for multitasking!
To avoid this scenario, the C shell implemented a feature that was quickly
incorporated in the Bourne shell, which allows users to start and control
multiple concurrent processes by placing them into the background (by adding
the & symbol at the end of the command or via the shell builtins), bringing
them to the foreground (via builtins), suspending and continuing them (by
sending possibly keyboard generated signals to the relevant process group),
etc. Listing 2.2 illustrates the basic job control functionality.
CHAPTER 2. UNIX 42
the Bell Labs patent office, which secured funding for further development,
was largely thanks to the system’s abilities to typeset beautiful documents
using the roff text formatting program19 . The same tools are still used to
format the manual pages.
Unix provided manual pages and documentation not just of the executa-
bles and configuration files provided by the system, but also for so-called
“supplementary” documents. These comprise a number of papers that, as
in the case of the Interprocess Communication (IPC) tutorials, for example,
served as the de-facto reference documentation and continue to be used in
countless Computer Science classes today to teach students the fundamentals
of Unix IPC. Other highlights include an introduction to the GNU debugger
gdb, the make tool, a vi(1) reference manual, an overview of the file system,
and various dæmons. Since these documents are licensed under the permis-
sive BSD License, they can be – and thankfully are! – included in modern
Unix versions (such as e.g. NetBSD) and made available on the Internet[19].
19
Consider that W. Richard Stevens used to typeset his famous books “Advanced Pro-
gramming in the UNIX Environment” and the “TCP/IP Illustrated” series by himself using
groff(1).
CHAPTER 2. UNIX 44
A system that allows multiple users to simultaneously utilize the given re-
sources is in need of a security model that allows for a distinction of access
levels or privileges. The system needs to be able to distinguish between file
access or resource utilization amongst users, thus requiring the concept of ac-
cess permissions, process and file ownership, process priorities and the like.
Controlling access to shared resources by individual users also required, ef-
fectively, a single omnipotent user to control and administer these privileges,
thus necessitating the superuser or root account (with plenty of security
implications and concerns of its own).
In order to meet these requirements, the Unix system uses a set of file
permissions to restrict three different types of access – read, write and exe-
cute (or rwx, respectively) – to the file owner or user, members of a specific
user group, or everybody else (i.e., others) on the system (ugo, respectively).
CHAPTER 2. UNIX 46
Ritchie to Ken Thompson – can be quoted to underline this point; they must
have been on to something.
The most well-known expression of what makes Unix Unix is probably
Douglas McIlroy’s summary[24], partially cited in the previous chapter:
UNIX was not designed to stop its users from doing stupid things,
as that would also stop them from doing clever things. – Doug
Gwyn
CHAPTER 2. UNIX 48
The awareness that your software might be used in ways you cannot
imagine, that the user of the software might actually know better what they
may wish to accomplish than the designer or implementer is what makes
Unix so fascinating. Interestingly, this philosophy, this trust into the user
and his or her capabilities and knowledge stands in stark contrast to that
of the late Steve Jobs, who famously quipped that “people don’t know what
they want until you show it to them”[27]. Apple’s products are known for
their elegance and ease of use, but advanced users know that should you
attempt to do something with them that the designers did not anticipate,
it’s either impossible or painfully cumbersome.
The primary user interface on the Unix systems remains the command-
line. This is not for a lack of other options, but a manifestation of the Unix
philosophy. While it may appear more “user-friendly” to a novice to use a
pointing device to select pre-determined options from a menu using a Graph-
ical User Interface (GUI), it is anathema to efficient System Administration.
System Administrators need to be able to perform tasks remotely, quickly,
and reliably unattended; execution of programs needs to be automated and
scheduled, configuration be done outside of the application, and data be
transformed with the myriad of available filters. As you can tell, these re-
quirements go back to the Unix way of writing simple tools that work well
together by communicating via text streams. Thanks to the consistency with
which these principles are implemented across the platform, the learning
curve for advanced users, while perhaps steeper than on some other systems,
only needs to be climbed once. At the same time, it gets you to a higher
level of efficiency quickly.
Listing 2.3: The simplified, or 2-clause, BSD license. Nice and terse, huh?
In contrast, the GNU’s Not Unix (GNU) GPL clocks in at 11 full text pages.
Problems and Exercises
Problems
1. Research the history of the Unix operating system in more detail.
Branch out into the “USL vs. BSDi” lawsuit. Follow the BSD ge-
nealogy into the Mac OS X system. Analyze the future direction of the
commercial Unix versions.
3. Review the intro(1) manual pages on your system (they may exist for
different sections and, depending on the Unix flavor, in varying detail).
From there, move on to the following manual pages, considering the
multiuser implications: chmod(1)/chown(1), login(1), passwd(5),
su(1), sudo(8)
5. Play around in the Unix environment of your choice. Look at the exe-
cutables found in the system’s path (/bin, /usr/bin, /sbin, /usr/sbin)
– do you know what all these tools do?
50
CHAPTER 2. UNIX 51
Exercises
1. Using the programming language of your choice, write a simple inter-
active shell capable of executing programs on the user’s behalf. Try to
use it as your shell. Were you aware of the limitations before you did
this? Were you aware of the complexity of even basic features?
Bibliography
[1] TOP500; on the Internet at https://www.top500.org/statistics/
list/ (visited January 16, 2017)
[2] The Creation of the UNIX Operating System, on the Internet at http:
//www.bell-labs.com/history/unix/ (visited January 7, 2012)
[3] Dennis M. Ritchie, ’The Evolution of the Unix Time-sharing Sys-
tem’, published in AT&T Bell Laboratories Technical Journal, “Com-
puting Science and Systems: The UNIX System,” 63 No. 6
Part 2, October 1984; also available via the Internet Archive
at e.g. https://web.archive.org/web/20150408054606/http://cm.
bell-labs.com/cm/cs/who/dmr/hist.html (visited January 17, 2017)
[4] Brian W. Kernighan, Rob Pike, The UNIX Programming Environment,
Prentice Hall, 1984
[5] Eric Steven Raymond, The Art of Unix Programming, Addison-Wesley
Professional, September 2003; also available on the Internet at http:
//catb.org/~esr/writings/taoup/ (visited January 7, 2012)
[6] The Unix Heritage Society Mailing List, http://minnie.tuhs.org/
mailman/listinfo/tuhs
[7] Douglas Adams, The Hitchhiker’s Guide to the Galaxy, Pan Books, 1979
[8] Brian W. Kernighan, Dennis M. Ritchie, The C Programming Language,
Prentice Hall, 1988
[9] Dennis M. Ritchie, ’Advice from Doug Mcilroy’; now only found on the
Internet Archive at https://web.archive.org/web/20150205024833/
http://cm.bell-labs.com/cm/cs/who/dmr/mdmpipe.html
[10] Douglas McIlroy, A Research UNIX Reader: Annotated Excerpts from
the Programmer’s Manual, 1971-1986; on the Internet at http://www.
cs.dartmouth.edu/~doug/reader.pdf
[11] ’USL vs. BSDI documents’, only available on the Internet Archive
via e.g. https://web.archive.org/web/20150205025251/http://cm.
bell-labs.com/cm/cs/who/dmr/bsdi/bsdisuit.html (visited Jan-
uary 18, 2017)
BIBLIOGRAPHY 53
[12] ’What would you like to see most in minix?’, Linus Torvalds,
posting to the comp.os.minix newsgroup on Usenet, on the In-
ternet at https://groups.google.com/group/comp.os.minix/msg/
b813d52cbc5a044b (visited January 16, 2012)
[13] BSD Licenses on Wikipedia at https://en.wikipedia.org/wiki/BSD_
licenses (visited January 18, 2017)
[14] Éric Lévénez, Unix History, on the Internet at http://www.levenez.
com/unix/ (visited January 5, 2012)
[15] The IEEE and The Open Group, “The Open Group Base Specifi-
cations Issue 7, IEEE Std 1003.1, 2016 Edition” on the Internet at
http://pubs.opengroup.org/onlinepubs/9699919799/ (visited Jan-
uary 17, 2017)
[16] Andrew S. Tanenbaum, in ’Computer Networks’, Prentice Hall, 2004
[17] fcntl(2), NetBSD System Calls Manual, on the Internet at http://
netbsd.gw.com/cgi-bin/man-cgi?fcntl++NetBSD-current (visited
January 18, 2017)
[18] NetBSD system call name/number “master” file, on the In-
ternet at http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/
syscalls.master?rev=HEAD (visited January 18, 2017)
[19] 4.4 Berkeley Software Distribution Documentation, on the Internet at
http://www.netbsd.org/docs/bsd/lite2/ (visited January 28, 2012)
[20] Vinton G. Cerf, Robert E. Kahn, “A Protocol for Packet Network
Intercommunication”, IEEE Transactions on Communications 22 (5);
also available on the Internet at http://ece.ut.ac.ir/Classpages/
F84/PrincipleofNetworkDesign/Papers/CK74.pdf (visited January
28, 2012)
[21] DNS server survey, 2004; on the Internet at http://mydns.bboy.net/
survey/ (visited February 4, 2012)
[22] Mail (MX) Server Survey, August 1st, 2007, showed over 60% of SMTP
traffic to originate from a Sendmail, Exim, or Postfix installation;
on the Internet at http://www.securityspace.com/s_survey/data/
man.200707/mxsurvey.html (visited February 4, 2012)
BIBLIOGRAPHY 54
[29] The Unix Heritage Society, The Unix Archive, on the Internet at http:
//www.tuhs.org/Archive/README (visited April 11, 2012)
Chapter 3
Documentation Techniques
We noted in Chapter 2.2.2 that one of the many ways in which the Unix
operating system distinguished itself from other systems was that it included
extensive documentation of high quality. Each tool provided by the OS came
with a manual page describing its use, each library included a description of
its interfaces, and configuration files were accompanied by documentation
elaborating on its syntax and format.
Even simple systems cannot be operated, maintained, updated – or in a
word, administered – without an abstract description of the various moving
parts.
But while every System Administrator values good documentation, and
will happily agree that, yes, all systems should in fact be properly docu-
mented, runbooks created, procedures described, and additional information
referenced, it does take practice and conscious dedication to develop a habit
of writing high quality system documentation.
For this reason, and before we dive into the fundamental technologies
and concepts in Part II, let us briefly focus on a few principles that help us
provide better documentation due to a better understanding of what purpose
it serves.
55
CHAPTER 3. DOCUMENTATION TECHNIQUES 56
We write documentation for other technical people. Our services are used
by a number of people in our organization. Some of these are very technical
expert users, software developers or other engineers, perhaps. In order to
allow these people to get information about our systems quickly, we provide
end-user documentation which allows more efficient use of the resources we
make available.
We write documentation for our users. The systems we maintain provide
a certain service and thus has users, some of whom may be internal to our
organization and others which may be outside customers. It is not rare that
the people in charge of maintaining the systems provide documentation to
these end-users in one way or another to allow them to utilize the service or
help them find help when things break down.
We write documentation for other people everywhere. System administra-
tors rely on the Internet at large to find answers to the rather odd questions
they come up with in the course of their normal work day. Frequently we
encounter problems and, more importantly, find solutions to such problems
that are of interest to other people, and so we strive to share our findings,
our analyses, and our answers.
Some of the most interesting technical blogs are written by system ad-
ministrators or operational teams and describe how they scale their infras-
tructure to meet their demands. Likewise, system administrators present
their findings and solutions at industry conferences, write technical papers
or participate in Internet standards development, all of which require expert
documentation and writing skills.
legal identity and the public at large – an easy and obvious one to draw and
enforce – but between distinct entities within a single organization. Docu-
mentation containing certain details regarding your infrastructure’s network
architecture, access control lists, location of credentials and the like may
simply not be suitable to be made available outside your department. Un-
fortunately the boundaries can be less than obvious as people move in and
out of departments as an organization evolves.
useful. Getting straight to the heart of the matter is usually advisable in our
context.
Another benefit of providing documentation online is that it becomes
immediately searchable and can be linked to other sources of information.
Helping your readers to find the content they are looking for becomes trivial
if your documentation framework includes automated index generation and
full text search. To make it easier for your readers, be sure to use the right
keywords in the text or use “tags” – remember to include or spell out common
acronyms or abbreviations depending on what you and your colleagues or
users most commonly use!
tl;dr
Sometime in 2011, an acronym established itself in the “blogo-
sphere”: tl;dr – too long; didn’t read. This acronym (in general
use since at least 2003) was frequently accompanied by a short
summary of the content in question, thus allowing online readers
lacking the patience to actually read the document to nevertheless
form opinions about it. While it is certainly useful to provide a
succinct summary of the content in e.g. an email (where an appropri-
ate ’Subject’ line or an introductory sentence may suffice), ii you feel
inclined to provide a tl;dr summary to your documentation, chances are
that you’re addressing the wrong audience.
and flow of information. In this section, we will take a look at some of the
most common distinct categories of system documentation.
This is probably the most common type of document you will have in
your information repository. Without any further classification, pretty much
anything you document will fall into this category; it is a broad definition
of what most people think of when they think about system documentation.
It is worth taking a closer look, as different types of process documentation
exist and hence call for different formats or presentation.
Processes usually provide a description of what is done in a given situa-
tion, what the correct or suitable steps are to resolve a given problem or how
a routine task is accomplished. These documents are generally focused on
the practical side of the business or organization.
Procedures are focused slightly more on the operational side of things.
They not so much describe what is to be done but list the actual commands
to issue to yield a specific result. It is not uncommon to have well-documented
procedures be turned into a “runbook”, a list of ordered steps to follow and
commands to issue under specific circumstances. Carefully and correctly
written, a runbook can frequently be used to let even inexperienced sup-
port staff respond to an emergency situation. The procedures listed usually
include simple if-this-then-that directions with exit points to escalate the
problem under certain circumstances. Furthermore, a good runbook usually
makes for an excellent candidate for automation – more on that in Chapter
8.
Another type of document can be considered a subcategory of a process
document: the ubiquitous HOWTO. Instructional descriptions of how to
CHAPTER 3. DOCUMENTATION TECHNIQUES 61
3.3.2 Policies
In this category you will find all the various documents provided by Sys-
tem Administrators, IT Support, Help Desk, and similar groups that are
intended to allow users to find solutions to their problems without having to
engage personal assistance. A common solution includes increasingly com-
plex documents entitled “FAQ” (for “Frequently Asked Questions”).
When creating an FAQ compilation, it is important to focus on the ques-
tions your users actually ask, not the ones that you think they might ask.
All too often these documents include answers to questions that are not, in
fact, all that frequent, but that the author happened to have an answer to
or would like the users to ask.
It should be noted that it may well be necessary to separate some of these
documents for one audience from those provided to another. The references
you provide to your peers may include privileged information, and the style
in which you guide inexperienced users would be ill-suited to help your expert
users find solutions to their problems. Once again, knowing your audience is
key.
In addition, it is important to distinguish between reference material and
CHAPTER 3. DOCUMENTATION TECHNIQUES 63
README.
If the software in question becomes more complex, you may wish to ref-
erence additional documents, including pointers to the software architecture,
the rationale for certain design decisions, a software specific FAQ etc.
3.4 Collaboration
Unlike other kinds of writing, creating and maintaining system documenta-
tion is not a solitary task. Instead, you collaborate with your colleagues to
produce the most accurate information possible, and allowing end users to
update at least some of the documents you provide has proven a good way to
keep them engaged and improve the quality of your information repository.
Depending on the type of document, you may wish to choose a different
method of enabling collaboration. Some documents should be treated as less
flexible or mutable than others: for example, you probably do not want your
customers be able to modify the SLAs you agreed to, but you do want to
encourage your users to contribute use cases or corrections to your software
documentation.
As Bertrand Meyer, creator of the Eiffel programming language, observed:
For this reason, it is critical that you lower the barrier for collaboration
and make it easy for others to contribute with corrections or updates. Care-
fully marking obsolete documents as such is similarly of importance, but also
beware overeagerly removing “stale” content: being able to read up on histor-
ical design and architecture decisions can be invaluable even if the documents
do not reflect the current state.
Documentation should be kept under some kind of revision control, much
like software code is. Many different solutions exist, ranging from storing
simple text or markup files in your own repository, to using a database backed
“Wiki”, to hosting documents online using a cloud provider, to using e.g.
Google Docs. Realistically, you will likely encounter a combination of all of
the above. When choosing a solution, focus on these properties:
• Usability; how easy is it for you and your colleages to edit the docu-
ments, to keep them up to date?
CHAPTER 3. DOCUMENTATION TECHNIQUES 66
3.5 Formats
Finally, a word on the format of your documentation. Be aware of how your
users might read the documents you provide. Frequently, it is necessary to
be able to search documents offline, to forward them via email, to link to
them, or to copy and paste portions.
With that in mind, be mindful of the format in which you generate and
present your information. System Administrators, heavily influenced by the
Unix philosophy, are often partial to plain ASCII text or simple markup
formatting.
To increase the readability of your documents use a short line-length and
the copious use of paragraphs. Viewing a single large block of run-away text
with no line breaks immediately puts stress on the reader, as absorbing the
information provided therein requires a high degree of concentration and eye
movement.
Breaking up your text into smaller paragraphs helps the reader relax and
facilitates reading comprehension, speed, and ease. Write the text as you
would read it out aloud, with paragraphs allowing the reader to catch their
breath for a second.
If you are writing a longer technical document, you can further structure
it using headlines, numbered sections, subsections etc. using different ways to
CHAPTER 3. DOCUMENTATION TECHNIQUES 67
underline or emphasize the section titles. Similarly, you can use itemization,
bulleted or numbered lists, or indentation to make your text easy to read.
Use short sentences, even (and especially) if you’re German. Just like
one single block of text is hard to read, so are never ending sentences with
multiple conditionals and subclauses. You are not writing Molly Bloom’s
Soliloquy.
Use proper punctuation. A period will almost always suffice; semicolons
may be used as needed, but exclamation points are rarely called for!
Resist the temptation to use images instead of text. If you are unable to
distill the concepts or thoughts into language, then you have likely not fully
understood the problem. Use illustrations as supplementary, not instead of
information. Text ought to be your primary content: it can can easily be
skimmed, indexed and searched, glanced through in a matter of seconds, and
parts of a paragraph be re-read with ease; a screencast or video, to cite an
extreme example, must be watched one excruciating minute at a time (not
to mention the challenges non-textual media pose to visually impaired users).
That’s it - with the above advice in mind, you should find that you soon
spend a lot more time on defining your thoughts and putting the important
information forward rather than fiddling with font faces, sizes and colors.
And remember, if plain text is good enough for RFCs, the standards used
to define the internet and just about everything running on it, then it’s quite
likely going to be good enough for you.
Problems and Exercises
Problems
1. Identify a tool or utility you use on a regular basis which does not have
an adequate manual page. Write the fine manual! Submit it to the
tool’s author/maintainer.
3. Many Open Source projects also are transparent in the ways that their
infrastructure is maintained and operated – identify a major project
and review their documentation. Compare to how different companies
present their best practices (for example on a company blog, in presen-
tations at conferences, in articles about their infrastructure, ...). Does
documentation play a significant role?
68
BIBLIOGRAPHY 69
Bibliography
[1] Thomas A. Limoncelli, Time Management for System Administrators,
O’Reilly Media, 2005
[3] William Strunk Jr., E. B. White, The Elements of Style, Longman, 4th
Edition, 1999 (The original text is now in the public domain and avail-
able online, for example at http://www.gutenberg.org/ebooks/37134
(visited January 23, 2017).)
[6] Jakob Nielsen, Writing for the Web, miscellaneous papers and links;
on the Internet at http://www.useit.com/papers/webwriting/ (visited
April 06, 2012)
70
71
We will conclude our whirlwind tour across all the diverse areas of Sys-
tem Administration in Part IV, hinting at the fact that we really only have
barely scratched the surface in many ways. We will outline major industry
trends and developments that we did not have the time or space to include
here in Chapter 15, before circling back to the definition of our profession in
Chapters 16 and 17, where we elaborate on the legal and ethical obligations
and considerations before we take a brief look at what might lie ahead.
As you can tell, each topic is far reaching, and we cannot possibly cover
them all in every possible detail. For this reason, we focus not on specific
examples but on the basic principles. Instructors should try to choose real-
world examples from personal experience to illustrate some of these concepts
in class, as we will present some case studies where appropriate. Students, on
the other hand, are encouraged to relate the topic to their own experiences
and to deepen research based on their interests.
Chapter 4
Disks are always full. It is futile to try to get more disk space.
Data expands to fill any void. – Parkinson’s Law as applied to
disks
4.1 Introduction
This chapter deals primarily with how we store data. Virtually all computer
systems require some way to store data permanently; even so-called “diskless”
systems do require access to certain files in order to boot, run and be useful.
Albeit stored remotely (or in memory), these bits reside on some sort of
storage system.
Most frequently, data is stored on local hard disks, but over the last few
years more and more of our files have moved “into the cloud”, where different
providers offer easy access to large amounts of storage over the network.
That is, we have more and more computers depending on access to remote
systems, shifting our traditional view of what constitutes a storage device.
74
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 75
disks can be combined to create a single logical storage unit through the use
of a Logical Volume Manager (LVM) or Redundant Array of Independent
Disks (RAID). This allows for improved performance, increased amount of
storage and/or redundancy. We will discuss these concepts in more detail in
Section 4.4.
Direct attached storage need not be physically located in the same case
(or even rack) as the server using it. That is, we differentiate between in-
ternal storage (media attached inside the server with no immediate external
exposure) and external storage (media attached to a server’s interface ports,
such as Fibre Channel, USB etc.) with cables the lengths of which depend
on the technology used. External media allows us to have large amounts of
storage housed in a separate enclosure with its own power supply, possibly
located several feet away from the server. If a server using these disks suf-
fers a hardware failure, it becomes significantly easier to move the data to
another host: all you need to do is connect the cable to the new server.
Simple as this architecture is, it is also ubiquitous. The advantages of
DAS should be obvious: since there is no network or other additional layer
in between the operating system and the hardware, the possibility of failure
on that level is eliminated. Likewise, a performance penalty due to network
latency, for example, is impossible. As system administrators, we frequently
need to carefully eliminate possible causes of failures, so the fewer layers of
indirection we have between the operating system issuing I/O operations and
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 78
storage. In order for the clients to be able to use the server’s file system
remotely, they require support for (and have to be in agreement with) the
protocols used3 . However, the clients do not require access to the storage
media on the block level; in fact, they cannot gain such access.
From the clients’ perspective, the job of managing storage has become
simpler: I/O operations are performed on the file system much as they would
be on a local file system, with the complexity of how to shuffle the data
over the network being handled in the protocol in question. This model
is illustrated in Figure 4.2, albeit in a somewhat simplified manner: even
though the file system is created on the file server, the clients still require
support for the network file system that brokers the transaction performed
locally with the file server.
In contrast with DAS, a dedicated file server generally contains signifi-
3
The most common protocols in use with network attached storage solution are NFS
on the Unix side and SMB/CIFS on the Windows side. The Apple Filing Protocol (AFP)
is still in use in some predominantly Mac OS environments, but Apple’s adoption of Unix
for their Mac OS X operating system made NFS more widespread there as well.
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 80
cantly more and larger disks; RAID or LVM may likewise be considered a
requirement in this solution, so as to ensure both performance and failover.
Given the additional overhead of transferring data over the network, it comes
as no surprise that a certain performance penalty (mainly due to network
speed or congestion) is incurred. Careful tuning of the operating system and
in particular the network stack, the TCP window size, and the buffer cache
can help minimize this cost.
The benefits of using a central file server for data storage are immediate
and obvious: data is no longer restricted to a single physical or virtual host
and can be accessed (simultaneously) by multiple clients. By pooling larger
resources in a dedicated NAS device, more storage becomes available.
Other than the performance impact we mentioned above, the distinct
disadvantage lies in the fact that the data becomes unavailable if the net-
work connection suffers a disruption. In many environments, the network
connection can be considered sufficiently reliable and persistent to alleviate
this concern. However, such solutions are less suitable for mobile clients,
such as laptops or mobile devices, which frequently may disconnect from and
reconnect to different networks. Recent developments in the area of Cloud
Storage have provided a number of solutions (see Section 4.2.4), but it should
be noted that mitigation can also be found in certain older network file sys-
tems and protocols: the Andrew File System (AFS), for example, uses a local
caching mechanism that lets it cope with the temporary loss of connectivity
without blocking.
While network attached storage is most frequently used for large, shared
partitions or data resources, it is possible to boot and run a server entirely
without any direct attached storage. In this case, the entire file system,
operating system kernel and user data may reside on the network. We touch
on this special setup in future chapters.
Figure 4.3: A SAN providing access to three devices; one host accesses parts
of the available storage as if it was DAS, while a file server manages other
parts as NAS for two clients.
Figure 4.3 illustrates how the storage volumes managed within a SAN
can be accessed by one host as if it was direct attached storage while other
parts are made available via a file server as NAS to different clients. In order
for the different consumers in a SAN to be able to independently address
the distinct storage units, each is identified by a unique Logical Unit Number
(LUN). That is, the system administrator combines the individual disks via
RAID, for example, into separate volumes; assigning to each storage unit an
independent LUN allows for correct identification by the clients and prevents
access of data by unauthorized servers, an important security mechanism.
Fibre Channel switches used in SANs allow further partitioning of the fabric
by LUNs and subdivision into SAN Zones, allowing sets of clients specifically
access to “their” storage device only.
In this storage model the clients – the computers, file servers or other
devices directly attached to the SAN – are managing the volumes on a block
level (much like a physical disk, as discussed in Section 4.3.1). That is, they
need to create a logical structure on top of the block devices (as which the
SAN units appear), and they control all aspects of the I/O operations down
to the protocol. With this low-level access, clients can treat the storage like
any other device. In particular, they can boot off SAN attached devices, they
can partition the volumes, create different file systems for different purposes
on them and export them via other protocols.
Storage area networks are frequently labeled an “enterprise solution” due
to their significant performance advantages and distributed nature. Espe-
cially when used in a switched fabric, additional resources can easily be
made available to all or a subset of clients. These networks utilize the Small
Computer System Interface (SCSI) protocol for communications between the
different devices; in order to build a network on top of this, an additional
protocol layer – the Fibre Channel Protocol (FCP) being the most common
one – is required. (We will review the various protocols and interfaces in
Section 4.3.)
SANs overcome their restriction to a local area network by further encap-
sulation of the protocol: Fibre Channel over Ethernet (FCoE) or iSCSI, for
example, allow connecting switched SAN components across a Wide Area
Network (or WAN). But the concept of network attached storage devices fa-
cilitating access to a larger storage area network becomes less accurate when
end users require access to their data from anywhere on the internet. Cloud
storage solutions have been developed to address these needs. However, as
we take a closer look at these technologies, it is important to remember that
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 83
Figure 4.4: A possible cloud storage model: an internal SAN is made available
over the internet to multiple clients. In this example, the storage provider
effectively functions as a NAS server, though it should generally be treated
as a black box.
offer clients access on the block level, allowing them to create file systems
and partitions as they see fit (examples include Amazon’s Elastic Block Store
(EBS) and OpenStack Cinder service).
All of these categories have one thing in common, however. In order
to provide the ability of accessing storage units in a programmatic way –
a fundamental requirement to enable the flexibility needed in demanding
environments – they rely on a clearly specified API. Multiple distributed
resources are combined to present a large storage pool, from which units are
allocated, de-allocated, re-allocated, relocated, and duplicated, all by way of
higher-level programs using well-defined interfaces to the lower-level storage
systems.
Customers of cloud storage solution providers reap a wealth of benefits,
including: their infrastructure is simplified through the elimination of storage
components; storage units are almost immediately made available as needed
and can grow or shrink according to immediate or predicted usage patterns;
applications and entire OS images can easily be deployed, imported or ex-
ported as virtual appliances.
Of course these benefits carry a cost. As usual, any time we add layers
of abstraction we also run the risk of increasing, possibly exponentially, the
number of ways in which a system can fail. Cloud storage is no exception:
by relying on abstracted storage containers from a third-party provider, we
remove the ability to troubleshoot a system end-to-end; by outsourcing data
storage, we invite a number of security concerns regarding data safety and
privacy; by accessing files over the internet, we may increase latency and
decrease throughput; the cloud service provider may become a single point
of failure for our systems, one that is entirely outside our control.
ubiquitous hard drive5 , storing data on rotating, magnetic platters (see Fig-
ure 4.5a). Understanding the physical structure of these traditional storage
devices is important for a system administrator, as the principles of espe-
cially the addressing modes and partition schemas used here come into play
when we look into how file systems manage data efficiently.
Figure 4.5: An open PATA (or IDE) hard drive (left) and a Solid State Drive
(right). The HDD shows the rotating disk platters, the read-write head with
its motor, the disk controller and the recognizable connector socket.
Hands-on Hardware
I have found it useful to have actual hardware in class whenever
possible. In the past, I have brought with me different hard
drives by different manufacturers and with different interfaces
7
Hot-swapping was a standard feature in many SCSI implementations. Many system
administrators in charge of the more powerful servers, using larger and more performant
SCSI drives when compared to the individual workstations at the time, were rather fond of
this: when a disk failed, it could be replaced without scheduling downtime for all services.
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 89
quiring the read-write heads to move. This is why disk partitions comprise
multiple cylinder groups rather than, as we may sometimes imagine them to
be, pie wedges.
Figure 4.6: On the left: Illustration of tracks and sectors on a hard disk.
Note that for simplicity, sectors on the inside and outside of the platters are
of identical size. On disks with Zone Bit Recording (shown on the right),
this is no longer the case.
The tracks on each platter are in turn divided into a number of sectors.
If you were to divide a disk with concentric tracks by drawing straight lines
from the center to the outer edge, you would quickly realize that even though
you end up with the same number of such sectors on each track, the fields
created in this manner would be of varying sizes: the ones on the outer edge
would be significantly larger than the ones in the middle of the disc (see
Figures 4.6a and 4.6b).
As these fields represent the smallest addressable unit on a hard drive,
we would be wasting a lot of disk space. Instead, each sector is kept at a
fixed size8 and use a technique known as Zone Bit Recording to store more
sectors on the outer tracks of each disc than on the inner area.
8
512 bytes used to be the industry standard for hard drives for many years; since around
2010, a number of vendors have started creating hard drives with a physical block size
of 4096 bytes. Interestingly, file systems having standardized on 512 byte blocks tend to
divide these blocks and continue to present to the OS 512 byte “physical” blocks.
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 91
The total number of 512 byte sectors across all platters of the hard drive
thus define its total capacity. In order to access each storage unit, the read-
write head needs to be moved radially to the correct cylinder – a process we
call seeking – and the platter then spun until the correct sector is positioned
under it. In the worst case scenario, we just passed the sector we wish to
access and we have to perform a full rotation. Therefore, a drive’s perfor-
mance is largely defined by this rotational latency and the time to position
the read-write head (also known as seek time).
Since the motor of a drive may rotate the discs at a constant linear velocity
(versus constant angular velocity), the discs are in fact moving slower near
the spindle than on the outer part. This means that more sectors could
be read in a given time frame from the outside of the discs than from the
inside. In fact, it used to be not uncommon for system administrators to
partition their disks such that large, frequently accessed files would reside on
cylinders near the beginning (i.e. the outside) of the platters. Nowadays such
fine tuning may no longer be common, but instead people have started to
simply create a single partition occupying only about 25% of the disk at the
beginning and ignoring the rest. This technique, known as “short stroking”,
may seem wasteful, but the performance gain compared to the cheap prices
of today’s HDDs may make it actually worthwhile.
It is worth noting that the physical disk structure described here applies
only to traditional mechanical hard drives, not to Solid State Drives or other
storage media. Nevertheless, it is useful to understand the structure, as a
number of file system or partitioning conventions derive directly from these
physical restrictions.
4.4.1 Partitions
Now that we understand the physical layout of the hard disks, we can take
a look at how we partitions are created and used. As we noted in the pre-
vious section, a disk partition is a grouping of adjacent cylinders through
all platters of a hard drive9 . Despite this unifying principle, we encounter
a variety of partition types and an abundance of related terminology: there
are partition tables and disklables, primary and extended partitions; there
are whole-disk partitions, disks with multiple partitions, and some of the
partitions on a disk may even overlap.
Different file systems and anticipated uses of the data on a disk require
different kinds of partitions. First, in order for a disk to be bootable, we
require it to have a boot sector, a small region that contains the code that
the computer’s firmware (such as the BIOS) can load into memory. In fact,
9
Some operating systems such as Solaris, for example, have traditionally referred to
partitions as “slices”; some of the BSD systems also refer to disk “slices” within the context
of BIOS partitions. Unfortunately, this easily brings to mind the misleading image of a
slice of pie, a wedge, which misrepresents how partitions are actually laid out on the disk.
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 93
# fdisk -l
Disk / dev / cciss / c0d0 : 73.3 GB , 73372631040 bytes
255 heads , 63 sectors / track , 8920 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
this is precisely what a BIOS does: it runs whatever code it finds in the
first sector of the device so long as it matches a very simple boot signature.
That is, regardless of the total capacity of the disk in question, the code that
chooses how or what to boot needs to fit into a single sector, 512 bytes. On
most commodity servers10 , this code is known as the Master Boot Record or
MBR. In a classical MBR the bootstrap code area itself takes up 446 bytes
and each partition entry requires 16 bytes. As a result, we can have at most
four such BIOS partitions that the MBR can transfer control to.
Sometimes you may want to divide the available disk space into more
than four partitions. In order to accomplish this, instead of four primary
partitions, the MBR allows you to specify three primary and one so-called
extended partition, which can be subdivided further as needed. When the
system boots, the Basic Input/Output System (BIOS) will load the MBR
code, which searches its partition table for an “active” partition, from which
it will then load and execute the boot block. This allows the user to run
multiple operating systems from the same physical hard drive, for example.
BIOS partitions are usually created or maintained using the fdisk(8) utility.
10
Even though many other hardware architectures used to be dominant in the server
market, nowadays the x86 instruction set (also known as “IBM PC-compatible” computers)
has replaced most other systems. For simplicity’s sake, we will assume this architecture
throughout this chapter.
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 94
Just as the first sector of the disk contains the MBR, so does the first
sector of a BIOS partition contain a volume boot record, also known as a par-
tition boot sector. In this sector, the system administrator may have placed a
second-stage bootloader, a small program that allows the user some control
over the boot process by providing, for example, a selection of different ker-
nels or boot options to choose from. Only one partition is necessary to boot
the OS, but this partition needs to contain all the libraries and executables
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 95
to bootstrap the system. Any additional partitions are made available to the
OS at different points during the boot process.
In the BSD family of operating systems, the volume boot record contains
a disklabel, detailed information about the geometry of the disk and the par-
titions it is divided into. Listing 4.2 shows the output of the disklabel(8)
command on a NetBSD system. You can see the breakdown of the disk’s
geometry by cylinders, sectors and tracks and the partitioning of the disk
space by sector boundaries. This example shows a 40 GB11 disk containing
three partitions, a 10 GB root partition, a 512 MB swap partition and a data
partition comprising the remainder of the disk. Since the disk in question
is actually a virtual disk, the information reported relating to the hardware,
such as the rpm rate, for example, is obviously wrong and should be ignored.
This serves as a reminder that a system administrator always may need to
have additional background knowledge (“this host is a virtual machine with
a virtual disk”) to fully understand the output of her tools in order to make
sense of it. Note also that some partitions overlap. This is not a mistake: the
BSD disklabel includes information about both the entire disk (partition ’d’)
as well as the BIOS partition assigned to NetBSD (partition ’c’, starting at
offset 63). Other partitions, i.e. partitions actually used by the OS, should
not overlap.
Being able to read and understand the detailed output of these commands
is important, and students are encouraged to practice making sense of differ-
ent partition schemas across different operating systems (see exercise 5).
Dividing a single large disk into multiple smaller partitions is done for a
number of good reasons: first, as we discussed, if you wish to install mul-
tiple operating systems, you need to have dedicated disk space as well as
a bootable primary partition for each OS. You may also use partitions to
ensure that data written to one location (log files, for example, commonly
stored under e.g. /var/log) cannot cause you to run out of disk space in
another (such as user data under /home). Other reasons to create different
partitions frequently involve the choice of file system or mount options, which
necessarily can be applied only on a per-partition basis. We will discuss a
number of examples in Section 4.5.
11
(78140160sectors ∗ 512bytes/sector)/(10243 bytes/GB) = 37.26GB
Note the difference in actual versus reported disk size:
(40 ∗ 230 − 40 ∗ 109 )/(10243 ) = 40 − 37.26 = 2.74
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 96
8 partitions :
# size offset fstype [ fsize bsize cpg / sgs ]
a : 20972385 63 4.2 BSD 4096 32768 1180 # ( Cyl .
0* - 20805)
b : 1048320 20972448 swap # ( Cyl .
20806 - 21845)
c : 78140097 63 unused 0 0 # ( Cyl .
0* - 77519)
d : 78140160 0 unused 0 0 # ( Cyl .
0 - 77519)
e : 56119392 22020768 4.2 BSD 4096 32768 58528 # ( Cyl .
21846 - 77519)
#
Figure 4.7: Logical Volume Management lets you combine multiple physical
disks or partitions into a single volume group, from which logical volumes can
be allocated.
The LVM divides the physical volumes into data blocks, so-called physical
extents, and allows the system administrator to group one or more of these
physical volumes into a logical volume group. In effect, available storage
space is combined into a pool, where resources can dynamically be added or
removed. Out of such a volume group, individual logical volumes can then
be created, which in turn are divided into the equivalent of a hard disk’s
sectors, so-called logical extents. This step of dividing a logical volume group
into logical volumes is conceptually equivalent to the division of a single hard
drive into multiple partitions; in a way, you can think of a logical volume
as a virtual disk. To the operating system, the resulting device looks and
behaves just like any disk device: it can be partitioned, and new file systems
can be created on them just like on regular hard drive disks.
By creating logical extents as storage units on the logical volumes, the
LVM is able to grow or shrink them with ease (the data corresponding to
the logical extends can easily be copied to different physical extents and
remapped), as well as implement data mirroring (where a single logical extent
maps to multiple physical extents).
Since logical volume management is a software solution providing an ad-
ditional layer of abstraction between the storage device and the file system,
it can provide additional features both on the lower level, such as data mir-
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 99
4.4.3 RAID
Logical Volume Managers provide a good way to consolidate multiple disks
into a single large storage resource from which individual volumes can be
created. An LVM may also provide a performance boost by striping data, or
redundancy by mirroring data across multiple drives.
Another popular storage technology used for these purposes is RAID,
which stands for Redundant Array of Independent Disks13 . Multiple disks
can be combined in a number of ways to accomplish one or more of these
goals: (1) increased total disk space, (2) increased performance, (3) increased
data redundancy.
Much like an LVM, so does RAID hide the complexity of the management
of these devices from the OS and simply presents a virtual disk comprised
of multiple physical devices. However, unlike with logical volume manage-
ment, a RAID configuration cannot be expanded or shrunk without data loss.
Furthermore, an LVM is a software solution; RAID can be implemented on
either the software or the hardware layer. In a hardware RAID solution, a
dedicated disk array controller (such as a PCI card) is installed in the server,
and interacted with via controlling firmware or host-level client tools. As a
software implementation, an LVM may provide RAID capabilities, as may
certain file systems. Sun Microsystem’s ZFS, for example, includes subsys-
tems that provide logical volume management and offer RAID capabilities,
13
The acronym “RAID” is sometimes expanded as “Redundant Array of Inexpensive
Disks; since disks have become less and less expensive over time, it has become more
customary to stress the “independent” part. A cynic might suggest that this change in
terminology was driven by manufacturers, who have an interest in not explicitly promising
a low price.
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 100
RAID 0
By writing data blocks in parallel across all available disks (see Figure 4.8a),
RAID 0 accomplishes a significant performance increase. At the same time,
available disk space is linearly increased (i.e. two 500 GB drives yield 1 TB
of disk space, minus overhead). However, RAID 0 does not provide any fault
tolerance: any disk failure in the array causes data loss. What’s more, as
you increase the number of drives, you also increase the probability of disk
failure.
RAID 1
This configuration provides increased fault tolerance and data redundancy
by writing all blocks to all disks in the array, as shown in Figure 4.8b. When
a disk drive fails, the array goes into degraded mode, with all I/O operations
continuing on the healthy disk. The failed drive can then be replaced, and
the RAID controller rebuilds the original array, copying all data from the
healthy drive to the new drive, after which full mirroring will again happen
for all writes. This fault tolerance comes at the price of available disk space:
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 101
for an array with two drives with a 500 GB capacity, the total available space
remains 500 GB.
RAID 5
This level provides a bit of both RAID 0 and RAID 1: data is written
across all available disks, and for each such stripe the data parity is recorded.
Unlike in levels 2 through 4, this parity is not stored on a single, dedicated
parity drive, but instead distributed across all disks. See Figure 4.8c for an
illustration of the block distribution.
Since parity information is written in addition to the raw data, a RAID 5
cannot increase disk capacity as linearly as a RAID 0. However, any one of
the drives in this array can fail without impacting data availability. Again,
as in the case of a RAID 1 configuration, the array will go into degraded
mode and get rebuilt when the failed disk has been replaced. However, the
performance of the array is decreased while a failed drive remains in the
array, as missing data has to be calculated from the parity; the performance
is similarly reduced as the array is being rebuilt. Depending on the size of
the disks inquestion, this task can take hours; all the while the array remains
in degraded mode and another failed drive would lead to data loss.
(a) Block-level striping (b) Mirroring (c) Block-level striping with distributed parity
Figure 4.8: Three of the most common RAID levels illustrated. RAID 0
increases performance, as blocks are written in parallel across all available
disks. RAID 1 provides redundancy, as blocks are written identically to all
available disks. RAID 5 aims to provide increased disk space as well as
redundancy, as data is striped and parity information distributed across all
disks.
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 102
the server and known to the RAID controller, but that are inactive until a
disk failure is detected. At that time, the array is immediately rebuilt using
this stand-by drive; when the faulty disk has been replaced, the array is
already in non-degraded mode and the new disk becomes the hot spare.
Figure 4.9: An Apple Xserve RAID, a now discontinued storage device with
14 Ultra-ATA slots offering Fibre Channel connectivity and implementing a
number of RAID levels in hardware as well as software across two independent
controllers.
The “left” RAID was dedicated to storing large amounts of video and
audio data made available to clients running Mac OS X. We connected
the RAID controller via Fibre Channel to a SAN switch, and from there
to an Apple Xserve network server, which managed the HFS+ file system
on this storage component.
The second 2.2 TB of storage space, the ”right” side of the array, was
meant to become the central data space for all workstations in the Com-
puter Science and Mathematics departments as well as their laboratories.
Up until then, this file space had been provided via NFS from a two-mod-
ule SGI Origin 200 server running IRIX, managing a few internal SCSI
disks as well as some Fibre Channel direct attached storage. We intended
to migrate the data onto the XServe RAID, and to have it served via a
Solaris 10 server, allowing us to take advantage of several advanced fea-
tures in the fairly new ZFS and to retire the aging IRIX box.
Neatly racked, I connected the second RAID controller and the new So-
laris server to the SAN switch, and then proceeded to create a new ZFS
file system. I connected the Fibre Channel storage from the IRIX server
and started to copy the data onto the new ZFS file system. As I was
sitting in the server room, I was able to see the XServe RAID; I noticed
the lights on the left side of the array indicate significant disk activity,
but I initially dismissed this as not out of the ordinary. But a few seconds
later, when the right side still did not show any I/O, it dawned on me:
the Solaris host was writing data over the live file system instead of onto
the new disks!
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 105
Now it was interesting to note that at the same time as I was overwriting
the live file system, data was still being written to and read from the
HFS+ file system on the Apple server. I was only able to observe inter-
mittent I/O errors. Thinking I could still save the data, I made my next
big mistake: I shut down the Apple server, hoping a clean boot and file
system check could correct what I still thought was a minor problem.
Unfortunately, however, when the server came back up, it was unable to
find a file system on the attached RAID array! It simply could not iden-
tify the device. In retrospect, this is no surprise: the Solaris server had
constructed a new (and different) file system on the device and destroyed
all the HFS+ specific file system meta data stored at the beginning of
the disks. That is, even though the blocks containing the data were likely
not over written, there was no way to identify them. After many hours
of trying to recreate the HFS+ meta data, I had to face the fact this
was simply impossible. What was worse, I had neglected to verify that
backups for the server were done before putting it into production use –
fatal mistake number three! The data was irrevocably lost; the only plus
side was that I had learned a lot about data recovery, SAN zoning, ZFS,
HFS+ and file systems in general.
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 106
not usually think of as such15 . This leads us to special purpose file systems
that really only provide a file I/O API to their respective resources: the
procfs file system, representing information about the system’s processes and
the devfs file system, a virtual file system acting as an interface to device
drivers are two examples.
file systems, including for example ext2, for a long time the default file system
for Linux systems.
These traditional file systems suffer a notable drawback in the case of
unexpected power failure or a system crash: some of the data may not have
been committed to the disk yet, causing a file system inconsistency. When
the OS boots up again, it will need to perform time consuming checks (see
fsck(8)) and any problems found may not actually be recoverable, leading to
possible data loss. In order to address these problems, a so-called journaled
file system might choose to first write the changes it is about to make to
a specific location (the journal) before applying them. In the event of a
crash, the system can then simply replay the journal, yielding a consistent
file system in a fraction of the time it would take a traditional file system to
traverse the entire hierarchy to ensure consistency.
There are many different journaled file systems, including a number of
commercial implementations (such as Apple’s HFS+, IBM’s JFS, or SGI’s
XFS) and open source variants (including ext3, ext4 and reiserfs for Linux
and updates to UFS/FFS providing support for journaling or logging).
Despite the differences in how data is ultimately written to the disk, the
fundamental concepts of how data and metadata are managed, the use of
the inode structures, or the Virtual File System layer to allow the kernel to
support multiple different file systems remains largely the same across the
different Unix operating and file systems.
Figure 4.9: The Unix file system is a tree-like structure, rooted at /; different
file systems can be attached at different directories or mount points. In this
illustration, /home and /usr reside on separate disks from /.
$ pwd
/ home / jschauma # pwd (1) writes the absolute pathname
$ echo ’ hello ’ > file # create ’ file ’ in the current directory
$ cd / usr / share / doc # cd (1) using an absolute pathname
$ pwd
/ usr / share / doc # no surprise here
$ cat / home / jschauma / file # now using an absolute pathname
hello
$ cd ../../../ home / jschauma # cd (1) using a relative pathname
$ pwd
/ home / jschauma
$ cat ./ file # ’./ file ’ also works
hello
$
Listing 4.3: Absolute pathnames begin with a / and are resolved from the
root of the file system; relative pathnames are resolved from the current
working directory.
the physical disk, thus minimizing seek time. The Unix File System therefore
consisted of the following major components, as illustrated in Figure 4.10:
Let us dive a little bit deeper and think about how the Unix File System
manages disk space: A file system’s primary task is to store data on behalf
of the users. In order to read or write this data, it needs to know in which
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 114
logical blocks it is located. That is, the file system needs a map of the blocks,
a way to identify and address each location. This is accomplished by way
of the inode and data block maps: the total number of inodes represents the
number of files that can be referenced on this file system, while the data
blocks represent the space in which the file data is stored.
As we noted, a data block is, necessarily, of a fixed size. That means
that if we wish to store a file that is larger than a single block, we have to
allocate multiple blocks and make a note of which blocks belong to the given
file. This information is stored as pointers to the disk blocks within the inode
data structure.
Unfortunately, however, not all files will be multiples of the logical block
size. Likewise, it is possible that files will be smaller than a single block.
In other words, we will always end up with blocks that are only partially
allocated, a waste of disk space. In order to allow more efficient management
of small files, UFS allowed a logical block to be divided further into so-called
fragments, providing for a way to let the file system address smaller units of
storage. The smallest possible fragment size then is the physical block size
of the disk, and logical blocks are only fragmented when needed.
A pointer to the data blocks and fragments allocated to a given file is
stored in the inode data structure, which comprises all additional information
about the file. This allows for an elegant separation of a file’s metadata,
which takes up only a fixed and small amount of disk space, and its contents.
Accessing the metadata is therefore independent of the file size (itself a piece
of metadata), allowing for efficient and fast retrieval of the file’s properties
without requiring access of the disk blocks. Other pieces of information
stored in the inode data structure include the file’s permissions, the numeric
user-id of the owner, the numeric group-id of the owner, the file’s last access,
modification and file status change times19 , the number of blocks allocated
for the file, the block size of the file system and the device the file resides on,
and of course the inode number identifying the data structure.
Now humans tend to be rather bad at remembering large numbers and
prefer the use of strings to represent a file, but the one piece of information
that is not stored in the inode is the file name. Instead, the Unix File System
allows for a mapping between a file name and its unique identifier – its inode
19
A file’s ctime, it’s time of last file status change, is frequently misinterpreted to be the
file’s creation time; most file systems, including UFS, do not store this kind of information.
Instead, the ctime reflects the last time the meta information of a file was changed.
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 115
$ ls - ai /
2 . 5740416 cdrom 5816448 libexec
2775168 stand
2 .. 1558656 dev 1862784 mnt
2166912 tmp
3003280 . cshrc 988439 emul 4 netbsd
3573504 usr
3003284 . profile 342144 etc 1824768 proc
3497472 var
3421440 altroot 1026432 home 798336 rescue
5702400 bin 3763584 lib 3003264 root
3 boot . cfg 2204928 libdata 5588352 sbin
$
Listing 4.4: Use of the ls(1) command on a NetBSD system to illustrate how
file names are mapped to inode numbers in a directory. (Note that in the
root directory both ’.’ and ’..’ have the same inode number as in this special
case they actually are the same directory.)
Listing 4.5: Sample output of the stat(1) command on a Linux system show-
ing the various pieces of information stored in the inode data structure for
the file “/etc/passwd”.
In addition to regular files (i.e. hard links), directories and symbolic links,
the following file types are supported by the traditional Unix file system:
• Block special devices – an interface for disk-like devices, providing buffered
and non-sequential I/O.
• Character special devices – an interface for communication devices such
as keyboards, mice, modems, or terminals, providing unbuffered I/O. A
number of virtual, so-called pseudo-devices such as /dev/null, /dev/zero,
or /dev/urandom, for example, are also accessed as character special
devices.
• Named pipes or FIFOs – another inter-process communications end-
point in the file system. This type of file represents a manifestation
of the traditional Unix pipe in the file system name space; I/O is per-
formed using the same methods as for any regular file. As data is
merely passed between processes, a FIFO always has a file size of zero
bytes.
• Unix domain sockets – an inter-process communications endpoint in the
file system allowing multiple processes with no shared ancestor process
to exchange information. Communication happens via the same API
as is used for network sockets.
The file type as well as its permissions and all the other properties of
a file can be inspected using the stat(1) command (see Listing 4.5 for an
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 117
example), or, perhaps more commonly, using the ls(1)l command (as illus-
trated in Figure 4.11). This command is so frequently used and its output
so ubiquitous that any system administrator can recite the meaning of the
fields in their sleep. The semantics and order in which permissions are ap-
plied, however, include a few non-obvious caveats, which is why we will look
at the Unix permissions model in more detail in Chapter 6.
Figure 4.11: The default output of the ls -l command includes most of the
metadata of a given file.
Over the last 30 years, UFS has served as the canonical file system for
almost all Unix versions. With time, a number of changes have become
necessary: larger storage devices required not only updates to the block
addressing schemas, but also to the different data types representing various
file system aspects; today’s huge amounts of available data storage have made
log-based file systems or journaling capabilities a necessity, and massively
distributed data stores pose entirely different requirements on the underlying
implementation of how the space is managed.
Yet through all this, as enhancements have been made both by commer-
cial vendors as well as by various open source projects, the principles, the very
fundamentals of the Unix File System have remained the same: the general
concept of the inode data structure and the separation of file metadata from
the data blocks have proven reliable and elegant in their simplicity. At the
same time, this simplicity has proven to yield scalability and adaptability:
the persistent idea that “everything is a file”, and that files simply store bytes
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 118
4.8 Conclusions
Throughout this chapter, we have built our understanding of file systems and
storage models from the ground up. We have seen how simple concepts are
combined to construct increasingly complex systems – a pattern that weaves
like a red thread through all areas we cover. We noted the circular nature of
technology development: the simple DAS model repeats itself, albeit more
complex and with additional layers, in common SANs, much as network
attached storage utilizes both DAS and SAN solutions, yet is taken to another
extreme in cloud storage solutions.
Being aware of the physical disk structure helps us understand a file sys-
tem’s structure: we realize, for example, that concentric cylinder groups make
up partitions, and that the location of data blocks within these cylinders may
have an impact on I/O performance. What’s more, the concepts of file sys-
tem blocks become clearer the further we deepen our knowledge of both the
logical and physical components, and the distinction of metadata from the
actual contents of a file allow us to explain how the various file system related
Unix tools operate on a fairly low level as well as how to tune these values
at file system creation time. As system administrators, this understanding
is crucial.
But we have been making a point of noting the three pillars of strong
system architecture and design: Scalability, Security, Simplicity. What role
do these play in the context of file systems and storage models?
Problems
1. Identify the storage area model(s) predominantly used in your envi-
ronment(s). What kind of problems with each do you frequently en-
counter? Would changing the storage model be a feasible solution to
these problems? Why or why not?
4. Ask your system administrators if they have any old or broken hard
drives, if possible from different manufacturers or with different capac-
ities. Open up the drives and identify the various components. How
many read-write heads are there? How many platters? How do the
different models differ?
6. Compare the various composite RAID levels and analyze their respec-
tive fault tolerance and mean time to recovery in the case of one or
121
CHAPTER 4. OF FILE SYSTEMS AND STORAGE MODELS 122
(a) Create a new file, then create a second hard link for this file. Verify
that both files are completely identical by using ls(1), stat(1)
and by appending data to one of the files and reading from the
other.
(b) Rename the original file and repeat – what changed? Why?
(c) Create a new file, then create a symbolic link for this file. Verify
that both the original file and the symbolic link are unique by
inspecting their inode numbers. Then append data to the file using
the regular name and confirm that reading from the symbolic link
yields the same data.
(d) Rename the original file and repeat – what changed? Why?
15. Create a very large file. Measure how long it takes to rename the file
within one directory using the mv(1) command. Next, use mv(1) to
move the file into a directory on a different file system or partition.
What do you observe? Explain the difference.
BIBLIOGRAPHY 124
Bibliography
[1] EMC Education Service, Information Storage and Management: Stor-
ing, Managing, and Protecting Digital Information, John Wiley & Sons,
2009
[3] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The Google
File System, in “Proceedings of the nineteenth ACM symposium on Op-
erating systems principles”, 2003, ACM, New York, NY; also available
on the Internet at
http://research.google.com/archive/gfs-sosp2003.pdf
(visited September 6th, 2012)
[4] The Hadoop Distributed File System: Architecture and Design, on the
Internet at
https://hadoop.apache.org/common/docs/r0.14.4/hdfs_design.html
(visited September 6th, 2012)
[9] Marshall Kirk McKusick, William N. Joy, Samuel J. Leffler and Robert
S. Fabry A Fast File System for UNIX, ACM Transactions on Computer
Systems 2 (3), 1984, ACM, New York, NY; also available on the Internet
BIBLIOGRAPHY 125
at
http://www.cs.berkeley.edu/˜brewer/cs262/FFS.pdf (visited Septem-
ber 9, 2012)
Chapter 5
5.1 Introduction
Having covered the details of file systems and storage devices in the previous
chapter, we can now focus on installing an operating system and adding
software according to the intended purpose of the server. Most people never
have to actually perform an OS installation, but System Administrators are
not most people. We routinely set up new systems, create new services, and
need to be able to fine-tune the software from the very lowest layers, defining
126
CHAPTER 5. SOFTWARE INSTALLATION AND PACKAGE MANAGEMENT127
exactly what software is installed, what drivers the kernel needs to support,
what modules it should load, and what services should be started at boot
time.
In addition, even within a single host some subsystems that control com-
ponents on a lower layer than even the OS kernel, such as a RAID controller or
a Fibre Channel HBA, are driven by their own firmware, accessed by the OS
via its kernel drivers and exposed to the administrative users via additional
software tools. All of these types of software have specific requirements, yet
all of them also have in common the general mechanisms to install, update,
maintain and remove the software.
In this chapter we will discuss in some detail the general principles under-
lying all software installations, beginning with a distinction of the different
types of software we commonly encounter, most notably the two main cate-
gories of OS components and add-on or third-party software. We will look
at how an operating system is installed on a server and what installation
mechanisms are commonly used. Since different operating systems install
different components into different locations, we will also take a close look at
the file system layout and explain some of the reasons behind the common
hierarchy found on most Unix systems.
Later in the chapter, we will cover concepts of software package manage-
ment that allow a system administrator to not only add or remove software
with ease, but to help maintain software runtime dependencies and provide
a consistent way to upgrade software when needed. Unfortunately, software
package management is, as of late 2012, still not a solved problem: manag-
ing software updates and security patches brings with it a surprising number
of complex problems, in part due to competing solutions in this problem
domain.
Every system administrator has a preferred operating system, and this
preference is often the result of years of experience and familiarity not only
with the kernel and libraries that make up the OS, but frequently also due
to the package management system in use on the host. Just how, exactly,
a package manager maintains software dependencies, what kinds of assump-
tions it makes about the system it is running on, its purpose, and its many
uses all reflect deeply on the OS development philosophy. The way software
updates are handled speaks volumes about both the system as well as its ad-
ministrators – the better we understand software installation concepts, the
better we will be at managing our systems. Understanding all this just a
little bit better is, in a nutshell, what this chapter is all about.
CHAPTER 5. SOFTWARE INSTALLATION AND PACKAGE MANAGEMENT128
• access of the Master Boot Record (MBR) – a special boot sector found
on the primary boot device; the code found in this location allows the
system to access the file system(s) and transfer control to the second-
stage boot loader
differ across time in features added or removed, but a given version of the
OS will behave the same way anywhere you may encounter it.
Contrast this with the world of Linux, where the definition of a coher-
ent operating system has been replaced with variety in software bundling:
a modifications) with the core libraries from the GNU project and a supris-
ing variety of different add-on software. Some of these “distributions” replace
core system utilities, some remove components but add others etc., until they
behave sufficiently different from one another that they may well be regarded
as unique Unix flavors.
upgrade process, the placement of software within the file system hierarchy,
how system configuration files and software runtime changes are handled
and hopefully gain a better understanding of how to tune and maintain our
systems.
NAME
hier -- layout of filesystems
DESCRIPTION
An outline of the filesystem hierarchy .
SEE ALSO
CHAPTER 5. SOFTWARE INSTALLATION AND PACKAGE MANAGEMENT135
it is not surprising to see a similar subtree reflecting the root file system
under /usr. That is, we find subdirectories named bin, lib, libexec, sbin
and share, with each having a similar purpose as under the root. On many
systems, we also find the /usr/local directory as a specific location for files
that are not part of the base OS. This directory has become the default
location for any “add-on” software, and it is thus not very surprising that it
also mirrors the general root hierarchy to some extent.
Note that /usr contains a subdirectory named share for so-called “ar-
chitecture independent data-files”. As the name suggests, the files in this
location are intended to be able to be shared across multiple system. An-
other directory with a particularly descriptive name is /var, for “variable” or
transient data, known to change at runtime.
These two directories already hint at two major properties of files found
on any Unix system, each with distinct implications on where or how they
might best be organized[2]:
Shareable versus unshareable content: Data files that remain the same
across multiple hosts, and, as the manual page suggests, possibly even across
different hardware architectures are considered shareable. Flat text files or
simple system databases that are not used for runtime configuration such as
manual pages, timezone or terminal information, are good examples.2 This
data frequently remains unchanged throughout day-to-day operations – that
is, it only changes if it is intentionally and specifically updated. Note that
these files may at times overlap with some “local” data (i.e. files not included
in the base OS).
Data files that need to be individual to each host are considered un-
shareable. Sometimes this is due to the fact that the data simply changes
at runtime (such as a sytem’s log files), other times it is because these files
define a unique feature of a given host (most files under the /etc directory
fall into this category). Note that once again these files may at times overlap
with certain “local” data.
Static versus variable data: Files that are not expected to change during
the runtime of the system are considered static. This includes the kernel, all
2
It should be noted that across hosts of a uniform hardware architecture most binaries
are also common candidates for shareable directories, and it was, in fact, not uncommon
for multiple hosts to share /usr, for example. Nowadays, such a setup is a rare exception.
CHAPTER 5. SOFTWARE INSTALLATION AND PACKAGE MANAGEMENT136
system libraries and binaries and many data files. Note that these files may,
of course, be changed when the system is upgraded, for example, but are not
modified under normal circumstances.
Files that are updated by the OS, by any running applications or by
end-users at runtime, are termed variable. That is, we anticipate that these
files will be created, modified, updated, or removed. Some may see fre-
quent or even continous updates (such as the various log files), while others
only see comparatively rare modifications (such as when a system dæmon is
restarted).
Combining these four classes of data yields a matrix of files with common
locations in the file system hierarchy as illustrated in Table 5.1. (Note that
not all static data is necessarily shareable, just as not all variable data is
necessarily unshareable.)
Even though nowadays it is less and less common for multiple Unix system
to share specific directories or mount points (perhaps with the exception of
users’ home directories, which are still frequently accessed over a network
file system), the distinction of variable versus static data is of particular
importance. Carefully separating these kinds of data will allow you to build
systems with separate mount options for these types, which can improve
performance and security significantly. As examples, consider a read-only
partition for system binaries to prevent accidental or malicious writes to the
base OS, or mounting a directory containing system logs using the noatime
or noexec mount options.3
Understanding these criteria also helps make sense of why software providers
choose a certain location for their files. Beware, however, that not all software
3
Reference your mount(8) manual page for details. Many other options are available.
CHAPTER 5. SOFTWARE INSTALLATION AND PACKAGE MANAGEMENT137
5.4 OS Installation
System administrators often maintain large numbers of hosts, and so it comes
as no surprise that new machines – both physical and virtual – have to be
brought up and integrated into the infrastructure on a regular basis. The
more systems are maintained and created, the more this process needs to be
automated, but at the end of the day, each one follows the same set of com-
mon steps. In an ideal world, these steps could be summarized as requiring
the installation of a base operating system with subsequent adjustments for
the specific task at hand.
As we will see in this section, each step depends on a large number of vari-
ables and site-specific customizations, and so most large-scale organizations
have written their own automation framework around a few common tools
to accomplish the goal of quick and scalable system and service deployment.
In fact, the topic of automated deployment tools is too large to adaequately
cover here (though we will revisit it in later chapters) and we shall instead
focus on the essential concepts needed to understand the requirements for
such systems.
Installing a new system is a process unique to each OS; installation meth-
ods range from unattended deployment systems using information deter-
mined at runtime to create the correct configuration to interactive graph-
ical installers (such as seen in Figure 5.2) allowing users to select amongst
common options to create a general purpose Unix server. Understanding
the individual steps included is valuable, and we recommend strongly for
students to perform this task as a learning exercise (see Problems 6 and 7).
Sooner or later, any system administrator will find him or herself trying to
debug a broken installation, and understanding all the steps performed and
what things might have gone wrong in the process are invaluable.
will be installed? What add-on software? What is the final purpose of the
machine?
Many of these questions are interdependent and answering one may re-
strict possible answers to other questions. For example, if the purpose of
the server is to run a specific piece of software that is only available for a
given OS, then this might also influence a very specific partitioning schema
or other file system considerations. In fact, the final purpose of the machine
and the software it needs to run will likely dictate your hardware choices –
processor architecture, amount of memory and disk space, number or types
of network interfaces – which in turn restrict your OS choices.
For example, if you plan on building a highly performant database server
able to handle a large number of transactions per second, you can quickly
identify the number of CPUs and CPU speed needed, the size of RAM to
provide a large in-memory buffer cache, as well as hard drive disk speed and
network throughput requirements, but not all operating systems are able to
handle with equal efficiency the same amount of CPUs, RAM, or perhaps
CHAPTER 5. SOFTWARE INSTALLATION AND PACKAGE MANAGEMENT139
link aggregation, for example. As a result, your database server may have to
run Solaris, even though the rest of your hosts are all running Linux (or vice
versa).
On the other hand, it is possible that the cost of running a different OS
just for one purpose is not worth the benefit of running the theoretically ideal
software. Instead, you might prefer to change your infrastructure architec-
ture and perhaps scale the database horizontally across a number of smaller
commodity hardware systems to meet the requirements. Whichever approach
you choose, these design decisions need to obviously be made prior to the OS
installation itself – changing your OS later on often results in much higher
costs, as services have to be migrated, ported, and verified while the server is
already in production use, a process that has been likened to changing one’s
tires on a car travelling on a busy highway at 100 miles per hour.
Obviously, each of these steps consists of many smaller steps, and how
exactly each of these main objectives is accomplished depends heavily on the
size of the organization, the resources available, and the level of automation
required. In addition, there is a thin line between system deployment and
system configuration. As mentioned, the OS installation always includes at
the very least a few minimal configuration steps, but some of the steps noted
above may be considered part of a configuration management system’s first
run. That is, much like the distinction between add-on software and core
system components, there exists a discrepancy between what aspects of a
server’s configuration are to be defined at installation time and which ones
are dynamically determined at runtime. We will cover this in more detail in
Chapter 7; for the time being, let us merely state that one way or another
CHAPTER 5. SOFTWARE INSTALLATION AND PACKAGE MANAGEMENT141
the system, once completely installed and set up, is configured for its final
purpose.
Magic Deployment
Around early 2007, during my time at Yahoo! Inc., I worked to-
gether with a number of people on a deployment system suitable
to quickly build new Unix systems and bring them up, ready to
serve production traffic. As so many other shops, we, too, chose
to develop our own solution instead of using an existing com-
mercial or open-source system for this purpose. Our setup was “special”,
unique, different from whatever the existing solutions provided (just like
everybody else’s). As a friend of mine once remarked, you’re not truly
a system administrator until you have written your own deployment sys-
tem. Or configuration management system. Or inventory database. Or
parallel secure shell to perform tasks across all of your host. Or...
Anyway, the system we wrote was elegant, and it taught us a lot about
the details of the OS installation process. Our goal was to allow hard-
ware to be delivered to the datacenter, be physically racked and their
asset tags scanned, and then not require any more physical contact to be
deployed, while at the same time allow existing systems to be rebuilt or
decommissioned as needed.
Our final solution used IPMI to power on the host, DHCP to assign a
system profile to the hardware (derived from a central database, where
the system was identified by the asset tag), and a pxeboot process to run
the system through a number of stages. In the initial stage, the system
would power up, identify and apply any pending firmware upgrades, and
then check if it was ready to be deployed. If not, it would go into a special
“inventory” state and power down.
In this phase, the server was configured for its final purpose: the correct
OS and network configuration was installed, software packages were de-
ployed according to the systems’ profiles, user accounts added and finally
the system registered in the central database as being ready for produc-
tion traffic.
We programmed this system from afar, but the first time we went to a
datacenter to see it in action, we were quite impressed how well it worked:
without any physical interaction, we could see racks of servers suddenly
power up, install, reboot and start taking traffic based on a simple hard-
ware provisioning request.
5. Make the system bootable. At some point after the disks have been
partitioned and before the host is rebooted for the first time, it needs to
be made bootable.4 The details depend again on the OS in question,
but generally involve the installation of the disk bootstrap software
in the MBR and the configuration of the first and second stage boot
loader, for example via installboot(8) or grub(1).
6. Install the OS. Finally we have reached the point where the actual OS
is installed. This generally requires the retrieval of the system’s base
packages or archive files from a CD or DVD, via NFS from another
4
The order of this step is not important, but leaving it out will likely yield a server
unable to start up without manual intervention.
CHAPTER 5. SOFTWARE INSTALLATION AND PACKAGE MANAGEMENT144
server or perhaps via FTP from a remote host (an approach which
brings with it a number of security implications – how do you verify
source authenticity and integrity? – and in which case initial network
configuration of the miniroot system is a pre-requisite).
After the data files have been retrieved, they are extracted or the nec-
essary packages installed into the target root device. Many interactive
installers allow the user to select different sets of software to install
and/or will automatically identify additional packages as required.
7. Install add-on software. After the base OS has been installed, any
optional software may be added, based on either the system’s configura-
tion or interactive user input. Depending on the installer, this step may
be combined with the previous step. We explicitly note this as a sep-
arate step, since “add-on” software here may not only include optional
packages from our OS vendor, but also your own software, such as your
configuration management system, third-party applications licensed to
your organization, or your software product or serving stack.
9. Reboot. Finally, after the system has been installed and basic configu-
ration performed, it needs to be rebooted. At this time, the host boots
the way it would under normal circumstances and enters a “first boot”.
This stage in a host’s deployment process tends to be significantly dif-
ferent from a normal boot: a number of services are run for the first
time, and further initial system configuration and system registration
(as noted above) is likely to take place.
Since this initial boot sequence may install software upgrades and
change the runtime configuration, it is advisable to reboot the host
another time after this “first boot”. This ensures that the system does
in fact come up in precisely the same state as it would in the future.
CHAPTER 5. SOFTWARE INSTALLATION AND PACKAGE MANAGEMENT145
Listing 5.2 shows the most basic steps to perform a minimal OS instal-
lation. It is interesting to analyze each of them (and constrast the sequence
for different operating systems), as doing so often illustrates a number of
assumptions made about the use of the software. The ease with which an OS
installation can be automated (at scale!) shows how much the OS provider
has prepared their system for a wide variety of use cases. System defaults,
such as what services are enabled and started by default, speak volumes about
general design philosophies. Similarly, the selection of software bundled with
the OS reflects on the target audience for a given OS.
Deploying a new host requires more than just installing an operating sys-
tem, however. As we noted earlier, the system needs to be incorporated into
our larger infrastructure, which may entail updates to a number of invento-
ries or databases that keep track of an organizations assets or the various
hosts’ purposes. This integration is necessarily very different from site to
site and unfortunately not (yet) suitably abstracted into an API that an
installer might integrate with. As a result, most system administrators do
end up writing their own deployment system and/or helper tools, calling
into the various operating systems’ specific installers and adding their own
customizations.
One of the many customizations included in any software deployment
system includes the extent to which add-on software is controlled. Getting
the right software onto the machine is part of the installation process, but
there is more to managing applications and configuration files than copying
files from one place to another. In the next section, we will take a closer look
at this topic.
Now all of these are good reasons to compile the software from source, and
system administrators routinely do build applications “by hand”, tuning the
configuration and specifying preferred installation prefixes, but this step is
only the prerequisite to finally packaging the software such that your systems
can be managed using a small set of tools.
It is imperative to not consider the process of downloading and compil-
ing software without the use of a package manager a reasonable software
management or deployment solution. We cannot maintain a coherent and
reliable system in this manner: when the time comes to update or remove
the software, there will be no telling what other system components might
be relying on its presence in this particular version.
• Easy package installation. The most obvious and frequent use of the
package manager is to install software. These tools usually allow you to
only specify a name, and they will fetch all required files (including pre-
requisites) from the internet, your intranet, or a local storage medium
and install all the files with the right permissions in the right places.
Complete !
$ rpm - ql screen
/ etc / pam . d / screen
/ etc / screenrc
/ usr / bin / screen
[...]
$ rpm - qf / usr / share / man / man1 / screen .1. gz
screen -4.0.3 -16. el6 . x86_64
$
Requires :
gtk2 + >=2.24.12 nb3
glib2 >=2.32.4 nb1
pcre >=8.30 nb1
lua >=5.1.4 nb1
libsmi >=0.4.5 nb1
libgcrypt >=1.4.6 nb2
gnutls >=3.0.17
Listing 5.4: Example invocations of the pkgsrc package manager and related
tools to visualize package dependencies. Figure 5.3 was created using these
commands.
has been announced. Even if a fix has not yet been released, this
information is invaluable, as often you are able to take other measures
to mitigate the risk.
integrate binary packages into our service inventory and deployment system
(“hosts functioning as web servers will get the apache-2.4.3 package in-
stalled”) and software upgrades or patches can consistently be applied using
the same set of tools; we significantly simplify virtually all operations that
require a lookup of what files are present on which hosts, what version of
what software is running at a given time and how to perform upgrades with
both simple or complex dependencies; by tracking all software versions and
the files the packages contain, we can easily check for known vulnerabilities
and quickly patch the software.
None of this is possible without the use of a package management system,
but these benefits come with a price: the packaging system must be used
consistently for all software. The act of adding a piece of software outside this
mechanism opens up the possibility for a broken dependency graph and often
triggers future “forced” installations of packages (i.e., overriding warnings
about unmet dependencies, since the packaging system has no way of knowing
what we installed outside of its catalog) and finally the loss of a coherent
inventory.
Hence it is crucial to be consistent and perform the at times tedious and
frustrating work of (re-)packaging software in the format that best integrates
with your deployment and management solutions. In addition – and this
is something that I have seen neglected all too often in the past – we need
to make sure to also build packages for all of the software that we do not
retrieve from other sources. That is, when we build software, we have to
package it up and integrate it into our software inventory. As we will see in
Chapter 9, system administrators write a lot of software: we write small and
simple tools just as we write large and complex infrastructure components;
all of these need to be installed, updated, maintained. Our own software is
no different from software provided by others, and so we owe it to ourselves
to package it properly as well.
Figure 5.3: Dependency graph for the “Wireshark” package, a popular net-
work protocol analyzer. Package management systems help track such intri-
cate dependencies.
effectively let it fetch and execute unknown code on our systems. That is,
not only do we trust the package manager locally, as well as the software
author(s), we also (and implicitly) trust the people who run and maintain
the package repositories. These repositories are managed by system adminis-
trators like ourselves – mere mortals, as highly as we may think of ourselves
– and every once in a while mistakes are made or accounts compromised,
putting at risk users of the software hosted there worldwide.7
Then what is the solution to this problem? Well, unfortunately there
simply isn’t a solution. We can build more and more complex systems that
provide better authenticity and better assurance of integrity, but at the end
of the day, we still have to place trust into systems built and maintained
by others. Ken Thompson’s Turing award acceptance speech “Reflections on
Trusting Trust”[4] cannot remain unmentioned in this context. If you are
not familiar with this short paper, look it up on the internet – it’s Zen-like
moral will change how you regard system security.
Despite our having to trust so many components outside our control, and
despite the fact that we cannot eliminate all risks involved, it is important
that we are aware of them. A good package manager will provide you with
reasonable options to automatically determine updates from trusted sources
and optionally even apply them. This requires a significant reliance on the
trustworthiness of the package manager, and only when we understand the
risks can we make informed decisions. With this understand, perhaps we’ll
be even a little bit more careful and observant about how and where our
package management system retrieves its data from or what other resources
we allow it to use.
new elements into your environment and the probability of unexpected side-
effects increases the larger the change set is.
“If it ain’t broke, don’t fix it.” has become many a system administra-
tor’s mantra, meaning that unless you have a very good reason to change
something on a system that is currently running and not experiencing any
problems, you shouldn’t do so. This simple lesson needs to be learned over
and over again by every system administrator, and it appears it cannot be
learned without a modicum of self-inflicted pain.8
The benefit of a package manager’s dependency resolution has the often
undesired result that a number of pre-requisites or dependencies further down
the graph need to also be updated when a simple software change to a single
package was desired. Therefore, all software updates need to be treated with
great care.
The conservative stance hence remains to leave your system running as is
instead of chasing every new software release, every minor OS update or even
to not bother applying patches if a software vulnerability appears to not di-
rectly impact you. Unfortunately, this will make it the more likely that when
an upgrade finally is required, it will pull with it a significant number of
other changes. There is no one-size-fits-all solution to this problem; every
system administrator has to make these decisions for themselves according to
their specific environment. Until experience allows you to confidently make
this call, it is probably wise to consider any known security issue to at least
warrant very careful analysis of the possible impact.
applications using this library – in order for the change to take effect,
these applications would need to be re-linked.
As you can tell, the above considerations suggest that determining when,
or even if, you can upgrade software is far from easy and requires an intimate
understanding of all the details of the running system. Even though you can
10
This brings with it the problem of system-wide atomicity of software upgrades: it
is generally not advised to run a given service using different versions of their software
across multiple systems. Upgrading all hosts at the same time, however, carries the risk
of breaking functionality (not to mention overall service down time). Deploying such
upgrades in a controlled manner across large numbers of systems is beyond the scope of
this chapter; in fact, it easily warrants an entire book by itself.
CHAPTER 5. SOFTWARE INSTALLATION AND PACKAGE MANAGEMENT158
5.7 Conclusions
In this chapter we have taken a close look at how software is installed on a
system. As we reviewed the boot process, we distinguished between differ-
ent types of software, and identified a few distinct categories: firmware and
device drivers, the kernel managing all hardware and interfacing with the
drivers, a number of core system components such as essential system tools
and libraries, basic applications and utilities required to make the system
usable, and finally entirely optional add-on applications. The distinction be-
tween these categories has proven to be difficult to make, which is why we
have further divided types of software by their place in the file system hier-
archy into the four categories of being “shareable”, “unshareable”, “static” or
“variable”.
We have looked at the steps required to install an operating system and
discussed the need for a package management solution to keep track of all files
on the system, as well as the implications of any required software upgrades.
A package management system, a requirement for addressing a number of
problems when maintaining large systems with many pieces of software in-
stalled, does not come without its price, however. As we have stressed, it is
important to make consistent use of the chosen package manager, but this
has become increasingly difficult: on the one hand, configuration manage-
ment tools are able to manipulate files “owned” by the package management
system; on the other, not all software you need to install is available in the
given format.
Building your own packages from sources is laborsome. What’s more,
it can be frustrating, as you are duplicating work already done by others
to some extent. This holds especially true with respect to many program-
ming languages and the many smaller components or modules available for
them. Almost every modern programming language nowadays includes its
own preferred way of installing language modules (including the search for
and retrieval of the “right” version from an online repository). There is an
old joke in which people lamenting the existence of too many competing
standards get together to create the one true solution only to end up with
CHAPTER 5. SOFTWARE INSTALLATION AND PACKAGE MANAGEMENT159
just one more competing standard with limited adoption. We can see this
holding painfully true when we look at how different language developers
have all solved the same set of problems (slightly) differently:
Perl has used its CPAN infrastructure for decades; Python uses its pip in-
staller; software written in the Ruby language is often provided as so-called
“gems”; Node.js uses its own package manager, called npm; PHP applications
or extensions are distributed using the PEAR framework and so on. Each of
these tools allows a package to easily express its requirements on other lan-
guage components, but none of these integrates really flawlessly with other,
non language-specific package managers. People using the RedHat Package
Manager (RPM), for example, end up having to re-packaging these modules
or risk breaking their package consistency model.
Others choose to let yet another tool control the deployment of their
software: More than once did we have to note that defining what software
packages or versions need to be installed on a system is not a one-time act per-
formed at OS installation time. Instead, controlling what software is added
or removed, updated, activated or in other ways manipulated is part of an
ongoing process and easily becomes part of the larger topic of Configuration
Management. In fact, many configuration management solutions – systems
that apply software configuration changes at runtime – include functionality
to either require software packages to be present or are even able to install
them as needed. Let us take a closer look at this aspect in one of the next
chapters.
Problems and Exercises
Problems
1. Observe the boot process of various systems in your environment. What
kind of firmware, boot loaders and system software is involved? Do you
have different boot options leading to a running system with a different
kernel or support for different features?
3. Review the hier(7) manual page on your preferred Unix version. Iden-
tify which components are specifically designated for single- and which
for multi-user operations of the system. Does the manual page hint
at a separation between shareable and unshareable, static and variable
data? Compare to two other Unix systems.
160
CHAPTER 5. SOFTWARE INSTALLATION AND PACKAGE MANAGEMENT161
(a) List all packages installed on your system with version numbers.
Identify packages that have other packages depend on them as well
as “leaf” packages (those without any other package depending on
them). Which package has the most dependencies, which one is
the most depended-upon?
(b) What are the commands to install a piece of software? To delete
a package? To upgrade a package? To downgrade a package?
(c) What are the commands to list the files installed by a given pack-
age? Given a file name, how do you identify what package it
belongs to?
CHAPTER 5. SOFTWARE INSTALLATION AND PACKAGE MANAGEMENT162
10. Identify a piece of software you commonly use, but that is not packaged
for your primary platform. Create a package for it, then contribute your
changes back to the original author(s) of the software.
Bibliography
[1] Abram Silberschatz, Greg Gagne, Peter B. Galvin Operating System
Concepts, John Wiley & Sons, 2011
[3] Evi Nemeth, Garth Snyder, Trent R. Hein, Ben Whaley UNIX and Linux
System Administration Handbook, 4th Edition, Prentice Hall, 2010
We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:
#1) Respect the privacy of others.
#2) Think before you type.
#3) With great power comes great responsibility.
– Output of sudo(8) upon first invocation.
6.1 Introduction
As was noted in Section 2.2.3, Unix was designed as a multi-tasking, multi-
user system, allowing simultaneous access by different people. This concept,
also found in all of today’s mainstream operating systems as well, demands
important safeguards to prevent both accidental as well as intentional (i.e.
malicious) access by unauthorized parties. Since this property affects virtu-
ally all aspects of system administration – ranging from account management
to the way different software components interact with one another – it is
important to fully understand the implications of the nature of multi-user
system.
In this chapter, we will discuss the different types of users and groups as
well as talk about user authentication on our systems. Note that even though
there are many parallels to authentication by remote clients against a service
we may offer (such as users logging into our website), in this chapter we
restrict ourselves to the “local” systems and how we access them internally.
Even if you are comfortable with the Unix multi-user model, you may still
164
CHAPTER 6. OF USERS AND GROUPS 165
find that explicitly identifying its inherent requirements and trust models
will help in formalizing your access control requirements so as to express and
apply them using your configuration management system, as discussed in the
Chapter 7.
more people to a given account (Figure 6.1), as well as a single person with
access to multiple accounts. From the system’s perspective the differences
are irrelevant (all process privileges are granted based on the UID), but as
system administrators, we have to treat each class of users differently. As
a general rule of thumb, we want to ensure, for example, that only those
accounts used by people have an interactive login session. All other accounts
should be restricted such that they can only execute specific commands or
access specific files3 . This becomes difficult quickly especially when using role
accounts, as often a certain set of interactive commands need to be allowed.
Care must be taken to correctly identify and enforce the restriction to just
these commands.
If you look closely at the example password file shown in Listing 6.1, you
may notice that in addition to the root account there is a second account
with UID 0, called toor. This account, often found on the BSD derived Unix
variants, does in fact offer a second superuser account: it is usually given a
different login shell to provide for a way to repair the system in case root
is unable to log in (for example due to a broken login shell as a result of
a software upgrade). This illustrates that it is possible for a Unix system
to have multiple usernames mapping to the same UID, though it should be
noted that generally speaking that is not a good idea and likely to be an
error.
in some capacity, meaning they will wish to share read and write access to
certain resources, or require similar execute privileges, for example to start,
stop, or otherwise control a specific service.
In a small commercial environment, such as a start-up, the good will
and intent to collaborate for all users of your systems is implied. In such
environments all users do frequently require the same or at least similar
access to all systems, as responsibilities are shared and job duties overlap.
As a result, you will find few distinct groups here, and getting access to one
system tends to simultaneously get you access to all systems.
As the environment grows in size, however, you will find that you need
to group accounts more carefully. For example, all developers on a given
software project require access to systems on which to build the software,
write access to the central source revision control repository and the like,
and so are grouped together; they do not require – nor should they have
– access to the systems processing credit card payments or which contain
payroll information.
In even larger companies, you may find that groups sharing access to the
same set of servers may have competing resource requirements, even if we
assume that every user has been given an account for the ultimate benefit
Figure 6.1: The set of users mapping to the set of account names is neither
bijective (or “one-to-one”), nor surjective (or “onto”): some accounts are not
used by an actual user, while others may be used by more than one user.
Users are associated with local groups on a given host, and group membership
may imply specific privileges across a set of hosts.
CHAPTER 6. OF USERS AND GROUPS 169
In contrast to these use cases, now consider the different groups of users
present in an academic environment, such as a large university. Access to
computing resources is made available to students, faculty, staff, and possibly
visiting researchers. Even disregarding any outside compromise, you have at
times almost directly conflicting privacy requirements: students should not
be able to access faculty resources or files (and yet may have an explicit
interest in doing so!), but students working as teaching or faculty assistants
sometimes should; student homework assignments done on shared servers
should not prevent research jobs from running; faculty collaborate with each
other on some projects, with outside researchers on others.
Finally, if you are maintaining public systems for an Internet Service
Provider (ISP), you may find yourself trying to manage hundreds or thou-
sands of completely unrelated user accounts with little to no collaboration
amongst them.
Group definitions exist in every organization, and are often drawn outside
the system administrator’s purview; it is not uncommon for computer sys-
tems to inherit a group structure by mapping the organization’s functional
hierarchy into different user groups. However, sooner or later you will run
into limitations and require more fine-grained control. A careful analysis of
the different types of users, the different groups based on requirements and
needs, and a directory service that allows you to easily create (possibly in-
tersecting) sets of users are needed to allow your systems to grow with and
adapt to the needs of whichever type of environment you are in. We will
get back to this concept of grouping users and mapping them to similarly
grouped sets of hosts in our next chapter.
And then there’s the start-up founder, who not only routinely tinkers
with the OS kernel and system configurations, but oversees the hardware
allocation requests in what has become an internet giant with tens of
thousands of hosts in data centers across the globe:
Every system administrator I know has a similar story to tell, only usu-
ally they end with “Well, he doesn’t need access, but he’s the boss, so...”.
user system, let us spend a little bit of time to make sure we understand what
exactly happens. As we will discuss in more detail in Chapter 11, when we
are authenticating a user, we are ensuring that user is who they claim to
be. More precisely, we are verifying that whoever is identifying themselves
as user “alice” can in fact provide the credentials which we assume only Alice
to have access to.4
More often than not, the credentials used to authenticate at login time
are simply a password of often pitiful complexity. Much like you may have
not let your sister (or her friends) climb into your tree house unless they
knew the secret passphrase, the Unix system will not let you log in without
you entering the correct string of characters.
It is important to understand at this point that the Unix system does
not actually compare the string you entered to the string it has stored in a
database, but that instead it operates on password hashes. That is, the string
you entered (together with a few bits of additional data known as the salt)
is transformed via a one-way function that produces a fixed-length string of
(different) characters, which are then stored and compared against at the
time of authentication. Since this transformation is a one-way function, it
is impossible for anyone to reproduce the original password from the hash –
an important cryptographic property of the function used. If this password
hash matches the one on record, access is granted, and you are logged in.
From that moment on, regular Unix semantics and permissions will apply to
decide whether or not access to a given resources (e.g. the ability to write to
a file) is granted.
eyes, which is why our Unix systems no longer store them in the world-
readable /etc/passwd database, but instead in a separate, protected file
(such as /etc/master.passwd on BSD derived systems or /etc/shadow on
many System V derived Unix versions).
But, and this is one of the problems when using passwords for local au-
thentication, this data has to exist on all the hosts a user wishes to log in
on. That, in turn, means that if a single host in the environment is com-
promised, the attacker gets their hands on all users’ password hashes. Once
retrieved, they can then perform so-called offline dictionary attacks on the
password hashes or look them up in widely available rainbow tables, large
pre-computed mappings of common strings to their hashes.
The solutions, or rather, the efforts to at least partially address these
issues include the use of a password sal, the previously mentioned small
amount of data added to the user’s password prior to the application of the
hash function, thereby yielding a different hash on unrelated systems despite
the password being the same and thus defeating simple rainbow table look
ups.
Another approach used frequently is to not make the password hash lo-
cally available on each host and instead rely on an authentication system
where the actual password hashes and additional user information is stored
in a central place and is accessed over the network. The Lightweight Direc-
tory Access Protocol (LDAP) is an example of this approach. However, it
should be noted that the benefits of a central location of this information
carries a certain prize, as well: if the host(s) providing this service becomes
unavailable, thousands of hosts can become unusable or see debilitating er-
rors. This problem can of course be defined in a more general statement:
as an environment increases in size, relying on a central service poses an
increasing risk of becoming a Single Point of Failure (often abbreviated as
“SPOF”), while distributing data across thousands of servers poses its own
data replication and synchronization challenges.
What’s more, passwords are an inherently insecure means of authentica-
tion, largely due to human nature: people, in general, are bad at remembering
complex passwords (which are harder for computers to crack) and hence tend
to use and reuse across different sites a small set of a simple passwords This
means that accounts on your systems may get compromised by password
leaks in another, completely different and independent environment!
Many solutions or improvements to this dilemma exist, ranging from
multi-factor authentication protocols to randomly generated passwords stored
CHAPTER 6. OF USERS AND GROUPS 173
6.5 Summary
Even though taken for granted nowadays, the nature of a multi-user system
has had from the beginning a number of important implications for overall
system security and operational procedures. The impact of these implications
grows near exponentially as your environment scales up, which is why we
make a point of identifying them explicitly in this chapter.
Users fall into a number of well-defined categories, or types of users. In
particular, we distinguish for good reasons between user accounts used by ac-
tual humans and so-called system accounts. Different users (of either kind)
have at times conflicting requirements and impose very specific trust models
on the environment. Access privileges need to be defined and enforced, and
authentication methods need to be considered. We briefly mentioned pass-
words as the most common form of authentication and noted a few of the
problems associated with them; we also covered some of the implications of
sharing root access with your peers.
More generally speaking, though, we looked at what it means for a system
to support multiple users. System administrators with experience managing
deployments in diverse environments are able to recognize and apply these
general rules:
6
Most Unix editors, for example, allow a user to invoke a shell; dynamically linked
executables can be tricked into loading custom libraries; the possibilities to exploit access
to a small number of tools running with superuser privileges are too numerous to account
for.
CHAPTER 6. OF USERS AND GROUPS 175
• Some users are more equal than others. While all users’ needs should be
addressed, there are, in any system, some users who have more specific
requirements; who need certain elevated privileges; whose computing
demands exceed those of others.
• All users are to be given precisely the access rights they need, but no
more. The principle of least privilege needs to be rigorously applied, as
any one account may become compromised, and the possible damage
deriving from this scenario needs to be limited as much as possible.
Due to this shift in scale, a distinct trend away from managing user access
on an individual host basis and instead shifting towards a Service Orchestra-
tion model has formed in the last couple of years. In this world, interactive
logins to any single host are unnecessary and imply a systemic failure, as
unavailable hosts should automatically be taken out of a production-serving
rotation and overall load be distributed to the remaining, working hosts.
Services are run as so-called system accounts unassociated with actual
users. Nevertheless, regular multi-user semantics apply. The principles of
how access to any given resource is granted, how services (or users) are au-
thenticated, and how a clear separation of privileges provides the foundation
for overall system security is not different when applied to user accounts that
end up being mapped to humans versus those that are not.
Problems
1. Review the passwd(5) manual page and make sure you understand
what each field is used for. Is this password database used on the
systems you have access to, or is authentication done by way of a central
system, for example via LDAP? If so, what additional information can
you find in this system?
2. Review the accounts present on your systems. How many of these are
system accounts, and how many are user accounts? What different
types of users can you identify? Are there any role accounts?
4. Identify whether or not sudo(1) is used on the systems you have access
to. Can you find out which users have which privileges? Which, if any,
commands can you think of that might be dangerous to allow untrusted
users to invoke? Try to think of non-obvious ways to circumvent the
given restrictions.
6. Search the internet for a list of the most popular passwords in use (such
as, not surprisingly, “password”).
177
CHAPTER 6. OF USERS AND GROUPS 178
(a) Generate hashes for each password using the following digest al-
gorithms: DES (as used by the Unix crypt(3) family), MD5 and
SHA1. Can you find the resulting strings in any rainbow tables
on the internet?
(b) Repeat the previous exercise, but add a salt to the password.
What do you notice about the results?
Bibliography
[1] Æleen Frisch, Essential System Administration, O’Reilly Media, 2002
[3] Evi Nemeth, Garth Snyder, Trent R. Hein, Ben Whaley UNIX and Linux
System Administration Handbook, 4th Edition, Prentice Hall, 2010
Chapter 7
Configuration Management
7.1 Introduction
As much as we would like them to be, computer systems are not static: files
are created, modified, or removed; users log in and run commands; services
are started or terminated. In addition, the requirements of the systems,
dictated at least in part by evolving business needs or emerging technologies,
are changing all too frequently as well. This leads to new software being
added, patched or upgraded; user accounts are added or removed; jobs are
scheduled or their frequency changed; interactions with other systems are
enabled or prevented. In other words, our systems do continuously undergo
change.
On a single host, such changes are made by local modification of system
configuration files, invocation of specific commands, and the installation or
tuning of different applications. As we configure our machines, we may create
detailed documentation about how to set up a given service, and the more
systems we have, the more often we have to repeat the same steps to configure
them, to upgrade software, or to rebuild them when they inevitably fail.
Updating documentation to reflect the changes we may have made after the
latest software update is tedious and error prone – it would be much easier
to initially identify the changes, document them, and then have them be
applied so that our hosts’ configuration reflects the documentation, not the
180
CHAPTER 7. CONFIGURATION MANAGEMENT 181
tomate configuration changes across both small and large sets of hosts. But
don’t fret – the learning opportunity provided by writing your own CM sys-
tem is not lost: even the most advanced solutions often require a fair amount
of customization to be integrated into your environment as you scale up.
Despite being easily dismissed as not flexible enough for practical appli-
cation, many of the recommendations or best practices outlined in ITIL
do underly or correlate to the operations of modern CMs. For example,
large scale configuration management has evolved to effectively imple-
ment or require a repository of all information about the various systems
in an organization, the Configuration Management Database (or CMDB).
Should you choose to read up on this topic, you will hopefully be able to
relate those documents to a number of concepts discussed here.
often used to convey both their uniqueness and fragility – is painful and time
consuming. On the other hand, if we have a well-defined procedure to build
an authentication server, we can easily replace it at any time with minimal
effort.
also identify requirements that will be applicable to a very large set of hosts,
quite possibly to every single host in our environment.
Let us consider the use case of account management in an environment us-
ing a central LDAP directory. We need to ensure that every host is configured
to authenticate local access against the central service. Our high-level config-
uration steps include the installation of the LDAP client package, changes to
the authentication module’s configuration file to accept (and require) LDAP
authentication, and a configuration file pointing to the central LDAP server.
Listing 7.2 illustrates the translation of these steps into a Chef “recipe”.
As before, all of these steps apply to all different operating systems or
OS flavors. What is different about this example is that it may well apply
to other hosts, including the previous syslog example. It is useful to identify
each of the specific cases and define them independently instead of trying
to create monolithic definitions for every single host. In particular, we often
define a “base” configuration for all of our systems, including default packages,
security settings, authentication methods and the like; individual service
definitions depend on this base configuration and bring in new requirements
applied to only those hosts providing the given service.
7.3.3 CM Requirements
Different CM systems allow you to specify your service requirements in differ-
ent ways. Due to their particular evolution, choice of programming language,
and internal architecture, these differ significantly in detail, but exhibit con-
ceptual similarities. Effectively, each CM system uses its own DSL, and you
will have to get used to the proper syntax to express your requirements.
Looking at the previous two examples, we can identify a number of re-
quired concepts in configuration management systems. Generally speaking,
the following (OS agnostic) capabilities are required:
Software Installation
The CM needs to be able to install software on the hosts it manages. More
specifically, it needs to be able to assure that software of a given version
is installed (or perhaps not installed). It does not need to duplicate the
capabilities of the package management system – rather, it relies on the
package manager as a tool to accomplish the desired result.
CHAPTER 7. CONFIGURATION MANAGEMENT 187
Service Management
The CM needs to be able to define which software services are supposed
to be running on a given host. A machine functioning as a web server,
for example, had better be running an HTTP dæmon. In order for some
configuration changes to take effect, this dæmon may need to be restarted,
and in order to make certain other changes, one may need to (temporarily)
shut down a service.
As starting, stopping, restarting and generally supervising running pro-
cesses is both software- and OS dependent, the CM may fall back on the
package manager or system provided service management mechanisms, such
as /etc/rc.d or /etc/init.d scripts, as well as more modern frameworks
such as Solaris’s “Service Management Facility” or Mac OS X’s launchd(8).
context in order to take into account that particular system’s full state to
apply the correct settings. However, when managing more than just a few
hundred hosts it actually becomes reasonable (and less error prone!) to sim-
ply have configuration management dictate the state on the host completely.
While this requires the CM to have full knowledge of a lot of the target sys-
tems’ state, this reduces the probability of any errors or unpredicted events
interfering with or changing the state of the host, yielding undesired results.
Nevertheless, it is a common requirement to be able to generate a host-
specific configuration file on the target system, which is why all CM systems
do provide this functionality. See Listing 7.3 for an example of a change
description that dynamically expands a template on the host in question,
conditionally restarting the given service if (and only if!) the resulting file
was modified.
Command Execution
The CM needs to be able to run a given command. This is about as generic
a requirement as we can define, but at the same time it is both one of the
most important as well as one of the most dangerous. In order to perform
system configuration, the software needs to run with superuser privileges,
so any command it may run could have disasterous consequences (especially
when run on every single host in your organization).
The majority of the commands a CM needs to run are well-defined and ab-
stracted within its DSL. That is, we can express the desired outcome without
detailed knowledge of the implementation, the commands that are actually
executed on the target system. For example, we ensure packages are installed
not by running the actual package manager commands, but simply by using
the DSLs expression, such as require => Package[’logrotate’].
Still, there will always be cases where we need to run an arbitrary com-
mand. By allowing the system administrator to centrally define commands
to execute on groups of hosts, the CM allows for powerful control of large
groups of hosts across your entire infrastructure.
– David Wheeler
7.4.1 States
In the previous section we talked about making changes to a running system,
but upon further consideration, we are not so much interested in making
changes, as we are in the results of having made such changes. Some of
the changes we have defined may not be necessary on a given host. What
we really care about is the current state of the system. That is, we want
to answer the question of whether the host is configured correctly, and then
apply only those changes necessary to get it to that state. The CM’s primary
goal is therefore asserting state; making changes to a system just happens to
be the method by which this is accomplished.
We wish to control the effects of (software) entropy on the systems such
that it is consistently brought back into a well-defined and desired state.
Throughout its lifetime, a host may go through a number of distinct states
as illustrated in Figure 7.1:
Unconfigured
A host in this state does not have a CM system installed or the installed
CM has never run. The most common example of hosts in this state is new
hardware that does not yet have an OS installed. Large environments often
have pools of new hardware delivered, racked, and set up in their data center,
awaiting allocation. Such systems are frequently in this unconfigured state.
Configured
The CM has run successfully and applied all required changes for the given
system. All required packages are installed, configured properly, and all
services on the host are running. The system is ready for production.
In Service
The host has been put into production. That is, it accepts traffic, provides
the service for which it was configured, and is relied upon; it is in active
use. Technically speaking, this is a subcategory of the “configured” state;
a host may be put into service by other systems than the CM, but only a
“configured” host may enter this state.
CHAPTER 7. CONFIGURATION MANAGEMENT 191
Out of Service
The system has explicitly been taken out of service; it is no longer serving
production traffic. Unlike the following states, this is a well-defined and
known state. The CM is running and may even have placed the system into
this state as a reaction to a system or network failure, possibly outside of this
host. That is, this state is as well a subcategoriy of the “configured” state.
The host may be taken out of service by another system than the CM.
Deviant
The host is no longer in the desired configuration state. Entropy has taken
its toll: changes made to the system either manually or as a side-effect of the
traffic it takes, from individual users or possibly even through a mistake in
the CM state model itself have caused a faulty configuration. In this case, the
Figure 7.1: Different states a host may be in. The CM tries to counter the
effects of Entropy, while Service Orchestration controls whether a host takes
traffic. Monitoring of the systems may allow us to discover an “unknown”
state or trigger a removal from service, while rebuilding a host allows us to
“start from scratch”.
CHAPTER 7. CONFIGURATION MANAGEMENT 192
Unknown
The CM may have stopped running on the host or may erroneously be ap-
plying the wrong configuration. The host may have been shut down, the
network disconnected, an intruder may have taken over control, or rats may
have gnawed through the power cable. We simply don’t know. What’s worse,
we may not even know that this host is in an unknown state! Not all failures
are immediately obvious.
7.4.2 Sets
Running any CM requires system administrators to create and maintain a
complex model of how services are defined, what changes are required to
enable or disable a given component, and to keep track of all of your systems
in different environments with different requirements. The key to solving
CHAPTER 7. CONFIGURATION MANAGEMENT 193
this puzzle is to simplify: we take a step back and attempt to identify the
essential logical components of the system.
In the previous section we have classified the different states that our
systems may be in, and we have said that the role of a CM is the assertion
of a given state. But before we can express the changes required to bring a
host into a given state, we need to have defined the role of each system.
Different CMs use different terms in their application of several concepts
that we borrow from Set Theory: each solution allows for the grouping of cer-
tain resources that can be combined using unions, intersections and the like.
For example, sets of changes or service definitions may be called “manifests”,
“promises”, or “recipes”, while sets of hosts may be referred to as “roles”, “node
groups”, or they may be defined through the use of “attributes”.
I’ve found it useful to remember the visual description of defining these
sets as “drawing circles around things”: this is quite literally what developers
and system administrators often end up doing on their whiteboards. Looking
at these whiteboards or state diagrams immediately conjures the Venn Dia-
gram, an easy way for us to visualize relationships between different groups.
As we define services as a set of changes, or hosts as being grouped to-
gether by certain attributes, we can build an inheritance model using further
and further abstraction of common properties into subsets. By performing
set operations such as unions, intersections and set differences, we gain sig-
nificant flexibility in determining common and service specific definitions.
Let us take a look at the different kinds of resources we might draw circles
around:
Sets of changes
Moving beyond maintenance of a few individual hosts, we realize that changes
made on our systems fall, broadly speaking, into two categories: those that
need to be applied to all systems (as might be the case illustrated by our
LDAP Client Configuration example from Section 7.3.2), and those that need
to be applied to only a subset of systems, possibly a subset of one. This
realization is fundamental: we begin to view system configuration no longer
as a task performed on an individual host, but as defining the changes that
may need to be applied.
In addition, there are changes that are made to all system in exactly the
same way (e.g. the /etc/sshđ_config file is updated on all hosts to disable
PasswordAuthentication) as well as changes that are made taking specific
CHAPTER 7. CONFIGURATION MANAGEMENT 194
properties of the target environment into account (e.g. the use of a different
default route or DNS server depending on the network the system is in). The
astute reader will recognize parallels to the four classes of data we identified
earlier: “static”, “variable”, “shareable” and “unshareable” – compare Table
5.1).
It is important to be consistent in the application of this model: even a
single service running on a single host should be well-defined and explicitly
described as such a set. This allows us to easily recreate the service in case of
emergency, when more resources are added, or when an additional instance
of the service is created, for example in a new data center.
Figure 7.2 shows how different change sets can be combined via an inher-
itance model to help define more complex services.
Figure 7.2: An illustration of how abstract change sets help define different
services. Common modules are included in more complex definitions before
being applied to groups of hosts. Some “base” modules (ssh and ldap in this
example) are included on all hosts.
CHAPTER 7. CONFIGURATION MANAGEMENT 195
Sets of hosts
Any host may, at any point in time, perform multiple roles, offer multiple
services, or meet multiple requirements. Grouping hosts according to their
properties, attributes, or functionality makes it possible to apply the previ-
ously identified sets of changes. But groups of hosts are not only defined by
the services they provide; you can also categorize hosts by different criteria
even within a single service definition. It is common (and good practice) to
have for each service a small number of hosts – some of which do not take
any production traffic – on which to test any changes that we plan on rolling
out. Many organizations use the terms “dev” (development), “qa” (quality
assurance), “testing”, “staging” or “canary”, and “prod” (production) for the
different stages of software development. Likewise, it may be useful to group
hosts by geographical location to allow for a carefully staged deployment on
a global scale.
Software deployment – an inherently risky business, as we are willfully
introducing entropy into a running and stable system – can thus be carefully
orchestrated through the use of a CM and clearly defined roles.
In fact, CMs themselves usually allow you to branch your changes in the
same way that a software development project may track different develop-
ment efforts before merging changes back into the main line. In order to take
advantage of this approach, divide your hosts into sets that are on a different
branch: a relativey small number of hosts would receive all changes imme-
diately, allowing the system administrators to rapidly test all their changes;
a larger host sample (ideally including a cross section of all possible host
group definitions) should follow the “staging” branch to which changes which
need to be tested would be deployed. All remaining hosts track the “stable”
or “production” branch, receiving changes only after they have gone through
the previous stages and found to be without errors.
See Section 7.5.1 for a more detailed description of this approach.
Sets of users
Finally, account management is perhaps the most obvious application of the
principles of set theory within configuration management.
As noted in Section 6.3, user accounts may be grouped together by their
required access privileges. Regardless of whether user accounts are managed
centrally via a directory service or whether individual accounts are explicitly
CHAPTER 7. CONFIGURATION MANAGEMENT 196
Development
These hosts serve as the playground where all prototypes are initially set
up as proofs of concept before being properly packaged and turned into a
defined services. All new changes are initially developed and tested in this
environment. Hosts in this role are often in a state of flux, and it is not
uncommon for a system administrator to break a service in the process of
testing a new configuration management module. For this reason, these
systems do not serve production traffic.
Test
A small number of hosts on which to perform end-to-end tests after initial
development make up the test environment. Once we have defined a new
service, created a new package, or made any other changes to our CM, we let
them be applied in this environment. The most important difference to the
“development” environment is that we do not perform any manual changes
here: everything goes through the full configuration management cycle. This
CHAPTER 7. CONFIGURATION MANAGEMENT 198
allows us to make sure that the module we put together does in fact include all
required changes, can be applied by the configuration management software,
and does not cause any obvious problems.
In a heterogenous environment it is import to ensure that all major op-
erating system versions are represented in this group.
Pre-Production
Once testing has determined that the changes we plan to roll out do indeed
yield the desired state, we can push them into the pre-production environ-
ment, sometimes referred to as “staging”. This group consists of a represen-
tative sample of all production serving hosts, including different hardware
configurations and different operating systems. For each major service, a
sample of hosts are placed into this role to ensure compatibility of any new
changes within the given software stack.
Systems in this role do not usually take full production traffic, but may
be used internally to test your products. In some environments, a small
percentage of actual production traffic is shifted to these systems to ensure
that all crucial code paths that might be encountered once the change is
deployed are executed.
Often times we define automated checks and tests that run against these
hosts to ensure no unexpected side effects were introduced by the changes
we made.
Production
All hosts serving production traffic or providing a crucial (possibly internal
only) service. Any changes made here should have gone through the previous
stages, and in some cases may require explicit “change management” prac-
tices, including advance notification of all stakeholders, service owners, and
customers.
Since this group of systems can range in size from only a handful of
hosts (for a particular service) to literally hundreds of thousands of machines
(providing multiple services), deployment of any changes is inherently risky
and needs to be coordinated carefully. Especially in very large environments
these changes are often deployed in a staged manner: starting with a small
number of machines the percentage of all hosts that will receive the change
is slowly ramped up.
CHAPTER 7. CONFIGURATION MANAGEMENT 199
Canary
Sometimes it is difficult to account for all eventualities, and experience has
shown that some errors cannot (easily) be reproduced or triggered in a con-
trolled staging environment. This is largely due to the fact that actual pro-
duction traffic is so difficult to simulate. Hence, it may be a good idea to
create a so-called “canary” role as a special method of detecting possible
errors in your configuration: individual hosts that are part of the actual
production environment and that do take production traffic will get code de-
ployments earlier than the bulk of the production hosts. This way, errors can
be detected before they might be deployed everywhere else. Like the canary
in the coal mine, these hosts serve as an early warning system for potentially
dangerous changes.
Security Roles
Please note that the definitions given here are also helpful in decid-
ing access control as well as traffic flow from a security perspective.
For example, hosts in a development role may be accessed by all
developers, while production systems may not. Similarly, traffic
may be allowed from hosts in the “test” role to other hosts in the
same role only, but access of production data may be restricted to
production systems only.
f (f (x)) ≡ f (x)
That is, a function applied twice to any value will produce the same result
as if applied only once. A common example here is taking the absolute value
of x:
|| − 1|| ≡ | − 1|
How does this property translate into practical applications within the
scope of configuration management? As we are moving the system from one
state to another, we are applying certain changes. Executing these steps must
not yield different results depending on the previous system state. Reviewing
the required capabilities we identified as CM requirements in Section 7.3.3 –
software installation, service management, file permissions and ownerships,
installation of static files, generation of host-specific data, and command
execution – we find that each may or may not be implemented in a manner
that satisfies this requirement. As one example, assertion of an existing file’s
ownership and permissions, is an inherently idempotent operation: no matter
how often you execute these commands, and no matter what the permissions
and ownership on the file were before, the end result will be the same.
Other typical tasks within configuration management are not quite as
obviously idempotent, and some – updating an existing file on a host with
parameters determined at runtime, for example – may in fact not be. For
any command executed within a configuration management system, you can
ask yourself the question of whether or not the outcome will be the same if
you either run the command under different circumstances, or if you repeat
the command over and over. See Listing 7.4 for a few examples. Note that
sometimes a command may behave idempotent only if a special flag is given,
and that under some circumstances the only thing that makes a command
not idempotent may be the exit status. (We will revisit the importance of
idempotence within the context of building scalable tools again in Chapter
9.)
In many cases the CM may aid in defining actions in such a manner as to
preserve idempotence – CFEngine, for example, explicitly discourages the use
of free shell commands and leads the user to abstract their requirements into
CHAPTER 7. CONFIGURATION MANAGEMENT 201
7.6 Summary
We have covered a lot of ground in this chapter, much of it theoretical. The
examples shown were taken from the most popular configuration management
solutions currently in use: CFEngine, Chef, and Puppet. All three implement
their own Domain Specific Language, with striking similarities amongst them.
We identified as one crucial concept in configuration management the idea
that services are abstracted from the hosts that provide them. This abstrac-
tion allows us to define not individual changes, but rather self-contained
change sets describing what is required to provide a given functionality.
These change sets, in turn, can be applied to groups of hosts; with those
sufficiently abstracted and clearly defined, we are able to combine attributes
of host groups and describe its members’ final state using mathematical set
operations. Much like on a white board or the proverbial paper napkin in
a restaurant, we draw circles around things. Giving those circles descrip-
tive names, configuration management then becomes a question of creating
unions or intersections of service descriptions with host collections.
Using this model, we determined that configuration management is really
not about applying changes, but about asserting state, ensuring that a host
meets a certain set of criteria. We have looked at the functional requirements
any CM needs to provide in order to yield a desired state as well as the
essential concepts of idempotence and convergence: all changes we make must
have well-defined and predictable outcomes that do not change if applied
repeatedly; at the same time, we only want to perform those changes that
are necessary.
When building our infrastructure we face a choice between a custom,
home-grown system that integrates perfectly with other system components,
and deploying an “Off The Shelf” solution such as CFEngine, Chef, Puppet or
similar products. Many commercial solutions exist that promise to manage
all of your systems with ease. Some of them offer free versions; some systems
are available as open source software, while others are closed.
Knowing our infrastructure needs inside and out, we often fall into the
trap of writing our own solutions. In small environments, the benefits of a
complex product do not seem worth the cost of adapting our infrastructure
to abide by the product’s configuration model. It is for this reason that
every growing company inevitably undergoes a phase during which large
parts of the initial infrastructure are ripped out and replaced by more scalable
solutions – including a mature configuration management system.
CHAPTER 7. CONFIGURATION MANAGEMENT 204
The area of configuration management has become one of the most im-
portant fields in large scale system administration and has opened up room
for a lot of interesting research. At ever larger deployments with literally
hundreds of thousands of hosts, the theory of how systems fail, in how far
we can achieve automatic recovery, and the concept of autonomous state
configuration has yielded the new term of Service Orchestration.
In this chapter, we have focused primarily on configuration management
of host-type systems. But our infrastructure consists of more than just
“hosts”: we have large amounts of network equipment as well as a num-
ber of appliances that sit somewhere in between the traditional networking
gear and a “normal” host: we have routers and switches, load balancers and
firewalls, storage devices and intrusion detection systems... the list goes on.
All of these devices also require configuration management, and it is unfor-
tunate that integrating them into your existing CM is often difficult, if not
impossible.
Our approach to divide resources into grouped sets and describe changes
as service definitions does in fact translate to non-host systems as well. All
we need is for a programmatic method to access them – ideally, though not
necessarily via a programmable interface such as a well-documented API –
and we can add support for them to our CM. A positive development in
recent years has lead some of the configuration management tools we men-
tioned to now allow you to manage at least some non-host type systems, even
though we are still a ways from the same level of control.
The following quote perhaps best summarizes the fundamental value con-
CHAPTER 7. CONFIGURATION MANAGEMENT 205
Granted, one would hope that people entering your datacenter and throw-
ing machines out of the window is not a regular event. But Murphy’s famous
law – commonly cited as “Anything that can go wrong, will go wrong.” – acts
as viciously, and we must view software failure as the functional equivalent.
All software has bugs, and it is only a question of time until your complex
service encounters a fatal error. Similarly, hardware fails as well – and not
too infrequently. This becomes painfully obvious in very large environments
with hundreds of thousands of machines, as the probability of encountering
a hardware failure increases proportionally to the number of hosts in service.
With properly defined services, correctly grouped hosts, and a well run-
ning CM individual hosts do become expendable. Any system-, software-, or
hardware failure has significantly reduced impact on our overall service avail-
ability. In other words: configuration management lies at the heart of any
well-managed infrastructure and makes large scale deployments ultimately
maintainable.
CHAPTER 7. CONFIGURATION MANAGEMENT 206
class syslog {
include cron
include logrotate
package {
’ syslog - ng ’:
ensure = > latest ,
require = > Service [ ’ syslog - ng ’];
}
service {
’ syslog - ng ’:
ensure = > running ,
enable = > true ;
}
file {
’/ etc / syslog - ng / syslog - ng . conf ’:
ensure = > file ,
source = > ’ puppet :/// syslog / syslog - ng . conf ’ ,
mode = > ’0644 ’ ,
owner = > ’ root ’ ,
group = > ’ root ’ ,
require = > Package [ ’ syslog - ng ’] ,
notify = > Service [ ’ syslog - ng ’];
commands :
restart_sshd ::
"/ etc / rc . d / sshd restart "
}
# cd / etc
# not idempotent
# rm resolv . conf
# not idempotent
# echo " nameserver 192.168.0.1" > resolv . conf
# idempotent
# echo " nameserver 192.168.0.2" >> resolv . conf
# not idempotent
# chown root : wheel resolv . conf
# idempotent
# chmod 0644 resolv . conf
# idempotent
Problems
1. Review your own environment, your infrastructure, and your essential
services. Do you think they could be rebuilt from scratch? With how
much effort? Which components seem completely and easily replace-
able, and which would you fear most to lose?
3. Review the documentation for one of the popular CMs. Most of them
can quickly and easily be tried out using just a few virtual machines.
Set up a CM and try to define a simple service. How does this system
implement the features and concepts we discussed?
209
CHAPTER 7. CONFIGURATION MANAGEMENT 210
Bibliography
[1] Mark Burgess, A Site Configuration Engine, USENIX Computing sys-
tems, Vol8, No. 3, 1995; on the Internet at
http://cfengine.com/markburgess/papers/paper1.pdf (visited February
22, 2013)
[4] Cabinet Office, ITIL Service Operation 2011 Edition (Best Management
Practices), The Stationery Office, 2011
[6] Steve Traugott, Lance Brown, Why Order Matters: Turing Equivalence
in Automated Systems Administration, Proceedings of the USENIX
Large Installation System Administration conference, Philadelphia,
PA Nov 3-8, 2002, USENIX Association, Berkeley, CA, 2002; on the
Internet at
http://www.infrastructures.org/papers/turing/turing.html (visited
January 29, 2013)
[14] Steve Traugott, Joel Huddleston, Joyce Cao Traugott, Best Practices
in Automated System Administration and Infrastructure Architecture:
Disaster Recovery; on the Internet at
http://www.infrastructures.org/bootstrap/recovery.shtml (visited
March 09, 2013)
Chapter 8
Automation
8.1 Introduction
In the previous chapter we have discussed how configuration management
systems allow us to create abstract service definitions and apply the required
changes on the desired sets of hosts. CM systems thus remove the need for
manual configuration on individual hosts – we have automated the steps, and
in the process improved the reliability and scalability of our services within
the infrastructure.
Similarly, we have hinted in Section 5.4 at systems capable of perform-
ing OS installations across large numbers of hosts without human interven-
tion. Knowing all the individual steps needed to install an operating system,
adding the required software, and performing initial host configuration (be-
fore we finally let our configuration management engine take control) allows
us to use software to repeat a previously tedious manual procedure hundreds
of times a day with great ease.
Both of these admittedly complex systems are prime examples of what
system administrators do best: being inherently “lazy” – a virtue, as we will
discuss a bit more in a minute – we automate any conceivable task such that
213
CHAPTER 8. AUTOMATION 214
it can be performed at the push of a button, or perhaps more likely, the press
of the return key.
In this chapter, we want to take a small step back from these large infras-
tructure components and review the motivation of the more general concept
of “automation”. Given the incredible variety of job descriptions and environ-
ments a system administrator might find themselves in, it is not surprising
that the idea of letting computers do our bidding has found a number of
different practical approaches. We see evidence of automation as much in
a system administrator’s shell aliases or custom scripts as in the regularly
scheduled cron(8) jobs or manually invoked tools.
It has been an old joke that the goal of any great system administrator
is to automate themselves out of a job, just as we like to suggest to our
colleagues or peers that they should stop bothering us, lest we replace them
with a small shell script. The use of this language reflects how we approach
automation: our goal is to not have to perform certain tasks ourselves, and
anything that can be automated, will be.
Unlike humans, computers do not seem to mind performing boring, repet-
itive tasks. This chapter looks at how we can take advantage of this fact,
how automation can help you stop wasting your time and put to greater use
the most valuable resources in any organization (your engineers’ minds), but
we also identify a number of possible pitfalls and risks.
solution that will save us tedious work in the long run, we are willing to invest
significant resources up front. We write scripts that will take into account
a variety of circumstances in order to allow us to run just one command
instead of a dozen, we schedule checks and implement monitoring solutions
to be notified of events before they become error conditions, and we write
tools that react to these alerts and which will automatically avert impending
doom.
Æleen Frisch identified[1] – only slightly tongue-in-cheek – a number of
“administrative virtues”: Flexibility, Ingenuity, Attention to Detail, Adher-
ence to Routine, Persistence, Patience, and Laziness. While these traits come
into play in almost any of a system administrator’s varied duties, there are
particularly strong parallels to the features that allow us to create reliable
automation tools:
We need flexibility and ingenuity to identify new and perhaps not entirely
obvious solutions to the problems we encounter.
We need to pay attention to the nuances in the ways our systems may dif-
fer, when we analyze and identify all the possible edge cases, or how invoking
a command under certain circumstances can lead to unexpected results.
We need a strict adherence to routine to produce reliable results, to collect
usable data, and to keep our systems maintainable. We need to follow our
own processes even when we are under pressure and inclined to take shortcuts,
because we trust the routine we identified earlier more than we trust our
stressed out brains.
We need persistence and patience when we write software, when we debug
our tools, when we collect enough data to be able to identify our outliers
and averages, and we need persistence and patience when we deploy our new
systems, our infrastructure components, or when we slowly, carefully, replace
a much needed service with different software.
But what impacts most of our practical work turns out to be laziness. As
soon as we run a lengthy command a second or third time, or as we repeat
a series of complex steps to complete a specific task, we start to wonder how
we can script this. Sure, we may end up spending a lot of time getting our
automated jobs just right, but we gain productivity in the long run. And so
our laziness pays off.
CHAPTER 8. AUTOMATION 216
8.3.1 Repeatability
Typing the same set of commands over and over is tedious. A few years
ago, while maintaining a heterogenous environment of NetBSD/i386 and IR-
IX/mips systems, I had to try to keep in sync the latest version of the GNU
Compiler Collection (GCC), standardizing it to fit into our environment, en-
abling only the required languages but at the same time ensuring the build
of a 64-bit binary1 .
Normally, you would have your package manager perform this installation.
However, as discussed in Section 5.5.1, there are situations where installing
software “by hand” is your only option. This was one of them: the native
package manager did not provide a 64-bit version of this compiler, and what
might otherwise seem like a routine task – building a new compiler – required
the setting of a number of environment variables and command-line options
to the ./configure script. Ultimately, the commands needed to build the
compiler grew to become what is shown in Listing 8.1.
A few months later, when a new version of gcc(1) became available, I
had to repeat the same process. Instead of wasting time trying to remember
all the options I needed, what environment variables I had to set, etc., I was
able to build it using this trivial script.
We often use automation as a way to remember the correct steps. Even a
task that is not performed frequently – in our example above, perhaps twice a
1
Certain 64-bit capable IRIX systems supported both a 32-bit and a 64-bit Application
Binary Interface (ABI); in order to use the more performant 64-bit interface, the applica-
tion would have to be compiled with support for and be linked against the 64-bit libraries.
Even though less common today, you can still find a number of tools that cannot (easily)
be built as 64-bit binaries.
CHAPTER 8. AUTOMATION 217
#! / bin / sh
export CC = cc CXX = CC CFLAGS =" -64 - mips4 - r10000 " LDFLAGS =" -64"
mkdir gcc - build
cd gcc - build
../ gcc -3.3.3/ configure -- prefix =/ usr / pkg / gcc -3.3.3 \
-- enable - languages =c , c ++ , java -- enable - libgcj \
-- disable - shared -- enable - threads = posix \
-- enable - version - specific - runtime - libs \
-- enable - haifa -- disable -c - mbchar \
-- disable - checking -- disable - nls
gmake bootstrap
year – benefits from us creating even a simple script to repeat it easily without
having to think twice about every detail. Detailed documentation explaining
why certain parameters need to be passed may be ideal, but providing a
simple script to repeat a cumbersome process is still a win.
Guaranteed repeatability allows us to ignore how a task is performed.
Even simple scripts can thus hide unnecessary complexity from us, making
the task at hand easier in the process.
8.3.2 Reliability
Being able to repeat the same process easily without having to recall the
details of every step is a great advantage, not only because it saves us a
lot of typing or because we don’t waste time recreating previously known
information. One of the key benefits of an easily repeatable process lies in its
reliability. When we run commands interactively, we may change the order
or detail of invocation as we run through the end-to-end process. We easily
make mistakes, mistype a command, accidentally redirect output to truncate
a file instead of appending to it, accidentally slip in an additional character,
execute a command from the wrong directory, or we simply skip a step (by
accident or because we deem it unnecessary).
Every time we invoke a script, on the other hand, it will run the same
commands in the same order. If we are careful when we automate a task
and create a tool executing idempotent commands only (hopefully combined
CHAPTER 8. AUTOMATION 218
with copious error checking), we need not worry about how it is invoked.
We can treat it like any other system utility and rely on it running either to
completion or produce meaningful error messages that allow us to diagnose
the problem.
Automation provides reliability not only through consistency, but also in
terms of quality: as soon as we begin to automate a task, we begin to move
beyond just stashing individual commands in a script. Instead, we build
our tool with reliability in mind, fine tuning the steps and thinking more
about how they are executed. Just like documentation, we are more likely
to consider in more detail the implications of each command when we create
an automated job. As a result, the final utility is more robust and executes
more reliably than any manual process ever would.
Some tasks need to be performed repeatedly with a certain regularity, a
prime example being system backups. For starters, the backup process needs
to be run on a daily basis. Defining the command required to initiate this
task and then scheduling it via the cron(8) dæmon is about as mundane
yet important a task as you may encounter in any system administrator’s
life. But being able to rely on the tool is a pre-requisite, and automating
the steps to allow them to be scheduled in the first place does provide this
required reliability.
As our systems grew and disk capacity increased, so did the require-
ments for backups. Eventually, upgrading our backup system became
unavoidable. At that point, we invested into a significantly larger (and
substantially more expensive) tape library able to hold dozens of backup
tapes and to rotate them automatically by means of an internal robotic
CHAPTER 8. AUTOMATION 219
tape switcher.
Once set up correctly, the entire backup process became completely auto-
mated, not requiring any human intervention whatsoever. Every so often,
“throwing money at the problem” may turn out to be the best solution.
The satisfaction of seeing a robot do one’s manual work should also not
be underestimated. But all automation solutions, including robots, can
experience unexpected failures. At Yahoo!, we once received an incident
notification that read: “Two robots collided, one arm pulled off.” Fortu-
nately, the appendage in question belonged to one of the robots, not a
datacenter technician.
8.3.3 Flexibility
As we develop our tool, it often becomes obvious that we are not, as initially
assumed, solving one specific problem, but that we are facing a particular
instance of an often more generic set of problems. In our previous example,
we set out to build a script that would let us easily create a 64-bit version
of version 3.3.3 of the gcc(1) tools. We hard-coded both the ABI as well as
the software version into our simplistic script, meaning that the next time
a new version is released, we need to edit the source to build the software.
What’s more, some of our machines do not support the 64-bit ABI, and we
need to build 32-bit versions as well. In other words: we need a more flexible
tool that allows us to specify both the ABI as well as the software version.
Once identified as a candidate for automation, we quickly begin to view
the problem at hand in a different context. By spending a little bit of time
up front, we can anticipate future requirements and make our tool useful in
a variety of circumstances. For example, we may allow the user to specify
different options on the command-line or let the tool react to certain envi-
ronment variables.
But flexibility is not only exhibited by allowing different invocations. A
good tool is also flexible in the way in which it handles certain error con-
ditions. This may be achieved by verifying any of the assumptions made,
or by failing early and explicitly. This behaviour reflects the concepts of
Idempotence and Convergence, which we discussed in Chapter 7; we will go
back to these and other desirable features of scalable tools in Chapter 9. For
CHAPTER 8. AUTOMATION 220
now, suffice it to say that even if somewhat paradoxically a tool being more
strict about how it runs may actually provide greater flexibility: it allows us
to run it under different circumstances with predictable outcomes. What’s
more important, it may allow us to build other tools around it.
Take a look at Listing 8.2, where we have added some error checking
as well as options to build either 32-bit or 64-bit binaries of the specified
version to our simple example script. While there is still plenty of room
for improvement (part of Problem 4), we increased both the reliability and
flexibility of what started out as a collection of fixed commands stashed away
in a file.
8.4.1 Ourselves
Often we begin automating a task by customizing our own environment. We
all have our own shell aliases, functions, or custom little scripts stored in our
home directory or a private location in our PATH.
CHAPTER 8. AUTOMATION 221
tasks is perhaps the simplest version of automation, yet it often evolves into a
more complex solution, expanding with the problem scope. After completing
a task, we often realize that it was just one step of a longer process, and that
we may well automate the remaining steps as well.
Automation is rarely completely autonomous. By this, we mean that
we define specific subsets of tasks that are performed by a computer on our
behalf, but the overall operation, divided into answering questions about the
what, how, when, and, ultimately, why of a given problem solution, are to be
answered by the system administrators in charge.
Tools written to save typing efforts, even if they include logic that provides
flexibility to yield different outcomes depending on certain circumstances,
by and large provide answers to the question of how to solve a problem by
having a human describe detailed instructions. That is, we describe the steps
necessary to be executed in their specific order. In our previous example, we
identified the goal of the tool (“Build a new version of the gcc(1) compiler.”)
and provided a script describing the method of accomplishing it (“Set these
environment variables. Run the configure script with these options. Run
the make command.”).
The level of automation reached here is fairly low, but it has the advantage
that the solution is simple. As we move beyond this first stage of automation,
we begin to define the problem in more general terms, allowing us to describe
only what we wish to accomplish without having to care about how this
is done. In our example, a generic package manager might serve as the
automated solution: we can specify that we need a new version of a package,
without having to know (or care!) about how the system actually builds it.
Despite increased complexity, the benefits are significant. In addition, we
still control when actions are taken interactively.
Configuration management takes automation to the next level: here,
system administrators only describe the what: which packages should be
installed, which files should be added, updated, or removed, etc., without
always specifying exactly how this is done, nor when these steps are to be ex-
ecuted.2 That is, the configuration management system is more autonomous,
as it applies its own rules of which steps to perform at what time to yield
eventual convergence to the defined state.
2
To be fair, writing the rules of a configuration management system may frequently
include a specification of the how, but with sufficient abstraction, such as by way of
carefully crafted templates, these details become less important for routine configuration
tasks.
CHAPTER 8. AUTOMATION 224
or did not measure. Trusting our automated tools can lead us to overlook
or dismiss factors not accounted for in these solutions. We will take a closer
look at this effect and related risks when we discuss monitoring in more detail
in Chapter 14.
If you look closely, you will notice that there is an erroneous space be-
tween the /usr and /lib/mumble components of the pathname. The
package had intended to remove only the subdirectory containing files it
previously installed, but proceeded to recursively remove all files under
/usr, a critical and sizeable part of the operating system!
Recall from Section 5.5.3 the inherent risk of trusting your package man-
ager (or any third-party software whose install scripts you run with supe-
ruser privileges without careful inspection) – this example helps illustrate
that point as well as the possible impact a single character in the wrong
place may have.
Even though this particular error was quickly detected and fixed, imagine
your automated software deployment system pushed an upgrade of this
specific package to all of your hosts, and you will quickly understand how
automation can magnify any failure exponentially.
cess permissions to cause significant damage. What’s more, it’s unlikely that
actions by this user can (easily) be tracked back to a human, an important
auditability requirement.
In this case, the ability to orchestrate complex changes across large sets
of hosts may lead to a loss of the audit trail. Granted, it is possible to retain
the ability to track commands and actions, but with every added level of
automation this becomes more and more cumbersome and is, quite frankly,
easily forgotten.
8.6.4 Safeguards
The complexity we inherit as a by-product of all the benefits of automation
should not be underestimated. With every additional automated step, we
broaden the possible impact of our tool. Not only can more things go wrong,
but failure may propagate or escalate at scale. As a result, the need for
safeguards together with the discovery of and alerting on error conditions
increases at the same rate.
Simple tools rarely require human interaction, but may at times allow for
confirmation from the user before taking a particular action; as an example,
consider the -i flag to the cp(1), mv(1), and rm(1) utilities. As we combine
such tools (or write our own) to provide more automation, such safeguards
are often seen as a hinderance, and we avoid them where possible. After all,
the whole point of automating a task is to avoid human interactions. But
as we do so, we also increase the risk of more widespread damage when (not
if) things do go awry. Smart tools make use of well-defined thresholds when
applying large changes: a tool may allow you to update or delete records
in your inventory database without interaction (assuming proper authentica-
tion), but may ask for additional confirmation or even higher privileges when
performing the same action on all records.
Since it is always easy to identify in hindsight the parts where our tools
should have had safeguards it is not uncommon to encounter such safety mea-
sures only after an undesirable incident has occurred. This is not surprising:
our tools evolve with time, and what is obviously necessary in a large scale
solution is easy to dismiss as overhead in a simple script. But at the same
time, adding safeguards, error-checking, privilege separation, logging, event
correlation, and monitoring is so much easier to do when developing a small
tool; refactoring a complex system already in production use to add these
essential features is significantly harder. This is why it is important to be
aware of the possible pitfalls of automation right from the start. We need to
build our tools in a scalable manner and with the foresight and understand-
ing that they may well evolve into a much larger service or component of a
bigger solution.
Yet we need to be careful not to hamper productivity with pointless re-
quirements for human interactions: requiring sign-off by a peer or supervisor
before issuing routine commands is not only demotivating and tedious, it
CHAPTER 8. AUTOMATION 229
becomes unsafe when users find ways around the restrictions. Any safe-
guard that users do not understand, or that ultimately get in the way of
them getting their job done, will eventually be circumvented. We will revisit
this dilemma when we discuss general security principles in Chapter 11, but
the question how, when and where to add the necessary safeguards with-
out impacting effectiveness remains one of the fundamental conflicts when
developing an automated solution.
The root cause of this outage was, as so often, “human error”: developers
accidentally initiated data deletion commands against the production-
instead of their maintenance infrastructure. The effects of this action
trickled down to each customer as they were making changes to their
load balancers, leading to confusing API errors in the service. It took
Amazon’s engineers several hours to fully understand why the errors oc-
curred and before a restoration of the service could even be attempted.
Given the complexity and size of the infrastructure at hand, this is not
very surprising.
8.7 Summary
Automation is an essential part of every system administrator’s life. We
encounter it on a daily basis in small tools as well as in large and complex
infrastructure components. We noted, perhaps half jokingly, that laziness
is an inherent trait, a virtue, of every good system administrator, which
paradoxically may lead us to go to great lengths and significant efforts to have
computers perform the tasks we would otherwise need to repeat ourselves.
We have looked at the explicit benefits automation of even trivial tasks
provides: we gain the ability to repeat complex command invocations without
having to remember all required options or environment settings; we begin
to rely on our tools as they grow and allow us to ignore the implementation
details, knowing that a carefully written script or program can be trusted
to perform the right sequence of steps; we gain flexibility in the execution
of many tasks as we apply some abstraction and build tools that solve not
just one specific problem, but allow us to address more general classes of
problems.
Parallel to these benefits, we noted that different users benefit in differ-
ent ways from the tools we create. Just as we increase flexibility through
abstraction do we improve the usefulness of our tool as its userbase grows.
But automating administrative tasks for all users of our systems or our peers
requires a different understanding of the problem space than if we were merely
jotting down a quick script to help ourselves save a few keystrokes. As with
CHAPTER 8. AUTOMATION 231
#! / bin / sh set -e
VERSION =" $ {1:?" Usage : $ {0} < version > [ abi ]"}"
ABI =" $ {2: -64}"
DIR =" $ { TMPDIR : -/ tmp }/ gcc - build "
export CC = cc
export CXX = CC
export CFLAGS =" - $ { ABI } - mips4 - r10000 "
export LDFLAGS =" - $ { ABI }"
gmake bootstrap
Listing 8.2: A second iteration of our gcc(1) building utility, showing some
added error checking and increased flexibility through the use of command-
line options. Our tools often evolve in this manner from a trivial script into
a more functional program.
Problems and Exercises
Problems
1. Take a look at your shell’s history file, e.g. ˜/.bash_history. Which
commands do you run most frequently? Can you create aliases or shell
functions that save you some typing?
2. Identify a routine task you perform on a regular basis. How can you
automate this task? Can it be broken into smaller independent subtasks
that can be automated?
3. Ask your peers, your local system administrators, or search the inter-
net for examples of simple scripts and custom tools they use. How
flexible are these tools? What assumptions do they make about their
environment? Can you improve one of them?
4. Consider the example script from Listing 8.2. What assumptions does
it make? How would you change the script to improve its reliability
and flexibility?
6. Identify the methods by which your systems are maintained and up-
dated, including the configuration management, software deployment
and service monitoring systems. Which steps are performed manually?
What level of automation can you identify?
7. Search the internet for a recent, significant service outage. What did
you learn about the root cause of the failure? What role did automation
233
CHAPTER 8. AUTOMATION 234
Bibliography
[1] Æleen Frisch, Essential System Administration, O’Reilly Media, 2002
[4] John Allspaw, “A Mature Role for Automation: Part I”, on the Internet
at
http://www.kitchensoap.com/2012/09/21/a-mature-role-for-
automation-part-i/ (visited April 1st, 2013)
[6] “Summary of the December 24, 2012 Amazon ELB Service Event in the
US-East Region”, on the Internet at
https://aws.amazon.com/message/680587/ (visited March 27th, 2013)
Chapter 9
9.1 Introduction
In the previous chapter, we talked about our desire to automate any conceiv-
able task, and we gave examples ranging from a very simple script building a
piece of software to complex systems orchestrating changes across thousands
of machines. All of these have one thing in common: they are software writ-
ten by system administrators. Even though many sysadmins may not think
of themselves as software developers, we all end up writing our fair share of
it.
The programs we create are in many ways different from a typical software
project like a word processor or similar standalone applications. System
administrators frequently refer to their tools as “duct tape”; they tend to
consist of little helper tools, of glue scripts, small programs that are used
to interface with each other, with more complex software systems or simply
intended to munge data into a new format. They often present a simple
command-line interface and operate on plain text input.
These system tools rarely consist of more than just a few hundred, rarely
up to a few thousand lines of code. In other words, they’re rather small, and
we think of them as “simple”, even as we take pride in the solutions we’ve
come up with and the automation they provide us with.
236
CHAPTER 9. BUILDING SCALABLE TOOLS 237
One of the key insights you can gain from years of working in different en-
vironments, on different operating systems, and using software tools written
by many different people with many different programming styles or philoso-
phies is the value of simplicity and flexibility. The best and most reliable
tools are those that we do not have to think a whole lot about, the ones we
use day in and day out and that we can combine with other tools in ways
not predicted by the original author of the software.
As system administrators, we strive to create equally reliable tools; as
software developers (albeit not in title), we understand and appreciate the
Unix Philosophy. We are aware of the differences in our anticipated user
base, as well as of the fact that we cannot predict all possible uses of our
software. This chapter covers all of this as well as a few general design
principles. Even if you do not currently think of yourself as somebody who
writes a lot of software, internalizing these lessons may help you understand
the decisions behind software you already use on a regular basis and hopefully
will help you build better tools yourself in the future.
usually produces two very different results, even if both may follow the same
requirements or behaviour.
Writing software is hard, in part because we are effectively unbound by
constraints. Existing software can be changed; what does not yet exist, can
be created. Job descriptions for positions in system administration usually
include a required familiarity with some “scripting languages”, suggesting
that we do not write software to produce full featured products. What we
create is often just regarded as “just a script”, a little program, a tool we put
together to automate a workflow.
But simple does not mean unimportant. If our little tool is actually useful,
it will quickly sprout new features, adapt to being used in other environments
and by different people, integrated into routine tasks and eventually relied
upon. Software is alive; it grows and ultimately escapes your control. Scripts
become programs, which in turn become infrastructure components or stand
alone software products.
Even though the boundaries between them are fluid, we can identify an
evolution of three approaches to creating software: scripting, programming,
and formal software development. It is important to understand the dif-
ference in scope, usability, and implications resulting from each approach.
Being aware of which step on the latter you find yourself allows you to more
clearly understand the requirements and appropriate solutions. Let us look
at these three stages in more detail.
“Scripting Languages”
It is worth noting that we use the terms “script” and “program”
(both as nouns or as verbs) in a rather language agnostic manner.
“Shell scripts” are often a collection of commands with limited
control flow, written using Bourne(-like) shell syntax. They tend
to evolve out of actual command pipelines executed interactively,
combining the common Unix tools such as awk(1), grep(1) or
sed(1).
to use. Perl, Python and Ruby are often cited as examples of “scripting
languages”.
9.2.1 Scripts
“Scripts” are primarily defined by the language we use to describe them: we
“throw together a quick script”, or we “whip up a few lines of shell code”.
The results reflect this attitude. Our scripts are – initially anyway – not
more than a collection of commands, stored in a file to save us the hassle of
having to remember and type them repeatedly. Our code examples in the
previous chapter, Listings 8.1 and 8.2, are good illustrations of this approach.
As a simple solution to a problem we don’t anticipate to be used by other
people, we make a number of assumptions about the user’s environment,
the invocation, user input, and so on. All of these assumptions tend to be
implicit or hidden, and we only become fully aware of them when they no
longer hold and the program misbehaves.
Scripts tend to be used for very simple tasks, and they often rely heavily
CHAPTER 9. BUILDING SCALABLE TOOLS 240
on the environment. Spanning just a few dozen lines or so, they expect
certain variables to be set, a directory hierarchy to follow a specific layout,
or they may attempt to write to files in the current working directory. Due
to a lack of error checking, online help or documentation, they often really
are only suitable for use by the person who wrote them.
These little helper scripts may start out as shell aliases or functions, or are
stored in a private directory in one’s PATH and used primarily to customize
your own environment and automate a few of your most common tasks. But
if what we whipped up there is actually useful, it will invariably evolve into
a larger program. With every additional user, it will grow new features,
assumptions about the environment it executes in will be removed or turned
into assertions. We may add command-line option parsing, explicit error
checking and reporting, fix bugs and increase overall robustness. Before you
know it, what started out as a few dozen commands stashed away in a file
becomes a reliable program.
9.2.2 Programs
All but the most simple scripts see some ongoing development as their user
base increases. Frequently we start out whipping up a script – there it
goes again, this phrase – only to come back to it (not much) later, adding
features and extending its functionality. In fact, we often throw out our
initial prototype and rewrite the tool, possibly in another language. Even
though it is difficult to willingly discard an existing solution into which time
and effort have gone, this is, after all, the the main purpose of a prototype:
to allow you to learn the details of the problem and to provide a proof of
concept implementation on which to model your later program.
In the process of writing a prototype, you will identify new desirable
features, figure out what works and what doesn’t, as well as discover hidden
requirements and dependencies. Growing in complexity as well as maturity,
a program developed from a prototype becomes a more reliable tool.
In contrast to simple scripts, programs make use of common toolkits,
software libraries or modules to combine existing frameworks and implement
new functionality. They provide a more consistent interface, account for
differences in the environment and may be able to handle larger input data,
for example. Programs range from a few hundred to a few thousand lines of
code; they may consume or provide an API to a service and interface with
various other components without human interaction.
CHAPTER 9. BUILDING SCALABLE TOOLS 241
Programs and scripts are, for the most part, what many of us System
Administrators are creating when we write software. We put tools together
that range from trivial to moderately complex, targeting as our intended
users people like us, our peers, possibly other people within our organization
but rarely outsiders with entirely different environments or requirements.
System administrators are known to half-jokingly refer to themselves as
duct tape slingers, experts in stringing together systems and programs in
ways previously unimagined and perhaps unintended; our tools function as
glue, to bind together independent system components. But here, too, the
language we choose reflects the attitude we have towards our tools, and it
helps develop greater confidence and a sense of pride in our creation.
What began as a set of simple scripts may have developed into a number of
complex programs used across numerous systems. Without realizing, we may
have developed a core infrastructure component upon which our organization
relies. At this point, the software is likely to have advanced to into the
stage of being a full-featured, self-contained application, requiring ongoing
maintenance, development and support. We have crossed the boundary from
“program” to product, often without being aware of it.
The larger our infrastructure grows, the more complex become the pro-
grams we use to hold it together. At some point, we need something stronger
than duct tape. That is, we need to change our attitude to treat the software
we create on this particular layer of the infrastructure ecosystem as requiring
– deserving – a more professional approach.
“[At Netflix] We’ve observed that the peer pressure from ‘Social Coding’
has driven engineers to make sure code is clean and well structured, doc-
umentation is useful and up to date. What we’ve learned is that a compo-
nent may be ‘Good enough for running in production, but not good enough
for [Open Source].’ ” [1]
With this in mind, it behooves us to develop all of our software with the
intention of making it public. The prospect of releasing our code as Open
Source to the world will aide as a significant Quality Assurance factor.
B e a u t i f u l i s b e t t e r than u g l y .
E x p l i c i t i s b e t t e r than i m p l i c i t .
S i m p l e i s b e t t e r than complex .
Complex i s b e t t e r than c o m p l i c a t e d .
F l a t i s b e t t e r than n e s t e d .
S p a r s e i s b e t t e r than d e n s e .
Readability counts .
S p e c i a l c a s e s a r e n ’ t s p e c i a l enough t o b r e a k t h e r u l e s .
Although p r a c t i c a l i t y b e a t s p u r i t y .
Errors should never pass s i l e n t l y .
Unless e x p l i c i t l y s i l e n c e d .
In the f a c e o f ambiguity , r e f u s e the temptation to g u e s s .
There s h o u l d be one−− and p r e f e r a b l y o n l y one −−o b v i o u s way t o do i t .
Although t h a t way may n o t be o b v i o u s a t f i r s t u n l e s s you ’ r e Dutch .
Now i s b e t t e r than n e v e r .
Although n e v e r i s o f t e n b e t t e r than ∗ r i g h t ∗ now .
I f t h e i m p l e m e n t a t i o n i s hard t o e x p l a i n , i t ’ s a bad i d e a .
I f t h e i m p l e m e n t a t i o n i s e a s y t o e x p l a i n , i t may be a good i d e a .
Namespaces a r e one h o n k i n g g r e a t i d e a −− l e t ’ s do more o f t h o s e !
$
With this approach, you will observe a shift in focus away from solving
just your own, very specific problem in your own, unique environment to-
wards creating general solutions. This allows you to continue to use or adapt
your tools as your infrastructure changes.
In this section, we will discuss a number of principles that will guide you
in this change of viewpoints and development practices. You may already
be familiar with some of them, as we may have mentioned them previously
in this book, while others are common recommendations for professional
software development.
For example, the Python programming language includes as a so-called
“easter egg” its own programming philosophy, known as the “Zen of Python”,
which we show in Listing 9.1. Throughout this chapter, we will reference
parts of the Zen of Python, and you will find that it applies equally well to
other programming languages and software development principles.
As you read on, try to think about how you would apply the principles
discussed here to your own software. That is, do not regard them as rules
or guidelines for large scale software development only, but as general advice
applicable also and especially to small(er) system tools. You will be able to
identify many of them as underlying the programs and utilities you already
use on a daily basis, and hopefully begin to view your own software as no
different.
CHAPTER 9. BUILDING SCALABLE TOOLS 245
Simplicity
Write programs that do one thing and do it well. Design your tools to be
as simple as possible. Writing software is hard! The more functionality you
attempt to implement, the more code you have to write, and more code in-
evitably translates to more bugs. Given this widely accepted understanding,
shouldn’t it be easy to write simple tools, to eschew the added complexity of
additional non-essential features? Unfortunately, identifying only the neces-
sary features and restraining oneself from adding functionality can be quite
difficult.
In order to write a program, we have to understand the problem we’re
trying to solve well enough to be able to explain it to a computer. The
more we think about a problem, the more corner cases we discover, the more
CHAPTER 9. BUILDING SCALABLE TOOLS 246
Tools as Filters
As the author of a program, we consider ourselves the ultimate authority
on how it might be used. We define the interfaces of our tools, and thereby
prescribing possible usage. But is is important to realize that we cannot
possibly foresee all the ways in which our product may be used. Any program
we write may end up being utilized in ways not anticipated by us.
The advice to write programs to work together embodies this awareness.
Because we cannot know how users will take advantage of the functionality
our program provides we need to allow them to combine it with great flex-
ibility. Since you cannot presume all use cases, begin by writing your tools
such that they accept input from stdin and generate output to stdout. As
shown above, this allows you to simplify your program by eliminating all the
comlexities of file I/O for these cases. But what’s more important: your tool
can now be used as a filter. Your users gain significant flexibility, as they can
now combine other tools to prepare input for or post-process output from
your program with great ease.
This approach leads to a few practical considerations. For example, it
is often a good idea to process input in chunks (most commonly one line
at a time) rather than attempt to store all input in a data structure before
handling it. This allows you to handle arbitrarily large input, as you are not
bound by the amount of available memory. Your program will also become
more responsive, as the user won’t have to wait for all input to be read before
your program can begin processing it.2
Treating your program as a filter also forces you to make a very explicit
distinction between the desired output it produces and any error- or diag-
nostic messages it may generate. Since the expected output may need to
be further processed by other commands, Unix tools have a long tradition
of printing such notifications to stderr, allowing you to process the valid
output generated while at the same time displaying errors to the user.
Likewise, do not assume that your program is invoked interactively. That
is, you cannot require input from the user. For example, it is a common
mistake to prompt the user for confirmation of certain actions (“Continue?
(y/n)”). To make matters worse, inexperienced programmers often attempt
2
Of course there are exceptions: certain programs require all input to be present before
they can produce a result (the sort(1) utility is one such example), but as a general rule
of thumb line-based processing of input data makes for a simpler approach and a more
useful filter.
CHAPTER 9. BUILDING SCALABLE TOOLS 248
That is, your program should tolerate malformed input, but produce
well-defined output. Note, however, that it is a common misinterpretation
of Postel’s Law to suggest that a program should always try to process all
input as if it was valid, even if it is not. This can be dangerous, as acting
on malformed input can lead to a number of security problems, such as
accidental code execution as a result of interpreting or evaluating input in
an executable context. Proper input validation is still required; it may be
sufficient for your tool to warn the user on invalid input before moving on to
the next chunk of data rather than aborting altogether.
Text Streams
Unix tools have a long tradition of operating on plain text. Consider
CHAPTER 9. BUILDING SCALABLE TOOLS 249
Listing 9.2: The BSD rc(8) system uses simple key=value pairs as a means
of configuration, making it trivial to process using common Unix tools.
the various commands you use on a daily basis: awk(1), grep(1), head(1),
sed(1), sort(1), tail(1), uniq(1), wc(1), ... all of them process data or
generate output by reading and writing text streams, described by Douglas
McIlroy as “a universal interface”, and their ubiquity is tied directly to the
use of these tools as filters.
But data structures frequently are complex representations of the author’s
own mental model, and programs regularly need to translate representations
of data for other tools to process or to retain state information in between
invocations.
Many programming languages allow for serialization of objects into a bi-
nary representation that can be read from or written to a file without the
(at times significant) overhead of parsing text, matching patterns, and re-
constructing a complex object. This approach, while often the most efficient,
would, however, limit your programs ability to be combined with other tools.
It could no longer function as a filter, as any command generating input
would need to produce the specific format required, and any output could
only be processed by tools capable of understanding this format.
The eXtensible Markup Language (XML) and the JavaScript Object No-
tation (JSON), are two example of data representation that attempts to
strike a compromise between binary formats and completely unstructured
text streams. For many system tools, however, text streams remain prefer-
CHAPTER 9. BUILDING SCALABLE TOOLS 250
able, as they provide a consistent user interface across the environment and
very clearly put the focus on who the primary consumer of the data is: the
user! The person running the commands needs to be able to make sense
of their intput and output. It is preferable to be wasteful with computing
resources and have your program require a few clock cycles more to process
the data than to waste your users’ time and energy, as they try to make sense
of the format.
In the Unix world, we have a thriving Open Source ecosystem of tools and
libraries; utilities written by one person or organization are often used and
extended by others, possibly in ways not imagined by the original author.
A stricter data model imposes restrictions on how the tool may be used,
extended, or built upon; encoding the structure in the input or output format
aids other programs, while text streams are helpful for human consumers.
The humility to put the user before the program, to understand that it is
people our tools interact with primarily and whose job they should make
easier leads us to favor text streams.
file, read its contents into memory, remove the file, then write the contents
from memory to the new location. But what if something goes wrong in the
process of writing the data and you are forced to abort the program? The
original file was already removed, which would come as a rather unpleasant
surprise to the user. The Principle of Least Astonishment would demand that
the file only be removed from the original location if the copy was successfully
written.
Unfortunately, it is not always this obvious to anticipate what may or may
not surprise your users. Destructive actions, such as removing or overwriting
files, or discarding input data can lead to significant frustrations when they
occur unexpectedly. As you write your program, carefully consider the way
you yourself interact with existing, similar tools, and what your expectations
are of them, then translate this behaviour to your own programs.
Create boring tools. Your users will thank you by being able to rely on
them.
>>> nums = [1 , 2 , 3 , 4 , 5]
>>> print list ( map ( lambda x : x **2) , nums )
[1 , 4 , 9 , 16 , 25]
>>>
>>> squared = []
>>> for x in nums :
... squared . append ( x **2)
...
>>> print squared
[1 , 4 , 9 , 16 , 25]
>>>
Listing 9.3: The two python code snippets shown here are functionally (al-
most) equivalent. But for non-advanced users, the second is easier to read
and more intuitive to understand.
But raising awareness amongst your peers how the systems you create or
maintain work has other beneficial side effects. All too often organizations
only find out how much they rely on a single person when a long-running
or widely relied-upon tool suddenly (and spectacularly) breaks down, and
CHAPTER 9. BUILDING SCALABLE TOOLS 255
nobody can be found who can make heads or tails of the code in question.
Other system administrators then need to reverse engineer previous design
decisions and attempt to understand how the given program works – or, more
often, how it doesn’t work – when somebody with a better understanding
could quickly have solved the problem.
Making sure that other people in your organization understand your code
helps avoid creating an inherent dependency on you as an individual. Creat-
ing thorough documentation and run books is one important aspect of this,
but often this is not sufficient to help somebody else really understand the
code and debug it in case of an emergency. For that, you need to atually
explain your code to your peers, an approach sometimes referred as “decreas-
ing your Bus Factor”: you should ensure that even if you were to suddenly
disappear (because, for example, you were run over by a bus) there would
be others who could debug, troubleshoot, update and maintain your tools.
In return, you can take your well-deserved vacation and enjoy the sunset on
the beach in Hawai‘i, sipping a Mai Tai, without the risk of getting paged
because you’re the only person who understands how a given program works.
The more people understand your codebase, the better. In addition,
knowing that your code will be scrutinized by others immediately makes
you focus more conciously on the quality, clarity and expressiveness of your
program. (See our previous note about “Open Source as Quality Assurance”.)
But just like you need to seek other people’s feedback and ensure their
understanding of your tools, it is equally important for you to keep up with
your peers’ programs and tools, to understand the problems they’re solving,
and to follow their implementations.
Unfortunately, code reading is not something that is commonly taught
in computer science programs, and it takes quite a bit of practice and a
fair amount of discipline. Often it may seem much easier to write your own
code than to fully immerse oneself in somebody else’s and understand it. We
easily dismiss a program because the style differs from one’s preferred way of
writing code or because it is structured in a way that seems counterintuitive
to us.
Many organizations have a common coding standard for this reason – if all
code is (at least visually) structured in the same manner, it becomes easier
for everybody to read and understand each other’s work. Such standards
cover not only things like indentation of code blocks, line width, or function-
and variable names, but often also prescribe certain common behaviour, such
CHAPTER 9. BUILDING SCALABLE TOOLS 256
Code Reviews
Understanding that peer review is an important and efficient
method to ensure quality, many organizations require code review
for certain changes. That is, before code can be committed in a
repository, it requires somebody else to sign off. So-called “com-
mit hooks” in a Code repository can enforce this by requiring the
commit messages to include the works “reviewed by: username”.
In order to allow reasonably efficient and agile development, this system
requires occasional exceptions, and so we do often find bogus usernames
or rubber-stamp reviews in our code repositories. Nevertheless, the idea
to require others to sign off on code changes is appealing, and a number
of software systems have been developed to automate and facilitate this
process.
This helps the author of the code in question ensuring that they really
understand their own code, ensures that colleagues are at least conceptu-
ally aware of how the program works, and helps enforce coding guidelines.
At the same time, it can be a thoroughly enjoyable experience and may
reinforce positive team bonds.
Write the fine manual. Any tool, no matter how simple it may seem,
deserves a manual page. echo(1), true(1), and yes(1) have manual pages
– whatever your program does is likely to be more complex than either of
these utilities. Users should not be expected to read through the code itself
to understand how a tool works. Writing a manual page helps you clearly
define the user interface, how your command is invoked and what the user’s
expectations will be.
CHAPTER 9. BUILDING SCALABLE TOOLS 259
9.5 Summary
Much can be written about how to write clear, readable code. Much has been
written in this chapter alone, yet there are hundreds of books, online columns,
and blog entries on the topic. We tried to capture some of the most important
principles underlying the development of scalable, robust system tools. In the
process, we drew a distinction between different approaches to the software
development processes typically encountered by system administrators.
The tools we write are often different in scope from the large scale software
development projects typically covered by literature. It takes a particular
understanding of your environment to write a system tool, a utility that fits
natively into your operating system and integrates well into the environment.
In order to approach this perfect fit, we put a strong emphasis on the Unix
philosophy and strive to adhere to the three main aspects – simplicity, ability
to function as a filter, use of text streams for I/O – wherever appropriate.
We mentioned the “Zen of Python”, and noted how the advice given here
translates to other programming languages; we covered the Principle of Least
Astonishment, and noted the importance of dependable, robust behaviour,
which must include predictable failure with meaningful error codes and diag-
nostics. We warned against the pitfalls of developing and deploying so-called
“temporary solutions”, knowing all too well that there is no such thing.
We covered (at length) the importance of readability of your own code as
well as that of others. Sometimes we need to step in and fix a few broken
windows to ensure a modicum of code quality, and we must not be too
proud to let others review and help improve our own code, which in turn
ensures that your colleagues will be able to help debug your program, thereby
CHAPTER 9. BUILDING SCALABLE TOOLS 260
The software you write reflects on you, your organization, your company,
your peers. It will be relied on and used by other people and systems; it
will also break and require debugging by yourself as well as others, by people
who understand your tool and by those who have never looked at the code
in question up until it broke. It is your responsibility to make it easier for
your users to work with your tool.
Problems and Exercises
Problems
1. Earlier in this book we mentioned the “yum” package management tool
as well as the “CFEngine”, “Chef”, and “Puppet” Configuration Man-
agement systems. “Cacti” and “Nagios” are two solutions related to
system monitoring, which we will discuss in a future chapter. Other
frequently used tools include curl(1) and rsync(1), for example.
What programming language(s) are these tools written in? Why do you
think the given language was chosen? What programming language(s)
can be used to interface with or integrate custom modules or extensions
into these tools?
3. Pick some of tools you most frequently use yourself (see Problems 1
and 2).
(a) Do they follow the Unix philosophy? In what way? In what way
do not follow - and why?
(b) Do they abide by the Principle of Least Astonishment? Do they
fail explicitly and predictably?
(c) Download and carefully read through their code. Is the code easy
or hard to read? If it is complex, is it also complicated? Does the
261
CHAPTER 9. BUILDING SCALABLE TOOLS 262
4. Analyze the rsync(1) utility and its behaviour based on how the
source and destination arguments are specified. Play with the differ-
ent command-line options and methods of specifying a directory as an
argument (path vs. path/ vs. path/.). Does the behaviour always
follow your expectations?
Bibliography
[1] Ruslan Meshenberg, Open Source at Netflix, The Netflix Tech Blog,
July, 2012; on the Internet at
http://techblog.netflix.com/2012/07/open-source-at-netflix-by-
ruslan.html (visited May 1, 2013)
[8] Jon Postel et al, RFC761 - Transmission Control Protocol, January 1980,
on the Internet at
https://tools.ietf.org/rfc/rfc761.txt (visited May 17, 2013)
Networking
265
Chapter 11
Security
266
Part III
267
268
269
Chapter 13
270
Chapter 14
271
Part IV
Meta Matter
272
273
274
Chapter 16
275
Chapter 17
276
Glossary
AFP Apple Filing Protocol A network file system / protocol used predomi-
nantly by Apple’s Mac OS versions (both “Classic” and OS X). Some-
times also referred to as “Apple Share”. 79
277
Glossary 278
BIND Berkeley Internet Name Domain The most widely used DNS soft-
ware, included in various Unix flavors since 4.3BSD. 36
BIOS Basic Input/Output System The basic firmware found on IBM com-
patible computers, loaded from read-only memory at system startup
and in charge of initializing some of the hardware components before
handing control over to the boot loader. 92–95, 129, 130, 143
EBS Elastic Block Store Amazon’s block-level cloud storage service. See
also: S3 85
EC2 Elastic Compute Cloud Part of Amazon’s Web Services, EC2 allows a
user to deploy virtual machines or “compute instances” on demand. vii,
12
GNU GNU’s Not Unix The GNU project was founded to provide free soft-
ware and aimed to provide a full operating system. After having
adopted the Linux kernel, GNU/Linux become commonly referred to
just as “Linux”, much to the chagrin of many GNU proponents. The
contributions of the GNU project, however, should not be underesti-
mated. See also: GPL v, 49, 132, 154
GPL GNU General Public License A widely used free software license orig-
inally written by Richard Stallman for the GNU Project. The license
aims to guarantee availability of the source code for the licensed soft-
ware as well as any derived works. 37, 49
LUN Logical Unit Number A numerical identifier for a distinct storage unit
or volume in a Storage Area Network. 82
MILNET Military Network The part of the ARPANET designated for un-
classified communications of the US Department of Defense. 35
NAS Network Attached Storage A storage model in which disk devices are
made available over the network by a file server to remote clients. The
file server is running an operating system and maintains the file system
on the storage media; client access the data over the network using
specific network file system protocols. See also: DAS, SAN 78, 80
Glossary 283
SCM Source Control Management The task of tracking changes during soft-
ware development. Often referred to as revision control, performed by
a Version Control System, (VCS). Examples include CVS, Subversion,
Perforce and Git. To avoid confusion with Software Configuration Man-
agement, we will use the acronym VCS when referring to source control
management. 181, 182
SOAP Simple Object Access Protocol A protocol used by many web services
to exchange structured information over HTTP using XML. See also:
REST 84
UFS Unix File System A widely adopted file system across different Unix
versions, implementing boot blocks, superblocks, cylinder groups, in-
odes and data blocks. Also called the Berkeley Fast File System (FFS).
32, 71, 106, 111
Image Attributions
288
APPENDIX A. IMAGE ATTRIBUTIONS 289
290
INDEX 291
UEFI, 129
UFS, 111
umask, 187
Unics, 29
unikernel, 38
Name Index
Ossana, Joe, 29
296