Sampler
Sampler
When you buy an ebook through oreilly.com you get lifetime access to the book, and
whenever possible we provide it to you in five, DRM-free file formats—PDF, .epub,
Kindle-compatible .mobi, Android .apk, and DAISY—that you can use on the devices of
your choice. Our ebook files are fully searchable, and you can cut-and-paste and print
them. We also alert you when we’ve updated the files with corrections and additions.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online
editions are also available for most titles (http://my.safaribooksonline.com). For more infor-
mation, contact our corporate/institutional sales department: 800-998-9938 or
corporate@oreilly.com.
Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trade-
marks of O’Reilly Media, Inc. R in a Nutshell, the image of a harpy eagle, and related trade
dress are trademarks of O’Reilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in this book, and O’Reilly Media,
Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and
author assume no responsibility for errors or omissions, or for damages resulting from the use
of the information contained herein.
ISBN: 978-1-449-31208-4
[LSI]
1348585490
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Part I. R Basics
3. A Short R Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Basic Operations in R 19
Functions 21
Variables 22
iii
Introduction to Data Structures 24
Objects and Classes 27
Models and Formulas 28
Charts and Graphics 30
Getting Help 35
4. R Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
An Overview of Packages 37
Listing Packages in Local Libraries 38
Loading Packages 40
Loading Packages on Windows and Linux 40
Loading Packages on Mac OS X 40
Exploring Package Repositories 41
Exploring R Package Repositories on the Web 42
Finding and Installing Packages Inside R 42
Installing Packages From Other Repositories 45
Custom Packages 45
Creating a Package Directory 45
Building the Package 47
6. R Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Constants 63
Numeric Vectors 63
Character Vectors 64
Symbols 65
Operators 66
Order of Operations 67
iv | Table of Contents
Assignments 69
Expressions 69
Separating Expressions 69
Parentheses 70
Curly Braces 70
Control Structures 71
Conditional Statements 71
Loops 72
Accessing Data Structures 75
Data Structure Operators 75
Indexing by Integer Vector 76
Indexing by Logical Vector 78
Indexing by Name 79
R Code Style Standards 80
7. R Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Primitive Object Types 83
Vectors 86
Lists 87
Other Objects 88
Matrices 88
Arrays 89
Factors 89
Data Frames 91
Formulas 92
Time Series 94
Shingles 95
Dates and Times 95
Connections 96
Attributes 96
Class 99
9. Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
The Function Keyword 111
Table of Contents | v
Arguments 111
Return Values 113
Functions as Arguments 113
Anonymous Functions 114
Properties of Functions 115
Argument Order and Named Arguments 117
Side Effects 118
Changes to Other Environments 118
Input/Output 119
Graphics 119
vi | Table of Contents
Database Connection Packages 156
RODBC 157
DBI 167
TSDBI 172
Getting Data from Hadoop 172
Table of Contents | ix
Kernel Smoothing 436
Machine Learning Algorithms for Regression 437
Regression Tree Models 439
MARS 450
Neural Networks 455
Project Pursuit Regression 459
Generalized Additive Models 462
Support Vector Machines 464
x | Table of Contents
Cleaning Up Memory 516
Functions for Big Data Sets 517
Other Ways to Speed Up R 518
The R Byte Code Compiler 518
High-Performance R Binaries 520
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
Table of Contents | xi
Getting and Installing R
1
This chapter explains how to get R and how to install it on your computer.
R Versions
Today, R is maintained by a team of developers around the world. Usually, there is
an official release of R twice a year, in April and in October. I’ve checked the code
in this book against 2.15.1, but if you have an earlier or later version of R installed,
don’t worry.
R hasn’t changed that much in the past few years: usually there are some bug fixes,
some optimizations, and a few new functions in each release. There have been some
changes to the language, but most of these are related to somewhat obscure features
that won’t affect most users. (For example, the type of NA values in incompletely
initialized arrays was changed in R 2.5.) Don’t worry about using the exact version
of R that I used in this book; any results you get should be very similar to the results
shown in this book. If there are any changes to R that affect the examples in this
book, I’ll try to add them to the official errata online.
Additionally, I’ve given some example filenames below for the current release. The
filenames usually have the release number in them. So don’t worry if you’re reading
this book and don’t see a link for R-2.15.1-win32.exe but see a link for R-2.73.5-
win32.exe instead; just use the latest version and you should be fine.
3
a port management system like Yum to simplify the installation and updating pro-
cess; see “Linux and Unix Systems” on page 5.) Here’s how to find the binaries.
1. Visit the official R website. On the site, you should see a link to “Download.”
2. The download link actually takes you to a list of mirror sites. The list is organ-
ized by country. You’ll probably want to pick a site that is geographically close,
because it’s likely to also be close on the Internet, and thus fast. I usually use
the link for the University of California, Los Angeles, because I live in California.
3. Find the right binary for your platform and run the installer.
There are a few things to keep in mind, depending on what system you’re using.
Windows
Installing R on Windows is just like installing any other piece of software on Win-
dows, which means that it’s easy if you have the right permissions, difficult if you
don’t. If you’re installing R on your personal computer, this shouldn’t be a problem.
However, if you’re working in a corporate environment, you might run into some
trouble.
If you’re an “Administrator” or “Power User” on Windows XP, installation is
straightforward: double-click the installer and follow the on-screen instructions.
There are some known issues with installing R on Microsoft Windows Vista. In
particular, some users have problems with file permissions. Here are two approaches
for avoiding these issues:
• Install R as a standard user in your own file space. This is the simplest approach.
• Install R as the default Administrator account (if it is enabled and you have
access to it). Note that you will also need to install packages as the Administrator
user.
For a full explanation, see http://cran.r-project.org/bin/windows/base/rw-FAQ.html
#Does-R-run-under-Windows-Vista_003f.
Currently, CRAN releases only 32-bit builds of R for Microsoft Windows. These are
tested on 64-bit versions of Windows and should run correctly.
Installing R
ning Mac OS X 10.5 (Leopard) and higher. If you’re using an older operating system,
or an older computer, you can find older versions on the website that may work
better with your system.
You’ll find three different R installers for Mac OS X: a three-way universal binary
for Mac OS X 10.5 (Leopard) and higher, a legacy universal binary for Mac OS X
10.4 and higher with supplemental tools, and a legacy universal binary for Mac
OS X 10.4 and higher without supplemental tools. See the CRAN download site for
more details on the differences among these versions.
As with most applications, you’ll need to have the appropriate permissions on your
computer to install R. If you’re using your personal computer, you’re probably OK:
you just need to remember your password. If you’re using a computer managed by
someone else, you may need that person’s help to install R.
The universal binary of R is made available as an installer package; simply download
the file and double-click the package to install the application. The legacy R installers
are packaged on a disk image file (like most Mac OS X applications). After you
download the disk image, double-click it to open it in the finder (if it does not au-
tomatically open). Open the volume and double-click the R.mpkg icon to launch
the installer. Follow the directions in the installer, and you should have a working
copy of R on your computer.
You’ll be prompted for your password, and if you have sudo privileges, R should be
installed on your system. Later, you can update R by typing:
$ sudo yum update R.x86_64
And, if there is a new version available, your R installation will be upgraded to the
latest version.
For more information on using RPM, or other package management systems, see
your user documentation.