Quantitative Data Analysis: February 2010
Quantitative Data Analysis: February 2010
net/publication/230041879
CITATIONS READS
46 4,518
4 authors, including:
41 PUBLICATIONS 2,062 CITATIONS
UNSW Sydney
937 PUBLICATIONS 99,879 CITATIONS
SEE PROFILE
SEE PROFILE
Timothy Slade
The University of Sydney
261 PUBLICATIONS 11,891 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
RADAR Study: Understanding the pathways to alcohol use disorder View project
All content following this page was uploaded by Timothy Slade on 11 January 2019.
http://www.soc.surrey.ac.uk/sru/ 1
social research UPDATE
Obtaining R and other SPSS R Integration Package like syntax colouring thereby making
Programs If you work with SPSS 16 or above, your code easier to read and to write.
download the SPSS R Integration You can now open an R file in the
In order to obtain R, go to the upper window or type commands
Package from http://www.spss.
Comprehensive R Archive Network directly into it before passing the
com/devcentral, extract the plugin
(CRAN http://www.cran.r-project. commands from the upper to the
from the folder with Winzip (http://
org) and download the binary which lower window by selecting from a
www.winzip.com) or the equivalent
is specific to your machine. If you range of buttons on the XEmacs
for your platform and follow the
have a Windows machine, download menu bar.
installation instructions. The package
R-a.b.c-win32.exe from the base
is exceptionally powerful and includes Cameron et al (2004) give a
folder from one of the web sites
the ability to generate pivot table comprehensive introduction to
near you (a.b.c in the name of the
output and to write results to new Emacs, much of which is relevant to
R executable corresponds to the
SPSS data sets. Go to the SPSS menu the XEmacs editor.
version number). In order to install,
bar and select Help/Programmability
double click on the executable and With the exception of SPSS, all the
for additional information.
follow the instructions. Pre-compiled software referred to in the article is
binaries are also available for GNU/ XEmacs free.
Linux machines running Debian, An extensible editor like XEmacs may
SUSE, Ubuntu, and Redhat although Using R: An Example
also be beneficial if you intend to
installation instructions may vary To use R, double click on the R icon
write lengthy R code. You can obtain
depending on the Linux distribution. on your desktop or type R at the
XEmacs itself, the Emacs Speaks
A universal binary can be obtained command line prompt. If you are
Statistics Package (ESS) in order to
for users of Mac OS X. more accustomed to manipulating
extend XEmac’s capabilities and a
and analyzing your data by using
configuration file. These applications,
R Commander a series of menus, the R interface
together with information on setting
seems sparse. This is because
up XEmacs for Windows, may be
Although the R learning curve can be experienced users drive R by entering
obtained by referring to http://
steep because R is driven from the instructions in the area after the
socserv.mcmaster.ca/jfox/Books/
console, it is possible to work with “greater than” sign or prompt which
Companion/ESS/. XEmacs binaries
R by using the R Commander GUI. you will see on the R Console.
for Linux and for Mac OS X can be
An installation guide for Windows,
obtained from http://www.xemacs. It is possible to enter data directly
GNU/Linux and Mac OS X is available
org and http://www.macports.org into R’s Data Editor, but it is more
from http://socserv.mcmaster.
respectively and you should follow likely that you will be working with
ca/jfox/Misc/Rcmdr/installation-notes.
the installation instructions for your data held in another format. This
html. The R Commander interface
operating system. Update deals with reading and
should be more intuitive for users
analyzing data which is stored in
without experience of command XEmacs may be thought of as a
SPSS ‘sav’ format, but you can import
line driven applications. You will complete working environment
data in text or comma separated
see that the interface is divided inside which you can write and edit
value (CSV) formats and you should
into two main windows: the script documents and issue and execute
type ?read.table or ?read.csv at the
window for R commands and commands without having to
R prompt for further information. R
scripts, and the output window for open different software to perform
can read data in SPSS format, but
the results of your commands or separate tasks. XEmacs uses modes
it is necessary to extend R’s base
scripts. R commands are executed which means that some of its
capability by downloading and
by pressing the Submit button and features may be toggled on or off
installing the foreign package. In
you can choose sections of your depending on the work done. For
order to download the package from
code to submit by highlighting our purposes, the most relevant
CRAN, select Packages/Install
code with your mouse. If your R mode when working with R is the
Package(s) from the R menu bar,
commands are incorrect, there is a ESS[S] mode which will be loaded
your nearest CRAN mirror and the
third Messages window outputting automatically in the upper of two
“foreign” package. Then load the
helpful information. R Commander windows if opening an R file with
package by selecting Packages/
should be very intuitive for SPSS users file name extension r. The lower
Load Package from the R menu
but if you need more information go window is in iESS mode and within
bar. If your copy of R does not
to the R Commander menu bar and the window you should see the R
come bundled with a GUI, type
select Help/Introduction to the R prompt. The ESS Mode is useful
install.packages(“foreign”) and
Commander. because it toggles on special features
2 http://www.soc.surrey.ac.uk/sru/
social research UPDATE
library(foreign) at the R prompt in We use the attach command to attach the sample dataset to R’s path so
order to load the package. that we can refer to variables within the dataset without a full pathname.
names(sample) outputs a list of variables within the sample dataset.
If you check which packages are
available on your system by typing attach(sample)
search() at the prompt, you should names(sample)
“Q1g” “Q10a” “Q13”
see package:foreign in the list.
We are now in a position to read an Q1g asks respondents to rate their response to the statement “I feel I am
SPSS ‘sav’ file into R and conduct happy helping people in my job” on the scale Strongly Agree, Agree, Disagree,
an initial exploration of some data. Strongly Disagree.
In order to illustrate the process, we
Q10a asks respondents to rate their commitment to their job on a ten point
are going to use data obtained from
scale where 10 represents “as committed as I could be” and 1 represents “not
Praxis Care’s Staff Survey which is a
at all committed.”
triennial omnibus survey of health
professionals who primarily work Q13 refers to the respondents’ main working environment, divided into two
with people with mental health categories: Central Office or administrative staff and Scheme or clinical staff
problems and/or learning difficulties. who work with clients.
Commands which you have to type If we want to obtain a frequency count of the number of workers by working
are in bold and output, where environment, we use the table command
generated, is shown in thus. table (Q13)
Our file may be read into R by using Central office as base Scheme as base
43 398
the read.spss function.
We can produce a barchart with the barplot command and a number of
staffdata<-read.spss(‘\\path_to_ options which create a title, a labelled y axis and hatched bars enclosed within
file\\staffdata.sav’, to.data. a blue border. The barplot will be saved to the specified directory as a PNG
frame=TRUE, reencode=NA, use. file.
missings=99)
png(filename=’\\path_to_file\\barQ13.png’)
barplot(table(Q13), main=’Where do you work?’, ylab=’Frequency’,
The read.spss command loads a file
density=c(10,20), border=’blue’, ylim=c(0,400))
with the name and pathname which dev.off()
we have specified into a dataset
called “staffdata”. (The way in which The table command may also be used to cross tabulate two variables by
you describe the path to the file will enclosing both variable names within parentheses:
differ depending on your operating
table(Q1g, Q13)
system). In addition to the path to Q13
the file in the parentheses, there are Q1g Central office as base Scheme as base
three further options. We first set strongly disagree 0 2
the to.data.frame option to TRUE disagree 1 11
in order to create a data set. Missing agree 30 188
strongly agree 11 196
values in the SPSS file have been
coded to 99 but R assigns missing
To create a table which is easier to read, we can calculate column percentages
values to NA so we have set the last
with the prop.table command where the second argument to prop.table
two options so that R will remap any
may be ‘1’ for ‘proportions of row totals’ or ‘2’ for ‘proportions of column
instance of 99 in the SPSS file to NA
totals.’
in the new R file.
We will now create a second colpercents<-table(Q1g,Q13)
dataset with the subset command prop.table(colpercents,2)*100
Q13
by selecting three variables from
Q1g Central office as base Scheme as base
staffdata. The second dataset will be strongly disagree 0.0000000 0.5037783
called “sample”. disagree 2.3809524 2.7707809
sample<-subset(staffdata, agree 71.4285714 47.3551637
select=c(Q1g,Q10a,Q13)) strongly agree 26.1904762 49.3702771
http://www.soc.surrey.ac.uk/sru/ 3
social research UPDATE
We may also generate frequency counts for the continuous variable Q10a and In order to end the session, we now
list the minimum, first quartile, median, mean, third quartile and maximum remove our created variables, detach
values by using the summary command before charting the distribution of datasets so that the dataset variables
Q10a with the hist and boxplot commands are no longer accessible to R and
table(Q10a) exit the application by using the q()
Q10a command.
1 2 3 4 5 6 7 8 9 10 detach(sample)
4 1 6 6 10 13 33 86 101 187 rm(list=ls())
summary(Q10a) q()
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 8.000 9.000 8.667 10.000 10.000
Conclusion
png(filename=’\\path_to_file\\histQ10a.png’) R may seem to be a difficult
hist(Q10a,main=’Histogram of Job Commitment Scores’, application to get to grips with but
xlab=’Commitment’, ylim=c(0,200), col=c(‘pink’)) any effort expended will be amply
dev.off()
rewarded by R’s power and its ability
to work seamlessly with a number of
The hist command produces a histogram for Q10a with a title, pink columns
equally powerful applications.
and a labelled x axis.
png(filename=’\\path_to_file\\boxQ10a.png’) References
boxplot((Q10a),main=’Boxplot of Job Commitment Scores’,xlab=’Commitm
Cameron, D., Rosenblatt, B., Raymond,
ent’,col=c(‘lightblue’),horizontal=TRUE)
dev.off() E. (2004) Learning GNU Emacs. O’Reilly
and Associates, Sebastopol, CA.
Hornik, K, (2008) The R FAQ. http://
CRAN.R-project.org/doc/FAQ/R-FAQ.
html.
Murrell, P (2005) R Graphics. Chapman
and Hall, Boca Raton, FL.
R Development Core Team (2007) R:
A Language and Environment for
Statistical Computing. R Foundation for
Statistical Computing, Vienna, Austria.
social research UPDATE (ISSN: 1360-7898) is published by the Department of Sociology, University of Surrey
Guildford GU2 7XH, United Kingdom. tel: +44 (0)1483689450
Edited by Nigel Gilbert (n.gilbert@surrey.ac.uk)
Spring 2009 © University of Surrey
View publication stats