0% found this document useful (0 votes)
23 views

Basic Commands SPSS

This document provides an introduction to basic commands and functions in SPSS. It discusses how to open SPSS files, the different types of SPSS files, and how to enter and manage data. It also summarizes how to perform common analyses like creating graphs, exploring distributions, and examining relationships between variables. Key functions covered include opening data files, defining variables, navigating datasets, and generating scatterplots, boxplots, and other visual outputs.

Uploaded by

Ayesha Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Basic Commands SPSS

This document provides an introduction to basic commands and functions in SPSS. It discusses how to open SPSS files, the different types of SPSS files, and how to enter and manage data. It also summarizes how to perform common analyses like creating graphs, exploring distributions, and examining relationships between variables. Key functions covered include opening data files, defining variables, navigating datasets, and generating scatterplots, boxplots, and other visual outputs.

Uploaded by

Ayesha Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Introduction to Basic Commands in SPSS

Basics

There are 2 ways to open SPSS:

• From the Start Menu at the bottom left corner of the screen
• From the Shortcut to SPSS icon on the Desktop

There are 3 types of files in SPSS:

• Data files. These files are spreadsheets that contain the data on which
you will perform analyses. Data files have a ".sav" extension.
• Output files. These files have several functions: they show you a record
of the actions you have performed in SPSS, the results of your analyses,
and any errors the program may encounter. Output files have a ".spo"
extension.
• Syntax files. These files contain commands that may be run on the data
files. Since commands may be performed by pointing and clicking on the
data file, it is not necessary to use a syntax file. Syntax files have a ".sps"
extension.

Entering Data

There are 5 things that need to be identified for each variable:

• Name. What the variable is called. It can be named any combination of


up to 8 characters (letters or numbers). However, you may not use any
punctuation marks, or anything that is a command in SPSS (e.g. the
word "by").
• Type. What kind of data the variable is (numeric, string, etc.).
• Labels. There are 2 kinds of labels: variable labels and value labels.
Variable labels are used to provide a more complete description of the
variable. Value labels define what the numbers in your data stand for
(e.g. our variable named "sex" has the labels 1 = female, 2 = male).
• Missing Values. What if someone leaves something blank? For our
dataset, all missing values are represented by the number 9.
• Measure. How the variable is measured (scale, ordinal, or nominal).

For queries: jawed.naseem@gmail.com


Getting Around in SPSS

• To change any part of a variable, double-click on the horizontal gray bar


at the top of the spreadsheet. This brings you to the Define Variable
window.
• Use the arrow keys on the keyboard to move forward and backward.
• To go all the way to the end of the data, press <control> and the right
arrow key.
• To go all the way to the beginning of the data, press <control> and the
left arrow key

Q-Q plot

What it does: The normal Q-Q plot graphically compares the distribution of a
given variable to the normal distribution (represented by a straight line).

Where to find it: Under the Graphs menu, choose Q-Q. Move the variable(s)
you wish to plot into the Variables list. Following is an example of a normal Q-
Q plot for the variable that represents our ethnocentrism scale.First, you'll see
the Normal Q-Q plot.

For queries: jawed.naseem@gmail.com


The straight line represents what our data would look like if it were perfectly
normally distributed. Our actual data is represented by the squares plotted along
this line. The closer the squares are to the line, the more normally distributed
our data looks. Here, most of our points fall almost perfectly along the line.
This is a good indicator that our data is normally distributed.

What it does: Boxplots graphically display measures of dispersion for a given


variable--the range, median, and quartiles.

Where to find it: Under the Graphs menu, choose Boxplot. Click on Simple,
and click on Summaries for separate variables. Click Define. Move the variable
you would like to plot into the "boxes represent" area, then click OK.

SPSS Output

Following is an example of a boxplot for the variable Age.

The box represents the interquartile range. The line across the box indicates the
median.

For queries: jawed.naseem@gmail.com


The "whiskers" are lines that extend from the box to the highest and lowest
values, excluding outliers. The circles above our upper whisker represent the
outliers. The location of the box between the whiskers tells us how the data are
distributed. If the box is in the middle of the whiskers, the data are probably
more evenly distributed. If the box is closer to the lower whisker, the data are
probably skewed towards the lower end of the scale. If the box is closer to the
upper whisker, the data are probably skewed towards the higher end of the
scale.In our example, the range of ages excluding outliers is from about 18
years to 38 years. The median age is about 20. It looks like we have mostly
younger people in our sample. We can also see that we have several outliers

What it is: Scatterplots graphically compare the association between two


variables. Each point in the scatterplot represents one case in the data set.

Where to find it: Under the Graphs menu, choose Scatter, then Simple, and
click OK. Plot one scale or ratio variable on the X axis, and the other on the Y
axis. It does not matter which variable you put on either axis.Following is a
scatterplot of the variable Self-Esteem and

Anxiety.

In the graph, we see a trend in the data: as scores on Self-Esteem increase,


scores on Anxiety decrease. This is a negative association; as one variable

For queries: jawed.naseem@gmail.com


increases, the other decreases. If both variables move in the same direction (that
is, if one variable increases as the other increases, or if one variable decreases
as the other decreases), the relationship is positive.

The more tightly clustered the data points around a negatively or positively
sloped line, the stronger the association.

If the data points appear to be a cloud, there is no association between the two
variables

What it does: Graphically displays the 95% confidence interval of the mean
for groups of cases.

Where to find it: Under Graphs, choose Error Bar. Click on Simple, and click
on Summaries for Groups of Cases. Click Define. Move your dependent
variable into the box marked "Variable." Move the independent variable (or
grouping variable) into the box marked "Category Axis." Click OK.

Following is an example of an Error Bar plot for the confidence interval of the
mean level of prejudice for each year in college.

For queries: jawed.naseem@gmail.com


The boxes in the middle of the error bar represent the mean score. The
"whiskers" represent the 95% confidence interval. Since there is no overlap
between the confidence interval of first-years and seniors, these two means are
probably significantly different. Try running an ANOVA to find out!

What it is: The Chi Square Goodness of Fit test determines if the observed
frequencies are different from what we would expect to find (we expect to see
equal numbers in each group within a variable).

Where you find it: Under the Analyze menu, choose Nonparametric Tests,
then Chi Square. Move the variable you wish to look at into the "Test
Variables" box, then click OK.

Assumptions
-None of the expected values may be less than 1
-No more than 20% of the expected values may be less than 5

Hypotheses
Null: There are approximately equal numbers of cases in each group
Alternate: There are not equal numbers of cases in each group

Following is sample output of a Chi Square Goodness of Fit test. We wanted to


see if different socioeconomic groups are equally represented among Wellesley
students. First, we see a frequency table for each

group.

For queries: jawed.naseem@gmail.com


The "Observed N" tells us how many people are in each group. For example,
we see that there are 32 people who identified themselves as "middle-class."

The "Expected N" tells us how many people we expected to find in each group,
if there is no difference between groups. Here we see that we expected to find
18.7 people in each group.

The "Residual" tells us how different each group is compared to what we


expected to find. We see that there are 13.3 more people than we expected to
find in the "middle-class" group.

Next, we see the results of the Chi Square test.

We have a Chi Square value of 92.107 (which is very large!).

Our significance level is .000.

There is a significant difference (our significance level is less than .05).


Therefore, we can say that there are not equal numbers of Wellesley students
from each socioeconomic group. It appears that we have more upper-middle
class students than expected, and fewer poor students than expected.

What it does: The Chi Square Test of Independence tests the association
between 2 categorical variables.

For queries: jawed.naseem@gmail.com


Where to find it: Under the Analyze menu, choose Descriptive Statistics, then
choose Crosstabs. Move one variable into the box marked "rows" and the other
into the box marked "columns." It does not matter which variable you put in
either rows or columns. Under the "Statistics" button, be sure to check off Chi
Square. Under the "Cells" button, be sure to check off Expected (in order to
obtain the expected values for each cell).

Assumptions:
-None of the expected values may be less than 1
-No more than 20% of the expected values may be less than 5

Hypotheses:
Null: There is no association between the two variables.
Alternate: There is an association between the two variables.

SPSS Output

Following is sample output of a Chi Square Test of Independence. We wanted


to see if whether or not students have children is related to whether or not they
attend school full or part time. That is, are students with children more likely to
attend class part-time?

First we see the "Case Processing Summary."

There are a total of 230 people who participated in our study, but there is one
missing case. Thus, our valid number of cases (Valid N) is 229.

Next we see the contingency table.

For queries: jawed.naseem@gmail.com


We can see in each cell how many people are in each group. For example, we
see that there are 31 students who are full-time with children. There are 14
students who are part-time with no children.

We can look to the marginals, or the ends of each row or column, to find the
total number for that category. For example, there are 46 students who have
children. There are 200 students who are full-time.

The contingency table also gives the expected values for each cell. For
example, we expected to find 40.2 students who are full-time with children.

Finally, we see the results of our Chi Square Test of Independence.

We see that our Pearson Chi Square value is 20.704. We have 1 degree of
freedom. Our significance is .000.

For queries: jawed.naseem@gmail.com


There is a significant difference (our significance level is less than .05).
Therefore, we can say that the two variables are associated. Whether or not you
are a full- or part-time student depends on whether or not you have children.

What it does: The One-Way ANOVA compares the mean of one or more
groups based on one independent variable (or factor).

Where to find it: Under the Analyze menu, choose Compare Means, then
choose One-Way ANOVA. Move all dependent variables into the box labeled
"Dependent List," and move the independent variable into the box labeled
"Factor." Click on the button labeled "Options," and check off the boxes for
Descriptives and Homogeneity of Variance. Click on the box marked "Post
Hoc" and choose the appropriate post hoc comparison. Generally, for Psych
205 students, you can follow this rule: If there are equal numbers of cases in
each group, choose Tukey. If there are not equal numbers of cases in each
group, choose Bonferroni.

Assumptions:
-The dependent variable(s) is normally distributed. You can check for normal
distribution with a Q-Q plot.
-The two groups have approximately equal variance on the dependent variable.
You can check this by looking at the Levene's Test. See below.

Hypotheses:
Null: There are no significant differences between the groups' mean scores.
Alternate: There is a significant difference between the groups' mean scores.

SPSS Output

Following is a sample output of a One-Way ANOVA. We compared the mean


level of prejudice of first-years, sophomores, juniors, and seniors. Mean level
of prejudice is our dependent variable, and year in college is our independent
variable.

First, we see the descriptive statistics for each of the 4 years in college.

For queries: jawed.naseem@gmail.com


It looks like first-years have the highest mean level of prejudice, and seniors
have the lowest mean level of prejudice.

Next we see the results of the Levene's Test of Homogeneity of Variance.

This tells us if we have met our second assumption (the groups have
approximately equal variance on the dependent variable). If the Levene's Test is
significant (the value under "Sig." is less than .05), the two variances are
significantly different. If it is not significant (Sig. is greater than .05), the two
variances are not significantly different; that is, the two variances are
approximately equal. If the Levene's test is not significant, we have met our
second assumption. Here, we see that the significance is .435, which is greater
than .05. We can assume that the variances are approximately equal. We have
met our second assumption.

Finally, we see the results of our One-Way ANOVA:

For queries: jawed.naseem@gmail.com


Our F value is 3.110.

Our significance value is .027.

There is a significant difference between the two groups (the significance is


less than .05).

Therefore, we can say that there is a significant difference between first-years,


sophomores, juniors, and seniors on their level of prejudice.

We can look at the results of the Post-Hoc Comparisons to see exactly which
pairs of groups are significantly different.

For queries: jawed.naseem@gmail.com


SPSS notes a significant difference with an asterisk (*). We can see that first-
years and sophomores are significantly different than seniors

What it does: The One-Sample T Test compares the mean score of a sample
to a known value. Usually, the known value is a population mean.

Where to find it: Under the Analyze menu, choose Compare Means, then One-
Sample T Test. Move the dependent variable into the "Test Variables" box.
Type in the value you wish to compare your sample to in the box called "Test
Value."

Assumption:
-The dependent variable is normally distributed. You can check for normal
distribution with a Q-Q plot.

Hypotheses:
Null: There is no significant difference between the sample mean and the
population mean.
Alternate: There is a significant difference between the sample mean and the
population mean.

SPSS Output

Following is a sample output of a one-sample T test. We compared the mean


level of self-esteem for our sample of Wellesley college students to a known
population value of 3.9

First, we see the descriptive statistics.

The mean of our sample is 4.04, which is slightly higher than our population
mean of 3.9.

For queries: jawed.naseem@gmail.com


Next, we see the results of our one-sample T test:

Our T value is 2.288.

We have 112 degrees of freedom.

Our significance value is .024.

There is a significant difference between the two groups (the significance is


less than .05).

Therefore, we can say that our sample mean of 4.04 is significantly greater than
the population mean of 3.9.

What it does: The Paired Samples T Test compares the means of two
variables. It computes the difference between the two variables for each case,
and tests to see if the average difference is significantly different from zero.

Where to find it: Under the Analyze menu, choose Compare Means, then
choose Paired Samples T Test. Click on both variables you wish to compare,
then move the pair of selected variables into the Paired Variables box.

Assumption:
-Both variables should be normally distributed. You can check for normal
distribution with a Q-Q plot.

Hypothesis:
Null: There is no significant difference between the means of the two variables.

For queries: jawed.naseem@gmail.com


Alternate: There is a significant difference between the means of the two
variables.

SPSS Output

Following is sample output of a paired samples T test. We compared the mean


test scores before (pre-test) and after (post-test) the subjects completed a test
preparation course. We want to see if our test preparation course improved
people's score on the test.

First, we see the descriptive statistics for both variables.

The post-test mean scores are higher.

Next, we see the correlation between the two variables.

There is a strong positive correlation. People who did well on the pre-test also
did well on the post-test.

Finally, we see the results of the Paired Samples T Test. Remember, this test is
based on the difference between the two variables. Under "Paired Differences"
we see the descriptive statistics for the difference between the two variables.

For queries: jawed.naseem@gmail.com


To the right of the Paired Differences, we see the T, degrees of freedom, and
significance.

The T value = -2.171

We have 11 degrees of freedom

Our significance is .053

If the significance value is less than .05, there is a significant difference.


If the significance value is greater than. 05, there is no significant difference.

Here, we see that the significance value is approaching significance, but it is


not a significant difference. There is no difference between pre- and post-test
scores. Our test preparation course did not help!

For queries: jawed.naseem@gmail.com


What it does: The Independent Samples T Test compares the mean scores of
two groups on a given variable.

Where to find it: Under the Analyze menu, choose Compare Means, the
Independent Samples T Test. Move your dependent variable into the box
marked "Test Variable." Move your independent variable into the box marked
"Grouping Variable." Click on the box marked "Define Groups" and specify
the value labels of the two groups you wish to compare.

Assumptions:
-The dependent variable is normally distributed. You can check for normal
distribution with a Q-Q plot.
-The two groups have approximately equal variance on the dependent variable.
You can check this by looking at the Levene's Test. See below.
-The two groups are independent of one another.

Hypotheses:
Null: The means of the two groups are not significantly different.
Alternate: The means of the two groups are significantly different.

SPSS Output

Following is a sample output of an independent samples T test. We compared


the mean blood pressure of patients who received a new drug treatment vs.
those who received a placebo (a sugar pill).

First, we see the descriptive statistics for the two groups. We see that the mean
for the "New Drug" group is higher than that of the "Placebo" group. That is,
people who received the new drug have, on average, higher blood pressure than
those who took the placebo.

For queries: jawed.naseem@gmail.com


Next, we see the Levene's Test for Equality of Variances. This tells us if we
have met our second assumption (the two groups have approximately equal
variance on the dependent variable). If the Levene's Test is significant (the
value under "Sig." is less than .05), the two variances are significantly different.
If it is not significant (Sig. is greater than .05), the two variances are not
significantly different; that is, the two variances are approximately equal. If the
Levene's test is not significant, we have met our second assumption. Here, we
see that the significance is .448, which is greater than .05. We can assume that
the variances are approximately equal.

Finally, we see the results of the Independent Samples T Test. Read the TOP
line if the variances are approximately equal. Read the BOTTOM line if the
variances are not equal. Based on the results of our Levene's test, we know that
we have approximately equal variance, so we will read the top line.

Our T value is 3.796.

For queries: jawed.naseem@gmail.com


We have 10 degrees of freedom.

There is a significant difference between the two groups (the significance is


less than .05).

Therefore, we can say that there is a significant difference between the New
Drug and Placebo groups. People who took the new drug had significantly
higher blood pressure than those who took the placebo.

For queries: jawed.naseem@gmail.com


What it does: The Pearson R correlation tells you the magnitude and direction
of the association between two variables that are on an interval or ratio scale.

Where to find it: Under the Analyze menu, choose Correlations. Move the
variables you wish to correlate into the "Variables" box. Under the "Correlation
Coefficients," be sure that the "Pearson" box is checked off.

Assumption:
-Both variables are normally distributed. You can check for normal distribution
with a Q-Q plot.

Hypotheses:
Null: There is no association between the two variables.
Alternate: There is an association between the two variables.

SPSS Output

Following is a sample output of a Pearson R correlation between the Rosenberg


Self-Esteem Scale and the Assessing Anxiety Scale.

SPSS creates a correlation matrix of the two variables. All the information we
need is in the cell that represents the intersection of the two variables.

For queries: jawed.naseem@gmail.com


SPSS gives us three pieces of information:
-the correlation coefficient
-the significance
-the number of cases (N)

The correlation coefficient is a number between +1 and -1. This number tells us
about the magnitude and direction of the association between two variables.

The MAGNITUDE is the strength of the correlation. The closer the correlation
is to either +1 or -1, the stronger the correlation. If the correlation is 0 or very
close to zero, there is no association between the two variables. Here, we have
a moderate correlation (r = -.378).

The DIRECTION of the correlation tells us how the two variables are related.
If the correlation is positive, the two variables have a positive relationship (as
one increases, the other also increases). If the correlation is negative, the two
variables have a negative relationship (as one increases, the other decreases).
Here, we have a negative correlation (r = -.378). As self-esteem increases,
anxiety decreases

What it does: The Spearman Rho correlation tells you the magnitude and
direction of the association between two variables that are on an interval or
ratio scale.

Where to find it: Under the Analyze menu, choose Correlations. Move the
variables you wish to correlate into the "Variables" box. Under the "Correlation
Coefficients," be sure that the "Spearman" box is checked off.

Assumption:
-Both variables are NOT normally distributed. You can check for normal
distribution with a Q-Q plot. If the variables are normally distributed, use a
Pearson R correlation.

Hypotheses:
Null: There is no association between the two variables.
Alternate: There is an association between the two variables.

SPSS Output

For queries: jawed.naseem@gmail.com


Following is a sample output of a Spearman Rho correlation between the
Rosenberg Self-Esteem Scale and the Assessing Anxiety Scale.

SPSS creates a correlation matrix of the two variables. All the information we
need is in the cell that represents the intersection of the two variables.

SPSS gives us three pieces of information:


-the correlation coefficient
-the significance
-the number of cases (N)

The correlation coefficient is a number between +1 and -1. This number tells us
about the magnitude and direction of the association between two variables.

The MAGNITUDE is the strength of the correlation. The closer the correlation
is to either +1 or -1, the stronger the correlation. If the correlation is 0 or very
close to 0, there is no association between the two variables. Here, we have a
moderate correlation (r = -.392).

The DIRECTION of the correlation tells us how the two variables are related.
If the correlation is positive, the two variables have a positive relationship (as
one increases, the other also increases). If the correlation is negative, the two
variables have a negative relationship (as one increases, the other decreases).
Here, we have a negative correlation (r = -.392). As self-esteem increases,
anxiety decreases.

For queries: jawed.naseem@gmail.com


What it does: Simple Linear Regression tells you the amount of variance
accounted for by one variable in predicting another variable

Where to find it: Under the Analyze menu, choose Regression, then choose
Linear. Enter the dependent and independent variables in the appropriate
places, then click OK.

Assumptions:
-The data are linear (if you look at a Scatterplot of the data, and the data seem
to be moving in a straight line, that's a good indication that the data are linear).
-The dependent variable is normally distributed. You can check for normal
distribution with a Q-Q plot.

Hypotheses:
The hypotheses for regression focus on the slope of the regression line.
Null: The slope equals zero (there is no slope)
Alternate: The slope is not equal to zero

SPSS Output

There are 3 key pieces of information we need to look for in our regression
output:
-The R Square value
-The significance of the regression
-The values of the constant and slope

Following is sample output where Assessing Prejudice is the independent


variable and Comfort with Inter-Ethnic Situations is the dependent variable.

The R Square tells us how much of the variance of the dependent variable can
be explained by the independent variable. In this case, 15% of the variance in
Comfort can be explained by differences in levels of prejudice.

For queries: jawed.naseem@gmail.com


The significance in the ANOVA table tells us if this is a significant linear
regression. In this example, we do have a significant linear equation (p<.01).

Finally, the Coefficients table gives us all the information we need to plug into
our Y' = b0 + b1X equation (where b0 = the constant and b1 = the slope). In
this example, b0 = 5.474 and b1 = -.53.

Our equation is: Y' = 5.474 + -.53x

For queries: jawed.naseem@gmail.com


For queries: jawed.naseem@gmail.com

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy