0% found this document useful (0 votes)
170 views5 pages

Data Interpretation With MS Excel

Excel is convenient for basic data analysis tasks like descriptive statistics and data manipulation but has limitations for statistical analysis. It is better suited for small datasets while statistical software packages can handle larger datasets and different types of analyses more efficiently. Excel is a versatile tool for other business and personal finance tasks but a statistical package is generally recommended over Excel for serious statistical analysis.

Uploaded by

akshay akshay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views5 pages

Data Interpretation With MS Excel

Excel is convenient for basic data analysis tasks like descriptive statistics and data manipulation but has limitations for statistical analysis. It is better suited for small datasets while statistical software packages can handle larger datasets and different types of analyses more efficiently. Excel is a versatile tool for other business and personal finance tasks but a statistical package is generally recommended over Excel for serious statistical analysis.

Uploaded by

akshay akshay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

DATA INTERPRETATION WITH MS EXCEL

 We used Excel to do some basic data analysis tasks to see whether it is a reasonable alternative to using a
statistical package for the same tasks. We concluded that Excel is a poor choice for statistical analysis
beyond textbook examples, the simplest descriptive statistics, or for more than a very few columns.
 Excel is convenient for data entry, and for quickly manipulating rows and columns prior to statistical
analysis. However when you are ready to do the statistical analysis, we recommend the use of a statistical
package such as SAS, SPSS, Stata, Systat or Minitab.
 Excel is probably the most commonly used spreadsheet for PCs. Newly purchased computers often arrive
with Excel already loaded. It is easily used to do a variety of calculations, includes a collection of statistical
functions, and a Data Analysis ToolPak. As a result, if you suddenly find you need to do some statistical
analysis, you may turn to it as the obvious choice. We decided to do some testing to see how well Excel
would serve as a Data Analysis application.
 Microsoft Excel allows you to manipulate, manage and analyze data helping assist in decision making and
creating efficiencies that will directly affect your bottom line. Whether you’re using it for business or to
help manage personal database and expenses Microsoft Excel gives you the right tools to enable you to
accomplish all your needs.

The advantages of Excel are:

 Easy and effective comparisons - With the powerful analytical tools included within Microsoft Excel you
have the ability to analyze large amounts of data to discover trends and patterns that will influence decisions.
Microsoft Excel’s graphing capabilities allows you to summarize your data enhancing your ability to organize
and structure your data.
 Powerful analysis of large amounts of data - Recent upgrades to the Excel spreadsheet enhance your ability
to analyze large amounts of data. With powerful filtering, sorting and search tools you are able to quickly and
easily narrow down the criteria that will assist in your decisions. Combine these tools with the tables, Pivot
Tables and Graphs you can find the information that you want quickly and easily even if you have hundreds of
thousands of data items. While you will need the latest technology to get the best out of Microsoft Excel it is
scalable and can be used at home on your low powered PC or at work on your high powered Laptop.
 Working Together - With the advent of the Excel Web App you can now work on spreadsheets
simultaneously with other users. The ability to work together enhances your ability to streamline processes
and allows for ‘brainstorming’ sessions with large sets of data – the collaboration tools allow you to get the
most out of the sharing capabilities of Microsoft Excel. The added bonus is that as the Excel Worksheet is web
based you can collaborate anywhere – you are no longer tied to your desk but can work on spreadsheets on the
go – this is ideal for a businessman on the go.
 Microsoft Excel Mobile & iPad Apps - With the advent of the tablet and the smart phone it is now possible
to take your worksheets to a client or a meeting without having to bring along your Laptop. The power of
these mobile devices now allows you to manipulate data and update your spreadsheets and then view the
spreadsheets immediately on your phone or tablet.

Uses of Microsoft Excel:

 Sort: You can sort your Excel data on one column or multiple columns. You can sort in ascending or
descending order.
 Filter: Filter your Excel data if you only want to display records that meet certain criteria.
 Conditional Formatting: Conditional formatting in Excel enables you to highlight cells with a certain color,
depending on the cell's value.
 Charts: A simple Excel chart can say more than a sheet full of numbers. As you'll see, creating charts is
very easy.
 Pivot Tables: Pivot tables are one of Excel's most powerful features. A pivot table allows you to extract
the significance from a large, detailed data set.
 Tables: Tables allow you to analyze your data in Excel quickly and easily.
 What-If Analysis: What-If Analysis in Excel allows you to try out different values (scenarios) for
formulas.
 Solver: Excel includes a tool called solver that uses techniques from the operations research to find optimal
solutions for all kind of decision problems.
 Analysis ToolPak: The Analysis ToolPak is an Excel add-in program that provides data analysis tools for
financial, statistical and engineering data analysis.

Further, Microsoft Excel is a very versatile tool and can be used for almost anything that you can imagine:
 Agendas
 Budgets
 Calendars
 Cards
 Charts and Diagrams
 Financial Tools
 Flyers
 Forms
 Inventories
 Invoices
 Lists and to-do checklists
 Planners
 Plans and proposals
 Reports
 Schedules
 Timesheets

Quantitative Data Analysis with Excel

Descriptive Statistics

The quickest way to get means and standard deviations for a entire group is using Descriptive in the Data Analysis
tools. You can choose several adjacent columns for the Input Range (in this case the X and Y columns), and each
column is analyzed separately. The labels in the first row are used to label the output, and the empty cells are
ignored. If you have more, non-adjacent columns you need to analyze, you will have to repeat the process for each
group of contiguous columns. The procedure is straightforward, can manage many columns reasonably efficiently,
and empty cells are treated properly.

To get the means and standard deviations of X and Y for each treatment group requires the use of Pivot Tables
(unless you want to rearrange the data sheet to separate the two groups). After selecting the (contiguous) data range,
in the Pivot Table Wizard's Layout option, drag Treatment to the Row variable area, and X to the Data area. Double
click on “Count of X” in the Data area, and change it to Average. Drag X into the Data box again, and this time
change Count to StdDev. Finally, drag X in one more time, leaving it as Count of X. This will give us the Average,
standard deviation and number of observations in each treatment group for X. Do the same for Y, so we will get the
average, standard deviation and number of observations for Y also. This will put a total of six items in the Data box
(three for X and three for Y). As you can see, if you want to get a variety of descriptive statistics for several
variables, the process will get tedious.

A statistical package lets you choose as many variables as you wish for descriptive statistics, whether or not they are
contiguous. You can get the descriptive statistics for all the subjects together, or broken down by a categorical
variable such as treatment. You can select the statistics you want to see once, and it will apply to all variables
chosen.
Correlations

Using the Data Analysis tools, the dialog for correlations is much like the one for descriptives - you can choose
several contiguous columns, and get an output matrix of all pairs of correlations. Empty cells are ignored
appropriately. The output does NOT include the number of pairs of data points used to compute each correlation
(which can vary, depending on where you have missing data), and does not indicate whether any of the correlations
are statistically significant. If you want correlations on non-contiguous columns, you would either have to include
the intervening columns, or copy the desired columns to a contiguous location.

A statistical package would permit you to choose non-contiguous columns for your correlations. The output would
tell you how many pairs of data points were used to compute each correlation, and which correlations are
statistically significant.

Two-Sample T-test

This test can be used to check whether the two treatment groups differ on the values of either X or Y. In order to do
the test you need to enter a cell range for each group. Since the data were not entered by treatment group, we first
need to sort the rows by treatment. Be sure to take all the other columns along with treatment, so that the data for
each subject remains intact. After the data is sorted, you can enter the range of cells containing the X measurements
for each treatment. Do not include the row with the labels, because the second group does not have a label row.
Therefore your output will not be labeled to indicate that this output is for X. If you want the output labeled, you
have to copy the cells corresponding to the second group to a separate column, and enter a row with a label for the
second group. If you also want to do the t-test for the Y measurements, you �ll need to repeat the process. The
empty cells are ignored, and other than the problems with labeling the output, the results are correct.

A statistical package would do this task without any need to sort the data or copy it to another column, and the
output would always be properly labeled to the extent that you provide labels for your variables and treatment
groups. It would also allow you to choose more than one variable at a time for the t-test (e.g. X and Y).

Paired t-test

The paired t-test is a method for testing whether the difference between two measurements on the same subject is
significantly different from 0. In this example, we wish to test the difference between X and Y measured on the
same subject. The important feature of this test is that it compares the measurements within each subject. If you scan
the X and Y columns separately, they do not look obviously different. But if you look at each X-Y pair, you will
notice that in every case, X is greater than Y. The paired t-test should be sensitive to this difference. In the two cases
where either X or Y is missing, it is not possible to compare the two measures on a subject. Hence, only 8 rows are
usable for the paired t-test.

When you run the paired t-test on this data, you get a t-statistic of 0.09, with a 2-tail probability of 0.93. The test
does not find any significant difference between X and Y. Looking at the output more carefully, we notice that it
says there are 9 observations. As noted above, there should only be 8. It appears that Excel has failed to exclude the
observations that did not have both X and Y measurements. To get the correct results copy X and Y to two new
columns and remove the data in the cells that have no value for the other measure. Now re-run the paired t-test. This
time the t-statistic is 6.14817 with a 2-tail probability of 0.000468. The conclusion is completely different!

Of course, this is an extreme example. But the point is that Excel does not calculate the paired t-test correctly when
some observations have one of the measurements but not the other. Although it is possible to get the correct result,
you would have no reason to suspect the results you get unless you are sufficiently alert to notice that the number of
observations is wrong. There is nothing in online help that would warn you about this issue.

Interestingly, there is also a TTEST function, which gives the correct results for this example. Apparently the
functions and the Data Analysis tools are not consistent in how they deal with missing cells. Nevertheless, I cannot
recommend the use of functions in preference to the Data Analysis tools, because the result of using a function is a
single number - in this case, the 2-tail probability of the t-statistic. The function does not give you the t-statistic
itself, the degrees of freedom, or any number of other items that you would want to see if you were doing a
statistical test.
A statistical packages will correctly exclude the cases with one of the measurements missing, and will provide all
the supporting statistics you need to interpret the output.

Cross tabulation and Chi-Squared Test of Independence

Our final task is to count the two outcomes in each treatment group, and use a chi-square test of independence to test
for a relationship between treatment and outcome. In order to count the outcomes by treatment group, you need to
use Pivot Tables. In the Pivot Table Wizard's Layout option, drag Treatment to Row, Outcome to Column and also
to Data. The Data area should say "Count of Outcome" – if not, double-click on it and select "Count". If you want
percents, double-click "Count of Outcome", and click Options; in the “Show Data As” box which appears, select "%
of row". If you want both counts and percents, you can drag the same variable into the Data area twice, and use it
once for counts and once for percents.

Getting the chi-square test is not so simple, however. It is only available as a function, and the input needed for the
function is the observed counts in each combination of treatment and outcome (which you have in your pivot table),
and the expected counts in each combination. Expected counts? What are they? How do you get them? If you have
sufficient statistical background to know how to calculate the expected counts, and can do Excel calculations using
relative and absolute cell addresses, you should be able to navigate through this. If not, you�re out of luck.

Assuming that you surmounted the problem of expected counts, you can use the Chitest function to get the
probability of observing a chi-square value bigger than the one for this table. Again, since we are using functions,
you do not get many other necessary pieces of the calculation, notably the value of the chi-square statistic or its
degrees of freedom.

No statistical package would require you to provide the expected values before computing a chi-square test of
indepencence. Further, the results would always include the chi-square statistic and its degrees of freedom, as well
as its probability. Often you will get some additional statistics as well.

The remaining analyses were not done on this data set, but some comments about them are included for
completeness.

Simple Frequencies

You can use Pivot Tables to get simple frequencies. Using Pivot Tables, each column is considered a separate
variable, and labels in row 1 will appear on the output.

Another possibility is to use the Frequencies function. The main advantage of this method is that once you have
defined the frequencies function for one column, you can use Copy/Paste to get it for other columns. First, you will
need to enter a column with the values you want counted (bins). If you intend to do the frequencies for many
columns, be sure to enter values for the column with the most categories. e.g., if 3 columns have values of 1 or 2,
and the fourth has values of 1,2,3,4, you will need to enter the bin values as 1,2,3,4. Now select enough empty cells
in one column to store the results - 4 in this example, even if the current column only has 2 values. Next choose
Insert/Function/Statistical/Frequencies on the menu. Fill in the input range for the first column you want to count
using relative addresses (e.g. A1:A100). Fill in the Bin Range using the absolute addresses of the locations where
you entered the values to be counted (e.g. $M$1:$M$4). Click Finish. Note the box above the column headings of
the sheet, where the formula is displayed. It start with "= FREQUENCIES(". Place the cursor to the left of the =
sign in the formula, and press Ctrl-Shift-Enter. The frequency counts now appear in the cells you selected.

To get the frequency counts of other columns, select the cells with the frequencies in them, and choose Edit/Copy on
the menu. If the next column you want to count is one column to the right of the previous one, select the cell to the
right of the first frequency cell, and choose Edit/Paste (ctrl-V). Continue moving to the right and pasting for each
column you want to count. Each time you move one column to the right of the original frequency cells, the column
to be counted is shifted right from the first column you counted.

If you want percents as well, you’ll have to use the Sum function to compute the sum of the frequencies, and define
the formula to get the percent for one cell. Select the cell to store the first percent, and type the formula into the
formula box at the top of the sheet - e.g. = N1*100/N$5 - where N1 is the cell with the frequency for the first
category, and N5 is the cell with the sum of the frequencies. Use Copy/Paste to get the formula for the remaining
cells of the first column. Once you have the percents for one column, you can Copy/Paste them to the other
columns. You’ll need to be careful about the use of relative and absolute addresses! In the example above, we used
N$5 for the denominator, so when we copy the formula down to the next frequency on the same column, it will still
look for the sum in row 5; but when we copy the formula right to another column, it will shift to the frequencies in
the next column.

Finally, you can use Histogram on the Data Analysis menu. You can only do one variable at a time. As with the
Frequencies function, you must enter a column with "bin" boundaries. To count the number of occurrences of 1 and
2, you need to enter 0,1,2 in three adjacent cells, and give the range of these three cells as the Bins on the dialog
box. The output is not labeled with any labels you may have in row 1, nor even with the column letter. If you do
frequencies on lots of variables, you will have difficulty knowing which frequency belongs to which column of data.

Linear Regression

Since regression is one of the more frequently used statistical analyses, we tried it out even though we did not do a
regression analysis for this example. The Regression procedure in the Data Analysis tools lets you choose one
column as the dependent variable, and a set of contiguous columns for the independents. However, it does not
tolerate any empty cells anywhere in the input ranges, and you are limited to 16 independent variables. Therefore, if
you have any empty cells, you will need to copy all the columns involved in the regression to new columns, and
delete any rows that contain any empty cells. Large models, with more than 16 predictors, cannot be done at all.

Analysis of Variance

In general, the Excel's ANOVA features are limited to a few special cases rarely found outside textbooks, and
require lots of data re-arrangements.

One-way ANOVA

Data must be arranged in separate and adjacent columns (or rows) for each group. Clearly, this is not conducive to
doing 1-ways on more than one grouping. If you have labels in row 1, the output will use the labels.

Two-Factor ANOVA Without Replication

This only does the case with one observation per cell (i.e. no Within Cell error term). The input range is a
rectangular arrangement of cells, with rows representing levels of one factor, columns the levels of the other factor,
and the cell contents the one value in that cell.

Two-Factor ANOVA with Replicates

This does a two-way ANOVA with equal cell sizes. Input must be a rectangular region with columns representing
the levels of one factor, and rows representing replicates within levels of the other factor. The input range MUST
also include an additional row at the top, and column on the left, with labels indicating the factors. However, these
labels are not used to label the resulting ANOVA table. Click Help on the ANOVA dialog for a picture of what the
input range must look like.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy