0% found this document useful (0 votes)
17 views46 pages

01 WEEK1 EA Statistics Part1

The document outlines a statistics course led by Tibor Takács and Esteban Muñoz, detailing course structure, assessment methods, and grading criteria. It includes a comprehensive class schedule covering various statistical concepts and techniques, along with project work requirements and deadlines. Recommended literature and data sources for project work are also provided.

Uploaded by

Alma Cseh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views46 pages

01 WEEK1 EA Statistics Part1

The document outlines a statistics course led by Tibor Takács and Esteban Muñoz, detailing course structure, assessment methods, and grading criteria. It includes a comprehensive class schedule covering various statistical concepts and techniques, along with project work requirements and deadlines. Recommended literature and data sources for project work are also provided.

Uploaded by

Alma Cseh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Statistics

Week 1
Part 1
Some contact information

Course Leader:
• Tibor Takács (takacs.tibor@uni-corvinus.hu)

Seminar Teacher:
• Esteban Muñoz (esteban.munoz@uni-corvinus.hu)

Room:
• E.1.107

Office:
• C707/A (Laboratory for Networks, Technology & Innovation –
NETI Lab)
• Thursday: 09:00 – 12:00
Information, requirements
Students’ achievement in the course are assessed based on two
compulsory exams and a project work as follows:

I. Midterm 1 (35 points, covers first quarter, week 1-7)

Week 8: 07-11 April

II. Midterm 2 (35 points, covers second quarter, week 8-11)

Week 12: 19-23 May

III. Project work 20 mins presentation/group document, 30 points).


Participation in the seminar and project work is compulsory to get
any grade!
Week 12: 19-23 May
Information, requirements
Grading scale:

000 – 054: Fail


055 – 064: Pass
065 – 074: Satisfactory
075 – 089: Good
090 – 100: Excellent

You can get bonus points on weekly seminar quizzes.

Recommended literature
• Essentials of Business Analytics, Second Edition, 2017, Cengage
Learning. Jeffrey D. Camm, James J. Cochran, Michael J. Fry, Jeffrey W.
Ohlmann, David R. Anderson, Dennis J. Sweeney, Thomas A. Williams.
• Statistics for Business and Economics, 2011, South Western College
Publishing. David R. Anderson, Dennis J. Sweeney, Thomas A. Williams.
Detailed class schedule
Week 1: 17-21 February

Qualitative and quantitative data. Frequency distribution table: frequencies, relative frequencies,
cumulative frequencies, cumulative relative frequencies. Bar chart. Pie chart. Dot Plot. Histogram.
Ogive. (Unit I, 1-5)

Week 2: 24-28 February

Ratios, proportions, and rates. Types of ratios. Ratios are used for temporal, geographic, and across-
group comparisons. Chain rule for temporal ratios. Proportions. Rates. Comparison of rates in absolute
and relative terms: differences, ratios, percentage changes.

Measures of central tendency. Mean, mode, median, percentiles. Exploratory data analysis. Box-plot
diagram. How to use the box-plot diagram: depicting the distribution, assessing its range of dispersion,
and detecting outliers. (Unit I, 6-9)

Week 3: 3-7 March

Measures of variability: range, interquartile range, variance, standard deviation, mean absolute
deviation, coefficient of variation. Distribution shape and properties. Normal distribution as a point of
reference. Measures of distribution shape: skewness, kurtosis. Standardization. Use of z-scores for
detecting outliers.

Cross-tabulation for qualitative data. Row and column percentages. Joint percentages. Analysis of
heterogeneous populations with graphical tools: clustered and stacked bar chart. Association between
qualitative data: Cramer’s V. (Unit I, 10-13)
Detailed class schedule
Week 4: 10-14 March

Relationship between a qualitative and a quantitative variable: between-to-total variance ratio (Eta-
squared), correlation ratio. The linear relationship between quantitative data: covariance and correlation.
Rank correlation.

Scatter-plot diagram. Grouped scatter-plot diagram. Fitting trend lines to scatter-plot diagrams. Simple
linear regression analysis, coefficients and interpretation, coefficient of determination, sample correlation
coefficient. (Unit I, 14-16)

Week 5: 17-21 March

Sampling, sample surveys. Survey, errors, sampling methods. Representativeness. Introduction to


statistical inference. Point estimation. Sampling error, sampling distributions. Standard normal
distribution table, sampling distribution of the sample mean, sampling distribution of the sample
proportion. Effect of the sample size. (Unit II, 1-4)

Week 6: 24-28 March

Introduction to interval estimation. The margin of error. Interval estimation of a population mean. Interval
estimation of a population proportion. t distribution. Determining the necessary sample size. (Unit II, 5-7)
Detailed class schedule
Week 7: 31 March - 4 April

Interval estimation of the difference between two population means. Independent and matched samples.
Interval estimation of the difference between two population proportions. Interval estimation of
population variance. Chi-square distribution. (U II, 8-10)

Midterm exam 1

Week 8: 7-11 April

Introduction to hypothesis testing. Developing null and alternative hypotheses. Type I and Type II errors.
Lower-tailed, upper-tailed, and two-tailed tests. Approaches to hypothesis testing. Decision rule. P-
value. z test about a population mean.

t test about a population mean. z test about a population proportion. Introduction to non-parametric test
procedures. Binomial test about a population proportion. Sign test about a population mean. (U II, 23-
25)

14 Apr-18 Apr Project week


21 Apr- 25 Apr Spring holiday

Week 9: 28 April - 2 May


1 May is a national holiday – no classes are held.
All seminars of the course are planned for Thursdays, i.e., there are no seminars this week
Detailed class schedule
Week 10: 5-9 May

z and t tests about the difference between two population means. Welch's d test. t tests with
independent and matched samples. z test about the difference between two population proportions.
Hypothesis testing and decision making. Power of the test. Calculating the probability of type ii error.
Determining the necessary sample size. (U II, 11-13) t and F tests of regression. (U II 23-25)

Non-parametric tests for independent and matched samples. Mann-Whitney's u test about the difference
between two population means. Matched samples binomial test for stochastic monotonicity. Tests about
population variances. F distribution. (U II, 14-16)

Week 11: 12-16 May

Introduction to Chi-Square Tests. The goodness of fit test for multinomial population proportions, the
goodness of fit test for normal distribution, and the test of independence. Fisher's exact test. Introduction
to Analysis of Variance (ANOVA). Testing for the equality of k population means. ANOVA Table. (U II, 17-
20)

Index numbers. Price relatives. Weighted aggregate price indexes: Laspeyres, Paasche, Fisher.
Calculation of aggregate price indexes as weighted averages. Practical use of price indexes in
economics and business. How to deflate nominal monetary values using a price index. Quantity
indexes. Quantity relatives. Weighted aggregate quantity indexes: Laspeyres, Paasche, Fisher.
Decomposing the change in nominal monetary values as the product of aggregate price and quantity
indexes. (U II, 21-22)
Detailed class schedule
Week 12: 19-23 May

Midterm exam 2
Project presentations

Project papers

• The project papers should be submitted in written form, and the results should be
presented in Week 12.
• The project teams include 4 or 5 students.
• The instructor shall approve the chosen databases until 31 March.
• Each team must develop an oral presentation and a written form paper (3 pages per
student, incl. tables and figures).
• The written paper must have a standard structure (introduction, problem statement,
data and methodology, discussion of results, and references).
• The submission deadline for the papers is 20 May 2025.
Information on Project work
30 points can be attained by developing and presenting a research paper Small work
groups should be created, within which the students must collaborate, submit a research
paper and present the main results of their research.

Steps of the project work:

1. Formulating a relevant research question.


2. Each group should select a cross-sectional dataset to which the group will apply the
statistical tools and methods they have got acquainted during the course.

Some possible data sources:

• World bank, IMF, OECD, Eurostat, National Statistical Services, tradingeconomics.com


• The instructor shall approve the chosen databases until 31 March. In case of not
choosing a dataset by this deadline will result in losing the opportunity to obtain these
30 points.
Information on Project work
Analysis:

• Descriptive analysis of each variable in your dataset (measure of location, measure of


variability, shape of the distribution, outliers, etc.).
• Association analysis (between 2 qualitative, between 1 quantitative and 1 qualitative
and between quantitative variables).
• Sampling (if it is necessary)
• Model building
• Evaluation of the model

The submission deadline for the papers is 20 May 2025.

Presentations will be given in class on May 22.


Statistics

Week 1
Part 1

Unit 1: Introduction to Data and Statistics


Unit 2: Data Acquisition and Analysis
Unit 1 Introducton to Statistics
and Quantitative Analysis
Three developments spurred recent explosive growth in the
use of analytical methods in business applications:

■ Technological advances produce a lot of data for business


■ Numerous methodological developments
■ Explosion in computing power and storage capability
A Categorization on Analytical Methods
and Models

Descriptive Analytics : Encompasses the set of techniques that


describes what has happened in the past.

Predictive Analytics: Consists of techniques that use models


constructed from past data to predict the future or ascertain
the impact of one variable on another.

Prescriptive Analytics: Indicates a best course of action to take


(Optimization, Simulation, Decision analysis)
The Spectrum of Business Analysis
Types of Data
What type of data do you know?

 Quantitative – numeric

 Qualitative – numerical or nonnumerical

Data can be classified as being either of


qualitative or of quantitative nature
Types of Data and Scales of Measurement

Nature of variable
Type of data
representation Data

Qualitative Quantitative

Numerical Nonnumerical Numerical

Nominal Ordinal Nominal Ordinal Interval Ratio

Scale of measurement
Qualitative and Quantitative Data

Data can be classified as being either of qualitative


or of quantitative nature.

The statistical analysis that is appropriate depends


on whether the data for the variable are qualitative
or quantitative.

In general, there are more alternatives for statistical


analysis when the data are quantitative.
Qualitative Data

Labels or names used to identify an attribute of each


element

Often referred to as categorical data

Use either the nominal or ordinal scale of


measurement

Can be either numeric or nonnumeric

Appropriate statistical analyses are rather limited


Quantitative Data

Quantitative data indicate how many or how much:

discrete, if measuring how many

continuous, if measuring how much

Quantitative data are always numeric.

Ordinary arithmetic operations are meaningful for


quantitative data.
Scales of Measurement

Scales of measurement include:


Nominal Interval

Ordinal Ratio

The scale determines the amount of information


contained in the data.

The scale indicates the data summarization and


statistical analyses that are most appropriate.
Scales of Measurement

■ Nominal

Data are labels or names used to identify an


attribute of the element.

A nonnumeric label or numeric code may be used.


Scales of Measurement

■ Nominal

Example:
Students of a university are classified by the
school in which they are enrolled using a
nonnumeric label such as Business, Humanities,
Education, and so on.
Alternatively, a numeric code could be used for
the school variable (e.g. 1 denotes Business,
2 denotes Humanities, 3 denotes Education, and
so on).
Scales of Measurement

■ Ordinal

The data have the properties of nominal data and


the order or rank of the data is meaningful.

A nonnumeric label or numeric code may be used.


Scales of Measurement

■ Ordinal

Example:
Students of a university are classified by their
course performance using a nonnumeric label
such as Distinction, Merit, Pass or Fail.
Alternatively, a numeric code could be used for
the class standing variable (e.g. 1 denotes
Distinction, 2 denotes Merit and so on).
Scales of Measurement

■ Interval

The data have the properties of ordinal data, and


the interval (or distance) between observations is
expressed in terms of a fixed unit of measure.

Interval data are always numeric.

It is meaningful to calculate sums and differences


of data values, but the scale doesn’t have a natural
zero point.
Scales of Measurement

■ Interval

Example:
The maximum temperature on Tuesday was 18°C,
while on Wednesday it was only 12°C. The peak
on Tuesday was higher by 6°C on Tuesday.
Scales of Measurement

■ Ratio

The data have all the properties of interval data


and the ratio of two values is meaningful.

Variables such as distance, height, weight use


the ratio scale.

This scale must contain a zero value that indicates


that nothing exists for the variable at the zero point.
Summary: Scales of Measurement

Nominal:
Data are labels or names used to identify an attribute of the element (colour,
religion, family type, etc.)

Ordinal:
The rank of the data is meaningful (grade, positions in a competition, etc.)

Interval:
The distance between observations is expressed in terms of a fixed unit – the
scale does not have a natural zero point (time, temperature)

Ratio:
The ratio of two values is meaningful (height, weight, speed,etc.), this scale
must contain a zero value.
Unit 2 Data Acquisition and Analysis

■ Types of Data Sets


■ Data Sources and Acquisition
■ Descriptive Statistics
■ Statistical Inference
■ Computers and Statistical Analysis
Cross-Sectional Data

With cross-sectional data, observations are made on


a number of elements at a single date / for a single
time period.

Example: data detailing the number of building


permits issued in June 2006 in each of the regions
of Italy
Time Series Data

With time series data, observations are made on a


single entity at several dates / over several time
periods.

Example: data detailing the number of building


permits issued in Tuscany, Italy in each of
the last 36 months
Panel Data

With panel data, observations are made on a number


of elements at several dates / over several time
periods.

Example: data detailing the number of building


permits issued in each of the regions of Italy
in each of the last 36 months
Data Sources

■ Existing Sources

Within a firm – almost any department


Business database services – Economist Intelligence Unit,
Reuters, Bloomberg
Government agencies - European Commission,
European Central Bank, Fed. Res. Bank of St. Louis
Bureaus of statistics - Eurostat
Industry associations – European Tourist Office
Special-interest organizations – OECD, IMF, UNO
Internet – more and more firms; Wikipedia
Data Acquisition Considerations

Time Requirement
• Searching for information can be time consuming.
• Information may no longer be useful by the time it
is available.
Cost of Acquisition
• Organizations often charge for information even
when it is not their primary business activity.
Data Errors
• Using any data that happens to be available or
that were acquired with little care can lead to poor
and misleading information.
Data Sources

■ Statistical Studies

In experimental studies the variables of interest


are first identified. Then one or more factors are
controlled so that data can be obtained about how
the factors influence the variables.

In observational (nonexperimental) studies no


attempt is made to control or influence the
variables of interest.
a survey is a
good example
Make your own questionnaire!!!
GOOGLE SURVEY EDITOR
Can you go to this homepage?

• Google Account
• Applications
• Google Drive
• New
• Forms

https://docs.google.com/forms
PLANNED SURVEY QUESTIONS
What gender are you?
What color is your hair?
How tall are you? (in cm)
How much do you weigh? (in kg)
When were you born?
Age (in months)
How many siblings do you have?
What kind of locality did you live in as a child? (at the age of 8)
What is the population (in thousands) of the locality in which you lived as a child? (at the age of 8)
Which program are you studying at?
Have you ever studied statistics before? (in high school or college)
How would you rate your proficiency in using Excel?
What kind of animal would you most like to be reborn as in your next life?
If costs didn't matter, in which country would you most like to live for the next 3 months?
What would be your 1st preferred drink at a party?
What would be your 2nd preferred drink at a party?
QUESTIONS - answers

What gender are you?

male / female
QUESTIONS - answers

What color is your hair?

black
dark brown
light brown
blond
red
gray
I've no hair
other:
QUESTIONS - answers

How tall are you? (in cm)

How much do you weigh? (in kg)

When were you born?

How many siblings do you have?


QUESTIONS - answers

What kind of locality did you live in as a child? (at the age of 8)

Metropolis (more than 1.000.000 inhabitants)

City (between 100.000 and 1.000.000 inhabitants)

Medium town (between 10.000 and 100.000 inhabitants)

Small town (less than 10.000 inhabitants)

Township or village
QUESTIONS - answers

What is the population (in thousands) of the locality in which you lived as a
child? (at the age of 8)

Which program are you studying at?

Have you ever studied statistics before? (in high school or college)
yes / no

How would you rate your proficiency in using Excel?

1: I don't have any knowledge of Excel.


2: I have limited knowledge of Excel and I can only use its most basic tools.
3: I know how to use its main tools and usually I manage to achieve what I
need.
4: I am at ease with most of its tools and I also have some experience with
Excel …functions.
5: I am familiar with almost every tool and function.
QUESTIONS - answers
Page break → „Individual preferences”

What kind of animal would you most like to be reborn as in your


next life?

If costs didn't matter, in which country would you most like to live
for the next 3 months?

What kind of drinks do you prefer to have at a party?

3 separate items

1st preferred drink


2nd preferred drink
3rd preferred drink
QUESTIONS - answers
Listed below are statements that a person might use to describe himself or
herself. Please read each statement and decide how well it describes you.
„Grid” setup

Items:
I prefer to openly discuss my feelings and experiences with my friends rather
than keep them to myself.
Even my friends are unaware of my innermost feelings because I rarely express
how I think or feel.
I prefer to remain distant and detached with people.
I feel uncomfortable disclosing myself to other people, even to my friends.
I can cope better with nervousness in my friends' company than alone.
I prefer to keep my problems to myself.

Response categories:
1: very untrue of me
2: somewhat untrue of me
3: neutral
4: somewhat true of me
Can you open this link to our survey?

https://forms.gle/mPGkGsQW9q7ptxGf7

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy