0% found this document useful (0 votes)

35 views8 pages

AGE 301 - NOTE - A-1

The document provides an overview of Engineering Statistics, emphasizing the importance of data analysis in engineering research. It covers key concepts in descriptive and inferential statistics, data collection, and analysis methods including univariate, bivariate, and multivariate analyses. Additionally, it discusses various statistical software tools and techniques such as regression analysis, time series analysis, and principal component analysis.

Uploaded by

igilijosephine

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views8 pages

AGE 301 - NOTE - A-1

Uploaded by

igilijosephine

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

AGE 301- Engineering Statistics - Part I

Prof. Ayorinde A. Olufayo

Prof. Philip G. Oguntunde

Department of Agricultural & Environmental Engineering,

The Federal University of Technology, Akure, Nigeria

1. Introduction

Engineering research generates enormous data which, when analysed, form the basis for inferences,
decisions and conclusions. For meaningful research output, the engineer must analyse his data as
appropriate and presents the results in acceptable standard formats. This is a major role of statistics
in engineering. There are two broad aspects of statistics, namely: Descriptive Statistics and
Inferential Statistics. The former is used to explore and summarize the information contained in
the data, while the latter involves drawing inferences about the population from the data.
Procedures under descriptive statistics include the use of tables, graphs, and numerical measures
(computation of simple statistics such as mean, median, mode, variance, standard deviation, etc. to
describe the data). On the other hand, statistical inference involves formulation of statistical
hypotheses, testing of the hypotheses, and making inference or drawing conclusions based on the
results obtained.

1.1 Basic Concepts & Definitions

Statistics is the mathematical science involving the collection, analysis and interpretation of data. A
number of specialties have evolved to apply statistical theory and methods to various disciplines
e.g. Engineering (combines engineering and statistics), Environmental Science, Geosciences,
Operations research, Quality and process control, etc .

Descriptive statistics provides simple summaries about the sample/observations that have been
made. Such summaries may be either quantitative, i.e. summary statistics, or visual, i.e. simple-to-
understand graphs. These summaries may either form the basis of the initial description of the data
as part of a more extensive statistical analysis, or they may be sufficient in and of themselves for a
particular investigation.

Inferential statistics (or inductive statistics) use the data to learn about the population that the
sample of data is representing. These statistics are developed on the basis of probability theory.

Deterministic Data are data generated in accordance to known and precise laws e.g. is the fall of a
body subject to the Earth’s gravity. The attributes of deterministic data: Within the precision of the
measurements, under repeated experiments in well defined conditions the same data will be
obtained,.

1
Random Data are data that seem to occur in a purely haphazard way e.g. Thermal noise generated
in electrical resistances, Brownian motion of tiny particles in a fluid; Weather variables; Financial
variables such as Stock Exchange share, Gambling game outcomes (dice, cards, roulette, etc.). In
none of these examples can a precise mathematical law describe the data. Also, there is no
possibility of obtaining the same data in repeated experiments, performed under similar conditions.

Datasets & Preliminaries

A statistical data analysis project starts with the data collection task. The quality of performing this
task is a major determinant of the quality of the overall project. Issues such as reducing the number
of missing data, recording the pertinent documentation on what the problem is and how the data
was collected and inserting the appropriate description of the meaning of the variables involved
must be adequately addressed. Data screening and quality control are very pertinent.

Missing data – failure to obtain for certain objects/cases the values of one or more variables – will
always undermine the degree of certainty of the statistical conclusions. Many software products
provide means to cope with missing data. These can be simply coding missing data by symbolic
numbers or tags, such as “na” (“not available”) which are neglected when performing statistical
analysis operations. Another possibility is the substitution of missing data by average values of the
respective variables. Yet another solution is to simply remove objects with missing data. Whatever
method is used the quality of the project is always impaired.

Outliers and extreme values

Outlier is an observation that is numerically distant from the rest of the data. An outlying
observation is one that appears to deviate markedly from other members of the sample in which it
occurs. Outliers, being the most extreme observations, may include the sample maximum or sample
minimum, or both, depending on whether they are extremely high or low. However, the sample
maximum and minimum are not always outliers because they may not be unusually far from other
observations. Outliers arise due to changes in system behaviour, fraudulent behaviour, human or
instrument errors or simply through natural deviations in populations.

1.2 Application Software Tools

There are many software tools for statistical analysis, covering a broad spectrum of possibilities. At
one end we find “closed” products where the user can only perform menu operations. SPSS and
STATISTICA are examples of “closed” products. At the other end we find “open” products
allowing the user to program any arbitrarily complex sequence of statistical analysis operations.
MATLAB and R are examples of “open” products providing both a programming language and an
environment for statistical and graphic operations.

It must be stressed that there are also many free computer statistical packages (Freewares)
available for use nowadays (e.g. MAKESENS - for trend detection in time series data). Some are
even customised for specific task. With the knowledge of one, it becomes relatively easy to use
others once the manual is available. During this course, we will focus on Microsoft Excel and other
readily available softwares. However, it should be noted that, Microsoft Excel is only a spreadsheet
and not a statistical package. Therefore it has serious limitations when it comes to doing rigorous
statistical analysis. Notwithstanding, it is a very powerful tool for exploratory data analysis
(EDA). Thus, the basic knowledge of MS Excel is assumed for this WMA 501.

2. Data Analyses

2.1 Univariate analysis

Univariate analysis involves describing the distribution of a single variable, including its central
tendency (e.g. mean, median, and mode) and dispersion (e.g. the range and quantiles of the data-
2
set, and measures of spread such as the variance and standard deviation). The shape of the
distribution may also be described via indices such as skewness and kurtosis. Characteristics of a
variable's distribution may also be depicted in graphs or tables such as histograms and box plots.

Histogram

A histogram (Figure 1) is a graphical representation showing a visual impression of the

distribution of data. It is an estimate of the probability distribution of a continuous variable.

Figure 1: Nigeria annual rainfall series using histogram (the outlier is enclosed in the oval shape).
The histogram is overlay with the theoretical normal distribution curve.

Box Plot

In descriptive statistics, a box plot (also known as a box-and-whisker diagram) is a convenient way
of graphically depicting groups of numerical data through their five-number summaries: the
smallest observation (sample minimum), lower quartile (Q1), median (Q2), upper quartile (Q3),
and largest observation (sample maximum). A box plot may also indicate which observations, if
any, might be considered outliers. Figure 2 shows the annual rainfall series of Nigeria using box
plot.

3
Figure 2: Annual rainfall series (1901-2000) of Nigeria using box plot (the outlier is encircled and
correspond to the 83rd data point or 1983 rainfall value)

2.2 Bivariate analysis

When a sample consists of two variable, descriptive statistics may be used to describe the
relationship between pairs of variables. In this case, the statistics include:

 Cross-tabulations and contingency tables

 Graphical representation via scatter plots
 Quantitative measures of dependence
 Descriptions of conditional distributions

Quantitative measures of dependence include correlation (such as Pearson's r when both variables
are continuous, or Spearman's rho if one or both are not) and covariance (which reflects the scale
upon which variables are measured). The slope, in regression analysis, also reflects the relationship
between variables, etc.

Graphics

Statistical graphics, also known as graphical techniques, are information graphics in the field of
statistics used to visualize quantitative data. Graphical techniques allow results to be displayed in
some sort of pictorial form. They include plots such as scatter plots (e.g. Figure 3), histograms,
etc.

Graphical statistical methods have four objectives:

 The exploration of the content of a data set

 The use to find structure in data
 Checking assumptions in statistical models
 Communicate the results of an analysis.

4
If one is not using statistical graphics, then one is forfeiting insight into one or more aspects of the
underlying structure of the data.

12
Fd (g cm-2 h -1)

3
dry wet Transition

0
0 100 200 300 400
Eo (W m-2)

Figure 3: A scatter plot showing relation between sap flow (Fd) and potential evaporation (Eo) in a
cashew orchard in Ghana.

2.3 Regression analysis & models

Regression analysis is a statistical technique for estimating the relationships among variables. It
includes many techniques for modeling and analyzing several variables, when the focus is on the
relationship between a dependent variable and one or more independent variables. Regression
models involve the following variables:

 The unknown parameters, denoted as β, which may represent a scalar or a vector.

 The independent variables, X.
 The dependent variable, Y.

A regression model relates Y to a function of X and β.

The model could be linear or curve-linear depending on the data structure and inherent relationship
between the variables.

Curve fitting techniques

There are different ways of fitting a curve other than a line to a data.

 Deriving the regression formula, which may be cumbersome, and requires some calculus
knowledge;
 Linearization: some nonlinear regression problems can be moved to a linear domain by a
suitable transformation.

5
 Using optimization algorithms to minimise the error space, e.g. Levenberg–Marquardt
algorithm
 Using Nonlinear modelling techniques, such as neural networks.

Goodness-of-fit
Commonly used checks of goodness of fit include the coefficient of determination, R2, analyses of
residuals and hypothesis testing. Statistical significance can be checked by an F-test of the overall
fit, followed by t-tests of individual parameters.

2.4 Time Series Analysis

In hydrometeorology (including: hydrology and water resources, agrometeorology, Ecohydrology;
climatology; etc) a time series is a sequence of data points, measured typically at successive time
instants spaced at uniform time intervals. Examples of time series plot of the annual rainfall total of
Nigeria (Figure 4). Goals of Time series analysis include: Identifying of patterns and Predicting
future values.

2200

2000

1800
Rainfall, mm/yr

1600

1400

1200

1000
1970 1980 1990 2000 2010
YEAR

Figure 4: Time series plot of the mean annual rainfall total of Nigeria.

Time series analysis comprises methods for analyzing time series data in order to extract
meaningful statistics and other characteristics of the data. In addition, time series analysis (trend)
techniques may be divided into parametric and non-parametric methods.

2.4.1 Trend Analysis

Parametric and non parametric trend test

Generally before embarking on the parametric trend test or least square regression analysis, the data
must be checked for its suitability for regression analysis by checking three assumptions that a
linear regression makes about the data.

The regression assumes:

(i) that the source population is normally distributed,

6
(ii) the variance of the dependent variable in the source population is constant regardless of the
value of the independent variable(s), and
(iii) that the residuals are independent of each other.

2.4.2 Analysis of seasonality (Cycles and Periodicities)

Autocorrelation (Auto-correlogram)
Autocorrelation is used to identify periodic signal (stochastic component) in time series datasets.
Autocorrelation analysis correlates a time series dataset with itself at different time lags. It is useful
in checking randomness, finding repeating patterns, or identifying presence of a periodic signal in a
time series dataset.

Fourier Analysis
Also known as spectral analysis Use the exploration of cyclical patterns of data to decompose time
series datasets into spectrum of cycles of different lengths. This helps to uncover reoccurring cycles
of different length in a time series, which at first looks like a random noise. It is possible to use the
unfiltered datasets for the spectral analysis to retain the contribution of high frequency signals. It is
more efficient than autocorrelation as it uses variance (not correlation).

3. Further Data Analyses

Multivariate analysis comprises a set of techniques dedicated to the analysis of data sets with more
than one variable. Several of these techniques were developed recently in part because they require
the computational capabilities of modern computers.

3.1 Cross-correlation and multiple regressions

The correlation matrix is often used for a first inspection of the interrelationships among the
variables of the multivariate datasets. Multiple linear regression is the extension of the simple linear
regression in which the predictors are generally more than one. It may sometime serve as a follow
up to the cross-correlation analysis.

3.2 Data Structure Analysis

Statistical techniques that allow us to analyse the data structure with the dual objective of
dimensional reduction and improved data interpretation. Such include Principal Components
Analysis (PCA), Cluster Analysis (CA), Wavelet Analysis (WA) and Self Organising Maps
(SOM). Detail discussion on these is beyond the scope of this workshop.

Principal Components Analysis

Principal component analysis (PCA) is a technique used to emphasize variation and bring out
strong patterns in a dataset. It's often used to make data easy to explore and visualize. Principal
component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to
convert a set of observations of possibly correlated variables into a set of values of linearly
uncorrelated variables called principal components. It is a data reduction technique that can also
be used to find coupling among complex data sets.

Cluster Analysis
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in
the same group (called a cluster) are more similar (in some sense or another) to each other than to
those in other groups (clusters). In other words cluster analysis is an exploratory data analysis tool
which aims at sorting different objects into groups in a way that the degree of association between
two objects is maximal if they belong to the same group and minimal otherwise.

7
Wavelet Analysis
Wavelet analysis is becoming a common tool for analyzing localized variations of power within a
time series. By decomposing a time series into time-frequency space, one is able to determine both
the dominant modes of variability and how those modes vary in time. The wavelet transform has
been used for numerous studies in geophysics, including tropical convection, the El Niño–Southern
Oscillation (ENSO) atmospheric cold fronts , Rainfall and temperature , etc.

Self Organising Maps

Invented by Teuvo Kohonen. It uses Unsupervised ANN (Artificial Neural Networks) using
competitive learning. Provide a mechanism for visualising complex relationships in multi-
dimensional data sets. A tool used for clustering, visualisation and dimension reduction. “Given an
N-dimensional cloud of data points, the SOM will seek to place an arbitrary number of nodes
within the data space such that the distribution of nodes is representative of the multi-dimensional
distribution function, with the nodes being more closely spaced in regions of high data

Concluding Remarks
Please note that all the major topics presented above will be illustrated with examples during the
practical session using relevant softwares e.g. MS Excel, SPSS and MAKESENS (freeware) as
contained in the task and tutorials for this course.

Introduction To Statistics
No ratings yet
Introduction To Statistics
92 pages
Modified Ps Final 2023
No ratings yet
Modified Ps Final 2023
124 pages
IB Standard Level Maths Analysis Approaches
No ratings yet
IB Standard Level Maths Analysis Approaches
23 pages
Chapter 1
No ratings yet
Chapter 1
109 pages
STATISTICS (Pages 131)
No ratings yet
STATISTICS (Pages 131)
131 pages
Lecture Notes 1 Introduction To Statistics and Data Analysis
100% (1)
Lecture Notes 1 Introduction To Statistics and Data Analysis
23 pages
QT Theory (Full)
No ratings yet
QT Theory (Full)
81 pages
Basic Terms Statistics
No ratings yet
Basic Terms Statistics
40 pages
Statistical Packages Book
No ratings yet
Statistical Packages Book
63 pages
Basic Statistics
No ratings yet
Basic Statistics
90 pages
الإحصاء الهندسي
No ratings yet
الإحصاء الهندسي
64 pages
Chapter One: 1. Basic Concepts, Methods of Data Collection and Presentation
No ratings yet
Chapter One: 1. Basic Concepts, Methods of Data Collection and Presentation
111 pages
Chapt 1
No ratings yet
Chapt 1
20 pages
Some Imoprtant Topics of Statistics With Defination
No ratings yet
Some Imoprtant Topics of Statistics With Defination
46 pages
1.1 Definitions and Classification of Statistics: Chapter One: Introduction
100% (3)
1.1 Definitions and Classification of Statistics: Chapter One: Introduction
10 pages
Note For Students
No ratings yet
Note For Students
68 pages
Masaya Book Statistics
No ratings yet
Masaya Book Statistics
460 pages
Unit 6 How To Analyze Your Data
No ratings yet
Unit 6 How To Analyze Your Data
6 pages
Probability and Statistics Part-One For Construction Technology Management
No ratings yet
Probability and Statistics Part-One For Construction Technology Management
42 pages
Research Method Lecture Notes
No ratings yet
Research Method Lecture Notes
32 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
80 pages
PR2 Modular M
100% (1)
PR2 Modular M
5 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
Stat For Comp (CH 1-5)
No ratings yet
Stat For Comp (CH 1-5)
54 pages
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
No ratings yet
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
42 pages
Prob & Stat
No ratings yet
Prob & Stat
50 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
48 pages
Unit 9 Statistical Packages: 9.0 Objectives
No ratings yet
Unit 9 Statistical Packages: 9.0 Objectives
16 pages
Data Visualization
No ratings yet
Data Visualization
18 pages
Statistics
No ratings yet
Statistics
12 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
30 pages
Business Stastics
No ratings yet
Business Stastics
82 pages
IEM Outline Lecture Notes Autumn 2016
No ratings yet
IEM Outline Lecture Notes Autumn 2016
198 pages
Average: Sagni D. 1
No ratings yet
Average: Sagni D. 1
85 pages
Basic Statistics Note.1
No ratings yet
Basic Statistics Note.1
47 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
39 pages
Unit .......
No ratings yet
Unit .......
45 pages
1.STA 112 Session 1
No ratings yet
1.STA 112 Session 1
7 pages
Data Analysis
No ratings yet
Data Analysis
17 pages
Green Aesthetic Thesis Defense Presentation
No ratings yet
Green Aesthetic Thesis Defense Presentation
5 pages
Presentation ON Introduction To Statistics: Course No: URP 5151 Couse Title: Statistics For Planners
No ratings yet
Presentation ON Introduction To Statistics: Course No: URP 5151 Couse Title: Statistics For Planners
37 pages
New Generation University College: AUGUST 2020
No ratings yet
New Generation University College: AUGUST 2020
51 pages
2013 CH 1 Up 9 Probability Note
No ratings yet
2013 CH 1 Up 9 Probability Note
104 pages
FYM - DOE - Lecture #2 PDF
No ratings yet
FYM - DOE - Lecture #2 PDF
51 pages
CH 1 Up 9 Probability Note-1 PDF
No ratings yet
CH 1 Up 9 Probability Note-1 PDF
106 pages
Statistics
No ratings yet
Statistics
14 pages
WQD7010 L2 Symmetric Encryption
No ratings yet
WQD7010 L2 Symmetric Encryption
109 pages
BRM Chapter 6
No ratings yet
BRM Chapter 6
8 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
22 pages
Introduction of Statistics (Reading)
No ratings yet
Introduction of Statistics (Reading)
24 pages
SDDF
No ratings yet
SDDF
40 pages
Worksheet For Surds
No ratings yet
Worksheet For Surds
21 pages
Chapter 1-1
No ratings yet
Chapter 1-1
18 pages
Chapter Six Methods of Describing Data
No ratings yet
Chapter Six Methods of Describing Data
20 pages
Intern Report
No ratings yet
Intern Report
16 pages
Statistics
No ratings yet
Statistics
25 pages
Khushwant Singh
No ratings yet
Khushwant Singh
3 pages
Grey Minimalist Business Project Presentation
No ratings yet
Grey Minimalist Business Project Presentation
5 pages
Quantitative Analysis: Basic Statistical Processes
No ratings yet
Quantitative Analysis: Basic Statistical Processes
4 pages
Ds Lab Manual
No ratings yet
Ds Lab Manual
32 pages
E-Book On Essentials of Business Analytics: Group 7
No ratings yet
E-Book On Essentials of Business Analytics: Group 7
6 pages
KNN ALGORITHM IN MACHINELEARNING
No ratings yet
KNN ALGORITHM IN MACHINELEARNING
10 pages
Summary EES Manual PDF
No ratings yet
Summary EES Manual PDF
13 pages
Gauss Sedial & Jacobies
No ratings yet
Gauss Sedial & Jacobies
16 pages
Model Evaluation Metrics
No ratings yet
Model Evaluation Metrics
21 pages
13 Konsep Model Dan Pemodelan
100% (1)
13 Konsep Model Dan Pemodelan
27 pages
Statistics
No ratings yet
Statistics
4 pages
Revisiting Secretary Problem
No ratings yet
Revisiting Secretary Problem
25 pages
Model Question Paper - With Effect From 2020-21 (CBCS Scheme)
No ratings yet
Model Question Paper - With Effect From 2020-21 (CBCS Scheme)
4 pages
Computer Science Questions From Chapter 10 For Class 10 11
No ratings yet
Computer Science Questions From Chapter 10 For Class 10 11
11 pages
6 Quiz9
No ratings yet
6 Quiz9
8 pages
Maize Leaf Disease Identification
No ratings yet
Maize Leaf Disease Identification
10 pages
Internship Progress Report: Bachelor of Engineering
No ratings yet
Internship Progress Report: Bachelor of Engineering
31 pages
Cambridge IGCSE ™: Additional Mathematics 0606/11
No ratings yet
Cambridge IGCSE ™: Additional Mathematics 0606/11
8 pages
Simple Linear Ordinary Least Squares Regression: JTMS-03 Applied Statistics With R
No ratings yet
Simple Linear Ordinary Least Squares Regression: JTMS-03 Applied Statistics With R
39 pages
Ac.f215 Exam 2018-2019 PDF
No ratings yet
Ac.f215 Exam 2018-2019 PDF
6 pages
BSDCH ZC317 AlgorithmDesign 1-2022
No ratings yet
BSDCH ZC317 AlgorithmDesign 1-2022
7 pages
TAR 2020 Reading 05
No ratings yet
TAR 2020 Reading 05
20 pages
Transportation and Assignment Problem
No ratings yet
Transportation and Assignment Problem
4 pages
Analysis and Design of Algorithm (ADA) : Amity School of Engineering & Technology (CSE)
No ratings yet
Analysis and Design of Algorithm (ADA) : Amity School of Engineering & Technology (CSE)
18 pages
Linear Systems TEST
No ratings yet
Linear Systems TEST
4 pages
Summary Hadoop
No ratings yet
Summary Hadoop
2 pages
TMA-Summer 2023-2024
No ratings yet
TMA-Summer 2023-2024
3 pages
Assignment 4
No ratings yet
Assignment 4
4 pages
Secure Joint Communication and Sensing
No ratings yet
Secure Joint Communication and Sensing
9 pages
Traffic Sign Recognition Project
No ratings yet
Traffic Sign Recognition Project
9 pages
Design and Implementation of Optimized FIR Filter Using Reversible Logic
No ratings yet
Design and Implementation of Optimized FIR Filter Using Reversible Logic
5 pages
Lab 4 - DTFS Analysis
No ratings yet
Lab 4 - DTFS Analysis
4 pages
Problem Proposal: Flipping Bits in A String
No ratings yet
Problem Proposal: Flipping Bits in A String
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

AGE 301 - NOTE - A-1

Uploaded by

AGE 301 - NOTE - A-1

Uploaded by

AGE 301- Engineering Statistics - Part I

Prof. Ayorinde A. Olufayo

Prof. Philip G. Oguntunde

Department of Agricultural & Environmental Engineering,

1.1 Basic Concepts & Definitions

Datasets & Preliminaries

Outliers and extreme values

1.2 Application Software Tools

2.1 Univariate analysis

A histogram (Figure 1) is a graphical representation showing a visual impression of the

2.2 Bivariate analysis

 Cross-tabulations and contingency tables

Graphical statistical methods have four objectives:

 The exploration of the content of a data set

2.3 Regression analysis & models

 The unknown parameters, denoted as β, which may represent a scalar or a vector.

A regression model relates Y to a function of X and β.

Curve fitting techniques

2.4 Time Series Analysis

2.4.1 Trend Analysis

Parametric and non parametric trend test

The regression assumes:

2.4.2 Analysis of seasonality (Cycles and Periodicities)

3. Further Data Analyses

3.1 Cross-correlation and multiple regressions

3.2 Data Structure Analysis

Principal Components Analysis

Self Organising Maps

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.