0% found this document useful (0 votes)

29 views11 pages

2 Data Analysis

The document discusses using big data analytics and the internet of things in the upstream oil and gas industry. It covers characteristics of big data, the data analysis cycle, and various techniques for exploratory data analysis including frequency tables, histograms, measures of location, spread and shape, bivariate analysis using scatterplots, and correlation.

Uploaded by

Het Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views11 pages

2 Data Analysis

Uploaded by

Het Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

1/26/2023

• Characteristics – 6 V’s
• Volume, Variety, Velocity, Veracity, Value, and
Big Data Analytics Variability
&
Internet of Things
in
Upstream Oil and Gas Industry

1 2

3 4

1
1/26/2023

Data Analysis Cycle

Linear regression

5 6

Exploratory Data Analysis

Univariate Analysis
• Data speak most clearly when they are organized.
Much of statistics, therefore, deals with the
organization, presentation, and summary of data.
• we will use a small 10 x 10 m2 patch of the
exhaustive data set in all of our examples.
• In these examples, all of the U and V values have
been rounded off to the nearest integer.

7 8

2
1/26/2023

Frequency Tables and Histograms

• A frequency table records how often observed
values fall within certain intervals or classes.
• The information presented in Table can also
be presented graphically in a histogram
• It is common to use a constant class width for
the histogram so that the height of each bar is
proportional to the number of values within
that class

9 10

• Cumulative frequency tables and histograms may

be prepared after ranking the data in descending Summary Statistics
order.
• The important features of most histograms
can be captured by a few summary statistics.
• The summary statistics we use here fall into
three categories:
– measures of location,
– measures of spread and
– measures of shape.

11 12

3
1/26/2023

• Measures of location
– Mean
– Median
– Mode
• measures of spread
– Variance
– standard deviation
– interquartile range
• measures of shape
– coefficient of skewness
– coefficient of variation
13 14

• Lower and Upper Quartile

Both the mean and the median are • In the same way that the
measures of the location of the median splits the data into
center of the distribution. The mean halves, the quartiles split
is quite sensitive to erratic high the data into quarters.
• If the data values are
arranged in increasing
order, then a quarter of the
data falls below the lower
or first quartile, Q1, and a
quarter of the data falls
above the upper or third
quartile, Q3.
15 16

4
1/26/2023

Measures of Spread

17 18

Measures of Shape

19 20

5
1/26/2023

Coefficient of Variation 2993 Core Porosity Measurements

• The coefficient of variation, CV, is a statistic that is
often used as an alternative to skewness to describe
the shape of the distribution.
• It is defined as the ratio of the standard deviation to
the mean.

21 22

Bivariate Analysis
Graphing Univariate Date

Example box and bean plots.

23 24

6
1/26/2023

Comparing Two Distributions Scatterplots

•There are some rather major

differences between the distributions
of the two variables.
•The U distribution is positively
skewed; the V distribution, on the
other hand, is negatively skewed.
•The V values are generally higher • A scatterplot is also useful for drawing our attention to aberrant data
than the U values, with a mean value • In the early stages of the study of a spatially continuous data set it is
more than five times that of U. necessary to check and clean the data; the success of any estimation
•The V median and standard deviation method depends on reliable data.
are also greater than their U • Investigations of such unusual pairs will reveal errors that were most
counterparts. likely made when the data were collected or recorded
•More details are observed using the
quartile comparison of the
distributions. 25 26

Correlation • Covariance:
• There are three patterns one can observe on a scatterplot: the
variables are either positively correlated, negatively
correlated, or uncorrelated.
• Two variables are positively correlated if the larger values of
one variable tend to be associated with larger values of the
other variable, and similarly with the smaller values of each
variable. In porous rocks, porosity and permeability are
typically positively correlated.
• Correlation Coefficient:
• Two variables are negatively correlated if the larger values of
one variable tend to be associated with the smaller values of
the other. In geological data sets, the concentrations of two
major elements are often negatively correlated.
• The final possibility is that the two variables are not related. An
increase in one variable has no apparent effect on the other.
27 28

7
1/26/2023

Rank Correlation Coefficient (RCC)

• If the variables of interest are related in a nonlinear
manner, then the rank correlation coefficient (RCC) can be
used as a more robust measure of (nonlinear) association.
• Rank transformation implies assigning rank-1 to the
smallest value, rank- 2 to the next highest value, and so
on.
• This is the simplest nonparametric linearizing technique
that does not require assuming any functional form for
the relationship

29 30

Graphing Bivariate Data

strong positive linear trend strong negative linear trend

• CC is also referred to as the Pearson correlation

coefficient
• RCC is referred to as the Spearman correlation coefficient
• Pearson correlation coefficient will be much more
sensitive to data clusters and outliers compared with the
Spearman correlation coefficient. weak negative Correlation modest positive correlation
• So, it is often desirable to compute both the measures to
examine the robustness of the correlation.

31 32

8
1/26/2023

Fig. 2.5A shows the porosity-permeability scatterplot for this dataset, indicating an
apparent exponential relationship.
On the other hand, same data after rank transformation, where a much stronger linear
trend can be observed.
Pearson CC value of 0.789 for these data, which reflects the strength of the linear trend
Spearman CC value of 0.916 reflecting the strength of the rank-transformed linear trend

33 34

MULTIVARIATE DATA Similarly, the

concept of
• The analysis of correlation in multivariate data is a simple extension of
the concepts discussed previously for bivariate data.
scatterplots for
• This involves calculating the Pearson (or Spearman) CC for all variable data visualization
pairs and presenting it in the form of a correlation matrix. can be
generalized to a
scatterplot matrix
or a pairs plot,
which is
generated by
combining
scatterplots of all
variable pairs to
show their
interrelationship
35 36

9
1/26/2023

Linear Regression

37 38

MULTI-ATTRIBUTE ANALYSIS CROSSPLOT ANALYSIS

• Attribute analysis consist of
• Seismic attributes are any parameter derived from the seismic
data that help us to enhance or quantify the features of target logs and attributes,
interpretation interest. which do not have direct
• Seismic attribute are generally mathematical transform of the mathematical relation.
seismic trace data. • Assuming a linear relationship
• Benefits of using Seismic attributes are between the target log and the
– The attributes are nonlinear, thus increasing the predictive power of attribute, a straight line may
the technique.
– Benefit in breaking down the input data into component parts. be fit by regression:

• Seismic attributes is be divided into two categories: • The coefficients a and b in this
– Horizon-based attributes, the average properties of the seismic equation may be derived by
trace between two boundaries, generally defined by picked horizons
minimizing the mean-squared
– Sample-based attributes, the transforms of the input trace in such a
way as to produce another output trace with the same number of prediction error:
samples as the input.
39 40

10
1/26/2023

CROSSPLOT ANALYSIS FOR MULTIPLE

ATTRIBUTES PROBABILISTIC NEURAL NETWORK
• The crossplot analysis for • Probabilistic neural
multiple attributes can be network is one of the
extended from the single technique to derive
attribute analysis. At each time nonlinear relation
sample, the target log is between the target log
modeled by the linear equation and the attributes
• The operations are
organized into the
• The weights in this equation may network with three
be derived by minimizing the layers:
mean-squared prediction error – Input layer
– Hidden layer
41 – Output layer 42

Lecture 1 Exploratory Data Analysis
No ratings yet
Lecture 1 Exploratory Data Analysis
41 pages
Statistical Analysis
No ratings yet
Statistical Analysis
50 pages
Amit_Khilare_Used_Device_Data_PM_Project
No ratings yet
Amit_Khilare_Used_Device_Data_PM_Project
25 pages
Ostatistics
100% (2)
Ostatistics
44 pages
Midterm 1
No ratings yet
Midterm 1
14 pages
Data Mining Notes C3
No ratings yet
Data Mining Notes C3
11 pages
CS 591.03 Introduction To Data Mining Instructor: Abdullah Mueen
No ratings yet
CS 591.03 Introduction To Data Mining Instructor: Abdullah Mueen
52 pages
DataAnalytics(Unit 2)
No ratings yet
DataAnalytics(Unit 2)
131 pages
IT326 - Ch2
No ratings yet
IT326 - Ch2
44 pages
Parallel Coordinates
No ratings yet
Parallel Coordinates
35 pages
Lec.02 Getting to Know Your Data
No ratings yet
Lec.02 Getting to Know Your Data
62 pages
DS Unit 1
No ratings yet
DS Unit 1
99 pages
DWDM-LS2-Fall-24-25
No ratings yet
DWDM-LS2-Fall-24-25
42 pages
02Data
No ratings yet
02Data
65 pages
Data Type, Data Chart, Descriptive Statistics
No ratings yet
Data Type, Data Chart, Descriptive Statistics
65 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
54 pages
data mining 2
No ratings yet
data mining 2
64 pages
5.1_exploratory_analysis_en
No ratings yet
5.1_exploratory_analysis_en
79 pages
Module 1
No ratings yet
Module 1
64 pages
02 Data
No ratings yet
02 Data
41 pages
Data Analysis and Interpretation of Findings
No ratings yet
Data Analysis and Interpretation of Findings
34 pages
Data Mining: Data Exploration: - Chapter 6
No ratings yet
Data Mining: Data Exploration: - Chapter 6
56 pages
02Data
No ratings yet
02Data
66 pages
02 Data
No ratings yet
02 Data
65 pages
Getting To Know Your Data
No ratings yet
Getting To Know Your Data
78 pages
VIPDMTheoryChapter2
No ratings yet
VIPDMTheoryChapter2
56 pages
Unit1 Statistics
No ratings yet
Unit1 Statistics
60 pages
Multivariate Analysis Techniques For Exploring Data
No ratings yet
Multivariate Analysis Techniques For Exploring Data
2 pages
Lect 3
No ratings yet
Lect 3
51 pages
L5 6 DataViz
No ratings yet
L5 6 DataViz
79 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
Chapter 2
No ratings yet
Chapter 2
65 pages
02a EDA and Data Visualization
No ratings yet
02a EDA and Data Visualization
79 pages
02 Data
No ratings yet
02 Data
42 pages
Unit-4 Flow Assurance & Mitigation
No ratings yet
Unit-4 Flow Assurance & Mitigation
71 pages
L4 Exploratory Analysis en
No ratings yet
L4 Exploratory Analysis en
42 pages
Exploratory Spatial Data Analysis
No ratings yet
Exploratory Spatial Data Analysis
54 pages
DA Major Notes
No ratings yet
DA Major Notes
46 pages
Exploratory Data Analysis and Data Visualization: Credits: Chrisvolinsky - Columbia University
No ratings yet
Exploratory Data Analysis and Data Visualization: Credits: Chrisvolinsky - Columbia University
49 pages
Business Club: Basic Statistics
No ratings yet
Business Club: Basic Statistics
26 pages
02Data
No ratings yet
02Data
24 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
65 pages
CH 8 Data Analysis
No ratings yet
CH 8 Data Analysis
34 pages
Data Basics for ML
No ratings yet
Data Basics for ML
23 pages
HPHT 4 (Autosaved)
No ratings yet
HPHT 4 (Autosaved)
45 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
65 pages
Aspects of Multivariate Analysis
No ratings yet
Aspects of Multivariate Analysis
50 pages
Variograms
92% (13)
Variograms
20 pages
Unit 4
No ratings yet
Unit 4
21 pages
20BPE009 - Leadership ppt-1
No ratings yet
20BPE009 - Leadership ppt-1
15 pages
Fundamentals of Data Science and Analytics On Descriptive Analysis
No ratings yet
Fundamentals of Data Science and Analytics On Descriptive Analysis
53 pages
1) Introduction To AL
No ratings yet
1) Introduction To AL
29 pages
L14&15&16 Jackup Rig II
No ratings yet
L14&15&16 Jackup Rig II
51 pages
Pipeline Engineering (Module 1)
100% (4)
Pipeline Engineering (Module 1)
39 pages
2) Selection of Lift Mode
No ratings yet
2) Selection of Lift Mode
21 pages
2/ Organizing and Visualizing Variables: Dcova
No ratings yet
2/ Organizing and Visualizing Variables: Dcova
4 pages
Graphical Presentation
No ratings yet
Graphical Presentation
6 pages
E-Book On Essentials of Business Analytics: Group 7
No ratings yet
E-Book On Essentials of Business Analytics: Group 7
6 pages
Data Analytics (Finished
No ratings yet
Data Analytics (Finished
4 pages
CNG & CNG Stations
No ratings yet
CNG & CNG Stations
34 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

2 Data Analysis

Uploaded by

2 Data Analysis

Uploaded by

1/26/2023

Data Analysis Cycle

Exploratory Data Analysis

Frequency Tables and Histograms

• Cumulative frequency tables and histograms may

• Lower and Upper Quartile

Coefficient of Variation 2993 Core Porosity Measurements

Example box and bean plots.

Comparing Two Distributions Scatterplots

•There are some rather major

Rank Correlation Coefficient (RCC)

Graphing Bivariate Data

• CC is also referred to as the Pearson correlation

MULTIVARIATE DATA Similarly, the

MULTI-ATTRIBUTE ANALYSIS CROSSPLOT ANALYSIS

CROSSPLOT ANALYSIS FOR MULTIPLE

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.