0% found this document useful (0 votes)
29 views11 pages

2 Data Analysis

The document discusses using big data analytics and the internet of things in the upstream oil and gas industry. It covers characteristics of big data, the data analysis cycle, and various techniques for exploratory data analysis including frequency tables, histograms, measures of location, spread and shape, bivariate analysis using scatterplots, and correlation.

Uploaded by

Het Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views11 pages

2 Data Analysis

The document discusses using big data analytics and the internet of things in the upstream oil and gas industry. It covers characteristics of big data, the data analysis cycle, and various techniques for exploratory data analysis including frequency tables, histograms, measures of location, spread and shape, bivariate analysis using scatterplots, and correlation.

Uploaded by

Het Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

1/26/2023

• Characteristics – 6 V’s
• Volume, Variety, Velocity, Veracity, Value, and
Big Data Analytics Variability
&
Internet of Things
in
Upstream Oil and Gas Industry

1 2

3 4

1
1/26/2023

Data Analysis Cycle

Linear regression

5 6

Exploratory Data Analysis


Univariate Analysis
• Data speak most clearly when they are organized.
Much of statistics, therefore, deals with the
organization, presentation, and summary of data.
• we will use a small 10 x 10 m2 patch of the
exhaustive data set in all of our examples.
• In these examples, all of the U and V values have
been rounded off to the nearest integer.

7 8

2
1/26/2023

Frequency Tables and Histograms


• A frequency table records how often observed
values fall within certain intervals or classes.
• The information presented in Table can also
be presented graphically in a histogram
• It is common to use a constant class width for
the histogram so that the height of each bar is
proportional to the number of values within
that class

9 10

• Cumulative frequency tables and histograms may


be prepared after ranking the data in descending Summary Statistics
order.
• The important features of most histograms
can be captured by a few summary statistics.
• The summary statistics we use here fall into
three categories:
– measures of location,
– measures of spread and
– measures of shape.

11 12

3
1/26/2023

• Measures of location
– Mean
– Median
– Mode
• measures of spread
– Variance
– standard deviation
– interquartile range
• measures of shape
– coefficient of skewness
– coefficient of variation
13 14

• Lower and Upper Quartile


Both the mean and the median are • In the same way that the
measures of the location of the median splits the data into
center of the distribution. The mean halves, the quartiles split
is quite sensitive to erratic high the data into quarters.
• If the data values are
arranged in increasing
order, then a quarter of the
data falls below the lower
or first quartile, Q1, and a
quarter of the data falls
above the upper or third
quartile, Q3.
15 16

4
1/26/2023

Measures of Spread

17 18

Measures of Shape

19 20

5
1/26/2023

Coefficient of Variation 2993 Core Porosity Measurements


• The coefficient of variation, CV, is a statistic that is
often used as an alternative to skewness to describe
the shape of the distribution.
• It is defined as the ratio of the standard deviation to
the mean.

21 22

Bivariate Analysis
Graphing Univariate Date

Example box and bean plots.

23 24

6
1/26/2023

Comparing Two Distributions Scatterplots

•There are some rather major


differences between the distributions
of the two variables.
•The U distribution is positively
skewed; the V distribution, on the
other hand, is negatively skewed.
•The V values are generally higher • A scatterplot is also useful for drawing our attention to aberrant data
than the U values, with a mean value • In the early stages of the study of a spatially continuous data set it is
more than five times that of U. necessary to check and clean the data; the success of any estimation
•The V median and standard deviation method depends on reliable data.
are also greater than their U • Investigations of such unusual pairs will reveal errors that were most
counterparts. likely made when the data were collected or recorded
•More details are observed using the
quartile comparison of the
distributions. 25 26

Correlation • Covariance:
• There are three patterns one can observe on a scatterplot: the
variables are either positively correlated, negatively
correlated, or uncorrelated.
• Two variables are positively correlated if the larger values of
one variable tend to be associated with larger values of the
other variable, and similarly with the smaller values of each
variable. In porous rocks, porosity and permeability are
typically positively correlated.
• Correlation Coefficient:
• Two variables are negatively correlated if the larger values of
one variable tend to be associated with the smaller values of
the other. In geological data sets, the concentrations of two
major elements are often negatively correlated.
• The final possibility is that the two variables are not related. An
increase in one variable has no apparent effect on the other.
27 28

7
1/26/2023

Rank Correlation Coefficient (RCC)


• If the variables of interest are related in a nonlinear
manner, then the rank correlation coefficient (RCC) can be
used as a more robust measure of (nonlinear) association.
• Rank transformation implies assigning rank-1 to the
smallest value, rank- 2 to the next highest value, and so
on.
• This is the simplest nonparametric linearizing technique
that does not require assuming any functional form for
the relationship

29 30

Graphing Bivariate Data


strong positive linear trend strong negative linear trend

• CC is also referred to as the Pearson correlation


coefficient
• RCC is referred to as the Spearman correlation coefficient
• Pearson correlation coefficient will be much more
sensitive to data clusters and outliers compared with the
Spearman correlation coefficient. weak negative Correlation modest positive correlation
• So, it is often desirable to compute both the measures to
examine the robustness of the correlation.

31 32

8
1/26/2023

Fig. 2.5A shows the porosity-permeability scatterplot for this dataset, indicating an
apparent exponential relationship.
On the other hand, same data after rank transformation, where a much stronger linear
trend can be observed.
Pearson CC value of 0.789 for these data, which reflects the strength of the linear trend
Spearman CC value of 0.916 reflecting the strength of the rank-transformed linear trend

33 34

MULTIVARIATE DATA Similarly, the


concept of
• The analysis of correlation in multivariate data is a simple extension of
the concepts discussed previously for bivariate data.
scatterplots for
• This involves calculating the Pearson (or Spearman) CC for all variable data visualization
pairs and presenting it in the form of a correlation matrix. can be
generalized to a
scatterplot matrix
or a pairs plot,
which is
generated by
combining
scatterplots of all
variable pairs to
show their
interrelationship
35 36

9
1/26/2023

Linear Regression

37 38

MULTI-ATTRIBUTE ANALYSIS CROSSPLOT ANALYSIS


• Attribute analysis consist of
• Seismic attributes are any parameter derived from the seismic
data that help us to enhance or quantify the features of target logs and attributes,
interpretation interest. which do not have direct
• Seismic attribute are generally mathematical transform of the mathematical relation.
seismic trace data. • Assuming a linear relationship
• Benefits of using Seismic attributes are between the target log and the
– The attributes are nonlinear, thus increasing the predictive power of attribute, a straight line may
the technique.
– Benefit in breaking down the input data into component parts. be fit by regression:

• Seismic attributes is be divided into two categories: • The coefficients a and b in this
– Horizon-based attributes, the average properties of the seismic equation may be derived by
trace between two boundaries, generally defined by picked horizons
minimizing the mean-squared
– Sample-based attributes, the transforms of the input trace in such a
way as to produce another output trace with the same number of prediction error:
samples as the input.
39 40

10
1/26/2023

CROSSPLOT ANALYSIS FOR MULTIPLE


ATTRIBUTES PROBABILISTIC NEURAL NETWORK
• The crossplot analysis for • Probabilistic neural
multiple attributes can be network is one of the
extended from the single technique to derive
attribute analysis. At each time nonlinear relation
sample, the target log is between the target log
modeled by the linear equation and the attributes
• The operations are
organized into the
• The weights in this equation may network with three
be derived by minimizing the layers:
mean-squared prediction error – Input layer
– Hidden layer
41 – Output layer 42

43

11

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy