LO3 - TASK 2&3: Statistics and Financial Decisions
LO3 - TASK 2&3: Statistics and Financial Decisions
1) Linear Regression: It is used when we want to predict the value of a variable based on the
value of another variable. The variable we want to predict is called the dependent variable
(or sometimes, the outcome variable).
Advantages:
1. Linear Regression is simple to implement and easier to interpret the output coefficients.
2. When you know the relationship between the independent and dependent variable has a
linear relationship, this algorithm is the best to use because of its less complexity compared
to other algorithms.
3. In addition, it works in most cases. Even when it doesn't fit the data exactly, we can use it
to find the nature of the relationship between the two variables.
Disadvantages:
1. On the other hand in the linear regression technique outliers can have huge effects on the
regression and boundaries are linear in this technique.
3. But then linear regression also looks at a relationship between the mean of the dependent
variables and the independent variables. Just as the mean is not a complete description of a
single variable, linear regression is not a complete description of relationships among
variables.
Advantage
1. Less prone to whipsawing up and down in response to slight, temporary price swings back
and forth
2. Moving averages can be used for measuring the trend of any series. This method is
applicable to linear as well as non-linear trends.
Disadvantages
1. The trend obtained by moving averages generally is neither a straight line nor a standard
curve, for this reason, the trend cannot be extended for forecasting future values. Trend
values are not available for some periods at the start and some values at the end of the time
series. This method is not applicable to short time series.
2. Some of the data used to compute the moving average might be old or stal
3) Naïve: Estimating technique in which the last period's actuals are used as this period's
forecast, without adjusting them or attempting to establish causal factors. It is used only for
comparison with the forecasts generated by the better (sophisticated) techniques.
Advantages:
1. You’ll gain valuable insight
2. Efficiency and accuracy have also led to the widespread proliferation
3. It can decrease costs
Disadvantages:
1. It not Considerate if there any emergency conditions
2. Forecasts are never 100% accurate
3. It can be time-consuming and resource-intensive
4) Correlation: is used to describe the linear relationship between two continuous variables
(e.g., height and weight). In general, correlation tends to be used when there is no identified
response variable. It measures the strength (qualitatively) and direction of the linear
relationship between two or more variables.
Advantages:
1. can show the strength of the relationship between two variables
2. Study behavior that you cannot study
3. Gain quantitative data that can be easily analyzed
Disadvantages:
1. Cannot show cause and effect (what variables control what)
2. No control of the third variable that might affect the correlation
Scenario 1:
Naïve 10500
(10500+11000+12000)/3
Y=-785.71*2018+1.595.285.71=
production volume
total quantity of inventory (*1000)
Year (*1000)
1 100 20
2 120 27
3 150 36
4 200 50
250 65.2267
Scenario 2:
=CORREL
Linear Regression (Production = 250) 65.2267
Y=0.2974*250-9.1233=
Scenario 3:
n=20000
M=5
σ =0.1
LO4
Identify different types of charts / tables available to communicate different categories of variables.
1. Summary table: The summary table is a visualization which in table form, summarizes statistical data
information. In other visualizations, all visualizations can only be set up to display data constrained by
one or more markings (details visualizations). It is also possible to restrict the overview tables to one or
more filters.
2. Frequency Distribution table: A frequency distribution is a representation that shows the number of
observations within a given interval, either in a graphical or tabular format. The magnitude of the
interval depends on the data being evaluated and the analyst objectives. There must be mutually
exclusive and exhaustive intervals. In a mathematical sense, frequency distributions are usually used. In
general, the distribution of frequency may be combined with the mapping of regular distribution.
3. Contingency table: A data table in which data is tabulated by row entries according to one variable
and tabulated by column entries according to another variable, and which is used in particular in the
analysis of the association between variables.
4. Ordered array: In ascending or descending order, the elements of the ordered array are arranged.
Generally speaking an ordered array may have duplicate components.
After organizing data, you must visualize them so here is some ways in visualizing data:
1) Pie chart: A circular mathematical graph is a pie map (or a circle chart), which is broken into slices to
show numerical proportions. The arc length of each slice (and thus its central angle and area) in a pie
chart is equal to the sum it represents.
2) Stem and leaf: A table used for viewing data is a stem and leaves. On the left is the 'stem' that
indicates the first digit or digits. On the right is the ‘leaf’, which indicates the last digit.
3) Bar chart: A bar chart or bar graph is a chart or graph that provides rectangular bars with categorical
data with heights or lengths proportional to the values they represent. It is possible to plot the bars
vertically or horizontally. Comparisons of various groups are seen in a bar graph.
4) Scatter plot: A scatter plot is a series of points on a horizontal and vertical axis. In statistics, scatter
plots are important since they will display the degree of association, if any, between the values of
quantities or phenomena observed.
5) Histogram: Description of Quality Glossary: Histogram. A spectrum of frequencies indicates how often
each different value in a data set happens. The most widely used graph to illustrate frequency
distributions is a histogram. It looks pretty much like a bar map, but the distinctions between them are
major.
Use the appropriate tables/charts in order to present and communicate the following variables:
Survey 1:
-One variable: (major field of study)
For one categorical variable summary table is the simplest and easiest way to organize it:
The best way to organize two categorical variables is the contingency table:
H 23
G 35
E 53
N 60
O 63
M 65
A 70
L 70
F 78
B 80
J 80
K 85
I 90
C 95
D 98
Performance
Stem Leaf
2 3
3 5
4
5 3
6 0 3 5
7 0 0 8
8 0 0 5
9 0 5 8
With two numerical variables, we do a normal table to organize data because it is the easiest way to
read data after organizing it.
A 23 350
B 35 500
C 53 600
D 60 500
E 63 650
F 65 1200
G 70 1000
H 70 1200
I 78 1000
J 80 1200
K 80 1400
L 85 1350
M 90 1500
N 95 1400
O 98 1200
Salary Vs Performance
1600
1400 f(x) = 15.85 x − 100.91
1200 R² = 0.76
1000
Salary
800 Salary in £
600 Linear (Salary in £ )
400
200
0
10 20 30 40 50 60 70 80 90 100 110
Prformance