SFM A1.1
SFM A1.1
Part A
The excel file included (called “House Price Data Project”) includes data regarding
property prices and various attributes of the properties. Data is available for 9151
property sales in Bristol for the years 2019 and 2020.
Inferential statistic: Statistical inference is the process through which a sample of data is
used to create estimates and test hypotheses about a population's characteristics.
2. Data
2.1. Data sources:
Internal source: data from the company's database, such as personnel records,
production records, sales records, and so on, make up the bulk of this kind of
source.
Outside of the firm, you'll need to obtain data from sources outside of the
company. So, it is clear that data must be obtained via the purchase or negotiation
of agreements with other parties. Only with the consent of both parties can the
information be accessed.
Statistical data is developed to meet the needs of businesses and organizations that
require more detailed and selected data. Experimental data and observational data are the
two categories of statistical data.
A corporation observes a certain scenario or time, then collects data on the factors
of interest, and finally does a statistical analysis on the results of their data
collection. In order to collect observational data, surveys and public opinion polls
might be used.
It's important to note that experimental data differs from observation in that it is
carried out in a controlled environment. As a result of this, experimental results
give more comprehensive and accurate information.
II. Part B
2.1 Qualitative variables
age
Cumulative
Frequency Percent Valid Percent Percent
Valid 1900-1929 1856 20.3 20.3 20.3
1930-1949 1773 19.4 19.4 39.7
1950-1966 1338 14.6 14.6 54.3
1967-1975 714 7.8 7.8 62.1
1976-1982 223 2.4 2.4 64.5
1983-1990 332 3.6 3.6 68.1
1991-1995 142 1.6 1.6 69.7
1996-2002 271 3.0 3.0 72.7
2003-2006 517 5.6 5.6 78.3
2007 onwards 275 3.0 3.0 81.3
before 1900 1710 18.7 18.7 100.0
Total 9151 100.0 100.0
postcode
Cumulative
Frequency Percent Valid Percent Percent
Valid BS1 186 2.0 2.0 2.0
BS10 504 5.5 5.5 7.5
BS11 354 3.9 3.9 11.4
BS13 546 6.0 6.0 17.4
BS14 645 7.0 7.0 24.4
BS15 263 2.9 2.9 27.3
BS16 602 6.6 6.6 33.9
BS2 220 2.4 2.4 36.3
BS3 1058 11.6 11.6 47.8
BS4 1058 11.6 11.6 59.4
BS5 1178 12.9 12.9 72.3
BS6 588 6.4 6.4 78.7
BS7 766 8.4 8.4 87.1
BS8 391 4.3 4.3 91.3
BS9 792 8.7 8.7 100.0
Total 9151 100.0 100.0
year
Cumulative
Frequency Percent Valid Percent Percent
Valid 2019 5137 56.1 56.1 56.1
2020 4014 43.9 43.9 100.0
Total 9151 100.0 100.0
flat
Cumulative
Frequency Percent Valid Percent Percent
Valid 0 7189 78.6 78.6 78.6
1 1962 21.4 21.4 100.0
Total 9151 100.0 100.0
It can be seen that the percentage of flat house accounted for 21.4% in the sample size
while the others types of house is about 78.6%.
detached
Cumulative
Frequency Percent Valid Percent Percent
Valid 0 8668 94.7 94.7 94.7
1 483 5.3 5.3 100.0
Total 9151 100.0 100.0
The table and pie chart pointed out that detached houses for sale in England and Wales in
2019, 2020 comprised barely 5.3 percent of the property types.
2.2 Quantitative variables:
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation Skewness
Statistic Statistic Statistic Statistic Statistic Statistic Std. Error
price 9151 60000 3000000 333526.49 193388.346 3.032 .026
size 9151 18.000 558.000 93.08058 39.234310 2.648 .026
numberrooms 9151 1 14 4.51 1.506 .746 .026
current_energy_efficiency 9151 1 95 61.86 11.345 -1.093 .026
potential_energy_efficiency 9151 17 106 79.94 7.532 -1.802 .026
Valid N (listwise) 9151
The mean of house’s prices is approximately 333526.5 £ while 193388.346 is the
standard deviation of sales prices, which is considered a big different so it is believed that
the house’s prices fluctuated weakly. Looking at the histogram, the price of house from
160000£ to 390000£ is seemed to be the most frequency interval. Besides, the number of
expensive house is unpopular in England and Wales during 2019 and 2020.
About the total floor area of houses, the mean size is 93.081 m 2 while the dispersion of
the data from the mean of the values is roughly 39,234 which is regarded fairly big. To
put it another way, the typical dwelling has an oscillation around a floor space of 93 m 2.
From 63 to 100 square feet, these are the most common dwelling sizes.
The bulk of observations are for residences with a room count of five, which is close to
the mean number of rooms. Additionally, the standard deviation of 1,506 suggests that
the number of rooms in a home varies mostly around the average number of rooms,
implying that the majority of houses contain between three and six rooms.
About the current energy efficiency, statistic table pointed out that the mean of property’s current
energy is 61.86 while the standard deviation is only 11.345. It means that the amplitude of the
current energy property's swings is regarded above average since the majority of energies remain
close to the average value. In detail, the most prevalent kind occurs in houses with energy
currents ranging from 58 to 72.
79.94 is the average potential energy of real estate, which is regarded fairly high. When the
standard deviation is just 7.532, it is considered that there is a tiny variation in the potential
energy. 86 is the most frequency of the potential energy of all observations, according to the
chart.
In this chart, you can see the trend of properties for sale, which tend to have lower median prices
and a robust and favorable connection is expected. Moreover the sig. is 0.00 which is smaller
than 0.05, it means that the correlation coefficient between price and size is statistically
significant. Therefore, we can observe that the price-to-size connection has a strong positive
correlation of 0.835. To put it another way, as the size of a home grows, so does the price.
The correlation between price and number of rooms
Correlations
price numberrooms
price Pearson Correlation 1 .656**
Sig. (2-tailed) .000
N 9151 9151
numberrooms Pearson Correlation .656** 1
Sig. (2-tailed) .000
N 9151 9151
**. Correlation is significant at the 0.01 level (2-tailed).
A correlation between price dispersion and home size may be inferred from the concentration of
the scatter plot chart, which suggests an average connection. The significant level of 0.00 is less
than 0.05, indicating a relationship between the two variables, price and number of rooms. Price
and simulation have a moderate positive link, according to the correlation index of 0.659.
price
Standard
Mean Mode Range Deviation Minimum
flat 0 357639 270000 2920000 202226 80000
1 245175 200000 1205000 121081 60000
detached 0 317996 250000 1690000 163948 60000
1 612231 435000 2850000 380085 150000
year 2019 320208 240000a 1740000 178711 60000
2020 350572 250000 2940000 209476 60000
a. Multiple modes exist. The smallest value is shown
Because the mean house price for other kinds of homes is higher, there is a narrower price range
for flats, on average, by 112464 £ than there is for other types of homes. Adding insult to injury,
a look at the pivot table indicates that the standard deviation of apartment prices is 121081,
indicating that on average, apartment prices are very near to 245172 £, but the standard deviation
of other house prices is larger.
The average price of a detached home is approximately twice as expensive as the average price
of other kinds of homes, according to the table above. Detached house prices have a wider price
range and a larger standard deviation than other types of properties. To put it another way, it
indicates that there is an enormous disparity between the most costly and least expensive
detached residences in terms of price. Detached home values are more volatile.
The price of a home in 2020 is expected to rise somewhat over the previous year, although the
price range is considerably wider in 2020 than it was in 2019. The most costly property in 2019
is 1740000 £ more expensive than the cheapest house in the same year in 2019. 2940000 £ will
be the difference by 2020. Prices for 2020 are expected to not have much larger standard
deviation than 2019 prices.
Qualitative and quantitative factors need to be explored in section B of the paper. For
qualitative variables like apartment type, year, and age, the percentage table and pie chart
should be used since such values are given based on the total number of observations and
sets in the dataset. Because the percentage of each variable is so tiny, a pie chart is a
better visualization tool for examining the frequency ratio of several variables. The
statistics table and histogram are used for quantitative variables such as price, size,
current and potential energy efficiency. For quantitative variables, we may use a
histogram to visualize the frequency distribution and from that, we can make educated
guesses about what will happen by examining the most common values, their occurrence
rates, and their shapes.
III. Part C
3.1 Using T-test
Group Statistics
year N Mean Std. Deviation Std. Error Mean
price 2019 5137 320207.50 178711.408 2493.432
2020 4014 350571.75 209476.133 3306.327