0% found this document useful (0 votes)
34 views16 pages

SFM A1.1

The document discusses a dataset containing property sales data from Bristol from 2019-2020. It includes 9,151 property sales records with information on attributes like age, postcode, year of sale, and property type (flat, detached). The document then discusses descriptive vs inferential statistics and different types of existing data and statistical data that can be collected through various qualitative and quantitative methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views16 pages

SFM A1.1

The document discusses a dataset containing property sales data from Bristol from 2019-2020. It includes 9,151 property sales records with information on attributes like age, postcode, year of sale, and property type (flat, detached). The document then discusses descriptive vs inferential statistics and different types of existing data and statistical data that can be collected through various qualitative and quantitative methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

I.

Part A
The excel file included (called “House Price Data Project”) includes data regarding
property prices and various attributes of the properties. Data is available for 9151
property sales in Bristol for the years 2019 and 2020.

1. Method of analyzing the data set


Descriptive statistic: The vast majority of statistical information seen in newspapers,
periodicals, corporate reports, and other media is simplified and presented in a way that is
understandable to the reader. Descriptive statistics are summaries of data that may be
presented in a variety of ways, including tables, graphs, and numbers.

Inferential statistic: Statistical inference is the process through which a sample of data is
used to create estimates and test hypotheses about a population's characteristics.

Descriptive statistic Inferential statistic


- It is the goal of the field of descriptive - Sample analysis and observation are
statistics to provide a comprehensive used to make inferences about the
picture of the population under population in Inferential Statistics.
investigation.
- Data is collected, organized, analyzed, - Inferential Statistics compares data,
and presented in a relevant manner tests hypotheses, and predicts future
using descriptive statistics. results.
- The final result is presented in - The final result is expressed as
diagrammatic or tabular form. probability.
- Descriptive statistics depicts a - Inferential statistics explains the
condition. probability of an event occurring.
- To summarize a sample, descriptive - Inferential statistics aims to draw a
statistics describes the data that is conclusion in order to learn about the
previously known. Inferential statistics, population; this goes beyond the facts
on the other hand, aims to draw a supplied.
conclusion in order to learn about the
population; this goes beyond the facts
supplied.

2. Data
2.1. Data sources:

Internal and external sources are two types of existing data

 Internal source: data from the company's database, such as personnel records,
production records, sales records, and so on, make up the bulk of this kind of
source.
 Outside of the firm, you'll need to obtain data from sources outside of the
company. So, it is clear that data must be obtained via the purchase or negotiation
of agreements with other parties. Only with the consent of both parties can the
information be accessed.

Statistical data is developed to meet the needs of businesses and organizations that
require more detailed and selected data. Experimental data and observational data are the
two categories of statistical data.

 A corporation observes a certain scenario or time, then collects data on the factors
of interest, and finally does a statistical analysis on the results of their data
collection. In order to collect observational data, surveys and public opinion polls
might be used.
 It's important to note that experimental data differs from observation in that it is
carried out in a controlled environment. As a result of this, experimental results
give more comprehensive and accurate information.

As a general rule, observational data may offer an overall picture of an individual or a


group of individuals, whereas experimental data will cost more and deliver more detail.
Existing data Statistical data
This data is simple to locate, and the To ensure that the data is reliable, this
number of sources is almost limitless. method uses the most up-to-date quantities
Despite this, there is no way to ensure the and characteristics of a subject. The
veracity of the information provided. There downside is that it will take a lot of time
will be certain papers and resources from and money to get there. If the corporation
the previous century that have not been can devote more resources to this approach,
updated, and publication errors may also be the eventual product will be very different
an issue. As a consequence, this data type from what is already known.
is not ideal to use, but if there is no other
means to get data, it remains a fair option.
2.2. Data collection methods

Interviews, surveys and questionnaires, observation, recording and documenting,


and research on targeted populations are all examples of data gathering methods. Once
you've gathered data, you may divide it into two categories: qualitative and quantitative
(numbers and counts)

Qualitative data Quantitative data


- An interview in which one person asks - This data collecting approach uses
questions and the other person responds in surveys and questionnaires in which
order to gather information. participants are given a series of questions
- Observation: In order to get the to respond in various ways. Researchers
necessary data, researchers must pay will then be able to compile a database of
attention to the participants' activities and the information provided by the
behavior. participants.
- Focused group research includes - They'll look for materials and papers
interviews, questionnaires, and observation associated with the study participants to
since the researchers must gather data from make sure they have all the information
a group or team that has comparable they need.
characteristics. Having an overall
perspective of the group is the goal, so that
data may be collected.

II. Part B
2.1 Qualitative variables

age
Cumulative
Frequency Percent Valid Percent Percent
Valid 1900-1929 1856 20.3 20.3 20.3
1930-1949 1773 19.4 19.4 39.7
1950-1966 1338 14.6 14.6 54.3
1967-1975 714 7.8 7.8 62.1
1976-1982 223 2.4 2.4 64.5
1983-1990 332 3.6 3.6 68.1
1991-1995 142 1.6 1.6 69.7
1996-2002 271 3.0 3.0 72.7
2003-2006 517 5.6 5.6 78.3
2007 onwards 275 3.0 3.0 81.3
before 1900 1710 18.7 18.7 100.0
Total 9151 100.0 100.0
postcode
Cumulative
Frequency Percent Valid Percent Percent
Valid BS1 186 2.0 2.0 2.0
BS10 504 5.5 5.5 7.5
BS11 354 3.9 3.9 11.4
BS13 546 6.0 6.0 17.4
BS14 645 7.0 7.0 24.4
BS15 263 2.9 2.9 27.3
BS16 602 6.6 6.6 33.9
BS2 220 2.4 2.4 36.3
BS3 1058 11.6 11.6 47.8
BS4 1058 11.6 11.6 59.4
BS5 1178 12.9 12.9 72.3
BS6 588 6.4 6.4 78.7
BS7 766 8.4 8.4 87.1
BS8 391 4.3 4.3 91.3
BS9 792 8.7 8.7 100.0
Total 9151 100.0 100.0
year
Cumulative
Frequency Percent Valid Percent Percent
Valid 2019 5137 56.1 56.1 56.1
2020 4014 43.9 43.9 100.0
Total 9151 100.0 100.0
flat
Cumulative
Frequency Percent Valid Percent Percent
Valid 0 7189 78.6 78.6 78.6
1 1962 21.4 21.4 100.0
Total 9151 100.0 100.0

It can be seen that the percentage of flat house accounted for 21.4% in the sample size
while the others types of house is about 78.6%.

detached
Cumulative
Frequency Percent Valid Percent Percent
Valid 0 8668 94.7 94.7 94.7
1 483 5.3 5.3 100.0
Total 9151 100.0 100.0
The table and pie chart pointed out that detached houses for sale in England and Wales in
2019, 2020 comprised barely 5.3 percent of the property types.
2.2 Quantitative variables:

Descriptive Statistics
N Minimum Maximum Mean Std. Deviation Skewness
Statistic Statistic Statistic Statistic Statistic Statistic Std. Error
price 9151 60000 3000000 333526.49 193388.346 3.032 .026
size 9151 18.000 558.000 93.08058 39.234310 2.648 .026
numberrooms 9151 1 14 4.51 1.506 .746 .026
current_energy_efficiency 9151 1 95 61.86 11.345 -1.093 .026
potential_energy_efficiency 9151 17 106 79.94 7.532 -1.802 .026
Valid N (listwise) 9151
The mean of house’s prices is approximately 333526.5 £ while 193388.346 is the
standard deviation of sales prices, which is considered a big different so it is believed that
the house’s prices fluctuated weakly. Looking at the histogram, the price of house from
160000£ to 390000£ is seemed to be the most frequency interval. Besides, the number of
expensive house is unpopular in England and Wales during 2019 and 2020.
About the total floor area of houses, the mean size is 93.081 m 2 while the dispersion of
the data from the mean of the values is roughly 39,234 which is regarded fairly big. To
put it another way, the typical dwelling has an oscillation around a floor space of 93 m 2.
From 63 to 100 square feet, these are the most common dwelling sizes.
The bulk of observations are for residences with a room count of five, which is close to
the mean number of rooms. Additionally, the standard deviation of 1,506 suggests that
the number of rooms in a home varies mostly around the average number of rooms,
implying that the majority of houses contain between three and six rooms.
About the current energy efficiency, statistic table pointed out that the mean of property’s current
energy is 61.86 while the standard deviation is only 11.345. It means that the amplitude of the
current energy property's swings is regarded above average since the majority of energies remain
close to the average value. In detail, the most prevalent kind occurs in houses with energy
currents ranging from 58 to 72.

79.94 is the average potential energy of real estate, which is regarded fairly high. When the
standard deviation is just 7.532, it is considered that there is a tiny variation in the potential
energy. 86 is the most frequency of the potential energy of all observations, according to the
chart.

The correlation between price and size


Correlations
price size
price Pearson Correlation 1 .835**
Sig. (2-tailed) .000
N 9151 9151
size Pearson Correlation .835** 1
Sig. (2-tailed) .000
N 9151 9151
**. Correlation is significant at the 0.01 level (2-tailed).

In this chart, you can see the trend of properties for sale, which tend to have lower median prices
and a robust and favorable connection is expected. Moreover the sig. is 0.00 which is smaller
than 0.05, it means that the correlation coefficient between price and size is statistically
significant. Therefore, we can observe that the price-to-size connection has a strong positive
correlation of 0.835. To put it another way, as the size of a home grows, so does the price.
The correlation between price and number of rooms
Correlations
price numberrooms
price Pearson Correlation 1 .656**
Sig. (2-tailed) .000
N 9151 9151
numberrooms Pearson Correlation .656** 1
Sig. (2-tailed) .000
N 9151 9151
**. Correlation is significant at the 0.01 level (2-tailed).

A correlation between price dispersion and home size may be inferred from the concentration of
the scatter plot chart, which suggests an average connection. The significant level of 0.00 is less
than 0.05, indicating a relationship between the two variables, price and number of rooms. Price
and simulation have a moderate positive link, according to the correlation index of 0.659.
price
Standard
Mean Mode Range Deviation Minimum
flat 0 357639 270000 2920000 202226 80000
1 245175 200000 1205000 121081 60000
detached 0 317996 250000 1690000 163948 60000
1 612231 435000 2850000 380085 150000
year 2019 320208 240000a 1740000 178711 60000
2020 350572 250000 2940000 209476 60000
a. Multiple modes exist. The smallest value is shown

Because the mean house price for other kinds of homes is higher, there is a narrower price range
for flats, on average, by 112464 £ than there is for other types of homes. Adding insult to injury,
a look at the pivot table indicates that the standard deviation of apartment prices is 121081,
indicating that on average, apartment prices are very near to 245172 £, but the standard deviation
of other house prices is larger.
The average price of a detached home is approximately twice as expensive as the average price
of other kinds of homes, according to the table above. Detached house prices have a wider price
range and a larger standard deviation than other types of properties. To put it another way, it
indicates that there is an enormous disparity between the most costly and least expensive
detached residences in terms of price. Detached home values are more volatile.
The price of a home in 2020 is expected to rise somewhat over the previous year, although the
price range is considerably wider in 2020 than it was in 2019. The most costly property in 2019
is 1740000 £ more expensive than the cheapest house in the same year in 2019. 2940000 £ will
be the difference by 2020. Prices for 2020 are expected to not have much larger standard
deviation than 2019 prices.

Qualitative and quantitative factors need to be explored in section B of the paper. For
qualitative variables like apartment type, year, and age, the percentage table and pie chart
should be used since such values are given based on the total number of observations and
sets in the dataset. Because the percentage of each variable is so tiny, a pie chart is a
better visualization tool for examining the frequency ratio of several variables. The
statistics table and histogram are used for quantitative variables such as price, size,
current and potential energy efficiency. For quantitative variables, we may use a
histogram to visualize the frequency distribution and from that, we can make educated
guesses about what will happen by examining the most common values, their occurrence
rates, and their shapes.

III. Part C
3.1 Using T-test

Group Statistics
year N Mean Std. Deviation Std. Error Mean
price 2019 5137 320207.50 178711.408 2493.432
2020 4014 350571.75 209476.133 3306.327

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy