Sociology: Intermediate Quantitative Research Method
Sociology: Intermediate Quantitative Research Method
Aida Parnia
A.parnia@utoronto.ca
U of T Sociology
Week 2: Measurements
Today’s schedule
https://open.toronto.ca/dataset/bike-share-toronto-ridership-data/
4
[1] 2407928 11
These are some of the questions we can ask from the data to construct the
variables of interest and get to know the data. But they are not necessarily
good research questions.
7
Fig 1. Most used station for start of the trip Fig 2. Most used station for end of the trip
8
Tommy Thompson Park (Leslie Street Spit) TO Tommy Thompson Park (Leslie Street 3089
Spit)
Bay St / Queens Quay W (Ferry Terminal) TO Bay St / Queens Quay W (Ferry 1265
Terminal)
Waterfront Trail (Rouge Hill) TO Waterfront Trail (Rouge Hill) 1086
Humber Bay Shores Park / Marine Parade Dr TO Humber Bay Shores Park / Marine 1083
Parade Dr
Caution
Most trips are not going to another place but ending in the same place. We need to clean the data before
proceeding.
9
Cleaning data
1 bikedata <- bikedata %>%
2 mutate(route = if_else(
3 end_station_name == start_station_name, "Same station", route
4 ))
5
6 bikedata %>% count(route, sort = TRUE)
# A tibble: 263,273 × 2
route n
<chr> <int>
1 Same station 76660
2 Front St W / Blue Jays Way TO Union Station 965
3 King St W / Portland St TO King St W / Bay St (West Side) 735
4 York St / Queens Quay W TO Bathurst St/Queens Quay(Billy Bishop Airpor… 609
5 College St / Huron St TO Bay St / College St (East Side) 573
6 Fort York Blvd / Capreol Ct TO Union Station 540
7 Bathurst St/Queens Quay(Billy Bishop Airport) TO York St / Queens Quay… 532
8 Grand Avenue Park TO Windsor St / Newcastle St 444
9 Bay St / College St (East Side) TO College St / Huron St 431
10 The Well TO Union Station 419
# ℹ 263,263 more rows
10
Fig 3. Duration of trips for the top 5 most travelled routes (under 30 mins)
11
Describing a distribution
Measures of central tendency and variation
route mean median standard_deviation q1 q3 min max
College St / Huron St TO Bay St / College St (East
7 6 18 5 7 0 444
Side)
Fort York Blvd / Capreol Ct TO Union Station 9 8 16 7 9 0 378
Front St W / Blue Jays Way TO Union Station 6 4 28 4 5 2 799
King St W / Portland St TO King St W / Bay St (West
8 7 2 7 8 4 28
Side)
York St / Queens Quay W TO Bathurst St/Queens
11 8 9 7 10 5 107
Quay(Billy Bishop Airport)
13
Fig 5. Total number of trips during the day by months of travel and user type
16
Fig 6. Proportion of trips and time of the day by months of travel and user type
17
Fig 7. Proportion of trips and time of the day (AM vs PM) by months of travel and user type
18
x̄ = 1
n ∑ ni=1 x i
Median
The middle value in a set of numbers when they are arranged in order.
If there is an even number of observations, the median is the average
of the two middle numbers.
Mode
A measure of how much the values in a set differ from the mean. It is
calculated by taking the average of the squared differences from the
mean. σ = n ∑ i=1 (x i − x̄) 2
2 1 n
Standard Deviation
Points in your data that divide it into equal-sized intervals. They help in
understanding the distribution of the data. Common quantiles include
quartiles, percentiles, and deciles.
A measure is valid to the degree that it represents what you are trying
to measure.