0% found this document useful (0 votes)

25 views26 pages

Sociology: Intermediate Quantitative Research Method

Uploaded by

iris200193

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views26 pages

Sociology: Intermediate Quantitative Research Method

Uploaded by

iris200193

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

1

Data and Measurement

Moving from questions to variables

Aida Parnia
A.parnia@utoronto.ca

U of T Sociology

September 10, 2024

Week 2: Measurements
Today’s schedule

1. Turning data into variables

i. Toronto Open Data
ii. Exploring the data
iii. Identifying questions
iv. Constructing variables
2. Some useful definitions
3. Validity and Reliability
4. Summary of key points
3

Toronto Open Data Portal

https://open.toronto.ca/dataset/bike-share-toronto-ridership-data/
4

Bike ridership data from first 6 months of 2024

Rows: 2,407,928
Columns: 11
$ trip_id <dbl> 26916635, 26916636, 26916637, 26916638, 26916639, 2…
$ trip_duration <dbl> 897, 267, 158, 357, 195, 1814, 1736, 1348, 262, 564…
$ start_station_id <dbl> 7458, 7285, 7531, 7027, 7469, 7033, 7033, 7164, 741…
$ start_time <chr> "02/01/2024 00:00", "02/01/2024 00:02", "02/01/2024…
$ start_station_name <chr> "Church St / Lombard St", "Spadina Ave / Harbord St…
$ end_station_id <dbl> 7256, 7023, 7058, 7206, 7417, 7115, 7115, 7118, 722…
$ end_time <chr> "02/01/2024 00:15", "02/01/2024 00:06", "02/01/2024…
$ end_station_name <chr> "Vanauley St / Queen St W - SMART", "College St / B…
$ bike_id <dbl> 2325, 623, 7284, 6595, 357, 5750, 5363, 4652, 4031,…
$ user_type <chr> "Casual Member", "Annual Member", "Annual Member", …
$ model <chr> "ICONIC", "ICONIC", "ICONIC", "ICONIC", "ICONIC", "…
5

Exploring the data

1 # First we check the dimensions of our data
2 dim(bikedata)

[1] 2407928 11

and give a quick look to what the data looks like

trip_id trip_duration start_station_id start_time start_station_name end_station_i

26916635 897 7458 02/01/2024 Church St / 725

00:00 Lombard St

26916636 267 7285 02/01/2024 Spadina Ave / 702

00:02 Harbord St - SMART
26916637 158 7531 02/01/2024 541 Huron St - 705
00:02 SMART
26916638 357 7027 02/01/2024 Beverley St / 720
00:02 Dundas St W
6

The first step is getting to know the data

Once we get a sense of definitions of the variables, to understand the data

better, we ask the following questions:

What are the most use stations?

→ For the start of the trip and the end?
What are the most frequent trips?
How does the trips differ by the user type?
→ What is the difference in start time of trips between user types?
→ Do these differences vary by month of travel?

These are some of the questions we can ask from the data to construct the
variables of interest and get to know the data. But they are not necessarily
good research questions.
7

Top five most used stations?

Fig 1. Most used station for start of the trip Fig 2. Most used station for end of the trip
8

What are the most frequent trips?

route n

Tommy Thompson Park (Leslie Street Spit) TO Tommy Thompson Park (Leslie Street 3089
Spit)
Bay St / Queens Quay W (Ferry Terminal) TO Bay St / Queens Quay W (Ferry 1265
Terminal)
Waterfront Trail (Rouge Hill) TO Waterfront Trail (Rouge Hill) 1086
Humber Bay Shores Park / Marine Parade Dr TO Humber Bay Shores Park / Marine 1083
Parade Dr

Caution

Most trips are not going to another place but ending in the same place. We need to clean the data before
proceeding.
9

Cleaning data
1 bikedata <- bikedata %>%
2 mutate(route = if_else(
3 end_station_name == start_station_name, "Same station", route
4 ))
5
6 bikedata %>% count(route, sort = TRUE)
# A tibble: 263,273 × 2
route n
<chr> <int>
1 Same station 76660
2 Front St W / Blue Jays Way TO Union Station 965
3 King St W / Portland St TO King St W / Bay St (West Side) 735
4 York St / Queens Quay W TO Bathurst St/Queens Quay(Billy Bishop Airpor… 609
5 College St / Huron St TO Bay St / College St (East Side) 573
6 Fort York Blvd / Capreol Ct TO Union Station 540
7 Bathurst St/Queens Quay(Billy Bishop Airport) TO York St / Queens Quay… 532
8 Grand Avenue Park TO Windsor St / Newcastle St 444
9 Bay St / College St (East Side) TO College St / Huron St 431
10 The Well TO Union Station 419
# ℹ 263,263 more rows
10

How long do most frequent trips take?

Fig 3. Duration of trips for the top 5 most travelled routes (under 30 mins)
11

How long do most frequent trips take?

Fig 3. Duration of trips for the top 5 most travelled routes

Describing a distribution
Measures of central tendency and variation
route mean median standard_deviation q1 q3 min max
College St / Huron St TO Bay St / College St (East
7 6 18 5 7 0 444
Side)
Fort York Blvd / Capreol Ct TO Union Station 9 8 16 7 9 0 378
Front St W / Blue Jays Way TO Union Station 6 4 28 4 5 2 799
King St W / Portland St TO King St W / Bay St (West
8 7 2 7 8 4 28
Side)
York St / Queens Quay W TO Bathurst St/Queens
11 8 9 7 10 5 107
Quay(Billy Bishop Airport)
13

Differences by types of membership

How does the trips differ by the user type, annual membership or casual
membership?
→ What is the difference in start time of trips between user types?
→ Do these differences vary by month of travel?

Breaking down the task for analysis:

1. Creating start times for the trips
2. Finding months of travel -> creating a variable for month
3. Stratifying data by groups of user types and month of travel
4. Exploring distribution
5. Calculating differences
14

When do the trips start during the day?

1 library(lubridate)
2 bikedata <- bikedata <- bikedata %>%
3 mutate(start_time = mdy_hm(start_time),
4 start_month = month(start_time,
5 label = TRUE, a
6 start_day = day(start_time),
7 start_hour = hour(start_time))
8
9 bikedata %>% group_by(start_month) %>%
10 count(start_hour) %>%
11 ggplot(aes(x = start_hour, y = n, fill = s
12 geom_col(position = "dodge") +
13 scale_fill_brewer(palette = "Set2") +
14 theme_pubr(base_size = 20) +
15 labs(fill = "Month", x = "Hour of the day
16 y = "Total number of trips") +
17 facet_grid(. ~ start_month)

Fig 4. Total number of trips during the day by

months of travel
15

Differences by types of user - totals

Fig 5. Total number of trips during the day by months of travel and user type
16

Differences by types of user - percentages

Fig 6. Proportion of trips and time of the day by months of travel and user type
17

Differences by types of user - percentages

Fig 7. Proportion of trips and time of the day (AM vs PM) by months of travel and user type
18

Differences by types of user - other possibilities

If month or hour of the travel doesn’t seem to matter, then maybe it is
duration?
19

Exploratory data analysis (EDA)

So when do we stop?

Some questions to ask yourself:

Do you have a research question

in mind?
From R for Data Science, Wickham et al. 2016
Are the variables making sense
and clean?
Tip
Do you have a sense of the
EDA never stops!
distribution of the variables?
Have you considered the
important relationships between
the variables?
20

Some definitions - Types of Variables

Nominal Variables (qualitative, discrete)

These are categorical variables without any order or ranking.

Examples include name of stations, the type of bike users, gender,
race.
Ordinal Variables (qualitative or quantitative, discrete)

These are categorical variables with a clear ordering or ranking.

However, the intervals between the ranks are not necessarily equal.
Examples include education level (high school, bachelor’s, master’s,
etc.) or months of the year.
Interval-ratio Variables (quantitative, continuous)

These are numerical variables with equal intervals between values.

Examples include seconds, hours, temperature in Celsius, age.
21

Some definitions - Measures of Central Tendency

Mean

The average of a set of numbers, calculated by adding all the numbers

together and dividing by the count of numbers.

x̄ = 1
n ∑ ni=1 x i
Median

The middle value in a set of numbers when they are arranged in order.
If there is an even number of observations, the median is the average
of the two middle numbers.
Mode

The value that appears most frequently in a set of numbers.

Some definitions - Measures of Variation

Range

The difference between the highest and lowest values in a set of

numbers.
Variance

A measure of how much the values in a set differ from the mean. It is
calculated by taking the average of the squared differences from the
mean. σ = n ∑ i=1 (x i − x̄) 2
2 1 n

Standard Deviation

The square root of the variance, providing a measure of the average

distance of each value from the mean. σ = √ n1 ∑ i=1 (x i − x
n
¯) 2
23

Some definitions - Quantiles

Quantiles

Points in your data that divide it into equal-sized intervals. They help in
understanding the distribution of the data. Common quantiles include
quartiles, percentiles, and deciles.

Quartiles: Divide data into four parts.

Percentiles: Divide data into 100 parts.
Deciles: Divide data into 10 parts.
Interquartile Range (IQR)

The range of the middle 50% of the values, calculated as the

difference between the first quartile (25th percentile) and the third
quartile (75th percentile).
24

Validity & Reliability

Validity

A measure is valid to the degree that it represents what you are trying
to measure.

Internal validity: How the representation stands for the concept.

→ e.g. is annual income a valid measure of one’s material resources;
External validity: How the representation can work in different settings.
→ e.g. is annual income a valid measure of material resources for
those who are under 18 years old.
25

Key points of this week

Asking questions about the data and operationalizing concepts
Using measurements to calculate summary statistics and create
visualizations
Not all visualization methods or summary statistics fit every
measurement
Goal of EDA: finding the best ways to communicate ideas
Assessing threats to internal and external validity after defining
concepts and research questions
26

Next week: Probability

Install R and R studio on your personal computer for the tutorial
You will receive an email to use remote PC to access the computers in
the lab.
Syllabus is updated with chapters from the textbook (Regression and
Other Stories)

Inns: Civil War: Tithe Causes
No ratings yet
Inns: Civil War: Tithe Causes
262 pages
Divvy Bike-Sharing (Oct - 23 - Sep - 24) Case Study
No ratings yet
Divvy Bike-Sharing (Oct - 23 - Sep - 24) Case Study
211 pages
Transportation Analytics in The Era of Big Data: Satish V. Ukkusuri Chao Yang Editors
No ratings yet
Transportation Analytics in The Era of Big Data: Satish V. Ukkusuri Chao Yang Editors
240 pages
Analysis Report
No ratings yet
Analysis Report
54 pages
Lecture 4 Urban Transportation Planning Concepts and Travel
No ratings yet
Lecture 4 Urban Transportation Planning Concepts and Travel
129 pages
Uber 240119080622 21f5d214
No ratings yet
Uber 240119080622 21f5d214
30 pages
CH5 - Traffic Impact Assessment
No ratings yet
CH5 - Traffic Impact Assessment
30 pages
Black and Green Illustrated Bicycle Product Presentation
No ratings yet
Black and Green Illustrated Bicycle Product Presentation
41 pages
Aixibv 3
No ratings yet
Aixibv 3
31 pages
Ds R Capstone Template
No ratings yet
Ds R Capstone Template
36 pages
Public Transport Commuting Analytics: A Longitudinal Study Based On GPS Tracking and Unsupervised Learning
No ratings yet
Public Transport Commuting Analytics: A Longitudinal Study Based On GPS Tracking and Unsupervised Learning
17 pages
Presentation+Slides
No ratings yet
Presentation+Slides
23 pages
Report of BDA Mini Project
No ratings yet
Report of BDA Mini Project
11 pages
Chapter 4 - Transportation Planning Overview
No ratings yet
Chapter 4 - Transportation Planning Overview
168 pages
City Bike Assignment
No ratings yet
City Bike Assignment
9 pages
GX 7, GX 11: Instruction Book
No ratings yet
GX 7, GX 11: Instruction Book
76 pages
Meal Planning
No ratings yet
Meal Planning
31 pages
Digital Twins For Precision Healthcare
No ratings yet
Digital Twins For Precision Healthcare
20 pages
DMDS Mini Project Final
No ratings yet
DMDS Mini Project Final
15 pages
Case Study 1
No ratings yet
Case Study 1
11 pages
Working With The Divvy Data Set
100% (1)
Working With The Divvy Data Set
43 pages
How Does A Bike-Share Navigate Speedy Success - Google Capstone Project
100% (2)
How Does A Bike-Share Navigate Speedy Success - Google Capstone Project
13 pages
Uncertain
No ratings yet
Uncertain
9 pages
Kia Carnival Brochure Mobile
No ratings yet
Kia Carnival Brochure Mobile
14 pages
502-90 - Time-Based Exploration of Bicycle Trip Data-2
No ratings yet
502-90 - Time-Based Exploration of Bicycle Trip Data-2
14 pages
ENSEMBLE - LA Metro Bike Share Report
No ratings yet
ENSEMBLE - LA Metro Bike Share Report
17 pages
ANLY-500-53 Project Presentation
No ratings yet
ANLY-500-53 Project Presentation
13 pages
Extracting Commuter-Specific Destination Hotspots From Trip Destination Data - Comparing The Boro Taxi Service With Citi Bike in NYC
No ratings yet
Extracting Commuter-Specific Destination Hotspots From Trip Destination Data - Comparing The Boro Taxi Service With Citi Bike in NYC
13 pages
Instructions: Meet DRU - The World's First Pizza Delivery Robot!
No ratings yet
Instructions: Meet DRU - The World's First Pizza Delivery Robot!
9 pages
Important!: Read Before Proceeding!
No ratings yet
Important!: Read Before Proceeding!
10 pages
Service Catalog
No ratings yet
Service Catalog
3 pages
MAD - PRACTICAL EXAM Slips - 23 - 24
No ratings yet
MAD - PRACTICAL EXAM Slips - 23 - 24
9 pages
Gong Et Al. - 2016 - Inferring Trip Purposes and Uncovering Travel Patt
No ratings yet
Gong Et Al. - 2016 - Inferring Trip Purposes and Uncovering Travel Patt
13 pages
Maligad Week-1 Assignment GEE-311 Gender-Society CSAB
No ratings yet
Maligad Week-1 Assignment GEE-311 Gender-Society CSAB
3 pages
Week 8 Notes
No ratings yet
Week 8 Notes
34 pages
Case Study - How Does A Bike-Share Navigate Speedy Success - Gloria Busungu
No ratings yet
Case Study - How Does A Bike-Share Navigate Speedy Success - Gloria Busungu
10 pages
Lecture 4b Urban Transportation Planning (Part 2) PDF
No ratings yet
Lecture 4b Urban Transportation Planning (Part 2) PDF
93 pages
Analytics Lab: Travel Industry
No ratings yet
Analytics Lab: Travel Industry
16 pages
Divvy Exercise R Script
No ratings yet
Divvy Exercise R Script
5 pages
Dey'S - Sample PDF - BST-XII Exam Handbook Term-I - 2021-22
No ratings yet
Dey'S - Sample PDF - BST-XII Exam Handbook Term-I - 2021-22
62 pages
Module1 Report
No ratings yet
Module1 Report
8 pages
Final Report
No ratings yet
Final Report
18 pages
Travel Demand Forecasting Trip Distribution and Attraction
No ratings yet
Travel Demand Forecasting Trip Distribution and Attraction
4 pages
Lecture Week 08 Travel Demand Forecasting pt2 Annotated
No ratings yet
Lecture Week 08 Travel Demand Forecasting pt2 Annotated
20 pages
Identifying Route Preferences Over OriginDestination Using Cellular Network Data
No ratings yet
Identifying Route Preferences Over OriginDestination Using Cellular Network Data
5 pages
Privacy and Security in Online Social Media - Unit 12 - Research Papers - Location Based Privacy
No ratings yet
Privacy and Security in Online Social Media - Unit 12 - Research Papers - Location Based Privacy
6 pages
TPEditor V1.10 Manual
No ratings yet
TPEditor V1.10 Manual
100 pages
BTS User Guide
No ratings yet
BTS User Guide
49 pages
VH Cbse-Gr-8 Mathematics Sample QP Half-Yearly
No ratings yet
VH Cbse-Gr-8 Mathematics Sample QP Half-Yearly
10 pages
Yield Certificate RAMULIFHO CARLSWALD NORTH ESTATE - 18 APRIL 2024
No ratings yet
Yield Certificate RAMULIFHO CARLSWALD NORTH ESTATE - 18 APRIL 2024
1 page
DSE 200X Final Project DarioDiazCuevas
No ratings yet
DSE 200X Final Project DarioDiazCuevas
46 pages
TransCAD Demo Guide Compactado Páginas 1 8,70 91
100% (1)
TransCAD Demo Guide Compactado Páginas 1 8,70 91
30 pages
PF Transpo
No ratings yet
PF Transpo
3 pages
Cyclistic Customer Usage
No ratings yet
Cyclistic Customer Usage
25 pages
GAD Activity Design Template
No ratings yet
GAD Activity Design Template
2 pages
Case Study 1 Exercise R Script
No ratings yet
Case Study 1 Exercise R Script
5 pages
How To Convert Casuals To Members?": Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic"
No ratings yet
How To Convert Casuals To Members?": Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic"
18 pages
LEC-04 Trip Generation (CE-863) PDF
No ratings yet
LEC-04 Trip Generation (CE-863) PDF
41 pages
CE 461 Mod 2 Trip Genration Distribution
No ratings yet
CE 461 Mod 2 Trip Genration Distribution
39 pages
Itlog Ni Jan
No ratings yet
Itlog Ni Jan
10 pages
Dean Doneen Report 1 CEGE3201
No ratings yet
Dean Doneen Report 1 CEGE3201
14 pages
Traffic Forecasting
No ratings yet
Traffic Forecasting
46 pages
Project Cyclistic Bike Share Analysis
No ratings yet
Project Cyclistic Bike Share Analysis
1 page
GA Project 2-Bluebikes Data Analytics - Final
No ratings yet
GA Project 2-Bluebikes Data Analytics - Final
16 pages
Transfluid Clutch in 1412TP
No ratings yet
Transfluid Clutch in 1412TP
4 pages
Nelder Mead Slides
No ratings yet
Nelder Mead Slides
47 pages
BUSO 758L: Data Analysis: Week 3: Visualization Using Tableau Homework Assignment Guide
No ratings yet
BUSO 758L: Data Analysis: Week 3: Visualization Using Tableau Homework Assignment Guide
20 pages
IMD MBA Class Profiles
No ratings yet
IMD MBA Class Profiles
16 pages
Miraña Genus Aeromonas
No ratings yet
Miraña Genus Aeromonas
1 page
Central Bankig and Its Funtions New
No ratings yet
Central Bankig and Its Funtions New
21 pages
Bike Renting PDF
No ratings yet
Bike Renting PDF
26 pages
Transportation Forecasting Process
No ratings yet
Transportation Forecasting Process
36 pages
Planning at The Scale of The Ward - Sensing Local
No ratings yet
Planning at The Scale of The Ward - Sensing Local
15 pages
Jennifer Bridges
No ratings yet
Jennifer Bridges
3 pages
Narrative Tenses - Docx - Google Dokumenti
No ratings yet
Narrative Tenses - Docx - Google Dokumenti
2 pages
Feb1104 Transp GIS
No ratings yet
Feb1104 Transp GIS
24 pages
Theda Weberlucks Electroacoustic Voices in Vocal Performance Art A Gender Issue 1
No ratings yet
Theda Weberlucks Electroacoustic Voices in Vocal Performance Art A Gender Issue 1
10 pages
Transportation Planning and Engineering: Transportation Planning Process (Data Collection Methods - Cont'd)
No ratings yet
Transportation Planning and Engineering: Transportation Planning Process (Data Collection Methods - Cont'd)
17 pages
Seminar On: Electronic Braking System (Ebs)
No ratings yet
Seminar On: Electronic Braking System (Ebs)
21 pages
Understanding Road Usage Patterns in Urban Areas: Scientific Reports
No ratings yet
Understanding Road Usage Patterns in Urban Areas: Scientific Reports
6 pages
Utp
No ratings yet
Utp
10 pages
Manual Feedback Assembly (PFW) : Valco Instruments Co. Inc
No ratings yet
Manual Feedback Assembly (PFW) : Valco Instruments Co. Inc
2 pages
At Home and Abroad
No ratings yet
At Home and Abroad
6 pages
Bike Sharing Analysis
No ratings yet
Bike Sharing Analysis
4 pages
DSM 5 Chart
93% (30)
DSM 5 Chart
2 pages
Lm3622 Aplication Circuit
No ratings yet
Lm3622 Aplication Circuit
2 pages
Lumbar Herniation Case Study
No ratings yet
Lumbar Herniation Case Study
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Sociology: Intermediate Quantitative Research Method

Uploaded by

Sociology: Intermediate Quantitative Research Method

Uploaded by

1

Data and Measurement

September 10, 2024

1. Turning data into variables

Toronto Open Data Portal

Bike ridership data from first 6 months of 2024

Exploring the data

and give a quick look to what the data looks like

26916635 897 7458 02/01/2024 Church St / 725

26916636 267 7285 02/01/2024 Spadina Ave / 702

The first step is getting to know the data

Once we get a sense of definitions of the variables, to understand the data

What are the most use stations?

Top five most used stations?

What are the most frequent trips?

How long do most frequent trips take?

How long do most frequent trips take?

Fig 3. Duration of trips for the top 5 most travelled routes

Differences by types of membership

Breaking down the task for analysis:

When do the trips start during the day?

Fig 4. Total number of trips during the day by

Differences by types of user - totals

Differences by types of user - percentages

Differences by types of user - percentages

Differences by types of user - other possibilities

Exploratory data analysis (EDA)

Some questions to ask yourself:

Do you have a research question

Some definitions - Types of Variables

These are categorical variables without any order or ranking.

These are categorical variables with a clear ordering or ranking.

These are numerical variables with equal intervals between values.

Some definitions - Measures of Central Tendency

The average of a set of numbers, calculated by adding all the numbers

The value that appears most frequently in a set of numbers.

Some definitions - Measures of Variation

The difference between the highest and lowest values in a set of

The square root of the variance, providing a measure of the average

Some definitions - Quantiles

Quartiles: Divide data into four parts.

The range of the middle 50% of the values, calculated as the

Validity & Reliability

Internal validity: How the representation stands for the concept.

Key points of this week

Next week: Probability

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.