Chapter2 PDF
Chapter2 PDF
A N A LY Z I N G S O C I A L M E D I A D ATA I N R
Vivek Vijayaraghavan
Data Science Coach
Lesson Overview
Filtering based on tweet components
Extract original tweets
x freq
<fctr> <int>
blairaasmith 2
javiergosende 1
juanburgos 1
WhutTheHale 2
NA 94
x freq
<lgl> <int>
FALSE 98
TRUE 2
x freq
<lgl> <int>
FALSE 61
TRUE 39
x freq
<lgl> <int>
NA 100
x freq
FALSE 100
head(counts)
retweet_count favorite_count
<int> <int>
1 162 833
2 141 894
3 164 1128
4 395 1346
5 475 2271
6 270 1654
text
<chr>
1 As we continue to build the Bakkt Bitcoin Futures contract, we reached a
2 BREAKING: The United States is considering entering into a "currency pact"
3 REMINDER: The Bitcoin ETF will eventually get approved.\n\nNot a question
4 [New Post] Bitcoin is becoming much more important in Hong Kong and India.
5 Reports are surfacing that some Hong Kong ATMs have run out of cash as
6 Bitcoin is the most transparent currency ever created.
Vivek Vijayaraghavan
Data Science Coach
Lesson Overview
friends_count and followers_count of a user
Vivek Vijayaraghavan
Data Science Coach
Lesson Overview
Understand twitter trends
trend tweet_vol
<chr> <dbl>
#90DayFiance 14375
#acefamilyisoverparty 12760
#ascendwithme NA
#bbcon2019 NA
#bookbirthday NA
#DemDebate 18928
trend tweet_vol
<chr> <dbl>
LeBron 298302
Lions 267945
Columbus Day 135014
John Bolton 118933
#DETvsGB 67197
#TuesdayThoughts 63259
Vivek Vijayaraghavan
Data Science Coach
Lesson overview
Time series data
library(rtweet)
Time series object contains aggregated frequency of tweets over a time interval
time n
<S3: POSIXct> <int>
2019-08-13 14:00:00 12
2019-08-13 15:00:00 34
2019-08-13 16:00:00 1
2019-08-13 17:00:00 2
head(camry_ts)
time camry_n
<S3: POSIXct> <int>
2019-08-13 14:00:00 12
2019-08-13 15:00:00 34
2019-08-13 16:00:00 1
2019-08-13 17:00:00 2
time tesla_n
<S3: POSIXct> <int>
2019-08-13 13:00:00 17
2019-08-13 14:00:00 58
2019-08-13 15:00:00 38
2019-08-13 16:00:00 32
2019 08 13 17 00 00 38