Abstract
Estimation of crowd sizes or the occupancy of buildings and skyscrapers can often be essential. However, traditional ways of estimation through manual counting, image processing or in the case of skyscrapers, through total water usage are awkward, inefficient and often inaccurate. Social media has developed rapidly in the last decade. In this work, we provide novel solutions to estimate the population of suburbs and skyscrapers—so-called micro-populations, through the use of social media. We develop a big data solution leveraging large-scale harvesting and analysis of Twitter data. By harvesting real-time tweets and clustering tweets within suburbs and skyscrapers, we show how micro-populations can be calculated. To validate this, we construct linear and spatial models for the suburbs in four cities of Australia using census data and geospatial data models (shapefiles). Our prediction of micro-population shows that Twitter can indeed be used for population prediction with a high degree of accuracy.











Similar content being viewed by others
References
Akaike H (2011) Akaike’s information criterion. In: Lovric M (ed) International encyclopedia of statistical science. Springer, Berlin, pp 25
Anderson JC, Lehnardt J, Slater N (2010) CouchDB: the definitive guide. O’Reilly Media, Inc., Sebastopol
Botta F, Federico, Moat HS, Preis T (2015) Quantifying crowd size with mobile phone and Twitter data. R Soc Open Sci 2(5):150162
Brockmann D, Hufnagel L, Geisel T (2006) The scaling laws of human travel. Nature 439(7075):462–465
Calabrese F et al (2010) The geography of taste: analyzing cell-phone mobility and social events. Pervasive computing. Springer, Berlin, pp 22–37
Cheng Z et al (2011) Exploring millions of footprints in location sharing services. ICWSM 2011:81–88
Cliff AD, Ord JK (1975) The choice of a test for spatial autocorrelation. In: Davies JC, McCullagh ML (eds) Display and analysis of spatial data. Wiley, Chichester, pp 54–77
Davies AC, Yin JH, Velastin SA (1995) Crowd monitoring using image processing. Electron Commun Eng J 7(1):37–47
Geary RC (1954) The contiguity ratio and statistical mapping. Inc Stat 5(3):115–146
Georgiev P, Noulas A, Mascolo C (2014) The call of the crowd: event participation in location-based social services. arXiv preprint arXiv:1403.7657
Gomide J et al (2011) Dengue surveillance based on a computational model of spatio-temporal locality of Twitter. In: Proceedings of the 3rd international web science conference. ACM
Gong Y, Deng F, Sinnott RO (2015) Identification of (near) real-time traffic congestion in the cities of Australia through Twitter”, understanding the City with Urban Informatics, CIKM 2015. Melbourne, Australia
Gonzalez MC, Hidalgo CA, Barabasi AL (2008) Understanding individual human mobility patterns. Nature 453(7196):779–782
Halleck Vega S, Elhorst JP (2015) The SLX model. J Reg Sci 55(3):339–363
Huang X, Li L, Sim T (2004) Stereo-based human head detection from crowd scenes. In: Image processing, 2004. ICIP’04. 2004 international conference on IEEE, vol. 2
Hubert LJ, Golledge RG, Costanzo CM (1981) Generalized procedures for evaluating spatial autocorrelation. Geogr Anal 13(3):224–233
Jacobs H (1967) To count a crowd. Columb Journal Rev 6(1):37
Jones M, Viola P (2003) Fast multi-view face detection. Mitsubishi Electr Res Lab TR-20003-96 3:14
Kong D, Gray D, Tao H (2006) A viewpoint invariant approach for crowd counting In: Pattern recognition, 2006. ICPR 2006. 18th international conference on IEEE, vol. 3
Lee R, Sumiya K (2010) Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection. In: Proceedings of the 2nd ACM SIGSPATIAL international workshop on location based social networks. ACM
Li SZ (2002) Statistical learning of multi-view face detection. Computer vision—ECCV 2002. Springer, Berlin, pp 67–81
Liang Y et al (2013) How big is the crowd?: Event and location based population modeling in social media. In: Proceedings of the 24th ACM conference on hypertext and social media. ACM
Marana AN et al (1997) Estimation of crowd density using image processing. In: Image processing for security applications (Digest No.: 1997/074), IEE Colloquium on IET
MacEachren AM et al (2011) Senseplace2: Geotwitter analytics support for situational awareness. In: Visual analytics science and technology (VAST), 2011 IEEE conference on IEEE
Moran PAP (1950) Notes on continuous stochastic phenomena. Biometrika 37(1/2):17–23
Ratti C et al (2006) Mobile landscapes: using location data from cell phones for urban analysis. Environ Plan 33(5):727–748
Regazzoni CS, Tesei A, Murino V (1993) A real-time vision system for crowding monitoring In: Industrial electronics, control, and instrumentation, 1993. Proceedings of the IECON’93, international conference on IEEE
Ryan D et al (2015) An evaluation of crowd counting methods, features and regression models. Comput Vis Image Underst 130:1–17
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on World Wide Web. ACM
Sawada M (2001) Global spatial autocorrelation indices–Moran’s I, Geary’s C and the General Cross-Product Statistic. Laboratory of Paleoclimatology and Climatology, Dept. Geography, University of Ottawa, Mimeo
Scellato S et al (2011) Socio-spatial properties of online location-based social networks. ICWSM 11:329–336
Seidler J, Meyer K, Gillivray LM (1976) Collecting data on crowds and rallies: a new method of stationary sampling. Soc Forces 55(2):507–519
Sinnott RO, Chen W (2016) Estimating crowd sizes through social media. In: 2016 IEEE international conference on pervasive computing and communication workshops (PerCom Workshops). IEEE
Sinnott RO, Yin S (2015) Accident black spot identification, verification and prediction through social media. In: IEEE international conference on data science and data intensive systems, Sydney, Australia
Sinnott RO et al (2014) The Australian urban research gateway. J Concurr Comput Pract Exp. doi:10.1002/cpe.3282
Song C et al (2010) Limits of predictability in human mobility. Science 327(5968):1018–1021
Swank E, Clapp JD (1999) Some methodological concerns when estimating the size of organizing activities. J Commun Pract 6(3):49–69
Swets DL, Punch B (1995) Genetic algorithms for object localization in a complex scene. In: IEEE international conference on image processing
Tanton R et al (2011) Small area estimation using a reweighting algorithm. J R Stat Soc Ser A (Statistics in Society) 174(4):931–951
Terpstra T et al (2012) Towards a realtime Twitter analysis during crises for operational crisis management. Simon Fraser University, Burnaby
Yip PSF et al (2010) Estimation of the number of people in a demonstration. Aust N Z J Stat 52(1):17–26
Zaldumbide JP, Sinnott RO (2015) Identification and verification of real-time health events through social media. In: IEEE international conference on data science and data intensive systems, Sydney, Australia
Zhan B et al (2008) Crowd analysis: a survey. Mach Vis Appl 19(5-6):345–357
Acknowledgements
The authors wish to thank the NeCTAR project for the use of the Cloud systems underpinning this paper, and the AURIN project for the Census and suburb Shapefiles. The corresponding author is Prof. Richard O. Sinnott.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sinnott, R.O., Wang, W. Estimating micro-populations through social media analytics. Soc. Netw. Anal. Min. 7, 13 (2017). https://doi.org/10.1007/s13278-017-0433-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-017-0433-6