Abstract
The last years have witnessed an increasing shortage of data experts capable of analyzing the omnipresent data and producing meaningful insights. Furthermore, some data scientists mention data preprocessing to take up to 80% of the whole project time. This paper proposes a method for collaborative data analysis that involves a crowd without data analysis expertise. Orchestrated by an expert, the team of novices conducts data analysis through iterative refinement of results up to its successful completion. To evaluate the proposed method, we implemented a tool that supports collaborative data analysis for teams with mixed level of expertise. Our evaluation demonstrates that with proper guidance data analysis tasks, especially preprocessing, can be distributed and successfully accomplished by non-experts. Using the design science approach, iterative development also revealed some important features for the collaboration tool, such as support for dynamic development, code deliberation, and project journal. As such we pave the way for building tools that can leverage the crowd to address the shortage of data analysts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
References
Davenport, T.H., Patil, D.J.: Data_Scientist-the_Sexiest_Job_of_the_21St_Century.Pdf (2012)
Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Wrangler: interactive visual specification of data transformation scripts. In: Human Factors in Computing Systems, pp. 3363–3372. ACM (2011). https://doi.org/10.1145/1978942.1979444
Bernstein, A., Klein, M., Malone, T.W.: Programming the global brain. Commun. ACM 55, 41 (2012). https://doi.org/10.1145/2160718.2160731
Sere, F.C., Swigger, K., Alpaslan, F.N., Brazile, R., Dafoulas, G., Lopez, V.: Online collaboration: collaborative behavior patterns and factors affecting globally distributed team performance. Comput. Hum. Behav. 27, 490–503 (2011). https://doi.org/10.1016/j.chb.2010.09.017
Van Noorden, R.: Online collaboration: scientists and the social network. Nature 512, 126–129 (2014). https://doi.org/10.1038/512126a
MacDonald, J.: Assessing online collaborative learning: Process and product. Comput. Educ. 40, 377–391 (2003). https://doi.org/10.1016/S0360-1315(02)00168-9
Yadav, M.S., Pavlou, P.A.: Marketing in computer-mediated environments: research synthesis and new directions. J. Mark. 78, 20–40 (2014). https://doi.org/10.1509/jm.12.0020
Tseng, H., Wang, C.-H., Ku, H.-Y., Sun, L.: Key factors in online collaboration and their relationship to teamwork satisfaction. Q. Rev. Distance Educ. 10, 195–206 (2009)
Salehi, N., McCabe, A., Valentine, M., Bernstein, M.S.: Huddler: convening stable and familiar crowd teams despite unpredictable availability. In: Proceedings of the 20th ACM Conference on Computer Supported Cooperative Work & Social Computing (2016)
Yukl, G.: Leadership in organizations. In: Personnel Psychology, 7th edn, p. 542 (2001). https://doi.org/10.1016/1048-9843(95)90027-6
Kulkarni, A., Can, M., Hartmann, B.: Collaboratively crowdsourcing workflows with turkomatic. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work - CSCW 2012, p. 1003 (2012). https://doi.org/10.1145/2145204.2145354
Kittur, A., Smus, B., Kraut, R.: CrowdForge Crowdsourcing complex work. In: Proceedings of the 2011 Annual Conference Extended Abstracts on Human Factors in Computing Systems - CHI EA 2011. p. 1801 (2011). https://doi.org/10.1145/1979742.1979902
Kittur, A., Khamkar, S., André, P., Kraut, R.E.: CrowdWeaver: visually managing complex crowd work. In: Scenario, pp. 1033–1036 (2012). https://doi.org/10.1145/2145204.2145357
Bernstein, M.S., Little, G., Miller, R.C., Hartmann, B., Ackerman, M.S., Karger, D.R., Crowell, D., Panovich, K.: Soylent: a word processor with a crowd inside. In: Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology, pp. 313–322 (2010). https://doi.org/10.1145/1866029.1866078
Carpenter, J.: May the best analyst win. Science (New York) 331, 698–699 (2011). https://doi.org/10.1126/science.331.6018.698
Dissanayake, I., Zhang, J., Gu, B.: Virtual team performance in crowdsourcing contests: a social network perspective. In: ICIS 2015 Proceedings, pp. 1–16 (2014)
Heer, J., Viégas, F.B., Wattenberg, M.: Voyagers and voyeurs: supporting asynchronous collaborative visualization. Commun. ACM 52, 87–97 (2009). https://doi.org/10.1145/1240624.1240781
Viegas, F.B., Wattenberg, M., Van Ham, F., Kriss, J., McKeon, M.: Many Eyes: a site for visualization at internet scale. IEEE Trans. Vis. Comput. Graph. 13, 1121–1128 (2007). https://doi.org/10.1109/TVCG.2007.70577
Willett, W., Heer, J., Hellerstein, J.M., Agrawala, M.: CommentSpace: structured support for collaborative visual analysis. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 3131–3140 (2011). https://doi.org/10.1145/1978942.1979407
Haas, D., Krishnan, S., Wang, J., Franklin, M.J., Wu, E.: Wisteria: nurturing scalable data cleaning infrastructure. In: Proceedings of the 41st International Conference on Very Large Data Bases, vol. 8, pp. 2004–2007 (2015). https://doi.org/10.14778/2824032.2824122
dos Santos, F., Bazzan, A.L.C.: An ant based algorithm for task allocation in large-scale and dynamic multiagent scenarios. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation - GECCO 2009, p. 73 (2009). https://doi.org/10.1145/1569901.1569912
Campbell, A., Wu, A.S.: Multi-agent role allocation: issues, approaches, and multiple perspectives. Auton. Agents Multi-Agent Syst. 22, 317–355 (2011). https://doi.org/10.1007/s10458-010-9127-4
Chandrasekaran, B., Josephson, J.R., Benjamins, V.R.: Ontology of tasks and methods. Knowl. Acquis. 1–25 (1998). Spring symposium series technical report (AAAI Technical Report SS-97-06)
Stefik, M.: Planning with constraints (MOLGEN: part 1). Artif. Intell. 16, 111–139 (1981). https://doi.org/10.1016/0004-3702(81)90007-2
Malone, T.W., Crowston, K., Lee, J., Pentland, B., Dellarocas, C., Wyner, G., Quimby, J., Osborn, C., Bernstein, A., Herman, G., Klein, M., O’Donnell, E.: Tools for inventing organizations: toward a handbook of organizational processes. Manag. Sci. 45, 425–443 (1999)
Howison, J., Crowston, K.: Collaboration through open superposition. Mis Q. 38(1), 29–50 (2014)
Hevner, A.R., March, S.T., Park, J., Ram, S.: Design science in information systems research. MIS Q. 28, 75–105 (2004). https://doi.org/10.2307/25148625
Gregor, S.: The nature of theory in information systems. MIS Q. 30, 611–642 (2006). https://doi.org/10.2307/25148742
Reinecke, K., Bernstein, A.: Knowing what a user likes: a design science approach to interfaces that automatically adapt to culture. MIS Q. 37, 427–453 (2013)
Peffers, K.E.N., Tuunanen, T., Rothenberger, M.A., Chatterjee, S.: A design science research methodology for information systems research. Decis. Sci. 24, 45–77 (2008). https://doi.org/10.2753/MIS0742-1222240302
Redmiles, D.: Software requirements for supporting collaboration through categories (2000)
Krishnan, S., Wang, J., Franklin, M.J., Goldberg, K., Kraska, T., Milo, T., Wu, E.: SampleClean: fast and reliable analytics on dirty data. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 38(3), 59–75 (2015)
Agrawal, A., Horton, J., Lacetera, N., Lyons, E.: Digitization and the contract labor market: a research agenda. NBER Working Paper, vol. 37 (2013). https://doi.org/10.3386/w19525
Mascha, E.J.: Equivalence and noninferiority testing in anesthesiology research. Anesthesiology 113, 779–781 (2010). https://doi.org/10.1097/ALN.0b013e3181ec621
Peffers, K., Tuunanen, T., Rothenberger, M.A., Chatterjee, S.: A design science research methodology for information systems research. J. Manag. Inf. Syst. 24(3), 45–77 (2007)
Acknowledgments
This work was supported by the Swiss National Science Foundation under contract number 14341.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Feldman, M., Anastasiu, C., Bernstein, A. (2018). Towards Collaborative Data Analysis with Diverse Crowds – A Design Science Approach. In: Chatterjee, S., Dutta, K., Sundarraj, R. (eds) Designing for a Digital and Globalized World. DESRIST 2018. Lecture Notes in Computer Science(), vol 10844. Springer, Cham. https://doi.org/10.1007/978-3-319-91800-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-91800-6_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91799-3
Online ISBN: 978-3-319-91800-6
eBook Packages: Computer ScienceComputer Science (R0)