0% found this document useful (0 votes)
16 views3 pages

June 2019 - 4. HCI KDD Paper v2

Uploaded by

anisharyan055
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views3 pages

June 2019 - 4. HCI KDD Paper v2

Uploaded by

anisharyan055
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Budding Data Scientists Hackathon

Hui Xiang Chua Ee-Ling Chua Kenneth Soo


National University of Singapore Hwa Chong Institution Stanford University
datadoubleconfirm@gmail.com chuael@hci.edu.sg kenneth@algobeans.com

ABSTRACT Timeline Deliverables


The "Budding Data Scientists Hackathon" was a pilot program to Jan - Mar 2018 Curriculum design
bring data science into a high school's curriculum in Singapore. Mar 2018 Training week
Unlike typical hackathons, this hackathon lasted for a few months.
A total of seven teams comprising 22 students underwent one Mar - Aug 2018 Mentoring
week of intensive training workshops and five months of Aug 2018 Final showdown/ Project presentations
mentoring to work on projects tackling social challenges using
Table 1. Timeline of "Budding Data Scientists Hackathon".
data science. The hackathon was made possible with the support
of the KDD Impact Program [1].
2. OUR APPROACH
1. INTRODUCTION A. Training and Curriculum Design
The "Budding Data Scientists Hackathon" set out to achieve the The curriculum for the training sessions was developed by data
following objectives: science practitioners, namely Hui Xiang Chua and Kenneth Soo,
● Enhance data science community engagement; and constituted data science concepts and hands-on exercises
using tools such as R, Python and Tableau (see Table 2). The
● Expand outreach of data science;
course materials were meticulously-designed and included the use
● Increase diversity and participation in data science; of visuals, multimedia, and real-world examples. Homework was
● Increase societal impact of data science; assigned to participants at the end of each session for them to
apply what they had learnt, and was also used to evaluate the
● Influence public policy through data science. participants’ learning progress.
The hackathon aimed to motivate upper secondary school students The curriculum was customized to fit the school's timetabling of
(i.e. grade 9/ 10 of the U.S. high school system) to develop an five 4-hour sessions, as part of Sabbaticals Week where students
interest in data science and use data science to help a social cause. learnt subjects of their interest outside of their core subjects.
They worked in teams to help tackle social challenges of their There was a focus on mathematical and computational analysis in
interest using data science, with a possibility of improving the the curriculum design as the sabbaticals were meant to further
data maturity within Voluntary Welfare Organisations (i.e. non- enhance students’ interests and develop their expertise in specific
profit organisations that provide welfare services and/or services core subjects taught during lower secondary, namely mathematics
that benefit the community at large). All teams presented their and computing in this case.
projects to a judging panel at the final showdown, and prize
money were awarded to the top three teams. A special award was B. Project Mentoring
also given for best visualization. During the project scoping stage, each team had to determine a
As the current secondary school curriculum did not encompass topic of analysis that centered around public policy or a particular
data science, all participants underwent five training sessions social cause. Participants could choose to make use of open data,
(approx. 20 hours). The training covered different aspects of data reach out to voluntary welfare (non-profit) organisations of
science such as statistics, programming, visualization, data interest for data, and/ or collect their own data.
maturity framework, data pipelines etc. We partnered with Animal Concerns Research and Education
This would be the first time students were able to gain real-world Society (ACRES), a non-governmental organisation and a
experience working with data science problems at earlier stages of registered animal welfare charity with the Ministry of Culture,
their education in Singapore. The inaugural “Budding Data Community and Youth in Singapore, which focuses on tackling
Scientists Hackathon” brought together five teams of students wildlife crime and humane education. Students could opt to work
from Hwa Chong Institution, and two teams from the affiliated on projects that helped ACRES in their work.
Nanyang Girls’ High School, with 3–4 students per team. To ensure that students received help and guidance whenever they
The final showdown was open to teachers and non-participating needed, mentoring was done over a team collaboration platform,
students to raise awareness of data science and its applications. Slack. In addition, three face-to-face consultation sessions are
held for progress updates and clarifications on project scoping,
The various data science projects done during the hackathon can data collection, and data analysis.
become use cases while the "Budding Data Scientists Hackathon"
can be replicated across different high schools. Table 1 outlines
the timeline of the hackathon.
Day Curriculum ● An optimization of taxi services in Singapore
1 Lab 1: Software installation ● Predicting traffic volume in the Central Business
District of Singapore
Theory 1: Introduction to various Data Science tasks
● Understanding risk factors for diabetes
Theory 2: Basic statistical concepts
● Identifying illegal wildlife trading on an e-commerce
Lab 2: Basic statistical tests in R
platform (done by two teams independently)
Homework #1: Share 3 things I learnt today and 1
● An analysis on wildlife trade
question I still have.
● Planning for Mass Rapid Transit delays in Singapore
2 Lab 3: Introduction to R (Basic R functions, Indexing,
Sort)
Lab 4: Data preparation in R (Merging, Recoding, Web 3. OUTCOMES AND TAKEAWAYS
Scraping) Students exhibited interest and enthusiasm throughout the
Lab 5: Plotting and Advance functions in R (IF and hackathon, and were able to understand and apply most of the
FOR) content that was taught.
Homework #2: Using what you have learnt today, find A. Highlights and Effectiveness of Training Session
interesting table(s) on Wikipedia, then use R to extract While most students were exposed to data science for the first
and plot something. Be sure to include plot title and time, it was heartening to see that students were interested and
axis labels. enthusiastic during class. Throughout the course, many
3 Theory 3: Probability participants asked questions that reflected their curiosity and
desire to learn.
Theory 4: k-Nearest Neighbors
Homeworks #1 and #3 required students to summarise what they
Theory 5: Regression have learnt for the day, while Homeworks #2, #4, and #5 required
Lab 6: k-Nearest Neighbors students to find data from the Internet and apply what they had
Lab 7: Regression (Simple, Multiple) learnt. From the homework submissions, students demonstrated a
high level of understanding of the course content, as well as
Lab 8: Decision trees creativity in applying the learnt skills to new data.
Homework #3: Share 3 things I learnt today and 1 As expected, one challenging aspect of the course was teaching
question I still have. the coding component. Coding required students to think in a
4 Lab 9: Data visualization and dashboarding in Tableau manner that was different from traditional subjects. The students
had various coding backgrounds; some students learnt coding (in
Homework #4: Build a dashboard containing three other languages) in school, whereas other students were
charts (at least two different chart types) and post to completely new to coding. This resulted in different learning
Tableau Public. Save your dashboard as image and speeds. To solve this, peer-to-peer learning was encouraged. In
create a Medium post inserting the image and your addition, the faster students were given advanced materials to self-
Tableau public dashboard URL. study while the slower students received additional help from the
5 Lab 10: Webscraping with Python instructors. The use of annotated sample codes, interesting
Homework #5: Go to IMDb.com> Movies, TV & datasets, and examples were also critical to the students’ learning.
Showtimes> Most Popular Movies. Scrape Title, Rank, During the training sessions, it was observed that the topic on
Rating, Advisory category, Run time, Genres, Date. Do statistical tests proved to be challenging for students to grasp. This
a write-up on what kind of analysis can be done could be due to a hastened introduction to such concepts without
including a screenshot of your code. dwelling more on probability distributions and it might be more
appropriate to cover this topic after the chapter on probability. In
Table 2. Curriculum of "Budding Data Scientists
addition, more time should be spent on the general syntax of the
Hackathon".
Python programming language before the webscraping exercise.
B. Survey on Training Sessions
C. Final Showdown and Blog
A survey was conducted to assess the effectiveness of the training
For the final showdown, teams had to do an eight-minute sessions (see Figure 1). From the survey, most students agreed
presentation, and a further three minutes were set aside for that the sessions were engaging, interesting, clear, and easy to
question-and-answer. Teams were judged based on five criteria, follow. In addition, all students agreed that the course material
namely understandability, accuracy, creativeness, usefulness of aided their learning, and that the training sessions improved their
findings, and teamwork. knowledge of data science.
A blog was set up to share the training materials/ curriculum and We also collected qualitative feedback and the following are some
document growth stories, learning outcomes, and findings from responses we obtained:
the hackathon. [2] The blog content could be useful for
● The course is enriching and enjoyable and I have a
individuals who are interested in learning about data science.
better grasp of basic coding in R and Python after the 5
The projects done by the students included: day sabbatical course. Hope this course could continue
for juniors to attend.
● This course has not only helped us in learning the basics real-world problem solving skills and interact with working
in data science, it has also given us a headstart in our personnel to understand their business challenges. Students learn
project. Learning all of these useful skills will definitely to recognize constraints in the real world and that there is no one
be helpful in the future. perfect solution to the problem. This is also a platform for the
● I felt the course was meaningful and my knowledge of school and students to contribute to society. As with any data
data science definitely improved. science projects, data is critical and many non-profit organizations
● The course is really an enjoyable and meaningful are not data-ready where data is either not collected in a proper
learning experience for me. Although some content is manner, or not at all. Hence, most of the projects in this hackathon
very difficult and take me quite much time to figure out rely on public data and/ or APIs made available by the Singapore
everything, I think these contents are very useful to both government. The pilot is successful and the school will be holding
the project and future life. a second run in 2019.
● This course has not only helped us in learning the basics
in data science, it has also given us a headstart in our
project. Learning all of these useful skills will definitely 4. ACKNOWLEDGMENTS
be helpful in the future. We would like to thank SIGKDD for establishing the KDD
Impact Program and the funding of awards.

5. REFERENCES
[1] ACM SIGKDD News: Announcing the SIGKDD Impact
Program Recipients for 2018.
https://www.kdd.org/News/view/announcing-the-kdd-
impact-program-recipients-for-2018
[2] Budding Data Scientists Hackathon homepage.
https://medium.com/budding-data-scientists

About the authors:

Hui Xiang Chua graduated with a B.Sc.(Hons) in Statistics and


M.Sc. in Business Analytics from National University of
Singapore in 2012 and 2016 respectively. She is a Data Science
for Social Good fellow and has over six years of experience
solving problems using data in the public service as a research
analyst. Her data science blog was recognised as 2018 Top 100
Data Science Resources on MastersInDataScience.com.
(projectosyo.wixsite.com/datadoubleconfirm)
Ee Ling Chua completed her teacher training in National Institute
Fig 1. Survey results on training sessions of "Budding Data of Education, Singapore in 2004 and subsequently Master’s
Scientists Hackathon" (n=16). Degree in Education from the University of Western Australia in
2016. She is the Principal Consultant at Hwa Chong Institution for
the Mathematics department and has served the school for eight
C. Highlights of Mentoring and Final Showdown years.
The five-month time frame for students to work on projects is Kenneth Soo completed his MS degree in Statistics at Stanford
extremely useful for students to explore and apply the concepts, University in 2017. He is the co-author of the best-selling book,
skills and techniques learned. Students are motivated to put in the Numsense! Data Science for the Layman: No Math Added, which
effort in these projects to fulfill the requirement of project work was written as a gentle introduction to data science and its
grading within the school curriculum, in addition to presenting at algorithms. He was the top student for all three years of his
the final showdown. They had to balance other academic demands undergraduate class in Mathematics, Operational Research,
and commitment to the project. The school recognizes that the Statistics and Economics (MORSE) at the University of Warwick.
hackathon presented an opportunity for students to enhance their

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy