SQL Task
SQL Task
● Problem: Create a text/csv file containing 1000 rows with the following
fields/columns:
○ StudentID: unique identifier 1:1000
○ Score: Random number between 40-100
○ Date: Any random date within the last 20 days. Eg: 18/11/2022
○ Description: Get a random word from a list of 10 words of your choice
○ Ethnicity: Randomly assign an ethnicity(google if you don't know
meaning)
○ Subject: Randomly assign one from 10 subjects. eg: Calculus,
Statistics, Databases,
○ Hobby: Randomly assign one from a list of 10 hobbies
○ Interest: Randomly assign one from a list of 10 interests. Eg: music,
nonfiction, debate, swimming,
Solve the above using any of the scripting languages of your choice(like Bash,
python, etc). Share the script and not the dataset. Share a screenshot of a
few rows of the dataset
● Problem: Based on the above data, create a dataset
○ For a specific ethnicity(your choice), find the average Score by Subject
○ Find the average Score by Hobby and Interest
○ Which ethnicity has the 2nd highest average score for a specific
Subject(eg: Statistics)
Solve each using two methods: SQL and python(you can use pandas or any
other library/package). Share the sql solution and python script for each and
not the output dataset. Also for each, share a screenshot of a few rows of the
output
● Problem: Given a student with a specific ethnicity, one specific Hobby and one
specific Interest. Recommend 3 subjects to the student, which are best
aligned to the student’s ethnicity, hobby and Interest, so that he/she can
secure the highest score in the future.
Solution needs to be in plain english(NO CODING). This problem is to judge
your communication, analytical thinking and technical understanding. The
approach to solve this problem should be NON-Machine Learning, NON-Data
Science and NON-Statistical, simpler the better. Please organize the solution
in points