Numpy Pandas Matplotlib Assignment Sem 7
Numpy Pandas Matplotlib Assignment Sem 7
Data
Based on Numpy, Pandas and Matplotlib
Subject: Machine Learning Lab
Tutor: Tawqeer ul Islam, GCET Jammu
Assignment Details
Dataset: The attached CSV file named student data.csv
1. Load and explore the data:
• Use Pandas to load the CSV file into a DataFrame.
• Get basic information about the DataFrame (shape, data types, missing
values).
• Explore score distributions per subject using histograms and box plots
(Matplotlib).
2. Calculate summary statistics:
• Use NumPy to calculate mean, median, standard deviation (and other
statistics) for each subject.
• Analyze correlations between subjects using NumPy’s correlation function.
3. Identify high-performing students:
• Determine the top 10 students based on average scores across all subjects.
• Visualize top student performance using bar charts (Matplotlib).
4. Analyze the impact of attendance:
• Explore the relationship between attendance rate and academic perfor-
mance (scatter plots, correlation analysis).
• Identify significant correlations between attendance and specific subjects.
5. Compare performance by gender:
• Calculate average scores per subject by gender and compare the results.
• Visualize gender differences using bar charts.
6. Identify areas for improvement:
• Analyze score distributions to identify areas where students may need ad-
ditional support.
Submission Date:
On or Before 17th of September 2024
Submission Mode:
The assignment should be sent via email to tawqeercse@gmail.com