Discussion #2 Shahrzad Karbasi
Discussion #2 Shahrzad Karbasi
Assignment Title:
Date of Submission: 6/11/2024
Assignment Due Date: 6/11/2024
Section Number:
Semester: 3
Course Instructor:
Certification of Authorship: I certify that I am the author of this paper and that any assistance I
received in its preparation is fully acknowledged and disclosed in the paper. I also have cited any
sources from which I used data, ideas, or words, either quoted directly or paraphrased. I certify
that this paper was prepared by me specifically for the purpose of this assignment, as directed.
[Digital signature]
2
Shahrzad karbasi
Business Analytics
Discussion #2
3
Next, handling missing data was a priority, as the dataset contained several reserved
codes for responses that were either unavailable or inapplicable. These codes included
"don't know", "no answer", "not applicable" and "rejected on the web". To focus on
meaningful and relevant responses, these inapplicable codes were removed from the
analysis. In addition, I merged the categories with low number of responses. For
example, fields with fewer than five respondents, such as "optometry" and "gerontology,"
were grouped under broader labels such as "other professions" or similar terms. This
4
integration reduced the complexity of the data set and allowed for more focused insights
into prominent areas of study.
Overall, the data show a clear pattern across academic fields. Majors such as business
administration, education, and nursing are dominant, and most other majors have only
limited representation. This distribution suggests that respondents may be predominantly
from vocational and professional backgrounds with a strong emphasis on practical and
applicable skills. In addition, a significant portion of missing data—comprising 56.8% of
the data set—was effectively managed by removing these entries from the analysis,
allowing the study to focus on meaningful responses.
This analysis demonstrates how effective data management and cleaning practices
enhance the clarity and accuracy of survey data. By standardizing labels, consolidating
categories, and managing missing data, the dataset was refined to enable valuable
insights into trends in college majors among survey respondents. The frequency
distributions and descriptive statistics suggest a predominance of career-oriented majors,
with a skewed distribution favoring fields like Business Administration and Education.
These findings provide a foundational understanding of the educational diversity among
6
respondents and underscore the value of data cleaning in preparing datasets for accurate
and insightful analysis.