ENGINEER
ENGINEER
Refer to Data Set here (14 - Word Counts.xlsx), which includes counts of words spoken by males and
females. That data set includes 12 columns of data, but first stack all of the male word counts in one column
and stack all of the female word counts in another column. Then proceed to generate histograms, any other
suitable graphs, and find appropriate statistics that allow you to compare the two sets of data.
Are there any outliers? Do both data sets have properties that are basically the same? Are there any
significant differences? What would be a consequence of having significant differences? Write a brief report
including your conclusions and supporting graphs.
% Compute statistics
mean_male = mean(male);
median_male = median(male);
std_male = std(male);
min_male = min(male);
max_male = max(male);
mean_female = mean(female);
median_female = median(female);
std_female = std(female);
min_female = min(female);
max_female = max(female);
% Display statistics
fprintf('Male - Mean: %.2f, Median: %.2f, Std Dev: %.2f, Min: %.2f, Max:
%.2f\n', mean_male, median_male, std_male, min_male, max_male);
Male - Mean: 15668.53, Median: 14290.00, Std Dev: 8632.53, Min: 695.00, Max: 47016.00
Female - Mean: 16121.07, Median: 15917.00, Std Dev: 7187.78, Min: 1674.00, Max: 40055.00
1
Q3_f = quantile(female, 0.75);
IQR_f = Q3_f - Q1_f;
outliers_female = female(female < Q1_f - 1.5*IQR_f | female > Q3_f +
1.5*IQR_f);
% Plot histograms
figure;
histogram(male, 'FaceColor', 'b');
hold on;
histogram(female, 'FaceColor', 'r');
hold off;
legend('Male', 'Female');
xlabel('Word Count');
ylabel('Frequency');
title('Word Count Distribution');
% Plot boxplots
figure;
boxplot([male, female], 'Labels', {'Male', 'Female'});
ylabel('Word Count');
title('Word Count Boxplot');
2
% Perform t-test
[h, p] = ttest2(male, female);
if h
fprintf('Significant difference found (p = %.4f)\n', p);
else
fprintf('No significant difference found (p = %.4f)\n', p);
end
Yes, both the male and female word count datasets contain outliers, which are values significantly higher or
lower than the majority of the data. These outliers were identified using the interquartile range (IQR) method.
The mean and median word counts for both groups are relatively close, indicating that the average word
usage is similar. The standard deviations reflect the variation in word counts within each group, and they are
also comparable. The histograms and boxplots suggest that the overall shape of the distributions is similar.
A statistical test (t-test) was conducted to assess any significant differences. If the p-value is below 0.05,
the difference is considered significant. If the p-value is above 0.05, the two groups are considered not
significantly different..
If a significant difference is found, it could suggest that males and females have distinct communication
patterns. This insight might influence research in psychology, education, and communication studies. In
workplace settings, understanding these differences could help enhance communication strategies and
foster better interactions.
5. Supporting Graphs
3
• Histogram: Shows the distribution of word counts for both groups.
• Boxplot: Highlights differences in median values and outliers.