0% found this document useful (0 votes)
3 views4 pages

ENGINEER

The analysis of word counts between males and females reveals that both datasets contain outliers, identified using the interquartile range method. The mean and median values are similar, and a t-test indicates no significant difference between the two groups (p = 0.5830). Supporting graphs, including histograms and boxplots, illustrate the distribution and variation in word counts for both genders.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views4 pages

ENGINEER

The analysis of word counts between males and females reveals that both datasets contain outliers, identified using the interquartile range method. The mean and median values are similar, and a t-test indicates no significant difference between the two groups (p = 0.5830). Supporting graphs, including histograms and boxplots, illustrate the distribution and variation in word counts for both genders.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

TASK 2

Refer to Data Set here (14 - Word Counts.xlsx), which includes counts of words spoken by males and
females. That data set includes 12 columns of data, but first stack all of the male word counts in one column
and stack all of the female word counts in another column. Then proceed to generate histograms, any other
suitable graphs, and find appropriate statistics that allow you to compare the two sets of data.

Are there any outliers? Do both data sets have properties that are basically the same? Are there any
significant differences? What would be a consequence of having significant differences? Write a brief report
including your conclusions and supporting graphs.

% Load the dataset


filename = '14 - Word Counts(14 - Word Counts ).csv';
data = readtable(filename);

% Remove missing values


data = rmmissing(data);

% Extract data columns


male = data.MALE;
female = data.FEMALE;

% Compute statistics
mean_male = mean(male);
median_male = median(male);
std_male = std(male);
min_male = min(male);
max_male = max(male);

mean_female = mean(female);
median_female = median(female);
std_female = std(female);
min_female = min(female);
max_female = max(female);

% Display statistics
fprintf('Male - Mean: %.2f, Median: %.2f, Std Dev: %.2f, Min: %.2f, Max:
%.2f\n', mean_male, median_male, std_male, min_male, max_male);

Male - Mean: 15668.53, Median: 14290.00, Std Dev: 8632.53, Min: 695.00, Max: 47016.00

fprintf('Female - Mean: %.2f, Median: %.2f, Std Dev: %.2f, Min:


%.2f, Max: %.2f\n', mean_female, median_female, std_female, min_female,
max_female);

Female - Mean: 16121.07, Median: 15917.00, Std Dev: 7187.78, Min: 1674.00, Max: 40055.00

% Identify outliers using IQR


Q1_m = quantile(male, 0.25);
Q3_m = quantile(male, 0.75);
IQR_m = Q3_m - Q1_m;
outliers_male = male(male < Q1_m - 1.5*IQR_m | male > Q3_m + 1.5*IQR_m);

Q1_f = quantile(female, 0.25);

1
Q3_f = quantile(female, 0.75);
IQR_f = Q3_f - Q1_f;
outliers_female = female(female < Q1_f - 1.5*IQR_f | female > Q3_f +
1.5*IQR_f);

% Plot histograms
figure;
histogram(male, 'FaceColor', 'b');
hold on;
histogram(female, 'FaceColor', 'r');
hold off;
legend('Male', 'Female');
xlabel('Word Count');
ylabel('Frequency');
title('Word Count Distribution');

% Plot boxplots
figure;
boxplot([male, female], 'Labels', {'Male', 'Female'});
ylabel('Word Count');
title('Word Count Boxplot');

2
% Perform t-test
[h, p] = ttest2(male, female);
if h
fprintf('Significant difference found (p = %.4f)\n', p);
else
fprintf('No significant difference found (p = %.4f)\n', p);
end

No significant difference found (p = 0.5830)

1. Are there any outliers?

Yes, both the male and female word count datasets contain outliers, which are values significantly higher or
lower than the majority of the data. These outliers were identified using the interquartile range (IQR) method.

2. Do both data sets have similar properties?

The mean and median word counts for both groups are relatively close, indicating that the average word
usage is similar. The standard deviations reflect the variation in word counts within each group, and they are
also comparable. The histograms and boxplots suggest that the overall shape of the distributions is similar.

3. Are there significant differences?

A statistical test (t-test) was conducted to assess any significant differences. If the p-value is below 0.05,
the difference is considered significant. If the p-value is above 0.05, the two groups are considered not
significantly different..

4. What could be the consequences of significant differences?

If a significant difference is found, it could suggest that males and females have distinct communication
patterns. This insight might influence research in psychology, education, and communication studies. In
workplace settings, understanding these differences could help enhance communication strategies and
foster better interactions.

5. Supporting Graphs

3
• Histogram: Shows the distribution of word counts for both groups.
• Boxplot: Highlights differences in median values and outliers.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy