0% found this document useful (0 votes)
6 views31 pages

50 Inference

The document outlines a comprehensive lab manual consisting of 50 exercises aimed at reinforcing essential mathematical concepts for data science, including error metrics, loss functions, distance metrics, hypothesis testing, and data scaling. Each exercise includes hints for implementation and encourages the use of Python with libraries like NumPy and SciPy for coding examples. Additionally, it provides detailed explanations and code implementations for specific concepts such as Mean Absolute Error (MAE), Log Loss, and various distance metrics.

Uploaded by

ilias ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views31 pages

50 Inference

The document outlines a comprehensive lab manual consisting of 50 exercises aimed at reinforcing essential mathematical concepts for data science, including error metrics, loss functions, distance metrics, hypothesis testing, and data scaling. Each exercise includes hints for implementation and encourages the use of Python with libraries like NumPy and SciPy for coding examples. Additionally, it provides detailed explanations and code implementations for specific concepts such as Mean Absolute Error (MAE), Log Loss, and various distance metrics.

Uploaded by

ilias ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

50-exercise lab manual with complete explanations and code implementations within this response.

Generating that much content would be extremely lengthy. However, I can provide you with a structured
outline of 50 exercises covering the essential math concepts you listed, along with explanations and hints
for implementation. You can then use this outline to create your lab manual, filling in the code and
detailed explanations yourself.

Lab: Essential Math for Data Science

This lab will reinforce your understanding of fundamental mathematical concepts crucial for data
science, machine learning, and AI. You will implement these concepts using a programming language of
your choice (Python with libraries like NumPy and SciPy is recommended).

Section 1: Error Metrics

1. MAE (Mean Absolute Error): Calculate MAE for a given set of predicted and actual values. (Hint:
Use absolute difference and then average.)

2. MSE (Mean Squared Error): Calculate MSE for the same data. (Hint: Square the differences
before averaging.)

3. RMSE (Root Mean Squared Error): Calculate RMSE (square root of MSE).

4. MAE vs. MSE: Compare MAE and MSE for different datasets. Discuss when one might be
preferred over the other.

5. Implement MAE and MSE functions: Create reusable functions for calculating MAE and MSE. 6-
10. Repeat exercises 1-5 with different datasets (e.g., linear, non-linear relationships).

Section 2: Loss Functions

11. Log Loss: Calculate log loss for binary classification. (Hint: Use the logarithmic function and
consider probabilities.)

12. Binary Cross-Entropy: Show that log loss is equivalent to binary cross-entropy.

13. Entropy Loss: Calculate entropy loss for multi-class classification. (Hint: Use probabilities and the
logarithmic function.)

14. Gini Index: Calculate the Gini index for a dataset. (Hint: Focus on probability distributions.)

15. Gini vs. Entropy: Compare Gini index and entropy loss. Discuss their properties and applications.
16-20. Repeat exercises 11-15 with different datasets and class distributions.

Section 3: Margins and Support Vector Machines

21. Hinge Loss: Implement a function to calculate hinge loss for SVM. (Hint: Consider the margin and
classification correctness.)

22. Effect of Margin: Analyze how the margin affects hinge loss.

23. Regularization: Explore the effect of regularization on hinge loss and model complexity.
24. SVM with Hinge Loss: (Advanced) Implement a simplified version of an SVM using gradient
descent and hinge loss.

25. Comparison with other loss functions: Compare hinge loss with other loss functions (e.g.,
squared loss) in a simple classification task.

Section 4: Distance Metrics

26. Euclidean Distance: Calculate Euclidean distance between two points in 2D and 3D space.

27. Manhattan Distance: Calculate Manhattan distance between the same points.

28. Cosine Similarity: Calculate cosine similarity between two vectors.

29. Hamming Distance: Calculate Hamming distance between two binary strings.

30. Distance Function Comparison: Compare the different distance metrics and discuss their
applications. (e.g., when is cosine similarity more appropriate than Euclidean distance?)

Section 5: Hypothesis Testing

31. Chi-Squared Test: Perform a chi-squared test for independence on a contingency table. (Hint:
Use a statistical library.)

32. Interpreting p-values: Interpret the p-value obtained from the chi-squared test.

33. Degrees of Freedom: Understand the concept of degrees of freedom in the chi-squared test.

34. Goodness of Fit: Perform a chi-squared goodness-of-fit test.

35. Chi-squared test limitations: Discuss the limitations and assumptions of the chi-squared test.

Section 6: Confidence Intervals

36. Confidence Interval for Mean: Calculate a confidence interval for the mean of a dataset. (Hint:
Use the t-distribution or z-distribution depending on sample size.)

37. Confidence Interval for Proportion: Calculate a confidence interval for a population proportion.

38. Margin of Error: Calculate the margin of error for a confidence interval.

39. Sample Size and Confidence: Explore the relationship between sample size and the width of the
confidence interval.

40. Interpreting Confidence Intervals: Explain the meaning of a 95% confidence interval.

Section 7: Data Scaling

41. Standardization (Z-score normalization): Standardize a dataset using z-score normalization.

42. Min-Max Normalization: Normalize a dataset using min-max scaling.

43. Robust Scaling: Normalize using median and interquartile range (more robust to outliers).
44. Effect on Algorithms: Discuss how standardization and normalization affect the performance of
different machine learning algorithms.

45. Choosing the right scaling method: Explain when to use each scaling method.

Section 8: Advanced Exercises (Optional)

46. Principal Component Analysis (PCA): (Advanced) Implement a simplified version of PCA.

47. Linear Regression with Gradient Descent: Implement linear regression using gradient descent.

48. Logistic Regression with Gradient Descent: Implement logistic regression using gradient
descent.

49. K-Means Clustering: Implement the K-means clustering algorithm.

50. A/B Testing: Design a simple A/B test and analyze the results.

This outline provides a comprehensive structure. Remember to include clear instructions, expected
outputs, and space for students to write their code and interpretations for each exercise. Remember to
consult statistical and machine learning textbooks or online resources for detailed explanations of the
concepts and their implementations.

Cost

examples explain with python explain code and examples

Let's illustrate a few examples from the previous outline using Python. I'll focus on error metrics, loss
functions, and distance metrics, providing code and explanations. Remember to install necessary
libraries like NumPy: pip install numpy

Section 1: Error Metrics

Python

import numpy as np

def mae(y_true, y_pred):

"""Calculates Mean Absolute Error."""

return np.mean(np.abs(y_true - y_pred))

def mse(y_true, y_pred):

"""Calculates Mean Squared Error."""

return np.mean(np.square(y_true - y_pred))


# Example usage:

y_true = np.array([1, 2, 3, 4, 5])

y_pred = np.array([1.2, 1.8, 3.1, 4.2, 5.5])

mae_error = mae(y_true, y_pred)

mse_error = mse(y_true, y_pred)

print(f"MAE: {mae_error}")

print(f"MSE: {mse_error}")

Explanation:

1. We use NumPy for efficient array operations.

2. mae() calculates the absolute difference between y_true and y_pred element-wise
using np.abs(), then averages the results using np.mean().

3. mse() squares the differences using np.square() before averaging.

Section 2: Loss Functions

Python

import numpy as np

def log_loss(y_true, y_prob):

"""Calculates Log Loss for binary classification."""

epsilon = 1e-15 # Avoid log(0) error

y_prob = np.clip(y_prob, epsilon, 1 - epsilon) #clip values to avoid log(0)

return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))

# Example usage:

y_true = np.array([1, 0, 1, 1, 0]) # True labels (1 for positive, 0 for negative)

y_prob = np.array([0.8, 0.2, 0.9, 0.7, 0.3]) # Predicted probabilities


loss = log_loss(y_true, y_prob)

print(f"Log Loss: {loss}")

Explanation:

1. log_loss() implements the formula for binary log loss. np.clip() prevents errors by ensuring
probabilities are within a small range from 0 and 1.

2. The formula sums the log-likelihood for each data point and averages it.

Section 4: Distance Metrics

Python

import numpy as np

def euclidean_distance(x, y):

"""Calculates Euclidean distance between two points."""

return np.linalg.norm(x - y)

def manhattan_distance(x, y):

"""Calculates Manhattan distance between two points."""

return np.sum(np.abs(x - y))

# Example usage:

point1 = np.array([1, 2])

point2 = np.array([4, 6])

euclidean = euclidean_distance(point1, point2)

manhattan = manhattan_distance(point1, point2)

print(f"Euclidean Distance: {euclidean}")

print(f"Manhattan Distance: {manhattan}")

Explanation:
1. euclidean_distance() uses np.linalg.norm() which efficiently computes the Euclidean norm
(distance).

2. manhattan_distance() calculates the sum of absolute differences along each dimension.

These are just a few examples. You can expand on these by adding more sophisticated error metrics
(e.g., weighted MAE), different loss functions (e.g., categorical cross-entropy), other distance metrics
(e.g., cosine similarity), and incorporating them into more complex machine learning models. Remember
to handle potential errors (like division by zero) gracefully in your code. For more advanced topics like
PCA and hypothesis testing, you'll need to utilize libraries like SciPy.

Cost

more

Let's expand on the previous examples with more Python code illustrating additional concepts from the
data science math lab outline.

Section 2: Loss Functions (Continued)

Python

import numpy as np

def entropy_loss(y_true, y_prob):

"""Calculates entropy loss for multi-class classification."""

epsilon = 1e-15 #avoid log(0) error

y_prob = np.clip(y_prob, epsilon, 1 - epsilon)

return -np.sum(y_true * np.log(y_prob), axis=1).mean()

# Example usage:

y_true = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0]]) #One-hot encoded true labels

y_prob = np.array([[0.7, 0.2, 0.1], [0.1, 0.8, 0.1], [0.2, 0.2, 0.6], [0.6, 0.3, 0.1]]) #Predicted probabilities

loss = entropy_loss(y_true, y_prob)

print(f"Entropy Loss: {loss}")

def gini_index(y):
"""Calculates the Gini index for a dataset."""

_, counts = np.unique(y, return_counts=True)

probabilities = counts / np.sum(counts)

return 1 - np.sum(np.square(probabilities))

# Example usage:

y = np.array([0, 1, 0, 0, 1, 1, 0])

gini = gini_index(y)

print(f"Gini Index: {gini}")

Explanation:

1. entropy_loss() calculates the cross-entropy loss, suitable for multi-class problems. axis=1 sums
across each row (each data point). np.clip handles potential log(0) errors.

2. gini_index() calculates the Gini impurity, a measure of node purity in decision trees.

Section 3: Margins and Support Vector Machines

Python

import numpy as np

def hinge_loss(y_true, y_pred, margin=1):

"""Calculates Hinge Loss for SVM."""

return np.mean(np.maximum(0, margin - y_true * y_pred))

# Example usage:

y_true = np.array([-1, 1, -1, 1]) # -1 and 1 represent classes

y_pred = np.array([-2, 1.5, 0.5, 2]) #Predicted scores

loss = hinge_loss(y_true, y_pred)

print(f"Hinge Loss: {loss}")

Explanation:
1. hinge_loss() calculates the hinge loss, commonly used in Support Vector Machines (SVMs). It
penalizes predictions that are not on the correct side of the margin.

Section 4: Distance Metrics (Continued)

Python

import numpy as np

from scipy.spatial.distance import cosine

def cosine_similarity(x, y):

"""Calculates cosine similarity between two vectors."""

return 1 - cosine(x, y) #Cosine distance is 1 - cosine similarity

# Example usage:

vector1 = np.array([1, 2, 3])

vector2 = np.array([4, 5, 6])

similarity = cosine_similarity(vector1, vector2)

print(f"Cosine Similarity: {similarity}")

def hamming_distance(x, y):

"""Calculates Hamming distance between two binary strings (or arrays)."""

return np.sum(x != y)

# Example usage:

binary_string1 = np.array([0, 1, 1, 0])

binary_string2 = np.array([1, 1, 0, 0])

distance = hamming_distance(binary_string1, binary_string2)

print(f"Hamming Distance: {distance}")

Explanation:
1. cosine_similarity() uses scipy.spatial.distance.cosine to compute the cosine distance, then
subtracts from 1 to get similarity.

2. hamming_distance() counts the number of positions where two binary vectors differ.

Section 5: Hypothesis Testing (Example with Chi-Squared)

Python

import numpy as np

from scipy.stats import chi2_contingency

# Contingency table

observed = np.array([[20, 10], [10, 30]])

chi2, p, dof, expected = chi2_contingency(observed)

print(f"Chi-squared statistic: {chi2}")

print(f"P-value: {p}")

print(f"Degrees of freedom: {dof}")

print(f"Expected frequencies:\n{expected}")

alpha = 0.05 #Significance level

if p < alpha:

print("Reject the null hypothesis: There is a significant association.")

else:

print("Fail to reject the null hypothesis: There is no significant association.")

Explanation:

This code performs a chi-squared test of independence. The chi2_contingency function


from scipy.stats calculates the chi-squared statistic, p-value, degrees of freedom, and expected
frequencies. The p-value is then compared to a significance level (alpha) to determine whether to reject
the null hypothesis (that the variables are independent).

Remember to install SciPy: pip install scipy These examples provide a more comprehensive illustration of
the mathematical concepts. You can adapt and expand upon them to create a complete lab manual.
Always carefully consider the context and assumptions of each statistical test.
Cost

short notes of MAE (Mean Absolute Error): Calculate MAE for a given set of predicted and actual values.
(Hint: Use absolute difference and then average.) with code examples explain details

Mean Absolute Error (MAE)

MAE is a simple and intuitive metric to measure the average absolute difference between predicted and
actual values. It's easy to understand and interpret because it represents the average magnitude of the
errors. However, it doesn't penalize larger errors as heavily as some other metrics (like MSE).

Formula:

MAE = (1/n) * Σ|yᵢ - ŷᵢ|

where:

1. n is the number of data points

2. yᵢ is the actual value of the i-th data point

3. ŷᵢ is the predicted value of the i-th data point

4. |...| denotes the absolute value

Advantages:

1. Easy to understand and interpret.

2. Less sensitive to outliers than MSE (because it uses absolute differences instead of squared
differences).

Disadvantages:

1. Doesn't penalize large errors as strongly as MSE.

2. Not differentiable at zero, which can be a problem for some optimization algorithms.

Python Code Example:

Python

import numpy as np

def mae(y_true, y_pred):

"""Calculates Mean Absolute Error."""

return np.mean(np.abs(y_true - y_pred))


# Example usage:

y_true = np.array([10, 20, 30, 40, 50]) # Actual values

y_pred = np.array([12, 18, 33, 38, 55]) # Predicted values

mae_error = mae(y_true, y_pred)

print(f"Mean Absolute Error: {mae_error}")

#Example with a list:

y_true = [10, 20, 30, 40, 50]

y_pred = [12, 18, 33, 38, 55]

mae_error = mae(np.array(y_true), np.array(y_pred)) #Convert to numpy arrays for easier calculation

print(f"Mean Absolute Error (list input): {mae_error}")

Explanation:

1. Import NumPy: We import the NumPy library for efficient array operations.

2. mae() function: This function takes two NumPy arrays (y_true and y_pred) as input.

3. Absolute Differences: np.abs(y_true - y_pred) calculates the element-wise absolute differences


between the actual and predicted values.

4. Mean: np.mean() computes the average of these absolute differences, giving us the MAE.

5. Example Usage: The code demonstrates how to use the mae() function with sample data. The
output shows the calculated MAE. The second example shows how to use the function with lists
by converting them to NumPy arrays first.

MAE is a valuable metric for evaluating regression models, particularly when outliers might significantly
skew the results. Its simplicity and interpretability make it a popular choice in many applications.

Cost

why use they are

The choice between different error metrics, like Mean Absolute Error (MAE) and Mean Squared Error
(MSE), depends heavily on the specific context and goals of your analysis or machine learning task.
Here's a breakdown of why you might choose one over the other:

When to use MAE (Mean Absolute Error):


1. Robustness to Outliers: MAE is less sensitive to outliers than MSE. Outliers have a
disproportionately large effect on MSE because the error is squared. If your data contains
outliers or you suspect the presence of significant errors, MAE provides a more robust measure
of average error.

2. Interpretability: MAE is easier to interpret. It directly represents the average absolute difference
between predictions and actual values. This makes it easier to explain to non-technical
audiences.

3. Linear Programming: MAE is useful in situations where linear programming techniques are
employed for optimization. The absolute value function, while not differentiable everywhere,
can be handled effectively within linear programming frameworks.

When to use MSE (Mean Squared Error):

1. Penalizing Large Errors: MSE penalizes larger errors more heavily than MAE. This is because the
error is squared. If large errors are particularly undesirable in your application (e.g., in financial
forecasting where large losses are very costly), MSE might be a better choice.

2. Differentiability: MSE is differentiable everywhere, making it suitable for use with gradient-
based optimization algorithms (like gradient descent) commonly used in machine learning. This
is a significant advantage in many machine learning contexts.

3. Statistical Properties: MSE has desirable statistical properties, such as being directly related to
the variance of the errors. This can be useful for statistical inference and model comparison.

In Summary:

Feature MAE MSE

Outlier Sensitivity Less sensitive More sensitive

Interpretability Easier to understand Less intuitive

Differentiability Not differentiable at zero Differentiable everywhere

Large Error Penalty Less severe More severe

Optimization Suitable for linear programming Suitable for gradient-based methods

The "best" metric depends on your priorities. If robustness to outliers and ease of interpretation are
paramount, MAE is a good choice. If penalizing large errors and using gradient-based optimization are
crucial, MSE is often preferred. Sometimes, it's beneficial to examine both MAE and MSE to get a more
complete picture of your model's performance.
A real-world example of statistics can be found in businesses when making decisions based on data
analysis. For instance, consider a retail company analyzing its sales data to determine the effectiveness
of a marketing campaign. The company might collect data on sales figures before and after the campaign
and then apply statistical methods to analyze this data.

Here's how statistics plays a role in this scenario:

1. Data Collection: The company collects sales data over a specific period.

2. Hypothesis Testing: They may set up a hypothesis to test whether the marketing campaign led to
a significant increase in sales.

3. Statistical Significance: By performing statistical tests (like t-tests), the company can determine if
any observed increase in sales is statistically significant, which means it is unlikely to have
occurred by chance.

4. Confidence Intervals: The company can also calculate confidence intervals to understand the
range within which they can expect their sales figures to fall, giving them a statistical assurance
of their sales forecasts.

5. Decision Making: Based on the analysis, if the results are statistically significant, the company
may decide to continue or expand their marketing efforts. If not, they may rethink their strategy.

In this way, statistics helps businesses derive actionable insights from their data, aiding in better
decision-making processes.
Here’s an explanation of the confidence interval and p-value concepts, along with how they are applied
in the examples:

Confidence Interval
A confidence interval provides a range of values within which a population parameter (e.g., mean,
proportion) is likely to fall, based on sample data. It is calculated using the sample statistic, standard
error, and a confidence level (e.g., 95%).

Key Points:

1. Confidence Level:

1. A 95% confidence level means that if we repeated the sampling process many times,
95% of the intervals would contain the true population parameter.

2. Formula:

1. For a mean: CI=xˉ±z⋅SECI=xˉ±z⋅SE

2. For a proportion: CI=p^±z⋅p^(1−p^)nCI=p^±z⋅np^(1−p^)

3. Margin of Error:

1. The width of the confidence interval depends on the standard error and the critical value
(e.g., zz-score for normal distribution).

Example:

In Example 1, we calculate the confidence interval for a mean:

1. Sample mean = 50

2. Standard deviation = 10

3. Sample size = 30

4. Confidence level = 95%

The margin of error is calculated using the zz-score for 95% confidence (z=1.96z=1.96) and the standard
error:

Margin of Error=z⋅σnMargin of Error=z⋅nσ

The confidence interval is:

[50−Margin of Error,50+Margin of Error][50−Margin of Error,50+Margin of Error]

P-Value

A p-value measures the probability of observing a sample statistic as extreme as the one obtained,
assuming the null hypothesis (H0H0) is true. It is used in hypothesis testing to determine whether to
reject H0H0.

Key Points:

1. Null Hypothesis (H0H0):


1. Represents the default assumption (e.g., no difference, no effect).

2. Alternative Hypothesis (H1H1):

1. Represents the claim being tested (e.g., there is a difference, there is an effect).

3. Decision Rule:

1. If pp-value < significance level (αα, e.g., 0.05), reject H0H0.

2. Otherwise, fail to reject H0H0.

4. Two-Tailed Test:

1. Tests whether the sample statistic is significantly different from the null hypothesis value
in either direction.

Example:

In Example 2, we calculate the p-value for a one-sample z-test:

1. Sample mean = 52

2. Population mean = 50

3. Standard deviation = 10

4. Sample size = 30

The z-statistic is calculated as:

z=xˉ−μσnz=nσxˉ−μ

The p-value is:

p=2⋅(1−CDF(z))p=2⋅(1−CDF(z))

Where CDF(z)CDF(z) is the cumulative probability up to zz in the standard normal distribution.

Applications in Examples

1. Confidence Interval for a Mean (Example 1):

1. Used to estimate the range within which the population mean is likely to fall.

2. P-Value for a One-Sample z-Test (Example 2):

1. Tests whether the sample mean is significantly different from the population mean.

3. Confidence Interval for a Proportion (Example 3):

1. Estimates the range for the proportion of successes in a population.

4. P-Value for a Two-Sample t-Test (Example 4):


1. Tests whether the means of two independent samples are significantly different.

5. Confidence Interval for Difference in Means (Example 5):

1. Estimates the range for the difference between two population means.

6. P-Value for a Chi-Square Test (Example 6):

1. Tests whether observed frequencies differ significantly from expected frequencies.

7. Confidence Interval for a Regression Coefficient (Example 7):

1. Estimates the range for a regression coefficient in a linear model.

8. P-Value for a Paired t-Test (Example 8):

1. Tests whether the mean difference between paired observations is significantly different
from zero.

9. Confidence Interval for Variance (Example 9):

1. Estimates the range for the population variance.

10. P-Value for a Proportion Test (Example 10):

1. Tests whether the sample proportion is significantly different from a hypothesized


population proportion.

Summary

1. Confidence Intervals: Provide a range of plausible values for a population parameter.

2. P-Values: Help determine whether to reject the null hypothesis in hypothesis testing.

Both are essential tools in inferential statistics for making data-driven decisions. Let me know if you need
further clarification or additional examples!

Cost

in bangla

কনফিডেন্স ইন্টারভাল এবং ফি-ভযালু বযাখ্যা (বাংলায়)

কনফিডেন্স ইন্টারভাল

কনফিডেন্স ইন্টারভাল একটি পফরসীমা প্রদান কডর যেখাডন একটি জনসংখযার পযারাফমিার (যেমন গড়,
অনুপাত) থাকার সম্ভাবনা থাডক। এটি নমুনা যেিা, স্ট্যান্ডােড এরর এবং একটি ফনফদডষ্ট কনফিডেন্স
যলডভডলর (যেমন ৯৫%) উপর ফভফি কডর গণনা করা হয়।
মূল ফবষয়:

1. কনফিডেন্স ললডভল:

1. ৯৫% কনফিডেন্স যলডভল মাডন, েফদ আমরা বারবার নমুনা সংগ্রহ কফর, তাহডল ৯৫%
ইন্টারভাল জনসংখযার প্রকৃত পযারাফমিার ধারণ করডব।

2. িমুলা:

1. গডড়র জনয: CI=xˉ±z⋅SECI=xˉ±z⋅SE

2. অনুপাডতর জনয: CI=p^±z⋅p^(1−p^)nCI=p^±z⋅np^(1−p^)

3. মার্জুন অি এরর:

1. কনফিডেন্স ইন্টারভাডলর প্রস্থ স্ট্যান্ডােড এরর এবং ক্রিটিকাল ভযালু (যেমন zz-যকার) এর
উপর ফনভডর কডর।

উদাহরণ:

উদাহরণ ১-এ, আমরা গডড়র জনয কনফিডেন্স ইন্টারভাল গণনা কফর:

1. নমুনার গড় = ৫০

2. স্ট্যান্ডােড যেফভডয়শন = ১০

3. নমুনার আকার = ৩০

4. কনফিডেন্স যলডভল = ৯৫%

মাক্রজন
ড অি এরর গণনা করা হয় zz-যকার (৯৫% কনফিডেডন্সর জনয z=১.৯৬z=১.৯৬) এবং স্ট্যান্ডােড এরর
বযবহার কডর:

Margin of Error=z⋅σnMargin of Error=z⋅nσ

কনফিডেন্স ইন্টারভাল হডব:

[50−Margin of Error,50+Margin of Error][50−Margin of Error,50+Margin of Error]

ফি-ভযালু

ফপ-ভযালু একটি পফরসংখযাফনক পফরমাপ ো যদখায় যে, েফদ H0H0 (নাল হাইডপাফথফসস) সতয হয়, তাহডল
নমুনা যেিা বা তার যথডকও চরম যেিা পাওয়ার সম্ভাবনা কত। এটি হাইডপাফথফসস যিফস্ট্ং-এ H0H0 গ্রহণ বা
প্রতযাখযান করার ফসদ্ধান্ত ফনডত বযবহৃত হয়।

মূল ফবষয়:

1. নাল হাইডিাফিফিি (H0H0):

1. ফেিল্ট অনুমান (যেমন যকাডনা পফরবতডন যনই, যকাডনা প্রভাব যনই)।


2. ফবকল্প হাইডিাফিফিি (H1H1):

1. পরীক্ষার দাফব (যেমন পফরবতডন আডে, প্রভাব আডে)।

3. ফিদ্ধাডের ফনয়ম:

1. েফদ pp-ভযালু < αα (যেমন ০.০৫), তাহডল H0H0 প্রতযাখযান করুন।

2. অনযথায়, H0H0 গ্রহণ করুন।

4. টু -লটইলে লটস্ট:

1. পরীক্ষা কডর যে নমুনার পফরসংখযাফনক মান H0H0-এর যথডক উডেখডোগযভাডব ফভন্ন


ফকনা।

উদাহরণ:

উদাহরণ ২-এ, আমরা একটি এক-নমুনা zz-যিডস্ট্র জনয pp-ভযালু গণনা কফর:

1. নমুনার গড় = ৫২

2. জনসংখযার গড় = ৫০

3. স্ট্যান্ডােড যেফভডয়শন = ১০

4. নমুনার আকার = ৩০

zz-স্ট্যাটিফস্ট্ক গণনা করা হয়:

z=xˉ−μσnz=nσxˉ−μ

ফপ-ভযালু হডব:

p=2⋅(1−CDF(z))p=2⋅(1−CDF(z))

যেখাডন CDF(z)CDF(z) হডলা zz-এর জনয ফকউমুডলটিভ সম্ভাবনা।

উদাহরণগুডলার প্রডয়াগ

1. গডের জনয কনফিডেন্স ইন্টারভাল (উদাহরণ ১):

1. জনসংখযার গডড়র সম্ভাবয পফরসীমা অনুমান করডত বযবহৃত।

2. এক-নমুনা zz-লটডস্টর জনয pp-ভযালু (উদাহরণ ২):

1. পরীক্ষা কডর যে নমুনার গড় জনসংখযার গড় যথডক উডেখডোগযভাডব ফভন্ন ফকনা।

3. অনুিাডের জনয কনফিডেন্স ইন্টারভাল (উদাহরণ ৩):

1. জনসংখযার সিলতার অনুপাডতর পফরসীমা অনুমান করডত বযবহৃত।


4. টু -নমুনা tt-লটডস্টর জনয pp-ভযালু (উদাহরণ ৪):

1. পরীক্ষা কডর যে দুটি স্বাধীন নমুনার গড় উডেখডোগযভাডব ফভন্ন ফকনা।

5. গডের িািডকযর
ু জনয কনফিডেন্স ইন্টারভাল (উদাহরণ ৫):

1. দুটি জনসংখযার গডড়র পাথডকযর


ড সম্ভাবয পফরসীমা অনুমান করডত বযবহৃত।

6. ফি-স্কয়ার লটডস্টর জনয pp-ভযালু (উদাহরণ ৬):

1. পরীক্ষা কডর যে পেডবফক্ষত


ড ফিডকাডয়ক্রন্স প্রতযাফশত ফিডকাডয়ক্রন্স যথডক উডেখডোগযভাডব
ফভন্ন ফকনা।

7. ফরডেশন লকা-ইফিফিডয়ডন্টর জনয কনফিডেন্স ইন্টারভাল (উদাহরণ ৭):

1. একটি ফরডগ্রশন মডেডল যকা-ইফিফসডয়ডন্টর পফরসীমা অনুমান করডত বযবহৃত।

8. লিয়ারে tt-লটডস্টর জনয pp-ভযালু (উদাহরণ ৮):

1. পরীক্ষা কডর যে যপয়ারে পেডবক্ষডণর


ড গড় পাথকয
ড শূনয যথডক উডেখডোগযভাডব ফভন্ন
ফকনা।

9. ভযাফরডয়ডন্সর জনয কনফিডেন্স ইন্টারভাল (উদাহরণ ৯):

1. জনসংখযার ভযাফরডয়ডন্সর পফরসীমা অনুমান করডত বযবহৃত।

10. অনুিাে লটডস্টর জনয pp-ভযালু (উদাহরণ ১০):

1. পরীক্ষা কডর যে নমুনার অনুপাত জনসংখযার অনুফমত অনুপাত যথডক উডেখডোগযভাডব


ফভন্ন ফকনা।

িারাংশ

1. কনফিডেন্স ইন্টারভাল: জনসংখযার পযারাফমিাডরর সম্ভাবয পফরসীমা প্রদান কডর।

2. ফি-ভযালু: হাইডপাফথফসস যিফস্ট্ং-এ H0H0 গ্রহণ বা প্রতযাখযান করার ফসদ্ধান্ত ফনডত সাহােয কডর।

এগুডলা ইনিাডরক্রন্সয়াল স্ট্যাটিফস্ট্ডের গুরুত্বপূণ ি


ড ু ল ো যেিা-ফভফিক ফসদ্ধান্ত ফনডত বযবহৃত হয়। েফদ
আরও বযাখযা বা উদাহরণ প্রডয়াজন হয়, জানাডবন!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy