0% found this document useful (0 votes)
19 views28 pages

Statistical Analysis in Physics Practical File

The document outlines a practical file on statistical analysis in physics, detailing various experiments involving random number generation, joint probability calculations, hypothesis testing, Bayesian inference, and regression analysis. Each experiment includes aims, algorithms, and steps for implementation using Python libraries such as NumPy and Matplotlib. The document serves as a comprehensive guide for conducting statistical analyses and visualizations in a physics context.

Uploaded by

Dhruv Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views28 pages

Statistical Analysis in Physics Practical File

The document outlines a practical file on statistical analysis in physics, detailing various experiments involving random number generation, joint probability calculations, hypothesis testing, Bayesian inference, and regression analysis. Each experiment includes aims, algorithms, and steps for implementation using Python libraries such as NumPy and Matplotlib. The document serves as a comprehensive guide for conducting statistical analyses and visualizations in a physics context.

Uploaded by

Dhruv Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Statistical Analysis

in Physics
Practical File

Dhruv Verma
22/26080
Physics (Hons.)
Sec– B
INDEX

Sr. No. Title Date Signature


Experiment – 1
Aim:

Generate sequences of N random numbers M (at least 10000) number of times


from different distributions (e.g. Binomial, Poisson, Normal). Use the arithmetic
mean of each random vector (of size N) and plot the distribution of the arithmetic
means.
​Verify the Central Limit Theorem (CLT) for each distribution.
​Show that CLT is violated for the Cauchy-Lorentz distribution.

Algorithm:
1. Import Libraries:
●​ Import numpy for numerical operations.
●​ Import matplotlib.pyplot for plotting histograms.
●​ Import scipy.stats for fitting normal distributions.

2. Define Function generate_means(distribution, params, N, M):


●​ Initialize an empty list means.
●​ Loop M times to generate sample means:
o​ Draw a sample of size N using the specified distribution with given
params.
o​ Calculate the mean of the sample.
o​ Append the mean to the means list.
●​ Convert the means list to a numpy array and return it.

3. Define Function plot_distribution(means, distribution_name):


●​ Plot a histogram of means with 50 bins, normalized to form a probability
density.
●​ Fit a normal distribution to the means:
o​ Calculate the mean (mu) and standard deviation (sigma) of the sampled
means.
o​ Generate x values over a range of ±4 standard deviations from the mean.
o​ Plot the normal probability density function (PDF) using the calculated mu
and sigma.
●​ Add labels, a title, and a legend to the plot.
●​ Display the plot.

4. Set Parameters:
●​ N = 50: Sample size per iteration.
●​ M = 10000: Number of iterations.

5. Generate and Plot Sample Means for Different Distributions:


●​ Binomial Distribution:
o​ Call generate_means() with np.random.binomial and parameters (10, 0.8).
o​ Plot the distribution of means using plot_distribution() with the label
"Binomial".

●​ Poisson Distribution:
o​ Call generate_means() with np.random.poisson and parameter (5,).
o​ Plot the distribution of means using plot_distribution() with the label
"Poisson".

●​ Normal Distribution:
o​ Call generate_means() with np.random.normal and parameters (0, 1).
o​ Plot the distribution of means using plot_distribution() with the label
"Normal".

6. Handle Cauchy-Lorentz Distribution (Violates Central Limit Theorem):


●​ Generate a 2D array of samples using np.random.standard_cauchy(size=(M,
N)).
●​ Compute the mean along the axis of samples.
●​ Plot the histogram of these means:
o​ Set the number of bins to 100 and normalize the histogram to form a
density.
o​ Label the plot as "CL Sample Means".
o​ Set x-axis limits to [-10, 10] to better visualize the heavy tails.
o​ Add a title, labels, a legend, and a grid.
o​ Display the plot
Code:
Output:
Experiment – 2
Aim:
Given data for two independent variables (xi, yi). Write a code to compute the joint
probability in a given sample space.
​Verify the same for the data generated by a random number generator based
on a given probability distribution of a pair of independent variables.
(Continuous)

Algorithm:
1. Import Required Libraries:
●​ Import numpy for numerical operations.
●​ Import matplotlib.pyplot for visualization.
●​ Import norm from scipy.stats for generating normal distributions (not directly
used but relevant for continuous distributions).

2. Define Function joint_probability_continuous(xi, yi, bins=20):


●​ Input:
o​ xi and yi: Arrays representing continuous samples from two random
variables.
o​ Bins: Number of bins for the 2D histogram (default is 20).

●​ Output:
o​ x_edges: Bin edges for xi.
o​ y_edges: Bin edges for yi.
o​ histogram: 2D array representing joint probability density.

●​ Calculate Joint Probability Density:


o​ Use np.histogram2d() to calculate the 2D histogram:
▪​ xi and yi are binned into a 2D grid.

▪​ density=True normalizes the histogram to form a probability density.

o​ The function returns:


▪​ histogram: 2D array of probability density values.
▪​ x_edges: Bin edges for xi.

▪​ y_edges: Bin edges for yi.

●​ Return the x_edges, y_edges, and histogram.

3. Generate Continuous Samples:


●​ Set num_samples = 10000.
●​ Generate xi_continuous using np.random.normal(0, 1, num_samples), which
produces samples from a standard normal distribution (mean = 0, standard
deviation = 1).
●​ Similarly, generate yi_continuous using the same normal distribution.

4. Calculate Joint Probability Distribution:


●​ Call joint_probability_continuous(xi_continuous, yi_continuous).
●​ Store the result in x_edges, y_edges, and joint_prob_continuous.

5. Visualize Joint Probability Distribution:


●​ Create Heatmap:
o​ Use plt.imshow() to display joint_prob_continuous.T as a heatmap:
▪​ .T transposes the matrix for correct alignment in the plot.

▪​ Set origin='lower' to place (0,0) at the bottom-left corner.

▪​ Use "viridis" colormap for visualization.

▪​ Set aspect='auto' to adjust the aspect ratio automatically.

▪​ Use extent to align the plot axes with the continuous variable ranges
from x_edges and y_edges.
●​ Customize Plot:
o​ Add a color bar to indicate probability density values.
o​ Set the title as "Joint Probability Distribution (Continuous)".
o​ Label the axes as "X" and "Y".
o​ Disable grid lines for a cleaner heatmap.
●​ Show Plot:
o​ Display the heatmap using plt.show().
6. Print Joint Probability Density Matrix:
●​ Print the joint probability density matrix (joint_prob_continuous) to observe the
numerical values of the distribution.

Code:

Output:
Experiment – 3
Aim:
Given data for two independent variables (xi, yi). Write a code to compute the
joint probability in a given sample space.
​Verify the same for the data generated by a random number generator
based on a given probability distribution of a pair of independent variables.
(Discrete)
Algorithm:
1. Import Required Libraries:
●​ Import numpy for numerical operations.
●​ Import matplotlib.pyplot for plotting.
●​ Import randint from scipy.stats to generate discrete random variables.

2. Define Function joint_probability_discrete(xi, yi):


●​ Input: Two arrays, xi and yi, representing samples from two discrete
random variables.
●​ Output: A dictionary containing the joint probability distribution.
●​ Initialize an empty dictionary joint_counts to store counts of each (X, Y)
pair.
●​ Count Occurrences:
o​ Loop through each pair of values in xi and yi.
o​ If the pair already exists in joint_counts, increment its count.
o​ Otherwise, initialize the count for that pair.
●​ Calculate Joint Probability:
o​ Get the total number of samples (total_sample).
o​ Compute joint probability by dividing each count by total_sample.
o​ Store the joint probabilities in a dictionary joint_prob.
●​ Return the joint_prob dictionary.

3. Generate Discrete Samples:


●​ Set num_samples = 10000.
●​ Generate x_discrete using randint.rvs(1, 5, size=num_samples), which
produces discrete random integers between 1 and 4.
●​ Similarly, generate y_discrete using the same distribution.

4. Calculate Joint Probability Distribution:


●​ Call joint_probability_discrete(x_discrete, y_discrete).
●​ Store the result in joint_probability_discrete.

5. Display Joint Probability Distribution:


●​ Loop through the sorted items of joint_probability_discrete.
●​ Print each joint probability.

6. Prepare Data for Visualization:


●​ Extract the unique values of x_discrete and y_discrete using np.unique().
●​ Initialize a 2D array joint_matrix with zeros, having dimensions
corresponding to the number of unique X and Y values.
●​ Fill Joint Probability Matrix:
o​ Loop through the items in joint_probability_discrete.
o​ Assign the corresponding probability to the correct index in
joint_matrix.

7. Visualize Joint Probability Distribution:


●​ Create Heatmap:
o​ Use plt.imshow() to display joint_matrix as a heatmap.
o​ Set the color map as "viridis" and origin='lower'.
o​ Adjust the axes using extent to align with the unique X and Y values.
●​ Customize Plot:
o​ Add a color bar to indicate probability values.
o​ Set title as "Joint Probability Distribution (Discrete)".
o​ Label the axes as "X" and "Y".
o​ Set the ticks on the axes using the unique X and Y values.
o​ Disable grid lines.
●​ Show Plot:
o​ Display the heatmap using plt.show().

Code:
Output:
Experiment – 4
Aim:
Hypothesis testing
Make a random number generator to simulate the tossing of a coin n times
with the probability for the head being q.
Write a code for a Binomial test with the Null hypothesis H0 (q = 0.5) against the
alternative hypothesis H1 (q not equal to 0.5)

Algorithm:
Step 1: Define Functions

1.​ Function toss_coin(n, q) :

1.1.​ Input: n (number of tosses), q (probability of heads).


1.2.​ Simulate n coin tosses using a binomial distribution (binomial(1, q, n)).
1.3.​ Return an array of n toss results (1 for heads, 0 for tails).

2.​ Function binomial_test(n, observed_heads, q_expected=0.5, alpha=0.05):

2.1.​ Input: n (total tosses), observed_heads (number of heads observed),


q_expected (expected probability of heads, default 0.5 for a fair coin),
and alpha (significance level).
2.2.​ Compute the two-tailed p-value using binom_test from scipy.stats.
2.3.​ If p_value < alpha, reject the null hypothesis (q = 0.5), indicating a
biased coin.
2.4.​ Otherwise, fail to reject the null hypothesis.
2.5.​ Return p_value and whether the null hypothesis is rejected.

Step 2: Simulate Coin Tosses

1.​ Set n = 100 (number of tosses) and q = 0.55 (actual probability of heads).

2.​ Call toss_coin(n, q) to generate a sequence of n toss results.

3.​ Compute observed_heads by summing the array (number of 1s).

Step 3: Perform Binomial Test

1.​ Call binomial_test(n, observed_heads, q_expected=0.5).

2.​ Store the returned p_value and decision (reject_null).


Step 4: Print Results

1.​ Print n, observed_heads, and p_value.


2.​ If reject_null is True, print that the null hypothesis is rejected (indicating a
biased coin).
3.​ Otherwise, print that there is no significant evidence of bias.

Step 5: Visualization

1.​ Compute the binomial probability mass function (PMF) for n tosses
assuming a fair coin (q = 0.5).
2.​ Plot the expected distribution using plt.plot(x, pmf).
3.​ Mark observed_heads on the graph using plt.axvline().
4.​Display the graph.

Code:
Output:
Experiment – 5
Aim:
Bayesian Inference
a)​ In an experiment of flipping a coin N times, M heads showed up (fraction
of heads f = M/N). Write a code to determine the posterior probability,
given the following prior for the probability of f:
i. Beta Distribution B(a, b) with given values of a and b.
ii. Gaussian Distribution with a given mean and variance.

b)​ Using the Likelihood of Binomial distribution, determine the value of f


(fraction of heads) that maximizes the probability of the data.

c)​ Plot the Likelihood (normalised), Prior and Posterior Distributions.

Algorithm:
Input:

●​ N: Total number of coin flips


●​ M: Number of heads observed
●​ a, b: Parameters for the Beta prior
●​ mu, sigma: Mean and standard deviation for the Gaussian prior

1. Create a range of possible values for the probability of heads f:

●​ Generate 1000 values from 0 to 1 and store in f_vals.

2. Compute the Likelihood (Binomial):

●​ Use the binomial PMF to compute the likelihood of getting M heads in N


trials for each f in f_vals.
●​ Normalize the likelihood by dividing by its maximum value.

3. Compute the Beta Prior:

●​ Evaluate the Beta probability density function using parameters a and b


over f_vals.

●​ Normalize the prior.


4. Compute the Gaussian Prior:

●​ Evaluate the normal distribution PDF using mean mu and standard


deviation sigma over f_vals.

●​ Normalize the prior.

5. Compute the Beta Posterior:

●​ Use conjugacy of the Beta prior with the binomial likelihood: posterior
parameters become (a + M) and (b + (N - M)).

●​ Evaluate and normalize the Beta posterior over f_vals.

6. Compute the Gaussian Posterior (Approximate):

●​ Use formulas for Gaussian posterior mean and standard deviation


(assuming a Gaussian prior and approximate Gaussian likelihood):

●​ posterior_mu = (mu / sigma² + M / N) / (1 / sigma² + 1 / N)

●​ posterior_sigma = sqrt(1 / (1 / sigma² + 1 / N))

●​ Evaluate and normalize the Gaussian posterior over f_vals.

7. Compute Maximum Likelihood Estimate (MLE):

●​ MLE for probability of heads is M / N.

8. Plot the Results:

●​ Plot:
o​ Likelihood (dashed line)
o​ Beta prior
o​ Gaussian prior (dotted line)
o​ Vertical line at MLE (dashed red)
●​ Label axes and add a legend.

Code:
Output:
Experiment – 6
Aim:
Regression Analysis and Gradient Descent:
Given a dataset (Xi, Yi). Write a code to obtain the parameters of the linear
regression equation using the method of least squares with both constant and
variable errors in the dependent variable (Y).
The data obtained in a physics lab may be used for this purpose. Also obtain the
correlation coefficient and the 90% confidence interval for the regression line.
Make a scatter plot along with error bars.
Also, overlay the regression line and show the confidence interval.

Algorithm:
Input:
●​ Arrays of measured values: X, Y
●​ Array of uncertainties in Y: sigma_Y
Output:
●​ Best-fit slope (m) and intercept (c) with uncertainties
●​ Correlation coefficient (r)
●​ Plot of data, fit, and 90% confidence interval
Steps:
1.​ Import required libraries:
o​ numpy, matplotlib.pyplot, scipy.stats, and scipy.optimize.curve_fit

2.​ Define the data:


o​ Input arrays X, Y, and sigma_Y

3.​ Define the linear model:


o​ Y=mX+cY = mX + cY=mX+c

4.​ Perform weighted linear regression using curve_fit():


o​ Fit linear_model to the data with uncertainties sigma_Y
o​ Retrieve optimized parameters m (slope), c (intercept)
o​ Calculate uncertainties dm, dc from the covariance matrix

5.​ Calculate the Pearson correlation coefficient r between X and Y

6.​ Set confidence level:


o​ Use a 90% confidence level → alpha = 0.10
o​ Degrees of freedom: dof = len(X) - 2
o​ Get t-critical value t_val for the given confidence level

7.​ Create a finer grid X_fit over the range of X for smooth plotting

8.​ Compute the best-fit line Y_fit using the linear model and best-fit parameters

9.​ Estimate standard error s_err of the regression:


o​ Use weighted residuals and degrees of freedom

10.​Calculate the 90% confidence interval:


o​ For each point in X_fit, compute the confidence bounds using:

Plot the results:


o​ Error bars on data points
o​ Best-fit line
o​ Shaded region for 90% confidence interval

12.​Print results:
o​ Slope and intercept with uncertainties
o​ Correlation coefficient

Code:

Output:
Experiment – 7
Aim:
Markov Chain
Given that a particle may exist in one of the given energy states (Ei, i = 1, … 4) and
the transition probability matrix T, so that Tij gives the probability for the particle
to make transition from energy state Ei to state Ej. Determine the long-term
probability of a particle to be in state in the state Ef if the particle was initially in
state Ei.

Algorithm:
Input:
●​ A transition matrix T (square matrix, rows sum to 1)
Output:
●​ Stationary distribution (steady-state probabilities)
Steps:
1.​ Import necessary library:
o​ Use numpy for numerical operations

2.​ Define the transition matrix T:


o​ Each row represents the probability of transitioning from one state to
another

3.​ Transpose the matrix T:


o​ Use T.T because eigenvectors of the transpose give the left
eigenvectors (which represent stationary distributions)

4.​ Find eigenvalues and eigenvectors of T.T using np.linalg.eig():


o​ Store eigenvalues in eigvals and corresponding eigenvectors in
eigvecs

5.​ Identify the eigenvector corresponding to eigenvalue 1:


o​ A Markov chain's stationary distribution is the eigenvector
corresponding to eigenvalue 1
o​ Use np.isclose(eigvals, 1) to find where eigenvalue ≈ 1
6.​ Select the corresponding eigenvector:
o​ Extract the real part of that eigenvector (since imaginary parts may
arise due to numerical error)

7.​ Normalize the eigenvector:


o​ Divide by the sum of its components to ensure it represents valid
probabilities (they sum to 1)

8.​ Print the stationary distribution

Code:
Output:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy