0% found this document useful (0 votes)
6 views34 pages

batch2 ds

Uploaded by

ece apce
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views34 pages

batch2 ds

Uploaded by

ece apce
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

1)i. Write a NumPy program to convert a list and tuple into arrays.

Program:
import numpy as np

# Convert a list to a NumPy array

list_data = [1, 2, 3, 4, 5]

array_from_list = np.array(list_data)

print("Array from list:", array_from_list)

# Convert a tuple to a NumPy array

tuple_data = (10, 20, 30, 40, 50)

array_from_tuple = np.array(tuple_data)

print("Array from tuple:", array_from_tuple)

Array from list: [1 2 3 4 5]

Array from tuple: [10 20 30 40 50]

ii.Write a NumPy program to convert the values of Centigrade degrees into


Fahrenheit degrees and vice versa. Values have to be stored into a NumPy
array.
Program:
import numpy as np

# Function to convert Centigrade to Fahrenheit

def centigrade_to_fahrenheit(celsius):

return (celsius * 9/5) + 32

# Function to convert Fahrenheit to Centigrade

def fahrenheit_to_centigrade(fahrenheit):

return (fahrenheit - 32) * 5/9

# Create a NumPy array of Centigrade temperatures

centigrade_values = np.array([0, 10, 20, 30, 40, 50])

# Convert Centigrade to Fahrenheit

fahrenheit_values = centigrade_to_fahrenheit(centigrade_values)

# Create a NumPy array of Fahrenheit temperatures


fahrenheit_array = np.array([32, 50, 68, 86, 104, 122])

# Convert Fahrenheit to Centigrade

centigrade_from_fahrenheit = fahrenheit_to_centigrade(fahrenheit_array)

# Print the results

print("Centigrade values:", centigrade_values)

print("Converted Fahrenheit values:", fahrenheit_values)

print("\nFahrenheit values:", fahrenheit_array)

print("Converted Centigrade values:", centigrade_from_fahrenheit)

output:
Centigrade values: [ 0 10 20 30 40 50]

Converted Fahrenheit values: [ 32. 50. 68. 86. 104. 122.]

Fahrenheit values: [ 32 50 68 86 104 122]

Converted Centigrade values: [ 0. 10. 20. 30. 40. 50.]

2. i. Write a NumPy program to find the real and imaginary parts of an array of
complex numbers.
Program:
import numpy as np
# Create a NumPy array of complex numbers
complex_array = np.array([2 + 3j, 4 - 5j, -1 + 2j, 3 + 4j])
# Extract the real parts of the complex numbers
real_parts = np.real(complex_array)
# Extract the imaginary parts of the complex numbers
imaginary_parts = np.imag(complex_array)
# Print the results
print("Complex array:", complex_array)
print("Real parts:", real_parts)
print("Imaginary parts:", imaginary_parts)

output:
Complex array: [ 2.+3.j 4.-5.j -1.+2.j 3.+4.j]
Real parts: [ 2. 4. -1. 3.]
Imaginary parts: [ 3. -5. 2. 4.]

ii. Write a NumPy program to convert a NumPy array into a csv file
program:
import numpy as np

# Create a NumPy array

array_data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Save the NumPy array into a CSV file

np.savetxt('array_data.csv', array_data, delimiter=',', fmt='%d')

print("Array has been saved to 'array_data.csv'.")

output:
1,2,3
4,5,6
7,8,9
3. i. Write a NumPy program to perform the basic arithmetic operations
Program:
import numpy as np

# Create two NumPy arrays

array1 = np.array([10, 20, 30, 40, 50])

array2 = np.array([1, 2, 3, 4, 5])

# Addition

addition_result = array1 + array2

# Subtraction

subtraction_result = array1 - array2

# Multiplication

multiplication_result = array1 * array2

# Division

division_result = array1 / array2

# Exponentiation (array1 raised to the power of array2)

exponentiation_result = array1 ** array2

# Print the results

print("Array 1:", array1)

print("Array 2:", array2)

print("\nAddition (Array1 + Array2):", addition_result)

print("Subtraction (Array1 - Array2):", subtraction_result)

print("Multiplication (Array1 * Array2):", multiplication_result)

print("Division (Array1 / Array2):", division_result)

print("Exponentiation (Array1 ** Array2):", exponentiation_result)

output:
Array 1: [10 20 30 40 50]

Array 2: [1 2 3 4 5]

Addition (Array1 + Array2): [11 22 33 44 55]


Subtraction (Array1 - Array2): [ 9 18 27 36 45]

Multiplication (Array1 * Array2): [ 10 40 90 160 250]

Division (Array1 / Array2): [10. 10. 10. 10. 10.]

Exponentiation (Array1 ** Array2): [ 10 400 27000 1600000 9765625]

ii.Write a NumPy program to transpose an array.

Program:
import numpy as np

# Create a 2D NumPy array

array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Transpose the array

transposed_array = np.transpose(array)

# Alternatively, you can also use the shorthand `.T` to transpose

# transposed_array = array.T

# Print the original and transposed arrays

print("Original Array:")

print(array)

print("\nTransposed Array:")

print(transposed_array)

output:
Original Array:

[[1 2 3]

[4 5 6]

[7 8 9]]

Transposed Array:

[[1 4 7]

[2 5 8]

[3 6 9]]
4) i. Use NumPy , Create an array with 5 dimensions and verify that it has 5
dimensions.
Program:
import numpy as np

# Create a 5-dimensional NumPy array with random integers

array_5d = np.random.randint(1, 10, size=(2, 3, 4, 5, 6))

# Verify the number of dimensions using .ndim

print("Array Shape:", array_5d.shape)

print("Number of Dimensions:", array_5d.ndim)

output:
Array Shape: (2, 3, 4, 5, 6)

Number of Dimensions: 5

ii. Using NumPy, Sort a boolean array.


Program:
import numpy as np

# Create a boolean NumPy array

boolean_array = np.array([True, False, True, False, True, False])

# Sort the boolean array

sorted_array = np.sort(boolean_array)

# Print the original and sorted arrays

print("Original Boolean Array:", boolean_array)

print("Sorted Boolean Array:", sorted_array)

output:
Original Boolean Array: [ True False True False True False]

Sorted Boolean Array: [False False False True True True]


5) i. Create your own simple Pandas DataFrame and print its values.
Program:
import pandas as pd

# Create a simple dictionary with data

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],

'Age': [24, 27, 22, 32, 29],

'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']

# Create a DataFrame from the dictionary

df = pd.DataFrame(data)

# Print the DataFrame

print(df)

output:
Name Age City
0 Alice 24 New York
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
3 David 32 Houston
4 Eve 29 Phoenix
ii. Create your own DataFrame from dict of narray/list.
Program:
import pandas as pd

import numpy as np

# Create a dictionary with NumPy arrays or lists

data = {

'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard'],

'Price': np.array([1000, 600, 300, 250, 100]),

'Stock': np.array([50, 200, 150, 80, 500])

# Create a DataFrame from the dictionary

df = pd.DataFrame(data)

# Print the DataFrame

print(df)

output:

Product Price Stock


0 Laptop 1000 50
1 Phone 600 200
2 Tablet 300 150
3 Monitor 250 80
4 Keyboard 100 500
6. Perform appending, slicing, addition and deletion of rows with a Pandas
DataFrame.

Program:
import pandas as pd

# Create a simple DataFrame

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [24, 27, 22, 32],

'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']

df = pd.DataFrame(data)

# Print the original DataFrame

print("Original DataFrame:")

print(df)

# 1. Appending a new row to the DataFrame

new_row = {'Name': 'Eve', 'Age': 29, 'City': 'Phoenix'}

df = df.append(new_row, ignore_index=True)

print("\nDataFrame after appending a new row:")

print(df)

# 2. Slicing the DataFrame (selecting specific rows)

sliced_df = df[1:3] # Selecting rows 1 and 2 (indexing starts from 0)

print("\nSliced DataFrame (rows 1 to 2):")

print(sliced_df)

# 3. Adding a new row with 'loc'

df.loc[len(df)] = ['Frank', 30, 'Dallas']

print("\nDataFrame after adding a new row with 'loc':")

print(df)

# 4. Deleting a row (deleting row with index 2)

df = df.drop(2)
print("\nDataFrame after deleting row with index 2:")

print(df)

output:
Original DataFrame:

Name Age City

0 Alice 24 New York

1 Bob 27 Los Angeles

2 Charlie 22 Chicago

3 David 32 Houston

DataFrame after appending a new row:

Name Age City

0 Alice 24 New York

1 Bob 27 Los Angeles

2 Charlie 22 Chicago

3 David 32 Houston

4 Eve 29 Phoenix

Sliced DataFrame (rows 1 to 2):

Name Age City

1 Bob 27 Los Angeles

2 Charlie 22 Chicago

DataFrame after adding a new row with 'loc':

Name Age City

0 Alice 24 New York

1 Bob 27 Los Angeles

2 Charlie 22 Chicago

3 David 32 Houston

4 Eve 29 Phoenix

5 Frank 30 Dallas
DataFrame after deleting row with index 2:

Name Age City

0 Alice 24 New York

1 Bob 27 Los Angeles

3 David 32 Houston

4 Eve 29 Phoenix

5 Frank 30 Dallas

7.i. Using Pandas, Create a DataFrame with a list of dictionaries, row indices,
and column indices.
Program:

import pandas as pd

# Create a list of dictionaries

data = [

{'Name': 'Alice', 'Age': 24, 'City': 'New York'},

{'Name': 'Bob', 'Age': 27, 'City': 'Los Angeles'},

{'Name': 'Charlie', 'Age': 22, 'City': 'Chicago'},

{'Name': 'David', 'Age': 32, 'City': 'Houston'}

# Define custom row indices and column indices

row_indices = ['A', 'B', 'C', 'D']

column_indices = ['Name', 'Age', 'City']

# Create the DataFrame

df = pd.DataFrame(data, index=row_indices, columns=column_indices)

# Print the DataFrame

print(df)
output:
Name Age City

A Alice 24 New York

B Bob 27 Los Angeles

C Charlie 22 Chicago

D David 32 Houston

ii. Use index label to delete or drop rows from a Pandas DataFrame.
Program:
import pandas as pd

# Create a simple DataFrame

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David'],

'Age': [24, 27, 22, 32],

'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']

df = pd.DataFrame(data)

# Set custom row indices

df.index = ['A', 'B', 'C', 'D']

# Print the original DataFrame

print("Original DataFrame:")

print(df)

# 1. Drop a row by index label (e.g., drop row with index 'B')

df_dropped = df.drop('B')

print("\nDataFrame after dropping row with index 'B':")

print(df_dropped)

# 2. Drop multiple rows by index labels (e.g., drop rows with index 'A' and 'D')

df_dropped_multiple = df.drop(['A', 'D'])

print("\nDataFrame after dropping rows with index 'A' and 'D':")

print(df_dropped_multiple)
# 3. Drop a row in-place (this will modify the original DataFrame)

df.drop('C', inplace=True)

print("\nDataFrame after dropping row with index 'C' in-place:")

print(df)

output:
Original DataFrame:

Name Age City

A Alice 24 New York

B Bob 27 Los Angeles

C Charlie 22 Chicago

D David 32 Houston

DataFrame after dropping row with index 'B':

Name Age City

A Alice 24 New York

C Charlie 22 Chicago

D David 32 Houston

DataFrame after dropping rows with index 'A' and 'D':

Name Age City

B Bob 27 Los Angeles

C Charlie 22 Chicago

DataFrame after dropping row with index 'C' in-place:

Name Age City

A Alice 24 New York

B Bob 27 Los Angeles

D David 32 Houston
8.Using Pandas library,
i.Load the iris.CSV file
ii.Convert it into the data frame and read it .
iii.Display records only with species "Iris-setosa"
program:
import pandas as pd

# Step 1: Load the iris CSV file into a Pandas DataFrame

# Replace 'iris.csv' with the correct file path if necessary

df = pd.read_csv('iris.csv')

# Step 2: Display the entire DataFrame or the first few rows to ensure it's loaded correctly

print("First few records of the DataFrame:")

print(df.head())

# Step 3: Display only the records with species 'Iris-setosa'

setosa_df = df[df['species'] == 'Iris-setosa']

# Display the filtered DataFrame

print("\nRecords with species 'Iris-setosa':")

print(setosa_df)

output:
First few records of the DataFrame:

sepal_length sepal_width petal_length petal_width species

0 5.1 3.5 1.4 0.2 Iris-setosa

1 4.9 3.0 1.4 0.2 Iris-setosa

2 4.7 3.2 1.3 0.2 Iris-setosa

3 4.6 3.1 1.5 0.2 Iris-setosa

4 5.0 3.6 1.4 0.2 Iris-setosa

Records with species 'Iris-setosa':

sepal_length sepal_width petal_length petal_width species

0 5.1 3.5 1.4 0.2 Iris-setosa


1 4.9 3.0 1.4 0.2 Iris-setosa

2 4.7 3.2 1.3 0.2 Iris-setosa

3 4.6 3.1 1.5 0.2 Iris-setosa

4 5.0 3.6 1.4 0.2 Iris-setosa

...

9. Use the diabetes data set from UCI, Perform Univariate analysis.
Program:

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

# Step 1: Load the diabetes dataset from the UCI repository

# You can replace this URL with the actual URL of the dataset or load it from a local file.

url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-
diabetes.data.csv'

columns = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',

'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']

df = pd.read_csv(url, names=columns)

# Step 2: Check the first few rows of the dataset

print(df.head())

# Step 3: Summary statistics for numerical features

print("\nSummary Statistics:")

print(df.describe())

# Step 4: Visualizing the distribution of each feature (Univariate Analysis)

# Histograms for all features

df.hist(bins=20, figsize=(15,10))

plt.tight_layout()

plt.show()

# Step 5: Boxplots for all features to check for outliers

plt.figure(figsize=(15, 10))
sns.boxplot(data=df)

plt.xticks(rotation=45)

plt.tight_layout()

plt.show()

# Step 6: Checking the distribution of 'Outcome' (Diabetes status)

sns.countplot(x='Outcome', data=df)

plt.title('Distribution of Outcome (Diabetes Status)')

plt.show()

output:

10.Use the diabetes data set from Pima Indians Diabetes , Perform Bivariate
analysis.
Program:

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

# Load the dataset from the UCI repository or local file


url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-
diabetes.data.csv'

columns = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',

'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']

df = pd.read_csv(url, names=columns)

# Display first few rows of the dataset

print(df.head())

# Step 1: Correlation Heatmap to analyze relationships between numerical features

plt.figure(figsize=(10, 8))

correlation_matrix = df.corr()

sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)

plt.title('Correlation Heatmap of Diabetes Dataset')

plt.show()

# Step 2: Scatter plots between features and target variable 'Outcome'

plt.figure(figsize=(15, 10))

# Plotting scatter plot for 'Glucose' vs 'Outcome'

plt.subplot(2, 3, 1)

sns.scatterplot(x='Glucose', y='Outcome', data=df)

plt.title('Glucose vs Outcome')

# Plotting scatter plot for 'BMI' vs 'Outcome'

plt.subplot(2, 3, 2)

sns.scatterplot(x='BMI', y='Outcome', data=df)

plt.title('BMI vs Outcome')

# Plotting scatter plot for 'Age' vs 'Outcome'

plt.subplot(2, 3, 3)
sns.scatterplot(x='Age', y='Outcome', data=df)

plt.title('Age vs Outcome')

# Plotting scatter plot for 'Insulin' vs 'Outcome'

plt.subplot(2, 3, 4)

sns.scatterplot(x='Insulin', y='Outcome', data=df)

plt.title('Insulin vs Outcome')

# Plotting scatter plot for 'BloodPressure' vs 'Outcome'

plt.subplot(2, 3, 5)

sns.scatterplot(x='BloodPressure', y='Outcome', data=df)

plt.title('BloodPressure vs Outcome')

# Plotting scatter plot for 'Pregnancies' vs 'Outcome'

plt.subplot(2, 3, 6)

sns.scatterplot(x='Pregnancies', y='Outcome', data=df)

plt.title('Pregnancies vs Outcome')

plt.tight_layout()

plt.show()

# Step 3: Pairplot to visualize the relationships between multiple features and 'Outcome'

sns.pairplot(df, hue='Outcome', diag_kind='hist', markers=["o", "s"])

plt.suptitle('Pairplot of Features with Outcome', y=1.02)

plt.show()
output:

11.Perform Multiple Regression analysis on your own dataset ( For example,


Car dataset with information Company Name, Model, Volume, Weight, CO2)
with more than one independent value to predict a value based on two or
more variable.
Program:
# Import necessary libraries

import pandas as pd

import statsmodels.api as sm

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

import matplotlib.pyplot as plt


# Step 1: Create or Load your Dataset

# Sample data representing car information

data = {

'Company Name': ['Toyota', 'Honda', 'Ford', 'BMW', 'Audi'],

'Model': ['Corolla', 'Civic', 'Focus', 'X5', 'A4'],

'Volume': [1.8, 2.0, 1.5, 3.0, 2.5], # Engine volume in liters

'Weight': [1300, 1200, 1400, 2000, 1800], # Weight in kilograms

'CO2': [120, 110, 140, 200, 180] # CO2 emissions in grams per km

# Convert to DataFrame

df = pd.DataFrame(data)

# Step 2: Preprocess the Data

# Since we are predicting CO2 based on Volume and Weight, we can drop 'Company Name' and
'Model' for now

df = df.drop(columns=['Company Name', 'Model'])

# Independent variables (Volume, Weight)

X = df[['Volume', 'Weight']]

# Dependent variable (CO2)

y = df['CO2']

# Step 3: Add a constant to the independent variables (for intercept)

X = sm.add_constant(X)

# Step 4: Perform Multiple Regression using statsmodels

model = sm.OLS(y, X).fit()

# Step 5: Display the summary of the regression analysis

print("Multiple Regression Analysis Summary (statsmodels):")

print(model.summary())

# Step 6: Perform Multiple Regression using scikit-learn

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(df[['Volume', 'Weight']], df['CO2'], test_size=0.2,


random_state=42)

# Initialize the Linear Regression model

regressor = LinearRegression()
# Train the model

regressor.fit(X_train, y_train)

# Predict on the test set

y_pred = regressor.predict(X_test)

# Step 7: Evaluate the model

print("\nMultiple Regression Analysis using scikit-learn:")

print(f"Coefficients: {regressor.coef_}")

print(f"Intercept: {regressor.intercept_}")

# Calculate R-squared value and Mean Squared Error (MSE)

r2 = r2_score(y_test, y_pred)

mse = mean_squared_error(y_test, y_pred)

print(f"R-squared: {r2}")

print(f"Mean Squared Error: {mse}")

# Step 8: Plotting the results

plt.scatter(y_test, y_pred)

plt.xlabel("Actual CO2")

plt.ylabel("Predicted CO2")

plt.title("Actual vs Predicted CO2")

plt.show()

output:
12.Perform Bivariate analysis using the pandas DataFrame that contains
information about two variables: (1) Hours spent studying and (2) Exam score
received by 20 different students
Program:
import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from scipy.stats import pearsonr

# Step 1: Create the DataFrame

data = {

'Hours Studying': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],

'Exam Score': [35, 40, 50, 60, 65, 70, 75, 80, 85, 88, 90, 92, 94, 95, 96, 98, 99, 99, 100, 100]

# Convert the dictionary to a pandas DataFrame

df = pd.DataFrame(data)

# Step 2: Descriptive Statistics

print("Descriptive Statistics:")

print(df.describe())

# Step 3: Calculate Correlation

correlation, _ = pearsonr(df['Hours Studying'], df['Exam Score'])

print(f"\nCorrelation between Hours Studying and Exam Score: {correlation:.2f}")

# Step 4: Scatter Plot

plt.figure(figsize=(8, 6))

plt.scatter(df['Hours Studying'], df['Exam Score'], color='blue', label='Data Points')

plt.title('Hours Studying vs Exam Score')

plt.xlabel('Hours Studying')

plt.ylabel('Exam Score')

plt.grid(True)

plt.legend()
plt.show()

# Step 5: Linear Regression Line (Fit a regression line)

sns.regplot(x='Hours Studying', y='Exam Score', data=df, scatter_kws={'color':'blue'},


line_kws={'color':'red'})

plt.title('Linear Regression Line: Hours Studying vs Exam Score')

plt.xlabel('Hours Studying')

plt.ylabel('Exam Score')

plt.show()

output:

13 . Perform Univariate analysis with the following pandas DataFrame 'points':


[1, 1, 2, 3.5, 4, 4, 4, 5, 5, 6.5, 7, 7.4, 8, 13, 14.2] 'assists': [5, 7, 7, 9, 12, 9, 9, 4, 6,
8, 8, 9, 3, 2, 6] 'rebounds': [11, 8, 10, 6, 6, 5, 9, 12, 6, 6, 7, 8, 7, 9, 15].

Program:
import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns


# Step 1: Create the DataFrame

data = {

'points': [1, 1, 2, 3.5, 4, 4, 4, 5, 5, 6.5, 7, 7.4, 8, 13, 14.2],

'assists': [5, 7, 7, 9, 12, 9, 9, 4, 6, 8, 8, 9, 3, 2, 6],

'rebounds': [11, 8, 10, 6, 6, 5, 9, 12, 6, 6, 7, 8, 7, 9, 15]

# Convert the dictionary to a pandas DataFrame

df = pd.DataFrame(data)

# Step 2: Descriptive Statistics for each column

print("Descriptive Statistics:")

print(df.describe())

# Step 3: Visualizing the Distribution of each variable

# Plot histograms for each variable

plt.figure(figsize=(12, 6))

# Histogram for 'points'

plt.subplot(1, 3, 1)

sns.histplot(df['points'], kde=True, color='blue', bins=10)

plt.title('Distribution of Points')

plt.xlabel('Points')

plt.ylabel('Frequency')

# Histogram for 'assists'

plt.subplot(1, 3, 2)

sns.histplot(df['assists'], kde=True, color='green', bins=10)

plt.title('Distribution of Assists')
plt.xlabel('Assists')

plt.ylabel('Frequency')

# Histogram for 'rebounds'

plt.subplot(1, 3, 3)

sns.histplot(df['rebounds'], kde=True, color='red', bins=10)

plt.title('Distribution of Rebounds')

plt.xlabel('Rebounds')

plt.ylabel('Frequency')

plt.tight_layout()

plt.show()

# Step 4: Box plots to visualize outliers

plt.figure(figsize=(12, 6))

# Box plot for 'points'

plt.subplot(1, 3, 1)

sns.boxplot(y=df['points'], color='blue')

plt.title('Boxplot of Points')

# Box plot for 'assists'

plt.subplot(1, 3, 2)

sns.boxplot(y=df['assists'], color='green')

plt.title('Boxplot of Assists')

# Box plot for 'rebounds'

plt.subplot(1, 3, 3)

sns.boxplot(y=df['rebounds'], color='red')

plt.title('Boxplot of Rebounds')
plt.tight_layout()

plt.show()

# Step 5: Skewness and Kurtosis

from scipy.stats import skew, kurtosis

# Skewness and Kurtosis for 'points'

points_skew = skew(df['points'])

points_kurt = kurtosis(df['points'])

# Skewness and Kurtosis for 'assists'

assists_skew = skew(df['assists'])

assists_kurt = kurtosis(df['assists'])

# Skewness and Kurtosis for 'rebounds'

rebounds_skew = skew(df['rebounds'])

rebounds_kurt = kurtosis(df['rebounds'])

print("\nSkewness and Kurtosis:")

print(f"Points: Skewness = {points_skew:.2f}, Kurtosis = {points_kurt:.2f}")

print(f"Assists: Skewness = {assists_skew:.2f}, Kurtosis = {assists_kurt:.2f}")

print(f"Rebounds: Skewness = {rebounds_skew:.2f}, Kurtosis = {rebounds_kurt:.2f}")

output:
14. i) Using various functions in numpy library, mathematically calculate the
values for a normal distribution and create Histograms to plot the probability
distribution curve.
Program:
import numpy as np

import matplotlib.pyplot as plt

# Step 1: Parameters for the normal distribution

mu = 0 # Mean of the distribution

sigma = 1 # Standard deviation

size = 10000 # Number of data points to generate

# Step 2: Generate random samples from a normal distribution

data = np.random.normal(mu, sigma, size)

# Step 3: Plot the histogram

plt.figure(figsize=(10, 6))
count, bins, ignored = plt.hist(data, bins=30, density=True, alpha=0.6, color='g')

# Step 4: Calculate the Probability Density Function (PDF)

# Define the normal distribution function

def normal_distribution(x, mu, sigma):

return (1/np.sqrt(2 * np.pi * sigma**2)) * np.exp(-0.5 * ((x - mu) / sigma)**2)

# Step 5: Generate points for the normal distribution curve

x_values = np.linspace(min(bins), max(bins), 100)

pdf_values = normal_distribution(x_values, mu, sigma)

# Step 6: Plot the PDF curve over the histogram

plt.plot(x_values, pdf_values, 'k', linewidth=2)

plt.title("Normal Distribution with Histogram")

plt.xlabel("Data points")

plt.ylabel("Density")

plt.grid(True)

plt.show()

output:
14.ii) Using plt.contour(), plt.contourf(), plt.imshow(), plt.colorbar(), plt.clabel()
functions visualize a contour plot.
Program:
import numpy as np

import matplotlib.pyplot as plt

# Create some sample data

x = np.linspace(-3, 3, 100)

y = np.linspace(-3, 3, 100)

X, Y = np.meshgrid(x, y)

Z = np.sin(X**2 + Y**2) / (X**2 + Y**2)

# Create a contour plot

plt.contour(X, Y, Z, levels=20, cmap='viridis')

# Create a filled contour plot

plt.contourf(X, Y, Z, levels=20, cmap='viridis', alpha=0.7)

# Add a colorbar

plt.colorbar()

# Add labels to the contour lines

plt.clabel(plt.contour(X, Y, Z, levels=20, colors='k'), inline=True, fontsize=10)

# Display the plot

plt.show()

output:
15 Make a three-dimensional plot with randomly generate 50 data points for x,
y, and z. Set the point color as red, and size of the point as 50.

Program:
import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

import numpy as np

# Generate 50 random data points for x, y, and z

np.random.seed(42) # Set a seed for reproducibility

x = np.random.rand(50) * 10

y = np.random.rand(50) * 10

z = np.random.rand(50) * 10

# Create a 3D plot

fig = plt.figure()

ax = fig.add_subplot(111, projection='3d')

# Plot the points with specified color and size

ax.scatter(x, y, z, c='red', s=50)

# Set labels for axes

ax.set_xlabel('X')

ax.set_ylabel('Y')

ax.set_zlabel('Z')

# Show the plot

plt.show()

output:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy