Dse Unit 3
Dse Unit 3
NumPy provides a wide range of operations for working with arrays, including:
Arithmetic Operations:
Element-wise addition, subtraction, multiplication, and division:
Python
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b # Element-wise addition
d = a - b # Element-wise subtraction
e = a * b # Element-wise multiplication
f = a / b # Element-wise division
Matrix multiplication:
Python
A = np.array([[1, 2],
[3, 4]])
B = np.array([[5, 6],
[7, 8]])
mean = np.mean(a)
median = np.median(a)
mode = np.mode(a)
std_dev = np.std(a)
variance = np.var(a)
Shape Manipulation:
Reshaping arrays:
Python
a = np.array([1, 2, 3, 4, 5, 6])
min_val = np.min(a)
max_val = np.max(a)
Generating random numbers:
Python
random_array = np.random.rand(3, 3) # Generate a 3x3 array of random numbers between
0 and 1
These are just a few examples of the many operations that can be performed using NumPy.
The specific operations you'll need to use will depend on your particular use case.
Transpose Matrix:
A transpose matrix is obtained by interchanging the rows and columns of the original matrix.
In other words, the elements of the ith row become the elements of the ith column in the
transposed matrix.
In NumPy, you can easily transpose a matrix using the T attribute:
Python
import numpy as np
A = np.array([[1, 2, 3],
[4, 5, 6]])
print(B) # Output:
# [[1 4]
# [2 5]
# [3 6]]
Swapping Axes:
Swapping axes in NumPy involves rearranging the dimensions of an array. This is particularly
useful for higher-dimensional arrays.
You can use the swapaxes() function to swap two specific axes of an array:
Python
import numpy as np
A = np.array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
print(B) # Output:
# [[[1 5]
# [2 6]],
# [[3 7]
# [4 8]]]
In this example, the first and third axes are swapped, resulting in a new array with
dimensions (2, 2, 2).
Key points:
Transposing a matrix is equivalent to swapping its first two axes.
The T attribute provides a convenient way to transpose matrices.
The swapaxes() function allows you to swap any two axes of an array.
Swapping axes can be useful for reshaping arrays or performing specific operations.
Pandas provides a variety of data structures that are tailored for efficient data manipulation
and analysis. Here are the main types:
1. Series:
A one-dimensional labeled array capable of holding any data type (integers, floats,
strings, objects, etc.).
The labels are called the index.
Can be created from Python lists, NumPy arrays, or dictionaries.
2. DataFrame:
A two-dimensional labeled data structure with rows and columns.
Each column is a Series object.
Can be created from Python dictionaries, lists of lists, NumPy arrays, or other
DataFrames.
Provides various methods for data manipulation, such as filtering, sorting, grouping,
and aggregation.
3. Panel:
A three-dimensional labeled data structure with panels, rows, and columns.
Each panel is a DataFrame.
Less commonly used compared to Series and DataFrame.
4. Panel4D:
A four-dimensional labeled data structure with frames, panels, rows, and columns.
Each frame is a Panel.
Even less commonly used compared to Series and DataFrame.
5. Sparse Data Structures:
Optimized for data with many missing values.
Include SparseSeries, SparseDataFrame, and SparsePanel.
Store only the non-zero values and their corresponding indices, saving memory.
Key Points:
Series and DataFrame are the most commonly used data structures in Pandas.
DataFrames are particularly useful for tabular data.
Sparse data structures can be more efficient for large datasets with many missing
values.
Pandas provides various methods for converting between different data structures.
By understanding these different data structures, you can choose the appropriate one for
your specific data analysis tasks and work efficiently with Pandas.
s = pd.Series([1, 2, 3, 4, 5])
value = s[2] # Access the third element
Label-based indexing: Access elements by their labels.
Python
s = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
value = s['c'] # Access the element with the label 'c'
DataFrame Indexing:
Row indexing: Access rows using integer-based or label-based indexing.
Python
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
row = df.iloc[1] # Access the second row by integer index
row = df.loc['row_label'] # Access the row with the label 'row_label'
Column indexing: Access columns using integer-based or label-based indexing.
Python
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
column = df['A'] # Access the column named 'A'
column = df.iloc[:, 1] # Access the second column by integer index
Selection
Boolean indexing: Select rows or columns based on a boolean condition.
Python
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
selected_rows = df[df['A'] > 2] # Select rows where 'A' is greater than 2
Slicing: Extract a subset of rows or columns using slicing notation.
Python
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
sub_df = df[1:3] # Extract rows 1 to 2 (exclusive)
loc and iloc attributes:
o loc uses labels for indexing.
o iloc uses integer-based indexing.
Key points:
Indexing and selection are essential for accessing specific elements or subsets of data
in Pandas.
Use integer-based indexing for positions and label-based indexing for named
elements.
Boolean indexing and slicing provide flexible ways to select data based on conditions
or ranges.
loc and iloc attributes offer different ways to access data based on labels or integer
positions.
By understanding these concepts, you can effectively manipulate and analyze data using
Pandas.
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b # Element-wise addition
Matrix multiplication:
Python
A = np.array([[1, 2],
[3, 4]])
B = np.array([[5, 6],
[7, 8]])