M3-Introduction to Numpy and Pandas
M3-Introduction to Numpy and Pandas
Intro.
Numpy : Numerical python
Python provides set of modules required for different
software applications
General Purpose Programming Lang.
numpy,pandas, ....
Required programming for DA, platform for DA, Usefull programming lang. For AI
Numpy:
We store data in different formats.
Dimension based the data are stored, i.e, arrays.
Since array is a static type, Its achieved by using Numpy
module.
As, Python is dynamic programming language.
Understanding Data Types in
Python
how arrays of data are handled
how NumPy improves on this
/* C code */
int result = 0;
for(int i=0; i<100;i++){
result +=i;
}
While in Python the equivalent operation could be written this
way:
# Python code
result = 0
for i in range(100):
result += i
In C, the data types of each variable are explicitly declared,
while in Python the types are dynamically inferred
# Python code
x=4
x = "four“
/* C code */
int x = 4;
x = "four"; // FAIL
A Python List
Python data structure that holds many Python
objects
The standard mutable multi element container in
Python is the list.
#list of integers:
In[1]: L = list(range(10))
L
Out[1]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In[2]: type(L[0])
Out[2]: int
#list of strings:
In[3]: L2 = [str(c) for c in L]
L2
Out[3]: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
In[4]: type(L2[0])
Out[4]: str
In[4]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")
itemsize: 8 bytes
nbytes: 480 bytes
Array Indexing: Accessing Single Elements
In a one-dimensional array,
we can access the ith value
(counting from zero) by specifying the desired index in square brackets
x1 x1[0] x1[4]
To index from the end of the array, you can use negative indices
x1[-1] x1[-2]
In a multidimensional array, using a comma-separated:
x2 x2[0, 0] x2[2, 0] x2[2, -1]
We can also modify values using any of the above index notation:
x2[0, 0] = 12
x2
NumPy arrays have a fixed type. x1[0] = 3.14159 # this will be truncated!
Array Slicing: Accessing Subarrays
To access subarrays with the slice notation, is marked by the colon (:)
character.
The NumPy slicing syntax:
x[start:stop:step]
One-dimensional subarrays :
x = np.arange(10)
x[:5] # first five elements
x[5:] # elements after index 5
x[4:7] # middle subarray
x[::2] # every other element
x[1::2] # every other element, starting at index 1
In case is when the step value is negative.
x[::-1] # all elements, reversed
x[5::-2] # reversed every other from index 5
Multidimensional subarrays
x2
x2[:2, :3] # two rows, three columns
x2[:3, ::2] # all rows, every other column
x2[::-1, ::-1] #reversed
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])
array([1, 2, 3, 3, 2, 1])
z = [99, 99, 99] # concatenate more than two arrays at once
print(np.concatenate([x, y, z]))
[ 1 2 3 3 2 1 99 99 99]
X`
grid = np.array([[1, 2, 3], [4, 5, 6]])
np.concatenate([grid, grid]) # the first axis
np.concatenate([grid, grid], axis=1)#the second axis (0)
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7], [6, 5, 4]]) # vertically stack the arrays
np.vstack([x, grid])
# horizontally stack the arrays
y = np.array([[99], [99]])
np.hstack([grid, y])
Cont..
Splitting of arrays :
np.split, np.hsplit, and np.vsplit.
[1 2 3] [99 99] [3 2 1]
grid = np.arange(16).reshape((4, 4))
grid
Import pandas as pd
obj = pd.Series([4, 7, -5, 3]) obj # Output
obj.values
obj.index
'b' in obj2
sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
obj3 = pd.Series(sdata)
index = pd.Index(np.arange(3))
obj2 = Series([1.5, -2.5, 0], index=index)
obj2.index is index
Class Description
Index Index object, representing axis labels
in a NumPy array of Python objects.
Int64Index Specialized Index for integer values.
MultiIndex “Hierarchical” index object representing
multiple levels of indexing on a single axis.
DatetimeIndex Stores nanosecond timestamps
PeriodIndex Specialized Index for Period data
Method Description
frame.reindex(columns=states)
obj['b']
obj[1]
obj[2:4]
obj[['b', 'a', 'd']]
obj[[1, 3]]
obj[obj < 2]
Slicing
obj['b':'c']
obj['b':'c'] = 5
Indexing
data[:2]
data[data['three'] > 5]
data < 5
data[data < 5] = 0 //
data.ix['Colorado', ['two', 'three']]
data.ix[['Colorado', 'Utah'], [3, 0, 1]]
data.ix[2]
data.ix[:'Utah', 'two']
data.ix[data.three > 5, :3]
Type Notes
obj[val] Single col or sequence of cols from the DataFrame.
boolean array (filter rows), slice (slice rows), or
boolean DataFrame
obj.ix[val] Selects single row of subset of rows from the
DataFrame.
obj.ix[:, val] Selects single column of subset of columns.
obj.ix[val1, val2] Select both rows and columns.
reindex method Conform one or more axes to new indexes.
xs method Select single row or column as a Series by label.
icol, irow methods Select single column or row, respectively, as a
Series by integer location.
get_value, Select single value by row and column label.
set_value methods
Aligning, mapping, and sorting
data in Pandas
Data alignment
df1 = DataFrame(np.arange(9).reshape(3,3),
columns=['a','b','c'], index=['SA', 'VIC', 'NSW'])
df1
df2 = DataFrame(np.arange(12).reshape(4,3),
columns=['a','b','e'], index=['SA', 'VIC', 'NSW', 'ACT'])
df2
Adding DataFrames
df1+df2
f = lambda x:x.upper()
df_states['state'] = df_states['state'].apply(f)
df_states
Sorting and ranking
df_states.sort_index()
df_states.sort_index(axis=1)
Sorting and ranking
obj = Series(range(4), index=['d', 'a', 'b', 'c'])
obj.sort_index()