Pandas 730pm
Pandas 730pm
=====================
-->It is the most important and commonly used library in datascience domain.
-->Pandas is freeware and opensource.
-->Pandas is built on top of Numpy.
-->It allows fast analysis , data cleaning and preparation.
-->Perfoemace wise and productivity wise pandas is too good to use.
-->It can work with data from a wide variety of sources like fies etc...
-->By using pandas we can manipulate data very easily with very less code and in
very less time.
Note:
--------
1.Numpy is a data analysis library
2.Matplotlib is a data visualization library
3.Pandas is bot data analysis and data visualization library.
4.Pandas data analysis is based on Numpy where as data visualization is based on
matplotlib.
website:https://pandas.pydata.org/
Latest version: 2.2.3(Sep 20, 2024)
From Doc:
pandas is a fast, powerful, flexible and easy to use open source data
analysis and manipulation tool, built on top of the Python programming language.
How to install:
pip install pandas
Important Topics:
--------------------------
Series
DataFrames
Missing Data
GroupBy
Merging,Joining and Concatenating
Operations
Data input and output
etc....
1).Series:
--------------
-->It is one of key data structure in pandas.
-->It is one-dimensional labeled arrays. i.e a sequence of values associated with
labels.
Note:
--------
1.In the above Series object, we have 3-values (python,java,DS) associated with
index labels (0,1,2), which are generated automatically by pandas.
2.For a string values, dtype is considered as object.
3.The default index labels are integers starts from 0. But we can define any other
type labels also.
4.The labels need not be unique.
Ex:
marks_list = [70,80,90]
s = pd.Series(marks_list)
print(s)
Ex:
salaries_list = [1000.5,2000.6,3000.7]
s = pd.Series(salaries_list)
print(s)
Ex:
hetro_list = [10,'Mahesh',10.5,True]
s = pd.Series(hetro_list)
print(s)
Ex-2
-------
books_dict = {'Book-1':'Python','Book-2':'Django','Book-3':'REST_API'}
s = pd.Series(books_dict)
print(s)
Note:
1.Index labels and values need not be homogenious.
2.Index labels need not be unique.
Labels need not be unique but must be a hashable type. The object
supports both integer- and label-based indexing and provides a host of
methods for performing operations involving the index. Statistical
methods from ndarray have been overridden to automatically exclude
missing data (currently represented as NaN).
Operations between Series (+, -, /, \\*, \\*\\*) align values based on their
associated index values-- they need not be the same length. The result
index will be the sorted union of the two indexes.
Parameters
----------
data : array-like, Iterable, dict, or scalar value
Contains data stored in Series. If data is a dict, argument order is
maintained.
index : array-like or Index (1d)
Values must be hashable and have the same length as `data`.
Non-unique index values are allowed. Will default to
RangeIndex (0, 1, 2, ..., n) if not provided. If data is dict-like
and index is None, then the keys in the data are used as the index. If the
index is not None, the resulting Series is reindexed with the index values.
dtype : str, numpy.dtype, or ExtensionDtype, optional
Data type for the output Series. If not specified, this will be
inferred from `data`.
See the :ref:`user guide <basics.dtypes>` for more usages.
name : Hashable, default None
The name to give to the Series.
copy : bool, default False
Copy input data. Only affects Series or 1d ndarray input. See examples.
Notes
-----
Please reference the :ref:`User Guide <basics.series>` for more information.
Examples
--------
Constructing Series from a dictionary with an Index specified
The keys of the dictionary match with the Index values, hence the Index
values have no effect.
Note that the Index is first build with the keys from the dictionary.
After this the Series is reindexed with the given Index values, hence we
get all NaN as a result.
Constructing Series from a list with `copy=False`.
>>> r = [1, 2]
>>> ser = pd.Series(r, copy=False)
>>> ser.iloc[0] = 999
>>> r
[1, 2]
>>> ser
0 999
1 2
dtype: int64
1).Data Parameter
---------------------------
data parameter can be used to represent data which is required to store inside
Series object.
books_dict = {'Book-1':'Python',10:20,10.5:20.6,'Book-2':'DS'}
s = pd.Series(data = books_dict)
print(s)
Note:
--------
The following are valid:
s = pd.Series(data = [10,20,30])
s = pd.Series(data = {0:'A',1:'B',2:'C'})
s = pd.Series(data = {'A':'Apple','B':'Ball','C':'Cat'})
s = pd.Series(data = np.array([10,20,30]))
s = pd.Series(data = 10)
s = pd.Series(data = 'Mahesh')
2).index parameter:
-----------------------------
-->We can use index parameter to define our own index values.
-->The values need not be unique.
-->If we are not using index, then pandas will generate default index labels which
are integers starts from 0.
-->The number of index values should be same as the number of values of data
parameter.
Ex:
-----
name_list = ['Sunny','Bunny','Vinny']
s = pd.Series(data = name_list,index=['S','B','C'])
print(s)
Note:
s = pd.Series(data = name_list,index=['S','B'])
ValueError: Length of values (3) does not match length of index (2)
If the data is dict, then matched indexes only will be considered from the dict
-----------------------------------------------------------------------------------
----------------------------------
name_dict = {'S':'Sunny','B':'Bunny','V':'Vinny','C':'Chinny'}
s = pd.Series(data = name_dict,index=['S','B',])
print(s)
Note that the Index is first build with the keys from the dictionary.
After this the Series is reindexed with the given Index values, hence we
get all NaN as a result.