0% found this document useful (0 votes)
2 views5 pages

Pandas 730pm

Pandas is a powerful, open-source data analysis and manipulation library built on Python, known for its efficiency in data cleaning and preparation. It supports various data structures, primarily Series and DataFrames, and allows easy data manipulation with minimal code. The library is built on top of Numpy and integrates well with Matplotlib for data visualization.

Uploaded by

kumargpc7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views5 pages

Pandas 730pm

Pandas is a powerful, open-source data analysis and manipulation library built on Python, known for its efficiency in data cleaning and preparation. It supports various data structures, primarily Series and DataFrames, and allows easy data manipulation with minimal code. The library is built on top of Numpy and integrates well with Matplotlib for data visualization.

Uploaded by

kumargpc7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 5

Introduction to Pandas:

=====================
-->It is the most important and commonly used library in datascience domain.
-->Pandas is freeware and opensource.
-->Pandas is built on top of Numpy.
-->It allows fast analysis , data cleaning and preparation.
-->Perfoemace wise and productivity wise pandas is too good to use.
-->It can work with data from a wide variety of sources like fies etc...
-->By using pandas we can manipulate data very easily with very less code and in
very less time.

Note:
--------
1.Numpy is a data analysis library
2.Matplotlib is a data visualization library
3.Pandas is bot data analysis and data visualization library.
4.Pandas data analysis is based on Numpy where as data visualization is based on
matplotlib.

website:https://pandas.pydata.org/
Latest version: 2.2.3(Sep 20, 2024)

From Doc:
pandas is a fast, powerful, flexible and easy to use open source data
analysis and manipulation tool, built on top of the Python programming language.

How to install:
pip install pandas

How to check installation:


>>> import pandas as pd
>>> pd.__version__ #'2.0.3'

Important Topics:
--------------------------
Series
DataFrames
Missing Data
GroupBy
Merging,Joining and Concatenating
Operations
Data input and output
etc....

1).Series:
--------------
-->It is one of key data structure in pandas.
-->It is one-dimensional labeled arrays. i.e a sequence of values associated with
labels.

Creation of Series from python list:


------------------------------------------------------
import pandas as pd
books_list = ['Python','Java','DataScience']
s = pd.Series(books_list)
print(type(s))
print(s)

Note:
--------
1.In the above Series object, we have 3-values (python,java,DS) associated with
index labels (0,1,2), which are generated automatically by pandas.
2.For a string values, dtype is considered as object.
3.The default index labels are integers starts from 0. But we can define any other
type labels also.
4.The labels need not be unique.

Ex:
marks_list = [70,80,90]
s = pd.Series(marks_list)
print(s)

Ex:
salaries_list = [1000.5,2000.6,3000.7]
s = pd.Series(salaries_list)
print(s)

Ex:
hetro_list = [10,'Mahesh',10.5,True]
s = pd.Series(hetro_list)
print(s)

-->The value in Series can be any type even hetrogenious also.

Creation of Series from python dict:


------------------------------------------------------
Ex-1
-------
books_dict = {0:'Python',1:'Django',2:'REST_API'}
s = pd.Series(books_dict)
print(s)

Ex-2
-------
books_dict = {'Book-1':'Python','Book-2':'Django','Book-3':'REST_API'}
s = pd.Series(books_dict)
print(s)

Note:
1.Index labels and values need not be homogenious.
2.Index labels need not be unique.

From Source code of pandas:


-------------------------------------------
# Series class

# error: Cannot override final attribute "ndim" (previously declared in base


# class "NDFrame")
# error: Cannot override final attribute "size" (previously declared in base
# class "NDFrame")
# definition in base class "NDFrame"
class Series(base.IndexOpsMixin, NDFrame): # type: ignore[misc]
"""
One-dimensional ndarray with axis labels (including time series).

Labels need not be unique but must be a hashable type. The object
supports both integer- and label-based indexing and provides a host of
methods for performing operations involving the index. Statistical
methods from ndarray have been overridden to automatically exclude
missing data (currently represented as NaN).

Operations between Series (+, -, /, \\*, \\*\\*) align values based on their
associated index values-- they need not be the same length. The result
index will be the sorted union of the two indexes.

Parameters
----------
data : array-like, Iterable, dict, or scalar value
Contains data stored in Series. If data is a dict, argument order is
maintained.
index : array-like or Index (1d)
Values must be hashable and have the same length as `data`.
Non-unique index values are allowed. Will default to
RangeIndex (0, 1, 2, ..., n) if not provided. If data is dict-like
and index is None, then the keys in the data are used as the index. If the
index is not None, the resulting Series is reindexed with the index values.
dtype : str, numpy.dtype, or ExtensionDtype, optional
Data type for the output Series. If not specified, this will be
inferred from `data`.
See the :ref:`user guide <basics.dtypes>` for more usages.
name : Hashable, default None
The name to give to the Series.
copy : bool, default False
Copy input data. Only affects Series or 1d ndarray input. See examples.

Notes
-----
Please reference the :ref:`User Guide <basics.series>` for more information.

Examples
--------
Constructing Series from a dictionary with an Index specified

>>> d = {'a': 1, 'b': 2, 'c': 3}


>>> ser = pd.Series(data=d, index=['a', 'b', 'c'])
>>> ser
a 1
b 2
c 3
dtype: int64

The keys of the dictionary match with the Index values, hence the Index
values have no effect.

>>> d = {'a': 1, 'b': 2, 'c': 3}


>>> ser = pd.Series(data=d, index=['x', 'y', 'z'])
>>> ser
x NaN
y NaN
z NaN
dtype: float64

Note that the Index is first build with the keys from the dictionary.
After this the Series is reindexed with the given Index values, hence we
get all NaN as a result.
Constructing Series from a list with `copy=False`.

>>> r = [1, 2]
>>> ser = pd.Series(r, copy=False)
>>> ser.iloc[0] = 999
>>> r
[1, 2]
>>> ser
0 999
1 2
dtype: int64

Due to input data type the Series has a `copy` of


the original data even though `copy=False`, so
the data is unchanged.

Constructing Series from a 1d ndarray with `copy=False`.

>>> r = np.array([1, 2])


>>> ser = pd.Series(r, copy=False)
>>> ser.iloc[0] = 999
>>> r
array([999, 2])
>>> ser
0 999
1 2
dtype: int64

Due to input data type the Series has a `view` on


the original data, so
the data is changed as well.
"""

The 5 parameters of Series Constructor:


-----------------------------------------------------------
1.data parameter
2.index " "
3.dtype " "
4.name " "
5.copy " "

1).Data Parameter
---------------------------
data parameter can be used to represent data which is required to store inside
Series object.

books_dict = {'Book-1':'Python',10:20,10.5:20.6,'Book-2':'DS'}
s = pd.Series(data = books_dict)
print(s)

Note:
--------
The following are valid:
s = pd.Series(data = [10,20,30])
s = pd.Series(data = {0:'A',1:'B',2:'C'})
s = pd.Series(data = {'A':'Apple','B':'Ball','C':'Cat'})
s = pd.Series(data = np.array([10,20,30]))
s = pd.Series(data = 10)
s = pd.Series(data = 'Mahesh')
2).index parameter:
-----------------------------
-->We can use index parameter to define our own index values.
-->The values need not be unique.
-->If we are not using index, then pandas will generate default index labels which
are integers starts from 0.
-->The number of index values should be same as the number of values of data
parameter.

Ex:
-----
name_list = ['Sunny','Bunny','Vinny']
s = pd.Series(data = name_list,index=['S','B','C'])
print(s)

Note:
s = pd.Series(data = name_list,index=['S','B'])
ValueError: Length of values (3) does not match length of index (2)

Duplicate index labels possible


----------------------------------------------
name_list = ['Sunny','Bunny','Vinny','Binny']
s = pd.Series(data = name_list,index=['S','B','C','B'])
print(s)

If the data is dict, then matched indexes only will be considered from the dict
-----------------------------------------------------------------------------------
----------------------------------
name_dict = {'S':'Sunny','B':'Bunny','V':'Vinny','C':'Chinny'}
s = pd.Series(data = name_dict,index=['S','B',])
print(s)

Ex:From pandas source code


-------------------------------------------
>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> ser = pd.Series(data=d, index=['x', 'y', 'z'])
>>> ser
x NaN
y NaN
z NaN
dtype: float64

Note that the Index is first build with the keys from the dictionary.
After this the Series is reindexed with the given Index values, hence we
get all NaN as a result.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy