0% found this document useful (0 votes)
4 views

0. Introduction of Python for Data Analysis

The document outlines a summer 2023 course on Python for Data Analysis, taught by Chaofan Sun, covering key topics such as Jupyter Notebook, Python basics, and libraries like NumPy, Pandas, and SciKit-Learn. The course structure includes reviews, exams, and focuses on data wrangling, visualization, and statistical analysis. It also provides links to textbooks and online resources for further learning.

Uploaded by

Perpetual Nkata
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

0. Introduction of Python for Data Analysis

The document outlines a summer 2023 course on Python for Data Analysis, taught by Chaofan Sun, covering key topics such as Jupyter Notebook, Python basics, and libraries like NumPy, Pandas, and SciKit-Learn. The course structure includes reviews, exams, and focuses on data wrangling, visualization, and statistical analysis. It also provides links to textbooks and online resources for further learning.

Uploaded by

Perpetual Nkata
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Python for Data Analysis

CSC 430_530_DA_401_501

Summer 2023

Chaofan Sun (sunc@cua.edu)


Textbook
Python for Data Analysis: Data Wrangling with Pandas,
NumPy, and IPython 2nd Edition, Kindle Edition, by
Wes McKinney (Author)
• ISBN-13: 978-1491957660,
• ISBN-10: 1491957662

PDF version is available in Blackboard.

2
Outline of this class
1. Review (chapters 1-4):Three or Four weeks, then Exam 1
• Jupyter Notebook
• Python (list, tuple, dictionary, loop, and if-elif-else)
• NumPy

2. Pandas (Chapter 5-8, 10): Six weeks, then Exam 2


• Data extractions, parsing, joining, standardizing, cleaning
• Data consolidating and filtering
• Statistics
• Etc.

3. Visualization (Chapter 9 and online resources): Two weeks, No exam


• pandas data frame plot
• matplotlib
• Seaborn

3
Python Libraries for Data Science
Many popular Python toolboxes/libraries:
• NumPy
• SciPy
• Pandas
• SciKit-Learn
• Keras

Visualization libraries
• matplotlib
• Seaborn

and many more …

4
Python Libraries for Data Science

• introduces objects for multidimensional arrays


and matrices, as well as functions that allow to
easily perform advanced mathematical and

NumPy:
statistical operations on those objects
• provides vectorization of mathematical
operations on arrays and matrices which
significantly improves the performance
• many other python libraries are built on NumPy

Link: http://www.numpy.org/

5
Python Libraries for Data Science

• collection of algorithms for linear


algebra, differential equations,
numerical integration,
SciPy: optimization, statistics and more
• part of SciPy Stack
• built on NumPy

Link: https://www.scipy.org/scipylib/
6
Python Libraries for Data Science
• adds data structures and tools designed
to work with table-like data (similar to
Series and Data Frames in R)
Pandas: • provides tools for data manipulation:
reshaping, merging, sorting, slicing,
aggregation etc.
• allows handling missing data

Link: http://pandas.pydata.org/

7
Python Libraries for Data Science

• provides machine learning


algorithms: classification,
SciKit- regression, clustering, model
validation etc.
Learn: • built on NumPy, SciPy and
matplotlib

Link: http://scikit-learn.org/

8
Python Libraries for Data Science

• python 2D plotting library which produces


publication quality figures in a variety of hardcopy
formats
• a set of functionalities similar to those of MATLAB
matplotlib: • line plots, scatter plots, barcharts, histograms, pie
charts etc.
• relatively low-level; some effort needed to create
advanced visualization

Link: https://matplotlib.org/
9
Python Libraries for Data Science

Seaborn:

provides high level


Similar (in style) to the
interface for drawing
based on matplotlib popular ggplot2
attractive statistical
library in R
graphics

Link: https://seaborn.pydata.org/

10
For Coding: Python
Jupyter notebook

11
Python/Pandas: Online Resources
Webpages:
• https://www.w3schools.com/python/python_syntax.asp
• https://www.geeksforgeeks.org/python-programming-language/

Video:
• https://www.youtube.com/playlist?list=PL5-da3qGB5ICCsgW1MxlZ0
Hq8LL5U3u9y
• https://www.youtube.com/watch?v=ZyhVh-qRZPA

Github stores many notebooks.


12

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy