0. Introduction of Python for Data Analysis
0. Introduction of Python for Data Analysis
CSC 430_530_DA_401_501
Summer 2023
2
Outline of this class
1. Review (chapters 1-4):Three or Four weeks, then Exam 1
• Jupyter Notebook
• Python (list, tuple, dictionary, loop, and if-elif-else)
• NumPy
3
Python Libraries for Data Science
Many popular Python toolboxes/libraries:
• NumPy
• SciPy
• Pandas
• SciKit-Learn
• Keras
Visualization libraries
• matplotlib
• Seaborn
4
Python Libraries for Data Science
NumPy:
statistical operations on those objects
• provides vectorization of mathematical
operations on arrays and matrices which
significantly improves the performance
• many other python libraries are built on NumPy
Link: http://www.numpy.org/
5
Python Libraries for Data Science
Link: https://www.scipy.org/scipylib/
6
Python Libraries for Data Science
• adds data structures and tools designed
to work with table-like data (similar to
Series and Data Frames in R)
Pandas: • provides tools for data manipulation:
reshaping, merging, sorting, slicing,
aggregation etc.
• allows handling missing data
Link: http://pandas.pydata.org/
7
Python Libraries for Data Science
Link: http://scikit-learn.org/
8
Python Libraries for Data Science
Link: https://matplotlib.org/
9
Python Libraries for Data Science
Seaborn:
Link: https://seaborn.pydata.org/
10
For Coding: Python
Jupyter notebook
11
Python/Pandas: Online Resources
Webpages:
• https://www.w3schools.com/python/python_syntax.asp
• https://www.geeksforgeeks.org/python-programming-language/
Video:
• https://www.youtube.com/playlist?list=PL5-da3qGB5ICCsgW1MxlZ0
Hq8LL5U3u9y
• https://www.youtube.com/watch?v=ZyhVh-qRZPA