0% found this document useful (0 votes)
27 views4 pages

Different Methods of Plotting

The document discusses various methods for data analysis and visualization using Pandas and Matplotlib in Python. These include reading data from files, plotting data, cleaning data through operations like dropping duplicates and columns, grouping and aggregating data, merging DataFrames, and indexing and filtering DataFrames.

Uploaded by

brylla montero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views4 pages

Different Methods of Plotting

The document discusses various methods for data analysis and visualization using Pandas and Matplotlib in Python. These include reading data from files, plotting data, cleaning data through operations like dropping duplicates and columns, grouping and aggregating data, merging DataFrames, and indexing and filtering DataFrames.

Uploaded by

brylla montero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

‭import pandas as pd‬

‭import numpy as np‬


‭import matplotlib.pyplot as plt‬
‭df = pd.read_csv(“/path”) >>‬‭csv for csv files‬
‭df = pd.read_excel(“/path”) >>‬‭excel for xlsx files‬

‭Different Methods of Plotting‬


‭df.plot(kind = ‘line’, title = ‘//name’, xlabel = ‘//name’, ylabel = ‘‘//name’’)‬
‭df.plot(kind = ‘bar’, stacked = True)‬
‭df.plot.barh(stacked = True)‬
‭df.plot.scatter(x = '//name', y = '//name', c = '//color', s = //size)‬
‭df.plot.hist(bins=10) >>‬‭bins is the interval‬
‭df.boxplot()‬
‭df.plot.area()‬
‭df.plot.pie(y = ‘//name’, figsize = (//size, //size))‬

‭Data Cleaning‬
‭df.info() >>‬‭provides info abt the csv/xlsx file‬

‭df.shape >>‬‭counts the number of columns and rows: ex.(22, 20)‬

‭df = df.drop_duplicates() >>‬‭for dropping duplicates‬

‭df = df.drop(columns = "//name of the column") >>‬‭dropping a specific column‬

‭df["//name of the column"].str.strip() >>‬‭removing a specific symbol or letter in the column‬

‭df["//name of the column"] = df["//name of the column"].str.strip("/._") >>‬‭also for removing‬‭a‬

‭specific symbol or letter but at once‬

‭df["//name of the column"] = df["//name of the column"].str.replace('[^0-9a-zA-Z]', '') >>‬‭used for‬

‭replacing‬

‭df["//name of the column"] = df["name of the column"].apply(lambda x: str(x)) >>‬‭lambda x:‬

‭str(x) converts each element x to a string using the str() function‬

‭df["Phone_Number"] = df["Phone_Number"].apply(lambda x: x[0:3] + '-' + x[3:6] + '-' + x[6:10])‬

‭>>‬‭The lambda function slices the string x into three parts: the first 3 characters (area code), the‬
‭next 3 characters (prefix), and the last 4 characters (line number). Then, it concatenates these parts‬

‭with hyphens in between to format the phone number. Result: "###-###-####"‬

‭df[["Street_Address", "State", "Zip_Code"]] = df["Address"].str.split(',', 2, expand = True) >>‬

‭df["Address"].str.split(',', 2, expand=True): This splits each element in the "Address" column into‬

‭substrings using the comma (,) as the delimiter. df[["Street_Address", "State", "Zip_Code"]] = ...:‬

‭This assigns the result of the split operation to three new columns in the DataFrame:‬

‭"Street_Address", "State", and "Zip_Code".‬

‭for x in df.index: >>‬ ‭allows you to loop through each row of the DataFrame.‬

‭if df.loc[x, "Do_Not_Contact"] == "Y": >>‬‭checks if the value in the "Do_Not_Contact" column for‬

‭the current row (x) is equal to "Y".‬

‭df.drop(x, inplace = True) >>‬‭drops the row if it has a value of “Y”‬

‭df = df.reset_index(drop=True) >>‬‭resets the index of the DataFrame to a default integer‬

‭index, starting from 0, and drops the existing index.‬

‭Group by and Aggregating‬

‭name_of_the_variable = df.groupby('//name of the column')‬

‭df.groupby('//name of the column').mean() >>‬‭gets the mean of the column‬

‭df.groupby('//name of the column').count() >>‬‭calculates the count of non-null values for‬

‭each column within each group.‬

‭df.groupby('Base Flavor').sum() >>‬ ‭calculates the sum of numerical values for each column‬

‭within each group.‬

‭df.groupby('Base Flavor').describe() >>‬‭provides statistics such as count, mean, standard‬

‭deviation, minimum, maximum, and quartiles.‬

‭df.groupby(['Base Flavor','Liked']).agg({'Flavor Rating': ['mean','max','count','sum']}) >>‬‭applies‬

‭aggregation functions to the "Flavor Rating" column within each group‬


‭Merging‬

‭df_inner = df1.merge(df2, how = 'inner', on = ['FellowshipID', 'FirstName']) >>‬‭df_inner, will‬

‭contain the rows from df1 and df2 where the values in both "FellowshipID" and "FirstName"‬

‭columns match.‬

‭df_outer = df1.merge(df2, how = 'outer') >>‬‭df_outer, will contain all rows from both df1 and df2,‬

‭with NaN values filled in where data is missing from either DataFrame.‬

‭df1.merge(df2, how = 'left') >>‬‭if there are no matches in df2 for a particular row in df1, NaN‬

‭values will be filled in for the columns from df2.‬

‭df_right = df1.merge(df2, how = 'right') >>‬‭In a right join, all the rows from the right df2 are‬

‭retained, and only the matching rows from the left df1 are appended. If there are no matching rows‬

‭in df1 for a particular row in df2, NaN values are filled in for the columns from df1.‬

‭From Exercise 1‬
‭data[:4] >>‬‭slicing data frames‬
‭data.head() >>‬‭gets the first 5 infos per column‬

‭Indexing Columns‬
‭data.director_name[:4]‬
‭cols = ["movie_title","director_name"]‬
‭data[cols][:5]‬

‭Finding info from a specific person/others (Find Movies by James Cameron)‬


‭james = data[data.director_name == ('James Cameron')]‬
‭show = ["movie_title","director_name"]‬
‭james[show][:5]‬

‭Sorting‬
‭sorted_data = data.sort_values(by="gross", ascending=False) >>‬‭used for sorting‬
‭sorted_data[:5] >>‬‭first 5 from the list will be displayed‬
‭Only 2 specific columns are shown (Movie title and Gross)‬
‭sorted_data = data.sort_values(by="gross", ascending=False)‬
‭cols = ["movie_title","gross"]‬
‭sorted_data[cols][:5]‬

‭Top 5 Films of Michael Bay‬


‭df = df[df.director_name == ('Michael Bay')] >>‬‭shows the films directed by Michael Bay‬
‭df = df.head(5) >>‬‭shows the top 5 films of Michael Bay‬

‭Challenge 5‬
‭sortedData = df2[(df2['gross'] == 67344392)] >>‬‭finds the name of the actor who has a gross‬
‭of 67344392‬
‭cols = ["movie_title","gross","actor_1_name"] >>‬‭shows these 3 specific column only‬
‭sortedData[cols] >>‬‭prints‬

‭sortedData2 = df3[(df3['actor_3_name'] == 'Omar Sy')] >>‬‭finds the actor named Omar Sy‬
‭cols = ["movie_title","actor_1_name","actor_3_name","country"] >>‬‭shows these 4 specific‬
‭column only‬
‭sortedData2[cols] >>‬‭It displays these columns for the filtered rows.‬

‭actorOne = df3[df3['actor_3_name'] == 'Omar Sy'] >>‬‭filters df3 to only include “Omar Sy”‬
‭actorThree = df3[df3['actor_1_name'] == 'Bruce Willis'] >>‬‭filters df3 to only include “Bruce‬
‭Willis”‬
‭mdata = pd.merge(actorOne, actorThree, how='outer') >>‬‭used for merging‬
‭mdata >>‬‭prints‬

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy