Different Methods of Plotting
Different Methods of Plotting
Data Cleaning
df.info() >>provides info abt the csv/xlsx file
replacing
>>The lambda function slices the string x into three parts: the first 3 characters (area code), the
next 3 characters (prefix), and the last 4 characters (line number). Then, it concatenates these parts
df["Address"].str.split(',', 2, expand=True): This splits each element in the "Address" column into
substrings using the comma (,) as the delimiter. df[["Street_Address", "State", "Zip_Code"]] = ...:
This assigns the result of the split operation to three new columns in the DataFrame:
for x in df.index: >> allows you to loop through each row of the DataFrame.
if df.loc[x, "Do_Not_Contact"] == "Y": >>checks if the value in the "Do_Not_Contact" column for
df.groupby('Base Flavor').sum() >> calculates the sum of numerical values for each column
contain the rows from df1 and df2 where the values in both "FellowshipID" and "FirstName"
columns match.
df_outer = df1.merge(df2, how = 'outer') >>df_outer, will contain all rows from both df1 and df2,
with NaN values filled in where data is missing from either DataFrame.
df1.merge(df2, how = 'left') >>if there are no matches in df2 for a particular row in df1, NaN
df_right = df1.merge(df2, how = 'right') >>In a right join, all the rows from the right df2 are
retained, and only the matching rows from the left df1 are appended. If there are no matching rows
in df1 for a particular row in df2, NaN values are filled in for the columns from df1.
From Exercise 1
data[:4] >>slicing data frames
data.head() >>gets the first 5 infos per column
Indexing Columns
data.director_name[:4]
cols = ["movie_title","director_name"]
data[cols][:5]
Sorting
sorted_data = data.sort_values(by="gross", ascending=False) >>used for sorting
sorted_data[:5] >>first 5 from the list will be displayed
Only 2 specific columns are shown (Movie title and Gross)
sorted_data = data.sort_values(by="gross", ascending=False)
cols = ["movie_title","gross"]
sorted_data[cols][:5]
Challenge 5
sortedData = df2[(df2['gross'] == 67344392)] >>finds the name of the actor who has a gross
of 67344392
cols = ["movie_title","gross","actor_1_name"] >>shows these 3 specific column only
sortedData[cols] >>prints
sortedData2 = df3[(df3['actor_3_name'] == 'Omar Sy')] >>finds the actor named Omar Sy
cols = ["movie_title","actor_1_name","actor_3_name","country"] >>shows these 4 specific
column only
sortedData2[cols] >>It displays these columns for the filtered rows.
actorOne = df3[df3['actor_3_name'] == 'Omar Sy'] >>filters df3 to only include “Omar Sy”
actorThree = df3[df3['actor_1_name'] == 'Bruce Willis'] >>filters df3 to only include “Bruce
Willis”
mdata = pd.merge(actorOne, actorThree, how='outer') >>used for merging
mdata >>prints