0% found this document useful (0 votes)
129 views

Pandas & Mysql

The document discusses various concepts related to SQL and database management systems (DBMS). It defines key terms like: 1. Advantages of using a DBMS like data independence, security, data integrity, etc. 2. Key database concepts like primary keys, foreign keys, alternate keys, candidate keys and composite keys that define relationships between tables. 3. SQL statements like data definition language (DDL) used to define database structure and objects, and data manipulation language (DML) used to work with data in database tables.

Uploaded by

Sahil Ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views

Pandas & Mysql

The document discusses various concepts related to SQL and database management systems (DBMS). It defines key terms like: 1. Advantages of using a DBMS like data independence, security, data integrity, etc. 2. Key database concepts like primary keys, foreign keys, alternate keys, candidate keys and composite keys that define relationships between tables. 3. SQL statements like data definition language (DDL) used to define database structure and objects, and data manipulation language (DML) used to work with data in database tables.

Uploaded by

Sahil Ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

1. Pandas is a high level data manipulating tool developed by Wes McKinney.

2. The series I a one dimensional labeled array capable of holding data on any type.
3. The series data is mutable. But the size of series data is immutable.
4. Numpy is a python package which stands for numeric python.
5. Dataframe is 2 dimensional array with heterogeneous data usually represented in tabular format.
6. Dataframe has two different index- column index and row index.
7. import pandas as pd
s=pd.Series([5,10,15,20,25])
--> As you know that series object is autometically indexed as 0,1,2,3,4. Write a python code to assign a series as
a,b,c,d,e
Ans. s.index=['a','b','c','d','e']

8. In pandas, s is a series with following data:


s=pd.Series([5,10,15,20,25])
find the output of the following
i) S[1,2]
ii) S[1:3]
Ans- i. 1 10
2 15
Dtype:int 64
ii. 1 10
2 15
Dtype:int 64
9. A dictionary Grade contains the following data
import pandas as pd
grade={'name':['rashmi','harsh','ganesh','priya','ivek'],'grade':['A1','A2','B1','A1','B2']}
 Write the statement for
a. Create a dataframe called gr.
b. Add a column percentage with the following data. [92,89,None,95,68,None,93]
c. Find the output of gr.iloc[0:5] and gr[0:5]
d. Rearrange the column as name, percentage, and grade.
e. Add the following 3 rows with following data:
Ishan 86 B1
Amrita 97 A1
None None None
f. Drop the column “grade” (by name)
g. Delete the 3rd and 5th rows.
h. What does the following will do?
a. Gr.drop(0,axis=0)
b. Gr.drop(0,axis=”index”)
c. Gr. drop([0,1,2,3],axis=0)
Ans-:
a. gr.dataFrame(grde)
b. Gr[“percentage”]= [92,89,None,95,68,None,93]
c. Output for the both command is –
Grade name
0 A1 Rashmi
1 A2 Harsh
d. gr=gr[[‘name’,’percentage’,’grade’]]
e. tgr=pd.dataGrame({‘name’:[Ishan’,’amrita’,’None’],’percentage’:[89,97,None],’grade’:
[‘B1’,’A1’,None]},columns=[‘name’,’percenatge’,’grade’])
gr=gr.append(tgr,ignore_index=true)

f. gr.drop(‘grade’,axis=1)
g. gr.drop([2,4])
h. a. first row from the dataframe gr will be deleted.
b. first row from the dataframe gr will be deleted.
c. first 4 rows from the dataframe gr will be deleted.

10. Suppose the dataframe is given.


import pandas as pd
df=pd.DataFrame({'book_id':['c0001','f0001','t0001','t0002','f0002'],"book_name":['fast cook','the tears','my first c+
+','c++ brain work','thunderbolt'],"author_name":['lata kapoor','william hopkins','brain & brook','a.w.
ross','anna'],"price":[540,550,670,578,750]})

i. print(df.loc[:'book_name'])
j. print(df['book_name'])
k. print(df[['book_name']])
l. print(df[['book_name','price']])
m. print(df[0:3])
n. print(df[2:4])
o. print(df.iloc[2]))
give he output of the above code : -

book_id book_name author_name price


0 c0001 fast cook lata kapoor 540
1 f0001 the tears william hopkins 550
2 t0001 my first c++ brain & brook 670
3 t0002 c++ brain work a.w. ross 578
4 f0002 thunderbolt anna 750
>>>
0 fast cook
1 the tears
2 my first c++
3 c++ brain work
4 thunderbolt
Name: book_name, dtype: object
>>>
book_name
0 fast cook
1 the tears
2 my first c++
3 c++ brain work
4 thunderbolt
>>>
book_name price
0 fast cook 540
1 the tears 550
2 my first c++ 670
3 c++ brain work 578
4 thunderbolt 750
>>>
book_id book_name author_name price
0 c0001 fast cook lata kapoor 540
1 f0001 the tears william hopkins 550
2 t0001 my first c++ brain & brook 670
>>>
book_id book_name author_name price
2 t0001 my first c++ brain & brook 670
3 t0002 c++ brain work a.w. ross 578
>>>
book_id t0001
book_name my first c++
author_name brain & brook
price 670
Name: 2, dtype: object
>>>

11. Suppose the dataframe is given.


import pandas as pd
d={"eid":['e01','e02','e03','e04','e05'],"ename":['nikhil','aman','shikhar','shiva','sneha'],"salary":
[14250,14253,25142,12365,14253]}
emp=pd.DataFrame(d,columns=['eid','ename','salary'])

a. #display first 2 records


b. #set the index as 'eid'
c. #reset the index
d. #display first 2 columns
e. #Add a column address
f. #Add a new rows with appropriate data
g. #delete a column
h. # delete a row
i. #sort the dataframe by name in ascending order
j. #sort the dataframe by name in descending order
k. Find the sum of salary column
l. Find the mean of salary column
m. Sort the dataframe by index
Ans.
a. print(emp.head(2)) All the four commands are right
print(emp.loc[0:1])
print(emp.iloc[0:2])
print(emp.loc[[0,1]])
b. emp.set_index(['eid'])
c. emp.reset_index()
d. print(emp[['eid','ename']]) All the commands are right
print(emp.iloc[0:,0:2])
print(emp.loc[:,['eid','ename']])
e. emp["address"]=['kanpur','fatehpur','agra','lucknow','agra'] All the commands are right
emp.loc[:,"address"]=['kanpur','fatehpur','agra','lucknow','agra']
emp=emp.assign(address=['kanpur','fatehpur','agra','lucknow','agra'])
emp.insert(3,column='address',value=['kanpur','fatehpur','agra','lucknow','agra'])
f.
new={"eid":['e06','e07','e08','e09','e010'],"ename":['nikhil','aman','shikhar','shiva','sneha'],"salary":
[14250,14253,25142,12365,14253]}
empnew=pd.DataFrame(new,columns=['eid','ename','salary'])
emp=emp.append(empnew,ignore_index='true')
or
new={"eid":['e06','e07','e08','e09','e010'],"ename":['nikhil','aman','shikhar','shiva','sneha'],"salary":
[14250,14253,25142,12365,14253]}
empnew=pd.DataFrame(new,columns=['eid','ename','salary'])
emp=pd.concat([emp,empnew],ignore_index='true’)
g. print(emp.drop('salary',axis=1)) OR
print(emp.drop(emp.columns[2],axis=1))
h. print(emp.drop([1])) OR
print(emp.drop(emp.index[1]))
i. print(emp.sort_values(by='ename'))
j. print(emp.sort_values(by='ename', ascending='False'))
k. print(emp.salary.sum()) OR print(emp["salary"].sum())
l. print(emp.salary.mean()) OR print(emp["salary"].mean())
m. print(emp.sort_index())

Plotting : MATPOTLIB. Matplotlib is a plotting library for the python programming language and its numeric
mathematics extensition numpy. It provides an object oriented API for embedding plots into application using GUI toolkits.

Pylab : Pylab is a package that combines numpy, scipy, matplotlib into a single nameplace.

Line plot : Line plot is type of plot which displays information as a seires of data points called markers connected by straight
line. In this type of plot, we need the measurement points to be ordered.

Matplotlib.pyplot is plotting library used for 2D graphics in python programming language.

A bar chart or bar graph is a chart that presets categorical data with rectangular bars with height or length proportional to
the values that they represent.

Histograms are plot type used to show the frequency across a continuous or discrete variable. Histograms are used to show
a distribution where as a bar chart is used to compare different entities.

plot() is used to create a plot of points on the graph.


show() is used to display the plot.
Plt.xlabel is used to add the text below the x axis
plt.ylabel is used to give the name of y axis
legend() is used to show the label to indentify or differentiate one plot to another plot. Ex plt.legend(loc=”best”,
fontsize=12)

plt.title() is used to give the title of plot .


plt.axis() method of pyplot moidule takes a list of [xmin,xmax,ymin,ymax] and specify the view port of the axes. These four
parameters set the minimum and maximum limits for x axis and y axis respectively.
grid() is used to add the grid lines in the plot
savefig() is used to save the plot.

Example:
import matplotlib.pyplot as plt
student=[1,2,3,4,5]
std1=[81,76,82,92,87]
std2=[65,67,76,87,78]
std3=[56,65,76,87,67]
std4=[56,65,45,34,23]
std5=[77,67,55,67,87]
plt.plot(student,std1,label='student1')
plt.plot(student,std2,label='student2')
plt.plot(student,std3,label='student3')
plt.plot(student,std4,label='student4')
plt.plot(student,std5,label='student5')
plt.xlabel("subject",fontsize=12)
plt.ylabel("marks",fontsize=6)
plt.legend(loc="best",fontsize=10)
plt.title("Student marks",fontsize=16)
plt.axis([0,6,0,100])
plt.grid(which="major",linestyle="-",linewidth='0.5', color="red")
plt.gcf().canvas.set_window_title("Line Graph")
plt.show()
plt.show()

Output:
SQL
1. advantages of DBMS
2. Data independence
3. Primaray keys
4. Foreign key
5. Alternate key
6. Candidate key
7. Composite key
8. Cartisian product
9. Data definition language: DDL statement are used to create and modify the structure of table and other object in
the database. These statements are :- create database, alter database, create table, alter table, drop table, drop
database.
10. Data manipulation language:- DML statements are used to work with data in a table of an existing database. DML
statements are:- select ( is a part of DML), delete , insert, replace, select, truncate, update.
11. Utility Statement:- describe, explain,help, use.
12. Transaction control language:- start transaction, commit, rollback, savepoint, lock tables, unlock tables, set
transaction.
13. Database control language:- these statements are used for database administrator:- these statements are:- grant,
deny, revoke
14. Count() is used to count total number of values in a given column or numbers of columns. Wheare as count(*) is
used to count total number of rows (including null value).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy