pandas-tutorial
pandas-tutorial
[8]: df.head()
choice_description item_price
0 NaN $2.39
1 [Clementine] $3.39
2 [Apple] $3.39
3 NaN $2.39
4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans… $16.98
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 order_id 4622 non-null int64
1 quantity 4622 non-null int64
2 item_name 4622 non-null object
3 choice_description 3376 non-null object
4 item_price 4622 non-null object
dtypes: int64(2), object(3)
memory usage: 180.7+ KB
1
[10]: df.columns # 1 dạng object
#muốn chuyển sang list thì chỉ việc
list(df.columns)
[11]: df.index # tìm index, muốn tìm data return từ bao nhiêu đến bao nhiêu
df.describe()
2
24% 458.040000 1.000000
30% 563.000000 1.000000
44% 818.000000 1.000000
50% 926.000000 1.000000
max 1834.000000 15.000000
choice_description item_price
0 NaN $2.39
1 [Clementine] $3.39
2 [Apple] $3.39
3 NaN $2.39
4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans… $16.98
3
2949 1172 1 Nantucket Nectar
3318 1330 1 Nantucket Nectar
3368 1351 1 Nantucket Nectar
3570 1433 1 Nantucket Nectar
3598 1443 15 Chips and Fresh Tomato Salsa
3845 1541 1 Nantucket Nectar
4019 1609 1 Nantucket Nectar
4078 1632 1 Nantucket Nectar
choice_description item_price
2 [Apple] $3.39
22 [Pomegranate Cherry] $3.39
105 [Pineapple Orange Banana] $3.39
173 [Apple] $3.39
205 [Peach Orange] $3.39
436 [Pomegranate Cherry] $3.39
601 [Pineapple Orange Banana] $6.78
925 [Pomegranate Cherry] $3.39
1356 [Pomegranate Cherry] $3.39
1585 [Peach Orange] $3.39
1626 [Pineapple Orange Banana] $3.39
1706 [Apple] $3.39
2162 [Pineapple Orange Banana] $3.39
2379 [Peach Orange] $6.78
2381 [Apple] $3.39
2430 [Pomegranate Cherry] $3.39
2653 [Pineapple Orange Banana] $3.39
2818 [Apple] $3.39
2838 [Peach Orange] $3.39
2853 [Apple] $3.39
2949 [Peach Orange] $3.39
3318 [Peach Orange] $3.39
3368 [Pineapple Orange Banana] $3.39
3570 [Pineapple Orange Banana] $3.39
3598 NaN $44.25
3845 [Peach Orange] $3.39
4019 [Pineapple Orange Banana] $3.39
4078 [Peach Orange] $3.39
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
4
[19]: df.loc[(df.quantity ==2) & (df.item_name == "Nantucket Nectar"),␣
↪["order_id","quantity","item_name"]]
item_price
601 $6.78
2379 $6.78
item_price
601 $6.78
2379 $6.78
choice_description
3 NaN
4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans…
[24]: df.iloc[3:5]
choice_description item_price
3 NaN $2.39
5
4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans… $16.98
[25]: df.iloc[3:5, -1] # chỉ hiển thị cột hàng index chọn vị trí index -1
[25]: 3 $2.39
4 $16.98
Name: item_price, dtype: object
[27]: df.item_price.dtype
[27]: dtype('O')
[37]: df["item_price"].dtype
[37]: dtype('O')
[33]: df.head()
choice_description item_price
0 NaN 2.39
1 [Clementine] 3.39
2 [Apple] 3.39
3 NaN 2.39
4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans… 16.98
[30]: print(df.dtype)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_25428\910862699.py in ?()
----> 1 print(df.dtype)
6
C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\generic.py in ?(self,␣
↪name)
6297 ):
6298 return self[name]
-> 6299 return object.__getattribute__(self, name)
[ ]: df["item_price"] = df["item_price"].astype(float)