0% found this document useful (0 votes)

2 views15 pages

Analisis Dan Visualisasi Data

This document outlines a lab focused on cleaning, analyzing, and visualizing internet speed data using Python libraries such as pandas and matplotlib. It includes detailed steps for data cleaning, statistical analysis, and creating visualizations. The lab emphasizes the importance of data quality and provides instructions for handling missing values, changing data formats, and generating correlation matrices.

Uploaded by

Devara Dipa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views15 pages

Analisis Dan Visualisasi Data

Uploaded by

Devara Dipa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Lab - Internet Meter Visualization

Objectives
 **Part 1: Clean and Analyze Data **

 Part 2: Visualize Data

Scenario/Background
In this lab, you will learn how to use the pandas library to perform the preliminary steps that are
needed before perfoming any data analysis. This includes removing missing values, changing the
format of data, and performing preliminary statistical analysis. After cleaning the data, you will
use matplotlib for data exploration and visualization.

Required Resources
 1 PC with Internet access
 Raspberry Pi version 2 or higher
 Python libraries: datetime, csv, subprocess, pandas, numpy
 Datafiles: data_long.csv

Part 1: Clean and Analyze Data

It is sometimes said that machine learning practitioners spend 80% of their time cleaning data. While
this claim is possibly an exaggeration, it is true that data cleaning is a fundamental task, necessary
to ensure the success of a machine learning project. As in many other lines of work, the law of
'garbage in, garbage out' does apply. It is simply not possible to do meaningful data analysis on dirty
data.

Step 1: Clean the data.

Load data from the file rpi_data_compact.csv. This file contains measurements of internet
speed, acquired during the lab Internet_Speed_Data_Acquisition. In particular, the focus is on three
quantities: ping time (ms), download speed (Mbit/s), and upload speed (Mbit/s).

a) Read data from rpi_data_compact.csv.

Read the ./Data/rpi_data_compact.csv file into a Pandas dataframe called df_compact and
inspect the first three rows of this dataframe using df.head.
In [ ]:

# Code Cell 1
import pandas as pd
import numpy as np
In [ ]:

# Code Cell 2
# Import data from csv file, and visualize the first rows
#df_compact =
#df_compact.?()

b) Remove unwanted columns.

As you may have noticed, the dataframe df_compact has an extra column. Use the
command df.drop to remove this column. Look at the Internet_Speed_Data_Acquisition lab for
help.
In [ ]:

# Code Cell 3
# Remove extra index columns

c) Remove NaNs.

A common issue that affects data quality is the presence of NaN values. These can make data
analysis functions abruptly terminate the calculation, throw an error, or produce incorrect results.
Typically, NaN values represent a piece of information that should be contained in, but is missing
from, the dataset. In this example, NaN values in df_compact may represent measurements where
the Internet connection fell, or queries from the Raspberry Pi that the Speedtest.net server failed
to answer. The code below finds out how many NaN values are contained in our dataframe.
First, go to http://pandas.pydata.org/pandas-docs/version/0.14.1/index.html and search
for isnull in the search box.
The documentation of the isnull function is a little difficult to understand. Fortunately, the Jupyter
interactive interface allows us to call this function and quickly examine its output.
In [ ]:

# Code Cell 4
NaNs_in_df = df_compact.isnull()
print(type(NaNs_in_df))
NaNs_in_df.head()
The outcome of the isnull function is a new data frame that contains a True of False if the
corresponding element of the DataFrame is NaN or not. Using the function sum on
this DataFrame will automatically convert the values True into 1s and False into 0s.
In [ ]:

# Code Cell 5
NaNs_per_column = NaNs_in_df.sum()
print(type(NaNs_per_column))
NaNs_per_column.head()
The resulting NaN_per_column is a pandas Series object, that can be thought of as a single
column of DataFrame (a DataFrame is actually a dict of Series, where the keys are the column
names). A Series object contains almost all of the functionalities of a DataFrame. Use
the sum function on the Nan_per_column Series and display the outcome.
In [ ]:

# Code Cell 6
NaNs_total = NaNs_per_column.sum()
NaNs_total
It is possible to concatenate all this instruction in one line, as follows:
In [ ]:

# Code Cell 7
df_compact.isnull().sum().sum()
Compute the number of missing values as a percentage of all the elements in the dataframe (round
the result to the second decimal
using numpy.round https://docs.scipy.org/doc/numpy/reference/generated/numpy.round_.html)
Use the pandas function dropna to remove NaN values from df_compact.
In [ ]:

# Code Cell 8
NaNs_pct = np.round(df_compact.isnull().sum().sum()/float(len(df_compact)*len
(df_compact.columns))*100, decimals = 4)
print('The DataFrame contains : {} NaNs, equal to {} of the measurements'.for
mat(NaNs_total, NaNs_pct)) #EDL : moved parenthesis
The function dropna, if called with only default parameters, removes all the rows of a DataFrame if
any of its values is `NaN'.
In [ ]:

# Code Cell 9
# Remove NaN values
df_compact_clean = df_compact.dropna()
Compare the length of the values before and after using dropna. Do you notice something odd?
Why?
In [ ]:

# Code Cell 10

d) Change values format.

The columns for ping, upload, and download in the dataframe df_compact contain numerical
values. It is therefore reasonable to expect that they all share one datatype, for example float64.
This, however, is not the case, as it can be detected using dtypes:
In [ ]:

# Code Cell 11
# Ping and Download are not floats
df_compact.dtypes
Use the Python function float() to convert a string value into a into float format.
In [ ]:

# Code Cell 12
str_val = '10.56'
float_val = float(str_val)
print(str_val, type(str_val), float_val, type(float_val))
Now convert all the values of the columns 'Ping (ms)' and 'Download (Mbit/s)' into float. Hint:
use apply and lambda. For help, look at the Internet_Speed_Data_Acquisition lab.
In [ ]:
# Code Cell 14
# this disables a notebook warning that is not relevant for our use case
pd.options.mode.chained_assignment = None

# Convert Ping and Download to float

#df_compact_clean['Ping (ms)_float'] = ...
#df_compact_clean['Download (Mbit/s)_float'] = ...

#Check that the types have been successfully converted

# ...
Now, remove the original Ping (ms) and Download (Mbit/s) columns, and rename the
new Ping (ms)_float and Download (Mbit/s)_float to Ping (ms) and Download
(Mbit/s). Use df.drop and df.rename, like in the Internet_Speed_Data_Acquisition lab.
In [ ]:

# Code Cell 15
# Remove the original 'Ping (ms)' and 'Download (Mbit/s)' columns

# Rename the new 'Ping (ms) float' and 'Download (Mbit/s) float ' to Ping (ms
)' and 'Download (Mbit/s)
In [ ]:

# Code Cell 16
df_compact_clean.head()
Before saving the DataFrame, it makes sense to reposition Upload as the last column. This can be
achieved using the reindex function.
In [ ]:

# Code Cell 17
df_compact_clean = df_compact_clean.reindex(columns = ['Date', 'Time', 'Ping
(ms)','Download (Mbit/s)','Upload (Mbit/s)']);
df_compact_clean.head()
Now that the dataset is finally clean, store it in a csv file and rename it.
In [ ]:

# Code Cell 18
# Let's save the new cleaned dataframe to a csv
df_compact_clean.to_csv('./rpi_data_processed.csv', index=False)

df_clean = df_compact_clean

Step 2: Basic statistics.

New data requires not only cleaning, but also a good deal of getting used to. When you start a data
analyis project, it is worthwhile to invest effort in exploring and calculating some basic statistical
properties of the data. This entails computing averages, standard deviations, and correlations.
a) Calculate mean and standard deviation using Pandas.

The mean and the standard deviation of all the columns of a DataFrame can be computed
using mean() and std() . Look for them in the pandas library documentation and apply them to
the df_clean DataFrame

Quote the results as quantity = mean ± standard_deviation. Do not forget to include the
units of measurement associated with each quantity.
In [ ]:

# Code Cell 19
# Compute mean and std for all the columns of df_compact
# SOLUTION:
# means = ...
# stands = ...

# Place mean and std for each column in a tuple

stats_ping = (means['Ping (ms)'], stands['Ping (ms)'])
stats_download = (means['Download (Mbit/s)'], stands['Download (Mbit/s)'])
stats_upload = (means['Upload (Mbit/s)'], stands['Upload (Mbit/s)'])

# Print the mean value ± the standard deviation, including measuring units
print('Average ping time: {} ± {} ms'.format(stats_ping[0],stats_ping[1]))
print('Average download speed: {} ± {} Mbit/s'.format(*stats_download))
print('Average upload speed: {} ± {} Mbit/s'.format(*stats_upload))

b) Calculate min and max deviation using Pandas.

Search in the pandas library documentation for how to compute the minimum and the maximum
values for all the columns in the DataFrame.
In [ ]:

# Code Cell 23
# Compute max and min for all the columns of df_compact
mins = df_clean.min()
maxs = df_clean.max()

# Place mean and std for each column in a tuple

mima_ping = (mins['Ping (ms)'], maxs['Ping (ms)'])
mima_download = (mins['Download (Mbit/s)'], maxs['Download (Mbit/s)'])
mima_upload = (mins['Upload (Mbit/s)'], maxs['Upload (Mbit/s)'])

# Print the mean and max values, including measuring units

print('Min ping time: {} ms. Max ping time: {} ms'.format(*mima_ping))
print('Min download speed: {} Mbit/s. Max download speed: {} Mbit/s'.format(*
mima_download))
print('Min upload speed: {} Mbit/s. Max upload speed: {} Mbit/s'.format(*mima
_upload))

c) Use the pandas describe function.

Execute the following line of code. Notice how much time the pandas library can save you with even
a single line of code!
In [ ]:

# Code Cell 24
df_clean.describe()

d) Use argmin, argmax and iloc.

Let's assume you would like to have a computer script that automatically emails status reports to
your internet provider. The reports would have to include the date and time corresponding to the
minimum internet speed. The temporal information would allow the internet provider to accurately
identify the cause behind the slow connection you observed.

Using the pandas argmin and argmax functions, find dates and times corresponding to the longest
and shortest ping time, the lowest and highest download speed, and the lowest and highest upload
speed.
In [ ]:

# Code Cell 25
# Find the min and max ping time
argmin_ping = df_clean['Ping (ms)'].argmin()
argmax_ping = df_clean['Ping (ms)'].argmax()

# Find the min and max download speed

argmin_download = df_clean['Download (Mbit/s)'].argmin()
argmax_download = df_clean['Download (Mbit/s)'].argmax()

# Find the min and max upload speed

argmin_upload = df_clean['Upload (Mbit/s)'].argmin()
argmax_upload = df_clean['Upload (Mbit/s)'].argmax()
The argmin and argmax functions return an index relative to a the Dataframe rows. To access a
specific row using this index, use iloc.
In [ ]:

# Code Cell 26
# Create a small DataFrame and access its rows using iloc

# A pandas DataFrame can be initialized passing a dict as a parameter to the

constructor pd.DataFrame().
# The key will represent the column, the values the rows.
df = pd.DataFrame({'field_1': [0,1], 'field_2': [0,2]})
df.head()
In [ ]:
# Code Cell 27
# To access the field_1 of the first row using iloc()
df.iloc[1]['field_1']
Use the indices computed using argmax and argmin in combination with iloc to visualize
the Date and the Time of the maximum/mimimum Ping, Download, and Upload.
In [ ]:

# Code Cell 28
#Print the corresponding Date and Time
#print('Ping measure reached minimum on {} at {}'.format(df_clean.loc[...,
# df_clean.loc[...))

#print('Download measure reached minimum on {} at {}'.format(...

#print('Upload measure reached minimum on ...

#print('Ping measure reached maximum on ...

#print('Download measure reached maximum on ...

#print('Upload measure reached maximum on ...

e) Create a correlation.

It is useful to analyze if the speed of download tends to increase and decrease together with the
speed of upload. The reasoning behind this would be that network usage and technical issues ought
to affect download and upload equally.

In this scenario, download and upload speeds would be called positively correlated. This means that
faster download and upload would typically occur together. This would refer to the general trend, but
instances of fast download with slow upload would still be possible.

On the other hand, you may infer that a higher download speed implies a lower upload speed, and
vice-versa. In this case, the argument would be that the internet line can only support a limited
amount of information being exchanged. Download and upload would then compete, and keep each
other 'in check'.

In this scenario, download and upload speeds would be called negatively correlated. This means
that download would typically be faster when upload is slower, and vice-versa. As before, this would
refer to a trend, so that simultaneous good download and upload speeds would still be possible.

To complete the picture, the time of ping may be positively or negatively correlated with either
upload or download. It is then natural to think of a table, where each quantity is compared with all
others. Such tables are well-known mathematical objects, and are dubbed correlation matrices.

Use the pandas function corr to derive the correlation matrix of ping, upload, and download. Store
the result in a variable called df_corr.
In [ ]:
# Code Cell 29
# Are these variables correlated?
df_corr = df_clean.corr()
df_corr
In [ ]:

# Code Cell 30
corr = df_corr.values
print('Correlation coefficient between ping and download: {}'.format(corr[0,
1]))
print('Correlation coefficient between ping and upload: {}'.format(corr[0, 2]
))
print('Correlation coefficient between upload and download: {}'.format(corr[2
, 1]))
These numbers answer the questions on the 'relationship' between ping, download, and upload.
Perfect positive correlation yields a value of +1, whereas perfect negative correlation yields a value
of -1. Yet, the correlations between download and ping, and between download and upload are
close to zero. Moreover, the correlation between upload and ping is small. This leads to the
conclusion that the three quantities are, in fact, mutually uncorrelated.

Part 2: Visualize Data

As the proverb goes, 'A picture is worth a thousand words'. Informative, meaningful, and intuitive
graphs play a crucial role in the exploration of data. Plots are useful in the initial stages of a project,
and well beyond that. Graphs are a great way to present the results and conclusions of your work in
front of an audience.

Python has a comprehensive library for making plots, called Matplotlib. As an additional learning
resource, it is certainly worth taking a look at the official Matplotlib documentation, and in particular
at the numerous examples.

Step 1: Create a first visualization of the Internet Speed Data.

Visualize the content of the df_clean DataFrame.

a) Import Matplotlib.
In [ ]:

# Code Cell 31
import matplotlib.pyplot as plt
# The following allows your Jupyter notebook to create plots inside a cell
%matplotlib inline

b) Plot Internet speed stats.

Based on what you learned in the SF_Crime_Data lab, generate a plot containing three lines: 1. ping
(ms) as a function of time, 2. upload (Mbit/s) as a function of time, and 3. download (Mbit/s) as a
function of time. Use the legend() function to add a legend to your graph, but do not worry about
labelling the axes. We will work out how to do that in a later task.
In [ ]:
# Code Cell 32
# Initialise figure
fig, ax = plt.subplots(figsize=(10, 5))

# Create x-axis
t = pd.to_datetime(df_clean['Time'])

# Plot three curves of different colors

az.plot(t, df_clean['Ping (ms)'], label='Ping (ms)')
#ax.plot(...
#ax.plot(...

# Insert legend
ax.legend()
plt.show()

c) Change the linestyle.

Because ping measurements include large and abrupt variations, they are perhaps better visualized
using dots. Within the command ax.plot(...) for diplaying ping data, specify that these
measurements are represented as dots. (Most of the code, here, can be recycled from the previous
task.)
In [ ]:

# Code Cell 33
# Initialise figure
fig, ax = plt.subplots(figsize=(10, 5))

# Plot three curves. Ping data

# is visualized using dots
t = pd.to_datetime(df_clean['Time'])

#ax.plot(...
#ax.plot(...
#ax.plot(...

# Insert legend

d) Add axis labels.

A plot without axis labels, and perhaps a title, is difficult to understand, as one cannot know for sure
which quantities are being depicted. Make the above graph compliant with standard scientific
practice by adding axis labels and a title. Specify a fontsize of about 16, so that title and labels are
printed nice and clear.
In [ ]:

# Code Cell 35
# Initialise figure
fig, ax = plt.subplots(figsize=(10, 5))

# Plot three curves

t = pd.to_datetime(df_clean['Time'])
ax.plot(t, df_clean['Ping (ms)'], 'o', label='Ping (ms)')
ax.plot(t, df_clean['Upload (Mbit/s)'], label='Upload (Mbit/s)')
ax.plot(t, df_clean['Download (Mbit/s)'], label='Download (Mbit/s)')

# Insert legend
ax.legend()

# Add axis labels and title

#ax.set_xlabel(...
#ax.set_ylabel(...
#ax.set_title(...

# Change tick size

ax.tick_params(labelsize=14)

e) Change the plot theme.

Use the 'fivethirtyeight' style context to make the previous graph more visually appealing.
To do this, add the row with the with statement to your code, before calling
the Matplotlib functions.
In [ ]:

# Code Cell 36
# Use a style context
#with ...

# Initialise figure
#fig, ax =

# Plot ping as a function of time

# Add axis labels and title

# Change tick size

f) Create a Histogram.

A histogram is a graphical representation of the frequency of the values of numerical data. Examine
the code below. An additional level of complexity is the use of subplots to display the histograms
side-by-side.
In [ ]:
# Code Cell 37
with plt.style.context('fivethirtyeight'):

nbins = 100
# Initialize figure
fig, ax = plt.subplots(2, 2, figsize=(10, 10))
ax[0][0].hist(df_clean['Ping (ms)'], nbins)
ax[0][0].set_xlabel('Ping (ms)', fontsize=16)
ax[0][0].tick_params(labelsize=14)
ax[0][1].hist(df_clean['Upload (Mbit/s)'], nbins)
ax[0][1].set_xlabel('Upload (Mbit/s)', fontsize=16)
ax[0][1].tick_params(labelsize=14)
ax[1][0].hist(df_clean['Download (Mbit/s)'], nbins)
ax[1][0].set_xlabel('Download (Mbit/s)', fontsize=16)
ax[1][0].tick_params(labelsize=14)
ax[1][1].set_visible(False)

Lab - Simple Linear Regression in Python

Objectives
In this lab, you will become familiar with the concepts of simple linear regression and working with
the provided data to make a prediction.
 **Part 1: Import the Libraries and Data **
 **Part 2: Plot the Data **
 **Part 3: Perform Simple Linear Regression **

Scenario / Background
In statistics, linear regression is a way to model a relationship between dependent variable yy and
independent variable xx.

In this lab, you will analyze district sales data and perform a simple linear regression to predict
annual net sales based on the number of stores in the district.

Required Resources
 1 PC with Internet access
 Python libraries: pandas, numpy, scipy, and matplotlib
 Datafiles: stores-dist.csv

Part 1: Import the Libraries and Data

In this part, you will import the libraries and the data from the file stores-dist.csv.

Step 1: Import the libraries.

In this step, you will import the following libraries:

 matplotlib.pyplot as plt
 numpy as np
 pandas as pd

In [ ]:
# Code Cell 1

Step 2: Import the data.

In this step, you will import the data from the file stores-dist.csv and verify that the file was
imported correctly.
In [ ]:
# Code Cell 2

# Import the file, stores-dist.csv

salesDist = pd.read_csv('./Data/stores-dist.csv')

# Verify the imported data

salesDist.head()
The column headings, annual net sales and number of stores in district are
renamed to make it easier during data processing.

 annual net sales to sales

 number of stores in district to stores

In [ ]:
# Code Cell 3
# The district column has no relevance at this time, so it can be dropped.
salesDist = salesDist.rename(columns={'annual net sales':'sales','number of s
tores in district':'stores'})
salesDist.head()

Part 2: Plot the Data

Step 1: Determine the correlation.
In this step, you will investigate the correlation of the data prior to regression analysis. You will also
drop any unrelated columns as necessary.
In [ ]:
# Code Cell 4
# Check correlation of data prior to doing the analysis
# # Hint: check lab 3.1.5.5
From the correlation coefficent, it appears that the column district has low correlation to
the annual net sales and number of stores in the district. So the district column is
not necessary as part of the regression analysis. The district column can be dropped from the
dataframe.
In [ ]:
# Code Cell 5
# The district column has no relevance at this time, so it can be dropped.
#sales = salesDist.drop(...)

sales.head()
From the correlation coefficent data, what type of correlation did you observe between annual net
sales and number of stores in the district?
<font color = 'gray'>Type your answer here.</font>

Step 2: Create the plot.

In this step, you will create a plot to visualize the data. You will also assign stores as the
independent variable xx and sales as the dependent variable yy.
In [ ]:
# Code Cell 6
# dependent variable for y axis
y = sales['sales']
# independent variable for x axis
x = sales.stores
In [ ]:
# Code Cell 7
# Display the plot inline
%matplotlib inline

# Increase the size of the plot

plt.figure(figsize=(20,10))

# Create a scatter plot: Number of stores in the District vs. Annual Net Sale
s
plt.plot(x,y, 'o', markersize = 15)

# Add axis labels and increase the font size

plt.ylabel('Annual Net Sales', fontsize = 30)
plt.xlabel('Number of Stores in the District', fontsize = 30)

# Increase the font size on the ticks on the x and y axis

plt.xticks(fontsize = 20)
plt.yticks(fontsize = 20)

# Display the scatter plot

plt.show()

Part 3: Perform Simple Linear Regression

In this part, you will use numpy to generate a regression line for the analyzed data. You will also
calculate the centroid for this dataset. The centrod is the mean for the dataset. The generated simple
linear regression line must also pass through the centroid.

Step 1: Calculate the slope and y-intercept of the linear regression line.
In [ ]:
# Code Cell 8
# Use numpy polyfit for linear regression to fit the data
# Generate the slope of the line (m)
# Generate the y-intercept (b)
m, b = np.polyfit(x,y,1)
print ('The slope of line is {:.2f}.'.format(m))
print ('The y-intercept is {:.2f}.'.format(b))
print ('The best fit simple linear regression line is {:.2f}x + {:.2f}.'.form
at(m,b))

Step 2: Calculate the centroid.

The centroid of the dataset is calculated by using the mean function.
In [ ]:
# Code Cell 9
# y coordinate for centroid
y_mean = y.mean()
# x coordinate for centroid
x_mean = x.mean()
print ('The centroid for this dataset is x = {:.2f} and y = {:.2f}.'.format(x
_mean, y_mean))

Step 3: Overlay the regression line and the centroid point on the plot.
In [ ]:
# Code Cell 10
# Create the plot inline
%matplotlib inline

# Enlarge the plot size

plt.figure(figsize=(20,10))

# Plot the scatter plot of the data set

plt.plot(x,y, 'o', markersize = 14, label = "Annual Net Sales")

# Plot the centroid point

plt.plot(x_mean,y_mean, '*', markersize = 30, color = "r")

# Plot the linear regression line

plt.plot(x, m*x + b, '-', label = 'Simple Linear Regression Line', linewidth
= 4)
# Create the x and y axis labels
plt.ylabel('Annual Net Sales', fontsize = 30)
plt.xlabel('Number of Stores in District', fontsize = 30)

# Enlarge x and y tick marks

plt.xticks(fontsize = 20)
plt.yticks(fontsize = 20)

# Point out the centroid point in the plot

plt.annotate('Centroid', xy=(x_mean-0.1, y_mean-5), xytext=(x_mean-3, y_mean-
20), arrowprops=dict(facecolor='black', shrink=0.05), fontsize = 30)

# Create legend
plt.legend(loc = 'upper right', fontsize = 20)

Step 4: Prediction
Using the linear regression line, you can predict the annual net sales based on the number of stores
in the district.
In [ ]:
# Code Cell 11
# Function to predict the net sales from the regression line
def predict(query):
if query >= 1:
predict = m * query + b
return predict
else:
print ("You must have at least 1 store in the district to predict the
annual net sales.")
In [ ]:
# Code Cell 12
# Enter the number of stores in the function to generate the net sales predic
tion.
What is the predicted net sales if there are 4 stores in the district?
<font color = 'gray'>Type your answer here.</font>

SHS LCS Q1 Las Le2
No ratings yet
SHS LCS Q1 Las Le2
6 pages
Shell Model Ebook v4
No ratings yet
Shell Model Ebook v4
9 pages
Data Manipulation in Python Using Pandas
No ratings yet
Data Manipulation in Python Using Pandas
12 pages
Texas Tech Thesis Guidelines
100% (2)
Texas Tech Thesis Guidelines
4 pages
Data Preprocessing
No ratings yet
Data Preprocessing
84 pages
Thinmanager 13.2 Thin Client Management Platform: User Manual
No ratings yet
Thinmanager 13.2 Thin Client Management Platform: User Manual
638 pages
Detecting Eggs Condition by Using Pixy Camera Based On Shell-Color Filtering
No ratings yet
Detecting Eggs Condition by Using Pixy Camera Based On Shell-Color Filtering
4 pages
Pandas 1
No ratings yet
Pandas 1
50 pages
Data Wrangling and Analysis
100% (1)
Data Wrangling and Analysis
36 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Practice 1
No ratings yet
Practice 1
45 pages
Pandas Library
No ratings yet
Pandas Library
15 pages
CH-6 Data Loading, Storage, and File Formats
No ratings yet
CH-6 Data Loading, Storage, and File Formats
163 pages
HumanEval Pro and MBPPPro Evaluating Large Language Models
No ratings yet
HumanEval Pro and MBPPPro Evaluating Large Language Models
27 pages
Ilham Muhamad Akbar - Tugas Analisis Data Bisnis 23juni2025
No ratings yet
Ilham Muhamad Akbar - Tugas Analisis Data Bisnis 23juni2025
5 pages
41b Data Wrangling, Grouping and Aggregation
No ratings yet
41b Data Wrangling, Grouping and Aggregation
31 pages
S08 Slides
No ratings yet
S08 Slides
14 pages
11 20241108 DataAnalysis AppliExamples
No ratings yet
11 20241108 DataAnalysis AppliExamples
36 pages
Asfasdas
No ratings yet
Asfasdas
36 pages
Intro Pandas
No ratings yet
Intro Pandas
18 pages
4.3.2.4 Lab - Internet Meter Anomaly Detection
No ratings yet
4.3.2.4 Lab - Internet Meter Anomaly Detection
8 pages
UK Tuberculosis Detection Programme
No ratings yet
UK Tuberculosis Detection Programme
1 page
ML Practical 03
No ratings yet
ML Practical 03
20 pages
Dsbda Ass2
No ratings yet
Dsbda Ass2
49 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
Project
No ratings yet
Project
10 pages
Pandas Module (Part-I)
No ratings yet
Pandas Module (Part-I)
36 pages
Lab File
No ratings yet
Lab File
96 pages
Python Basics Refresher
No ratings yet
Python Basics Refresher
19 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
Chapter 4 - Python For Data Analysis
No ratings yet
Chapter 4 - Python For Data Analysis
47 pages
CO3 - 1 - Pandas Series and Data Frame
No ratings yet
CO3 - 1 - Pandas Series and Data Frame
37 pages
Exercise 3
No ratings yet
Exercise 3
25 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Data Cleaning
No ratings yet
Data Cleaning
13 pages
XG Boost
No ratings yet
XG Boost
5 pages
DAP Writeups - Merged
No ratings yet
DAP Writeups - Merged
33 pages
Justenoughpython Pandas 220915 175329
No ratings yet
Justenoughpython Pandas 220915 175329
64 pages
A Cuckoo Search Based Pairwise Strategy For Combinatorial Testing Problem
No ratings yet
A Cuckoo Search Based Pairwise Strategy For Combinatorial Testing Problem
9 pages
University Licensure Examination Reviewer For Teacher: A Framework For Developing Gamified Examination
No ratings yet
University Licensure Examination Reviewer For Teacher: A Framework For Developing Gamified Examination
14 pages
Data Minning Assignment #1: Submitted By: Rahul Kumar Roll No: 160BTCCSE010 Class: CSE A, 3rd Year
No ratings yet
Data Minning Assignment #1: Submitted By: Rahul Kumar Roll No: 160BTCCSE010 Class: CSE A, 3rd Year
9 pages
X-Ray Warning Flash Lamp: Measurement & Control
No ratings yet
X-Ray Warning Flash Lamp: Measurement & Control
2 pages
Pandas
No ratings yet
Pandas
30 pages
Java String Functions': Visit at "JH X.KS'KK Ue%"
No ratings yet
Java String Functions': Visit at "JH X.KS'KK Ue%"
5 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Pandas
No ratings yet
Pandas
21 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
Prac 7
No ratings yet
Prac 7
5 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Metasyntheisis With Max and Jitter
No ratings yet
Metasyntheisis With Max and Jitter
6 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
The Entity-Relationship Model: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
No ratings yet
The Entity-Relationship Model: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
18 pages
Python (Unit - 2)
No ratings yet
Python (Unit - 2)
22 pages
ch4 Slides PDF
No ratings yet
ch4 Slides PDF
44 pages
Ap Python
No ratings yet
Ap Python
12 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
5CS037 WS02 PandasForDataAnalysis
No ratings yet
5CS037 WS02 PandasForDataAnalysis
30 pages
Root Insurance: Car Insurance Based On How People Drive, Not Who They Are
No ratings yet
Root Insurance: Car Insurance Based On How People Drive, Not Who They Are
4 pages
My Intern1-Recovered
100% (1)
My Intern1-Recovered
27 pages
Front Page Marquee
No ratings yet
Front Page Marquee
5 pages
Uniq Coching Class
No ratings yet
Uniq Coching Class
3 pages
ML - Preprocessing - Introduction
No ratings yet
ML - Preprocessing - Introduction
14 pages
Exercise 3
No ratings yet
Exercise 3
12 pages
EX2200-C Ethernet Switch Datasheet
No ratings yet
EX2200-C Ethernet Switch Datasheet
8 pages
Advanced Mould Manufacturing Techniques
No ratings yet
Advanced Mould Manufacturing Techniques
8 pages
Benchmark With Gartner Labor Market Insights Key Takeaways Deck July Edition
No ratings yet
Benchmark With Gartner Labor Market Insights Key Takeaways Deck July Edition
33 pages
Pandas 1
No ratings yet
Pandas 1
13 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Python-for-Data-Analysis (Pandas
No ratings yet
Python-for-Data-Analysis (Pandas
31 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Valve PS2601-17308
No ratings yet
Valve PS2601-17308
6 pages
Pandas Notes Basic To Advance
No ratings yet
Pandas Notes Basic To Advance
21 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
r05320201 Digital Signal Processing
No ratings yet
r05320201 Digital Signal Processing
8 pages
Pandas
No ratings yet
Pandas
41 pages
TechTip 1503 ConvertingManagedApptoModernApp
No ratings yet
TechTip 1503 ConvertingManagedApptoModernApp
5 pages
1.1.1 Binary Systems
No ratings yet
1.1.1 Binary Systems
9 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Unit I Iot
No ratings yet
Unit I Iot
4 pages
AzureWave AW-NB126H Manual
No ratings yet
AzureWave AW-NB126H Manual
14 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Forecasting Solved Examples
No ratings yet
Forecasting Solved Examples
10 pages
Project Report Specimen
No ratings yet
Project Report Specimen
38 pages
Alexis Reid - Type Specimens
No ratings yet
Alexis Reid - Type Specimens
81 pages
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
"C Programming for Beginners: A Step-by-Step Guide"
From Everand
"C Programming for Beginners: A Step-by-Step Guide"
Lov kush
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Analisis Dan Visualisasi Data

Uploaded by

Analisis Dan Visualisasi Data

Uploaded by

Lab - Internet Meter Visualization

 **Part 2: Visualize Data **

Part 1: Clean and Analyze Data

Step 1: Clean the data.

a) Read data from rpi_data_compact.csv.

b) Remove unwanted columns.

d) Change values format.

# Convert Ping and Download to float

#Check that the types have been successfully converted

Step 2: Basic statistics.

# Place mean and std for each column in a tuple

b) Calculate min and max deviation using Pandas.

# Place mean and std for each column in a tuple

# Print the mean and max values, including measuring units

c) Use the pandas describe function.

d) Use argmin, argmax and iloc.

# Find the min and max download speed

# Find the min and max upload speed

# A pandas DataFrame can be initialized passing a dict as a parameter to the

#print('Download measure reached minimum on {} at {}'.format(...

#print('Upload measure reached minimum on ...

#print('Ping measure reached maximum on ...

#print('Download measure reached maximum on ...

#print('Upload measure reached maximum on ...

Part 2: Visualize Data

Step 1: Create a first visualization of the Internet Speed Data.

Visualize the content of the df_clean DataFrame.

b) Plot Internet speed stats.

# Plot three curves of different colors

c) Change the linestyle.

# Plot three curves. Ping data

d) Add axis labels.

# Plot three curves

# Add axis labels and title

# Change tick size

e) Change the plot theme.

# Plot ping as a function of time

# Add axis labels and title

# Change tick size

Lab - Simple Linear Regression in Python

Part 1: Import the Libraries and Data

Step 1: Import the libraries.

Step 2: Import the data.

# Import the file, stores-dist.csv

# Verify the imported data

 annual net sales to sales

Part 2: Plot the Data

Step 2: Create the plot.

# Increase the size of the plot

# Add axis labels and increase the font size

# Increase the font size on the ticks on the x and y axis

# Display the scatter plot

Part 3: Perform Simple Linear Regression

Step 2: Calculate the centroid.

# Enlarge the plot size

# Plot the scatter plot of the data set

# Plot the centroid point

# Plot the linear regression line

# Enlarge x and y tick marks

# Point out the centroid point in the plot

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

 Part 2: Visualize Data