0% found this document useful (0 votes)
117 views20 pages

Yulu SRK

Yulu is India's leading micro-mobility service provider focused on shared electric cycles, aiming to reduce traffic congestion. The company has engaged a consulting firm to analyze factors influencing the demand for these cycles in the Indian market, particularly through dataset analysis and statistical testing. Key findings include significant correlations between various weather conditions and rental counts, with a notable impact of favorable weather on demand.

Uploaded by

gsrkrocky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views20 pages

Yulu SRK

Yulu is India's leading micro-mobility service provider focused on shared electric cycles, aiming to reduce traffic congestion. The company has engaged a consulting firm to analyze factors influencing the demand for these cycles in the Indian market, particularly through dataset analysis and statistical testing. Key findings include significant correlations between various weather conditions and rental counts, with a notable impact of favorable weather on demand.

Uploaded by

gsrkrocky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

yulu-srk

May 10, 2024

##About Yulu
Yulu is India’s leading micro-mobility service provider, which offers unique vehicles for the daily
commute. Starting off as a mission to eliminate traffic congestion in India, Yulu provides the
safest commute solution through a user-friendly mobile app to enable shared, solo and sustainable
commuting. Yulu zones are located at all the appropriate locations (including metro stations, bus
stands, office spaces, residential areas, corporate offices, etc) to make those first and last miles
smooth, affordable, and convenient! Yulu has recently suffered considerable dips in its revenues.
They have contracted a consulting company to understand the factors on which the demand for
these shared electric cycles depends. Specifically, they want to understand the factors affecting the
demand for these shared electric cycles in the Indian market
##Problem Statement
• Which variables are significant in predicting the demand for shared electric cycles in the
Indian market?
• How well those variables describe the electric cycle demands?
[148]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

[149]: df = pd.DataFrame(pd.read_csv("https://d2beiqkhq929f0.cloudfront.net/
↪public_assets/assets/000/001/428/original/bike_sharing.csv?1642089089"))

##Dataset Analysis
[150]: df.head()

[150]: datetime season holiday workingday weather temp atemp \


0 2011-01-01 00:00:00 1 0 0 1 9.84 14.395
1 2011-01-01 01:00:00 1 0 0 1 9.02 13.635
2 2011-01-01 02:00:00 1 0 0 1 9.02 13.635
3 2011-01-01 03:00:00 1 0 0 1 9.84 14.395
4 2011-01-01 04:00:00 1 0 0 1 9.84 14.395

humidity windspeed casual registered count


0 81 0.0 3 13 16

1
1 80 0.0 8 32 40
2 80 0.0 5 27 32
3 75 0.0 3 10 13
4 75 0.0 0 1 1

[151]: df.describe()

[151]: season holiday workingday weather temp \


count 10886.000000 10886.000000 10886.000000 10886.000000 10886.00000
mean 2.506614 0.028569 0.680875 1.418427 20.23086
std 1.116174 0.166599 0.466159 0.633839 7.79159
min 1.000000 0.000000 0.000000 1.000000 0.82000
25% 2.000000 0.000000 0.000000 1.000000 13.94000
50% 3.000000 0.000000 1.000000 1.000000 20.50000
75% 4.000000 0.000000 1.000000 2.000000 26.24000
max 4.000000 1.000000 1.000000 4.000000 41.00000

atemp humidity windspeed casual registered \


count 10886.000000 10886.000000 10886.000000 10886.000000 10886.000000
mean 23.655084 61.886460 12.799395 36.021955 155.552177
std 8.474601 19.245033 8.164537 49.960477 151.039033
min 0.760000 0.000000 0.000000 0.000000 0.000000
25% 16.665000 47.000000 7.001500 4.000000 36.000000
50% 24.240000 62.000000 12.998000 17.000000 118.000000
75% 31.060000 77.000000 16.997900 49.000000 222.000000
max 45.455000 100.000000 56.996900 367.000000 886.000000

count
count 10886.000000
mean 191.574132
std 181.144454
min 1.000000
25% 42.000000
50% 145.000000
75% 284.000000
max 977.000000

[152]: df.isnull().sum()

[152]: datetime 0
season 0
holiday 0
workingday 0
weather 0
temp 0
atemp 0
humidity 0

2
windspeed 0
casual 0
registered 0
count 0
dtype: int64

There are No null Values


[153]: df.duplicated().sum()

[153]: 0

There are no Duplicated Values


##Univariate Analysis
[155]: # Histogram
plt.figure(figsize=(10, 6))
sns.histplot(df['temp'], bins=20, kde=True, color='skyblue')
plt.title('Distribution of Temperature')
plt.xlabel('Temperature')
plt.ylabel('Frequency')
plt.show()

Temperature seems to follow a normal trend with a little cliff from 15 degrees to 30 degress

3
[156]: # Box plot
plt.figure(figsize=(10, 6))
sns.boxplot(df['humidity'], color='lightgreen')
plt.title('Distribution of Humidity')
plt.xlabel('Humidity')
plt.show()

[157]: # Bar plot with grouping


plt.figure(figsize=(10, 6))
sns.barplot(hue=df['weather'], y=df['count'], palette='Set1')
plt.title('Average Count Across Weather')
plt.xlabel('Weather')
plt.ylabel('Average Count')
plt.show()

4
[158]: correlation_matrix = df[["atemp", "temp", "humidity", "windspeed", "casual",␣
↪"registered", "count"]].corr()

correlation_df = pd.DataFrame(correlation_matrix)
plt.figure(figsize = (10, 6))
sns.heatmap(correlation_matrix, annot = True)
plt.show()

5
High Positive Correlation:
1. Atemp and Temp (0.98): There exists an exceptionally strong correlation between the
actual temperature in Celsius and the perceived temperature in Celsius.
2. Count and Registered (0.97): There is an extremely strong correlation between the total
count of rented bikes (including both casual and registered) and the count of registered users.

Moderate Positive Correlation:


1. Casual and Count (0.69): A moderately positive correlation is observed between the total
count of users and the count of casual users.
2. Casual and Registered (0.5): A moderate positive correlation is found between the count
of casual users and the count of registered users.

Low Positive Correlation:


1. Count vs Atemp & temp: The correlation between the total count of users and the tem-
perature (both actual and perceived) is weak.
2. Count/ Registered vs Atemp & temp: The count of registered and casual users exhibits
a weak correlation with the temperature (both actual and perceived).
##Bivariate Analysis
[159]: # Scatter plot for Count vs Registered
plt.figure(figsize=(8, 6))
sns.scatterplot(x=df['count'], y=df['registered'], color='green')

6
plt.title('Total Count vs Registered Count')
plt.xlabel('Total Count')
plt.ylabel('Registered Count')
plt.show()

[160]: # Scatter plot for Casual vs Registered


plt.figure(figsize=(8, 6))
sns.scatterplot(x=df['casual'], y=df['registered'], color='purple')
plt.title('Casual Count vs Registered Count')
plt.xlabel('Casual Count')
plt.ylabel('Registered Count')
plt.show()

7
[161]: # Box plot for Season vs Count
plt.figure(figsize=(10, 6))
sns.boxplot(hue='season', y='count', data=df, palette='muted')
plt.title('Season vs Count')
plt.xlabel('Season')
plt.ylabel('Count')
plt.show()

8
[162]: # Bar plot for Weather vs Count
plt.figure(figsize=(10, 6))
sns.barplot(hue='weather', y='count', data=df, palette='pastel')
plt.title('Weather vs Count')
plt.xlabel('Weather')
plt.ylabel('Count')
plt.show()

9
[163]: # Box plot for Holiday vs Count
plt.figure(figsize=(10, 6))
sns.boxplot(hue='holiday', y='count', data=df, palette='bright')
plt.title('Holiday vs Count')
plt.xlabel('Holiday')
plt.ylabel('Count')
plt.show()

10
[164]: # Bar plot for Workingday vs Count
plt.figure(figsize=(10, 6))
sns.barplot(hue='workingday', y='count', data=df, palette='Set2')
plt.title('Workingday vs Count')
plt.xlabel('Workingday')
plt.ylabel('Count')
plt.show()

11
##Effect of Working Day on Bikes rented
1. Comparison of Two Groups: The 2-sample t-test is appropriate for analyzing the effect
of categorization into “Working Day” and “Holiday” on the number of electric cycles rented.
This test specifically compares the average count of rentals between these two distinct groups.
2. Parametric Test for Means: Given the normality assumption and the nature of the data
being numerical, the 2-sample t-test is well-suited for this analysis. It is a parametric method
designed for comparing
• Null Hypothesis (H0): There is no significant difference in the average number of electric
bikes rented between working days and holidays.
– H0: The mean number of rentals on working days equals the mean number of rentals on
holidays.
• Alternate Hypothesis (H1): The average number of bike rentals on working days differs
from that on holidays.
– H1: The mean number of rentals on working days is not equal to the mean number of
rentals on holidays.
[165]: df['Working_day'] = 'Holiday'
df.loc[df['workingday'] == 1, 'Working_day'] = 'Working Day'

[166]: from scipy.stats import ttest_ind

working_day_counts = df[df['Working_day'] == 'Working Day']['count']


holiday_counts = df[df['Working_day'] == 'Holiday']['count']

12
# Perform 2-sample t-test
t_val, p_val = ttest_ind(working_day_counts, holiday_counts)

t_val = round(t_val,2)
p_val = round(p_val,2)
print("2-Sample T-Test Results:")
print(f"T-Statistic: {t_val}")
print(f"P-Value: {p_val}")

alpha = 0.05 # Testing at 95% significance level


if p_val < alpha:
print("The difference in mean count between Working Days and Holidays is␣
↪significant.")

else:
print("There is no significant difference in mean count between Working␣
↪Days and Holidays.")

print("We do not have enough evidence to support the hypothesis that␣


↪Working Day significantly affects the number of bikes rented compared to␣

↪holidays.")

2-Sample T-Test Results:


T-Statistic: 1.21
P-Value: 0.23
There is no significant difference in mean count between Working Days and
Holidays.
We do not have enough evidence to support the hypothesis that Working Day
significantly affects the number of bikes rented compared to holidays.
##Checking if Mean bikes rented in different weathers is same
[167]: df.groupby(by = 'weather')['count'].describe()

[167]: count mean std min 25% 50% 75% max


weather
1 7192.0 205.236791 187.959566 1.0 48.0 161.0 305.0 977.0
2 2834.0 178.955540 168.366413 1.0 41.0 134.0 264.0 890.0
3 859.0 118.846333 138.581297 1.0 23.0 71.0 161.0 891.0
4 1.0 164.000000 NaN 164.0 164.0 164.0 164.0 164.0

• Null Hypothesis (H0): There is no significant difference in the demand for bikes across
different weather conditions.
– H0: The mean demand for bikes is the same for all weather conditions.
• Alternate Hypothesis (H1): There is a significant difference in the demand for bikes across
different weather conditions.
– H1: The mean demand for bikes varies across different weather conditions.
[168]: sns.boxplot(data=df, hue='weather', y='count', palette="Set3")

13
[168]: <Axes: ylabel='count'>

[169]: sns.histplot(data = df, x = 'count', hue = 'weather',palette="Set3")

[169]: <Axes: xlabel='count', ylabel='Count'>

14
Paraphrased Observations:
• There’s a noticeable trend in the median rental counts across different weather types, with
Type 1 showing the highest median count, followed by Type 2 and then Type 3.
• This trend suggests that Type 1 weather conditions are associated with the highest demand
for shared electric bikes, likely due to favorable conditions that encourage increased micro-
mobility service usage.
• While Types 2 and 3 still attract moderate usage, it’s not as significant as Type 1.
• A median count of 0 for Weather Type 4 indicates that this weather condition doesn’t support
rental activity or the use of shared electric bikes.
• Weather Type 4 represents extreme or unfavorable conditions such as heavy rain, ice pellets,
and thunderstorms, which discourage users from using micro-mobility services altogether.
[170]: from scipy.stats import f_oneway
weather1 = df[df['weather'] == 1]['count']
weather2 = df[df['weather'] == 2]['count']
weather3 = df[df['weather'] == 3]['count']
weather4 = df[df['weather'] == 4]['count']
anova_stat, p_val = f_oneway(weather1, weather2, weather3,weather4)

15
if p_val < alpha:
print("There is a significant difference in the demand for bikes across␣
↪different weather conditions.")

print("We reject the null hypothesis (H0) that the mean demand for bikes is␣
↪the same for all weather conditions.")

else:
print("There is no significant difference in the demand for bikes across␣
↪different weather conditions.")

print("We do not have enough evidence to reject the null hypothesis (H0)␣
↪that the mean demand for bikes is the same for all weather conditions.")

There is a significant difference in the demand for bikes across different


weather conditions.
We reject the null hypothesis (H0) that the mean demand for bikes is the same
for all weather conditions.
##Checking if Mean bikes rented in different seasons is same
[171]: df['season_name'] = 'null'
df.loc[df['season'] == 1, 'season_name'] = 'Spring'
df.loc[df['season'] == 2, 'season_name'] = 'Summer'
df.loc[df['season'] == 3, 'season_name'] = 'Fall'
df.loc[df['season'] == 4, 'season_name'] = 'Winter'

• Null Hypothesis (H�): There is no significant difference in the demand for bikes across
different seasons.
– H�: The mean demand for bikes is the same for all seasons.
• Alternative Hypothesis (H�): There is a significant difference in the demand for bikes
across different seasons.
– H�: The mean demand for bikes varies across different seasons.
[172]: sns.boxplot(data=df, hue='season_name', y='count', palette="Set3")

[172]: <Axes: ylabel='count'>

16
[173]: sns.kdeplot(data = df, x = 'count', hue = 'season_name',palette="Set3")

[173]: <Axes: xlabel='count', ylabel='Density'>

17
Observations :
• Spring Popularity: Spring boasts the highest median demand, suggesting it’s the peak
season for bike rentals on average.
• Spring’s Unpredictability: However, Spring also experiences the most outliers, indicating
larger variations in demand compared to other seasons.
• Fall Favored by Weather: Fall seems to be the overall season with the most rentals, likely
due to pleasant weather conditions that encourage cycling.
• Winter and Summer Moderation: Winter and Summer see a moderate level of demand,
possibly influenced by weather extremes that can deter riders in those seasons.
[174]: spring = df[df['season'] == 1]['count']
summer = df[df['season'] == 2]['count']
fall = df[df['season'] == 3]['count']
winter = df[df['season'] == 4]['count']

anova_stat, p_val = f_oneway(spring ,summer, fall, winter)


if p_val < alpha:
print("There is a significant difference in the demand for bikes across␣
↪different seaons.")

print("We reject the null hypothesis (H0) that the mean demand for bikes is␣
↪the same for all seaons.")

18
else:
print("There is no significant difference in the demand for bikes across␣
↪different seaons.")

print("We do not have enough evidence to reject the null hypothesis (H0)␣
↪that the mean demand for bikes is the same for all seaons.")

There is a significant difference in the demand for bikes across different


seaons.
We reject the null hypothesis (H0) that the mean demand for bikes is the same
for all seaons.
##Relation Between Weather and Season
1. Test for Independence: The chi-square test of independence was chosen to assess the re-
lationship between two categorical variables: weather and season. By examining whether the
distribution of weather conditions varies significantly across different seasons, this test eval-
uates if there is a statistically significant association or dependency between these variables.
2. Expected vs. Observed Frequencies: The chi-square test compares observed frequencies
(actual counts) of weather categories within each season to expected frequencies under the
assumption of independence. By determining whether the observed distribution of weather
across seasons deviates significantly from what would be expected by chance, this analysis
helps identify patterns or associations between weather and season categories.
• Null Hypothesis (H0): Assumes that there is no association or relationship between
weather and season. The observed frequencies in the contingency table are consistent with
what would be expected by chance alone.
• Alternate Hypothesis (H1): Opposes the null hypothesis, suggesting that there is a non-
random association between weather and season. The observed frequencies in the contingency
table deviate significantly from what would be expected under independence.
[175]: contingency_table = pd.crosstab(df['weather'], df['season'])
contingency_table

[175]: season 1 2 3 4
weather
1 1759 1801 1930 1702
2 715 708 604 807
3 211 224 199 225
4 1 0 0 0

[176]: from scipy.stats import chi2_contingency

chi_val , p_val , dof , ef = chi2_contingency(contingency_table)

[179]: if p_val < alpha:


print("There is a statistically significant relationship between weather␣
↪and season.")

else:

19
print("There is no statistically significant relationship between weather␣
↪and season.")

There is a statistically significant relationship between weather and season.


So we reject the null Hypothesis

20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy