Yulu SRK
Yulu SRK
##About Yulu
Yulu is India’s leading micro-mobility service provider, which offers unique vehicles for the daily
commute. Starting off as a mission to eliminate traffic congestion in India, Yulu provides the
safest commute solution through a user-friendly mobile app to enable shared, solo and sustainable
commuting. Yulu zones are located at all the appropriate locations (including metro stations, bus
stands, office spaces, residential areas, corporate offices, etc) to make those first and last miles
smooth, affordable, and convenient! Yulu has recently suffered considerable dips in its revenues.
They have contracted a consulting company to understand the factors on which the demand for
these shared electric cycles depends. Specifically, they want to understand the factors affecting the
demand for these shared electric cycles in the Indian market
##Problem Statement
• Which variables are significant in predicting the demand for shared electric cycles in the
Indian market?
• How well those variables describe the electric cycle demands?
[148]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
[149]: df = pd.DataFrame(pd.read_csv("https://d2beiqkhq929f0.cloudfront.net/
↪public_assets/assets/000/001/428/original/bike_sharing.csv?1642089089"))
##Dataset Analysis
[150]: df.head()
1
1 80 0.0 8 32 40
2 80 0.0 5 27 32
3 75 0.0 3 10 13
4 75 0.0 0 1 1
[151]: df.describe()
count
count 10886.000000
mean 191.574132
std 181.144454
min 1.000000
25% 42.000000
50% 145.000000
75% 284.000000
max 977.000000
[152]: df.isnull().sum()
[152]: datetime 0
season 0
holiday 0
workingday 0
weather 0
temp 0
atemp 0
humidity 0
2
windspeed 0
casual 0
registered 0
count 0
dtype: int64
[153]: 0
Temperature seems to follow a normal trend with a little cliff from 15 degrees to 30 degress
3
[156]: # Box plot
plt.figure(figsize=(10, 6))
sns.boxplot(df['humidity'], color='lightgreen')
plt.title('Distribution of Humidity')
plt.xlabel('Humidity')
plt.show()
4
[158]: correlation_matrix = df[["atemp", "temp", "humidity", "windspeed", "casual",␣
↪"registered", "count"]].corr()
correlation_df = pd.DataFrame(correlation_matrix)
plt.figure(figsize = (10, 6))
sns.heatmap(correlation_matrix, annot = True)
plt.show()
5
High Positive Correlation:
1. Atemp and Temp (0.98): There exists an exceptionally strong correlation between the
actual temperature in Celsius and the perceived temperature in Celsius.
2. Count and Registered (0.97): There is an extremely strong correlation between the total
count of rented bikes (including both casual and registered) and the count of registered users.
6
plt.title('Total Count vs Registered Count')
plt.xlabel('Total Count')
plt.ylabel('Registered Count')
plt.show()
7
[161]: # Box plot for Season vs Count
plt.figure(figsize=(10, 6))
sns.boxplot(hue='season', y='count', data=df, palette='muted')
plt.title('Season vs Count')
plt.xlabel('Season')
plt.ylabel('Count')
plt.show()
8
[162]: # Bar plot for Weather vs Count
plt.figure(figsize=(10, 6))
sns.barplot(hue='weather', y='count', data=df, palette='pastel')
plt.title('Weather vs Count')
plt.xlabel('Weather')
plt.ylabel('Count')
plt.show()
9
[163]: # Box plot for Holiday vs Count
plt.figure(figsize=(10, 6))
sns.boxplot(hue='holiday', y='count', data=df, palette='bright')
plt.title('Holiday vs Count')
plt.xlabel('Holiday')
plt.ylabel('Count')
plt.show()
10
[164]: # Bar plot for Workingday vs Count
plt.figure(figsize=(10, 6))
sns.barplot(hue='workingday', y='count', data=df, palette='Set2')
plt.title('Workingday vs Count')
plt.xlabel('Workingday')
plt.ylabel('Count')
plt.show()
11
##Effect of Working Day on Bikes rented
1. Comparison of Two Groups: The 2-sample t-test is appropriate for analyzing the effect
of categorization into “Working Day” and “Holiday” on the number of electric cycles rented.
This test specifically compares the average count of rentals between these two distinct groups.
2. Parametric Test for Means: Given the normality assumption and the nature of the data
being numerical, the 2-sample t-test is well-suited for this analysis. It is a parametric method
designed for comparing
• Null Hypothesis (H0): There is no significant difference in the average number of electric
bikes rented between working days and holidays.
– H0: The mean number of rentals on working days equals the mean number of rentals on
holidays.
• Alternate Hypothesis (H1): The average number of bike rentals on working days differs
from that on holidays.
– H1: The mean number of rentals on working days is not equal to the mean number of
rentals on holidays.
[165]: df['Working_day'] = 'Holiday'
df.loc[df['workingday'] == 1, 'Working_day'] = 'Working Day'
12
# Perform 2-sample t-test
t_val, p_val = ttest_ind(working_day_counts, holiday_counts)
t_val = round(t_val,2)
p_val = round(p_val,2)
print("2-Sample T-Test Results:")
print(f"T-Statistic: {t_val}")
print(f"P-Value: {p_val}")
else:
print("There is no significant difference in mean count between Working␣
↪Days and Holidays.")
↪holidays.")
• Null Hypothesis (H0): There is no significant difference in the demand for bikes across
different weather conditions.
– H0: The mean demand for bikes is the same for all weather conditions.
• Alternate Hypothesis (H1): There is a significant difference in the demand for bikes across
different weather conditions.
– H1: The mean demand for bikes varies across different weather conditions.
[168]: sns.boxplot(data=df, hue='weather', y='count', palette="Set3")
13
[168]: <Axes: ylabel='count'>
14
Paraphrased Observations:
• There’s a noticeable trend in the median rental counts across different weather types, with
Type 1 showing the highest median count, followed by Type 2 and then Type 3.
• This trend suggests that Type 1 weather conditions are associated with the highest demand
for shared electric bikes, likely due to favorable conditions that encourage increased micro-
mobility service usage.
• While Types 2 and 3 still attract moderate usage, it’s not as significant as Type 1.
• A median count of 0 for Weather Type 4 indicates that this weather condition doesn’t support
rental activity or the use of shared electric bikes.
• Weather Type 4 represents extreme or unfavorable conditions such as heavy rain, ice pellets,
and thunderstorms, which discourage users from using micro-mobility services altogether.
[170]: from scipy.stats import f_oneway
weather1 = df[df['weather'] == 1]['count']
weather2 = df[df['weather'] == 2]['count']
weather3 = df[df['weather'] == 3]['count']
weather4 = df[df['weather'] == 4]['count']
anova_stat, p_val = f_oneway(weather1, weather2, weather3,weather4)
15
if p_val < alpha:
print("There is a significant difference in the demand for bikes across␣
↪different weather conditions.")
print("We reject the null hypothesis (H0) that the mean demand for bikes is␣
↪the same for all weather conditions.")
else:
print("There is no significant difference in the demand for bikes across␣
↪different weather conditions.")
print("We do not have enough evidence to reject the null hypothesis (H0)␣
↪that the mean demand for bikes is the same for all weather conditions.")
• Null Hypothesis (H�): There is no significant difference in the demand for bikes across
different seasons.
– H�: The mean demand for bikes is the same for all seasons.
• Alternative Hypothesis (H�): There is a significant difference in the demand for bikes
across different seasons.
– H�: The mean demand for bikes varies across different seasons.
[172]: sns.boxplot(data=df, hue='season_name', y='count', palette="Set3")
16
[173]: sns.kdeplot(data = df, x = 'count', hue = 'season_name',palette="Set3")
17
Observations :
• Spring Popularity: Spring boasts the highest median demand, suggesting it’s the peak
season for bike rentals on average.
• Spring’s Unpredictability: However, Spring also experiences the most outliers, indicating
larger variations in demand compared to other seasons.
• Fall Favored by Weather: Fall seems to be the overall season with the most rentals, likely
due to pleasant weather conditions that encourage cycling.
• Winter and Summer Moderation: Winter and Summer see a moderate level of demand,
possibly influenced by weather extremes that can deter riders in those seasons.
[174]: spring = df[df['season'] == 1]['count']
summer = df[df['season'] == 2]['count']
fall = df[df['season'] == 3]['count']
winter = df[df['season'] == 4]['count']
print("We reject the null hypothesis (H0) that the mean demand for bikes is␣
↪the same for all seaons.")
18
else:
print("There is no significant difference in the demand for bikes across␣
↪different seaons.")
print("We do not have enough evidence to reject the null hypothesis (H0)␣
↪that the mean demand for bikes is the same for all seaons.")
[175]: season 1 2 3 4
weather
1 1759 1801 1930 1702
2 715 708 604 807
3 211 224 199 225
4 1 0 0 0
else:
19
print("There is no statistically significant relationship between weather␣
↪and season.")
20