Py Dashboard
Py Dashboard
India
Submitted by:
Md Imran Siddiqui
Discipline of CSE/IT
Date: 12-April-2025
Signature:
Date: 12-April-2025
ACKNOWLEDGMENT
I would like to express my sincere gratitude to Dr. Mrinalini Rana ma’am, my project guide,
for their invaluable support, guidance, and encouragement throughout the development of
this project. Their expert insights and constructive feedback have been instrumental in
shaping the project's outcome.
Lastly, I would like to acknowledge the unwavering support of my family and friends, whose
encouragement has been a source of inspiration throughout this journey.
TABLE OF CONTENTS
Introduction
Source of Dataset
EDA Process
• Introduction
• General Description
• Specific Requirements
• Analysis Results
• Visualization
Conclusion
Future Scope
References
1. INTRODUCTION
I In the ever-evolving landscape of modern India, understanding the nature and spread of crime is
not just an academic exercise—it’s a vital tool for shaping safer, more informed communities. Crime
data, when seen not as cold numbers but as reflections of real human experiences, offers us a lens
into the social, economic, and political challenges our society continues to face. This report aims to
explore, decode, and visualize crime patterns in India using real statistical data from the National
Crime Records Bureau (NCRB).
We live in an era where data is more powerful than ever. Behind every bar graph and pie chart lies a
story—a reality affecting families, individuals, and entire regions. Whether it's the surge of
cybercrime, the heartbreaking rise in crimes against women, or the persistent shadow of property
thefts and violent offenses, every number represents a need for action. But before we can take
meaningful steps toward justice and reform, we must first understand where we stand. That’s where
this project comes in.
Using the Python programming language and powerful libraries like Pandas, Matplotlib, and
Seaborn, we dove deep into the dataset, cleaned and organized it, and brought it to life through
visual insights. This wasn’t just a coding exercise—it was a journey to uncover truths hidden
between rows and columns. The goal wasn’t just to analyze the data, but to see the heartbeat of the
nation’s crime map, to figure out what’s rising, what’s falling, and what still demands urgent
attention.
In this project, we categorized crimes into various segments such as violent crimes, crimes against
women, and property-related crimes. We explored how these crimes evolved over time, how they
vary across states, and which areas are most affected. From line plots showing trends across years to
heatmaps revealing deep correlations between different types of offenses—every visual we
generated was a piece of the puzzle.
This report is not meant to pass judgment, but to spark awareness. It's meant to show how tech and
data science can be powerful allies in social understanding and governance. As students, we may not
hold policymaking power today, but through knowledge, we contribute to the foundation of a more
aware, responsible, and secure tomorrow. And if even one insight from this analysis leads to a better
decision, a safer environment, or a stronger policy, then this effort will have been worth every line of
code.
2. SOURCE OF DATASET
The dataset used for developing this crime data dashboard has been sourced from the official Indian
government open data portal: https://www.data.gov.in/catalog/district-wise-crimes-under-various-
sections-indian-penal-code-ipc-crimes.
This dataset is publicly available and is published by the National Crime Records Bureau (NCRB),
making it a reliable and authoritative source for crime statistics across Indian states and districts.
The dataset includes district-wise records of various crimes reported under different sections of the
Indian Penal Code (IPC). The key columns in the dataset are:
These columns provide detailed insight into the nature and scale of crimes across different regions,
serving as the foundation for all visualizations and analyses in this dashboard.
3. EDA PROCESS
Exploratory Data Analysis (EDA) Process
The EDA process involves systematically examining the dataset to uncover patterns, detect
anomalies, and check assumptions through visual and statistical techniques. It begins with
understanding the structure of the data, identifying missing or inconsistent values, and
summarizing key features using descriptive statistics. Next, we use visualizations like
histograms, bar plots, heatmaps, and box plots to explore distributions, relationships, and
outliers. In our crime dataset, we analyzed the frequency of IPC crimes across districts and
states, enabling meaningful insights. EDA acts as the foundation for deeper analysis, helping
to refine questions, guide model selection, and ensure data quality and readiness.
Dataset Preprocessing
• Initial Examination
o Loaded the raw NCRB crime dataset using Python's Pandas library.
o Inspected the shape, column names, and data types.
o Identified missing values, duplicate records, and irrelevant columns.
• Column Cleanup
o Removed unnecessary or redundant columns that did not contribute to the
objective.
o Renamed headers for clarity and uniformity (e.g., proper casing, no extra
spaces).
• Handling Missing Data
o Checked for NaN values and blanks across all rows.
o Applied appropriate strategies:
▪ For numeric fields: used mean/median imputation or dropped rows
based on significance.
▪ For categorical data: replaced missing values with consistent default
terms (e.g., "Unknown").
• Data Type Corrections
o Ensured that all numerical columns (crime counts) were in int or float format.
o Converted date/time fields (if any) to datetime objects.
• Standardizing Names
o Resolved inconsistencies in district and state names (e.g., “Uttar Pradesh” vs
“U.P.”).
o Handled state bifurcations (e.g., “Telangana” appearing only post-2014) to
avoid duplication.
• Data Filtering
o Removed rows labelled as "TOTAL" or "GRAND TOTAL" to prevent
skewed analysis.
o Focused only on IPC-related crime categories relevant to the study.
4. ANALYSIS ON DATASET
The analysis was conducted using the crime dataset to address key objectives regarding crime
patterns and regional distribution. The dataset included various crime categories such as murder,
theft, violence against women, and negligence deaths, among others. The following analysis
presents the results for each objective using techniques such as pivot tables, calculated fields, and
visualizations in Microsoft Excel.
i. General Description:
This analysis aims to uncover which Indian states report the highest total number
of crimes under the Indian Penal Code (IPC). By focusing on state-level
aggregation, we can identify crime-prone areas on a broader geographic scale and
help policymakers and law enforcement prioritize resources accordingly.
• Aggregate total crimes from all IPC-related columns across the dataset.
• Group by the STATE/UT column instead of DISTRICT.
• Calculate total crimes per state and select the top 10.
After aggregating the data, the analysis reveals that states like Uttar Pradesh,
Maharashtra, and Bihar consistently report the highest number of IPC crimes.
These states, often with high population densities and large urban centers,
naturally contribute to a greater share of reported offenses.
iv. Visualization:
The findings are visualized using a horizontal bar plot. Each bar represents a state,
with the length corresponding to the total number of crimes. The 'Reds_d' palette
is used to enhance visibility and convey the intensity of crime numbers
effectively.
Objective 2: Trend of Violent Crimes Over Region
i. General Description:
This objective focuses on examining the pattern of violent crimes across multiple
Region. It aims to uncover whether serious offenses like murder, rape, and riots are
increasing, decreasing, or remaining stable over region.
The analysis shows fluctuating trends. Certain crimes like murder and riots show
marginal decline in some region, while others like rape and grievous hurt exhibit
worrying rises. This fluctuation highlights the evolving social and political climate
across different periods.
iv. Visualization:
A line plot is used to illustrate each crime's trend over the years. The multi-line graph
with legends makes it easier to compare how different types of violent crimes evolve
annually.
i. General Description:
This analysis dives into crimes specifically committed against women, aiming to
identify the states where women are most vulnerable. This insight can guide targeted
safety initiatives and awareness programs.
States like Uttar Pradesh and Delhi top the list, reflecting ongoing challenges in
women’s safety. High crime rates may be influenced by socio-economic factors,
reporting efficiency, and public awareness.
iv. Visualization:
The data is visualized with a bar plot using a purple gradient color palette. This helps
convey the severity and draw attention to regions needing urgent policy interventions.
i. General Description:
This objective analyzes kidnapping and abduction incidents based on the victim’s
profile—specifically separating cases involving women and girls from those
involving others. This split helps tailor preventive actions based on the target group.
• Use two columns: KIDNAPPING AND ABDUCTION OF WOMEN AND GIRLS and
KIDNAPPING AND ABDUCTION OF OTHERS.
• Group data by STATE/UT and take the top 10 states with the highest counts.
States like Uttar Pradesh, Bihar, and Maharashtra report the highest cases, especially
involving women and girls. This raises alarms about gendered crimes and the need for
stricter law enforcement.
iv. Visualization:
A grouped bar chart is created, comparing the two victim types side-by-side for each
state. The stacked bars make it easy to compare proportions and differences in
targeting.
Objective 5: Share of Property Crimes
i. General Description:
This objective focuses on crimes involving property, which include theft, robbery,
burglary, etc. Understanding the share of each type helps law enforcement agencies
focus more on specific preventive measures.
Theft and auto theft emerge as the dominant property crimes. While DACOITY has
relatively fewer cases, robbery and burglary still show significant presence in certain
areas.
iv. Visualization:
A pie chart is used to show proportional distribution of each crime. Pastel shades
enhance readability and help easily identify the dominant categories.
i. General Description:
This analysis checks how different crimes relate statistically to each other. Finding
correlations can reveal whether a rise in one crime type signals a rise in another.
Crimes like burglary and theft show high correlation, suggesting that areas prone to
one are also likely to report the other. Meanwhile, violent and property crimes tend to
be less correlated.
iv. Visualization:
A heatmap with color gradients (coolwarm palette) is used. Annotated boxes help in
quickly spotting strong and weak correlations between crime types.
Formula Used and Information Explained to the User
In this analysis, various methods were applied to process the crime data and derive
meaningful insights. Below is a detailed explanation of the key formulas and the reasoning
behind them:
Formula Used:
To calculate the total number of crimes reported for each state, we sum the values of
crime columns across each row. The formula used is to sum all the crime data (from
column 4 onward) for each state. This gives us a total count of reported crimes in each
state.
Explanation:
This step aggregates crime data by state to provide an overall view of crime
distribution across the country. By summing the crime counts, we get a single figure
representing the total number of crimes in each state, which is necessary for
identifying the states with the highest crime rates.
Formula Used:
The next step is to group the data by the state (STATE/UT), then calculate the total
crimes reported by summing them for each state. After grouping the data, the states
are sorted to identify the top 10 states with the highest crime counts.
Explanation:
Grouping by state allows us to break down the crime data by region. Sorting the states
based on the sum of crimes allows us to pinpoint which areas are experiencing the
most crime, helping to focus on these areas for policy interventions.
Formula Used:
For this analysis, we select crime categories related to violent crimes such as murder,
rape, and riots, and then group the data by region. The sum of these crimes is
calculated for each region.
Explanation:
Formula Used:
A similar approach is used to aggregate crimes that specifically target women, such as
rape, dowry deaths, and cruelty by husband. The data is grouped by state to identify
where these crimes are most prevalent.
Explanation:
Formula Used:
This formula looks at two specific crime categories: kidnapping and abduction of
women and girls, as well as others. The total counts for both types of crimes are
summed for each state.
Explanation:
Kidnapping and abduction are serious crimes, and this analysis helps break down
where these incidents occur most frequently. By examining both women/girls and
others separately, we can better understand the nature of abductions in different
regions.
6. Share of Property Crimes
Formula Used:
Property-related crimes like theft, auto theft, burglary, and robbery are aggregated to
calculate the total number of property crimes. The sum of these crimes is calculated
for each state to understand the prevalence of property crime.
Explanation:
This analysis highlights the relative frequency of property crimes in different states.
By aggregating the various forms of property crimes, we can see which states have
the highest property crime rates and identify potential areas for preventive measures.
Formula Used:
Understanding correlations between different types of crime is crucial for law enforcement
and policymaking. A high correlation between violent and property crimes, for example,
could suggest that certain regions may need additional resources or strategies to address
interconnected crime issues.
Visualization:
• Bar Chart: The bar chart visualizes the top 10 states with the highest number of IPC
(Indian Penal Code) crimes. The x-axis represents the total number of crimes, and the
y-axis lists the states.
• Color Palette: The chart uses a gradient of reds to highlight the severity of crime
rates, with darker shades indicating higher numbers of crimes.
Insights:
• This visualization reveals which states are most affected by IPC crimes, showing that
crime rates are not evenly distributed across the country. Some states exhibit
significantly higher crime numbers, indicating potential areas where law enforcement
resources need to be concentrated.
• States with the highest number of crimes are crucial for targeting crime prevention
programs and policies aimed at reducing crime.
Visualization:
• Line Graph: A line graph is used to show the trend of violent crimes (murder, rape,
riots) over the years. The x-axis represents Region, and the y-axis represents the total
number of violent crimes reported each Region.
• Multiple Lines: Different lines represent each crime category, allowing for a
comparison of trends across Region.
Insights:
• This visualization helps identify whether violent crime rates have increased or
decreased over the years. For example, if the line for murders is rising steadily while
rapes or riots decrease, it might indicate that efforts to reduce certain types of violence
have been more effective than others.
• It also helps pinpoint years with significant spikes or dips, which could be linked to
external factors like political events, law enforcement initiatives, or societal changes.
3. Crime Rate in Relation to Women’s Safety: Rape, Dowry Deaths, and Cruelty
Visualization:
• Stacked Bar Chart: This stacked bar chart displays the number of crimes against
women, such as rape, dowry deaths, and cruelty by husbands, for each state. The x-
axis represents the states, and the y-axis shows the total number of reported crimes
against women.
• Colour Coding: Different crime types are stacked in distinct colors to allow easy
comparison between crime types.
Insights:
• This visualization provides a clearer view of how crimes against women are
distributed across states. States with a high incidence of rape or dowry deaths may
need more targeted interventions, such as gender-based violence prevention programs,
legal reforms, and awareness campaigns.
• The chart also emphasizes which states may be facing systemic issues in terms of
women’s safety.
Visualization:
• Bar Chart: This bar chart compares the total number of kidnappings and abductions
(both women and others) reported in each state. The x-axis represents the number of
crimes, and the y-axis lists the states.
• Color Gradient: A color gradient represents the intensity of crime rates, with darker
colors indicating more frequent incidents.
Insights:
• This chart provides an in-depth look at states with the highest rates of kidnapping and
abduction. The visualization helps identify areas where such crimes are most
prevalent, and where preventive measures, such as public awareness campaigns and
improved law enforcement, could be implemented.
• Notably, regions with high rates of abductions can point to deeper issues like human
trafficking or organized crime.
Visualization:
• Pie Chart: This pie chart visualizes the proportion of property crimes, such as theft,
robbery, and burglary, in the dataset. Each crime type is represented as a segment of
the pie chart.
• Legend: The crime types are clearly labeled with their corresponding percentages.
Insights:
• This visualization shows the relative contribution of each property crime type. For
example, if theft constitutes the largest share, it indicates that theft-related crimes are
a major concern in the country.
• Understanding the proportion of each type of property crime helps law enforcement
agencies prioritize resources and develop strategies to tackle the most common
property crimes.
Visualization:
• Color Coding: Positive correlations are represented in warmer colors (red), and
negative correlations are shown in cooler colors (blue).
Insights:
• This heatmap reveals how different types of crimes are related to one another. For
example, a strong positive correlation between robbery and burglary suggests that
these crimes often occur together, perhaps due to similar motives or criminal
networks.
The visualizations provided above offer a clear and accessible way to analyze crime data,
revealing valuable patterns and trends across different crime categories and regions. By using
visual tools such as bar charts, line graphs, pie charts, and heatmaps, complex crime data is
simplified, making it easier for policymakers, law enforcement, and the general public to
understand the scope of crime in different states and regions.
Each visualization presents specific insights, such as identifying high-crime states, tracking
trends over time, and uncovering correlations between different types of crime. These
insights are crucial for developing targeted crime prevention strategies, optimizing resource
allocation, and enhancing public safety initiatives.
Through these visualizations, it becomes clear that addressing crime effectively requires a
nuanced understanding of regional variations, the nature of different crimes, and trends over
time.
5. CONCLUSION
The analysis of crime data, presented through detailed visualizations and insights, offers valuable
perspectives into the state of law enforcement and public safety in India. By carefully examining
trends in various crime categories, regions, and over time, the study reveals crucial patterns that can
guide future actions and policies aimed at reducing crime rates.
The dataset preprocessing phase highlighted the importance of cleaning and structuring data for
meaningful analysis. By converting categorical data, handling missing values, and ensuring that the
dataset was free of inconsistencies, we created a solid foundation for robust analytical work. This step
was pivotal in transforming raw data into actionable insights.
Through the analysis objectives, we identified key areas where crime rates are notably high, such as
certain states or particular crime categories. By visualizing crime trends over time, we observed
fluctuations and patterns that suggest the effectiveness (or lack thereof) of past crime prevention
efforts. The crime distribution analysis over time, particularly in violent crimes, pointed out the
need for targeted interventions and highlighted years where specific incidents may have led to surges
in crime rates.
The visualizations of crime data helped to simplify complex information, making it more accessible
to a wider audience. Whether it was through bar charts, line graphs, pie charts, or heatmaps, each
visualization served to highlight different aspects of the crime landscape. These insights provide a
more comprehensive understanding of how crime is distributed across regions and categories,
allowing for better-informed decision-making.
In conclusion, the findings from this analysis underscore the need for data-driven strategies to combat
crime. The insights gained here can be utilized to guide law enforcement policies, inform public
safety initiatives, and promote societal changes aimed at reducing crime. By continuously updating
and analyzing crime data, authorities can ensure that resources are allocated effectively, and strategies
remain adaptive to emerging trends. Ultimately, data-driven insights are key to creating safer
communities and improving the effectiveness of law enforcement agencies in India.
The journey through this report demonstrates the power of data analysis and visualization in
addressing societal challenges like crime, and how it can be leveraged for better governance and
public safety.
6. FUTURE SCOPE
The analysis of crime data, as presented in this report, serves as a foundation for future research and
improvements in public safety. While the current study provides insightful findings, there are several
areas where further analysis and enhancements can lead to deeper insights and more effective
interventions. Below are the potential future directions and improvements that could be explored to
extend the scope of this work:
1. Incorporating Real-Time Data: One of the most promising advancements in crime analysis is
the integration of real-time data. As crime incidents continue to be reported daily,
incorporating live data feeds into the system would enable authorities to monitor trends as
they unfold. This would allow law enforcement agencies to react more quickly to emerging
threats, and possibly even predict crime hotspots before they become significant issues.
2. Use of Machine Learning and Predictive Analytics: The application of machine learning
algorithms could be a game-changer in crime analysis. By analyzing historical crime data,
machine learning models could predict future crime trends, identify potential crime
hotspots, and provide actionable insights for law enforcement. These models could be
trained to recognize patterns in the data and forecast crime occurrences with higher
accuracy, enabling proactive crime prevention.
3. Geospatial Analysis and Crime Mapping: Geographic Information Systems (GIS) can be
integrated with the crime dataset to visualize crimes spatially. Mapping crime data on
geographical charts would allow for a more granular understanding of crime patterns in
specific regions, cities, or even neighbourhoods. By combining GIS with temporal data, it
could also provide insights into crime patterns at various times of the day or during specific
events (e.g., holidays, festivals).
4. Incorporating More Data Sources: While the current analysis focuses on crime data, the
inclusion of additional data sources such as weather data, economic indicators, population
demographics, and education levels could provide a more holistic view of the factors
contributing to crime. Understanding these underlying causes can help policymakers and law
enforcement agencies develop strategies that target the root causes of criminal behaviour.
5. Crime Prevention Strategy Evaluation: Future research could focus on evaluating the
effectiveness of crime prevention strategies over time. By analyzing crime data before and
after implementing specific policies or interventions (e.g., increased police presence, public
awareness campaigns), we could assess whether such initiatives are yielding positive results.
This would allow authorities to adjust their strategies based on the success or failure of past
efforts.
6. Focus on Underreported Crimes: The current dataset primarily focuses on reported crimes,
but many crimes, especially those involving domestic violence or sexual assault, often go
underreported. Future work could involve efforts to estimate the scale of underreported
crimes and develop methods for increasing reporting rates through public campaigns,
improved reporting mechanisms, or anonymous tips.
9. Public Awareness and Community Engagement: The future of crime prevention also lies in
community-based efforts. Data-driven insights can be shared with the public to raise
awareness about crime hotspots and promote community engagement. Furthermore,
involving the community in crime prevention efforts, such as neighborhood watch programs,
could help reduce crime rates and enhance public safety.
In conclusion, while the current analysis has provided important insights into crime trends, the
future scope for this work is vast. Leveraging advanced technologies like machine learning, GIS, and
real-time data, along with a collaborative approach involving various government agencies and the
public, can lead to more effective crime prevention and better public safety outcomes. Through
continuous research and innovation, the fight against crime can be significantly enhanced.
1. Screenshot of Objective
Python Code:
import pandas as pd
# Load data
file_name = 'main.csv'
crime_data = pd.read_csv(file_name)
crime_data.fillna(0, inplace=True)
print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
print(crime_data.head())
print(crime_data.tail())
print(crime_data.info())
# Descriptive statistics
print(crime_data.describe(include='all'))
print(crime_data.isnull().sum())
print("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
def show_menu():
print("3. Exit")
def choose_plot_type():
def plot_insights(option):
if option == '1':
top_states = crime_data.groupby('STATE/UT')['Total_Reported'].sum().nlargest(10)
plt.figure(figsize=(12,6))
plt.xlabel('Total Crimes')
plt.ylabel('State/UT')
plt.tight_layout()
plt.show()
elif option == '2':
yearwise_violent = crime_data.groupby('Region')[violent_types].sum()
plt.figure(figsize=(10,6))
plt.xlabel('Year')
plt.ylabel('Number of Cases')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
statewise_women_crime =
crime_data.groupby('STATE/UT')[women_related].sum().sum(axis=1).nlargest(10)
plt.figure(figsize=(12,6))
plt.xlabel('Number of Cases')
plt.ylabel('State')
plt.tight_layout()
plt.show()
elif option == '4':
kidnapping_stats = crime_data.groupby('STATE/UT')[
['KIDNAPPING AND ABDUCTION OF WOMEN AND GIRLS', 'KIDNAPPING AND ABDUCTION OF OTHERS']
plt.xlabel('State')
plt.ylabel('Number of Cases')
plt.tight_layout()
plt.show()
prop_crime_total = crime_data[property_related].sum()
plt.figure(figsize=(8,8))
plt.tight_layout()
plt.show()
plt.figure(figsize=(12,8))
plt.tight_layout()
plt.show()
else:
print("Invalid choice.")
def explore_custom():
print(f"{idx+1}. {column}")
try:
y_col = input("Enter second column name (leave blank if not needed): ").strip()
chart_type = choose_plot_type()
plt.figure(figsize=(10,6))
if chart_type == '1':
if y_col:
plt.plot(crime_data[x_col], crime_data[y_col])
else:
crime_data[x_col].plot()
top_values = crime_data[x_col].value_counts().head(10)
else:
plt.tight_layout()
plt.grid(True)
plt.show()
# Main interaction loop
while True:
user_choice = show_menu()
if user_choice == '1':
analysis_choice = pick_analysis_goal()
plot_insights(analysis_choice)
explore_custom()
print("Exiting...")
break
else:
https://www.linkedin.com/posts/imransiddiqui786_dataanalysis-pythonproject-crimedata-
activity-7316696737715236864-
wIXd?utm_source=share&utm_medium=member_desktop&rcm=ACoAAEWMTNUBooiRV3rh03
42V7YEdwkO1xgtyow
Github:
https://github.com/786imran786/District-Level-Crime-Analysis-in-India