Real Time Crime Dashboard
Real Time Crime Dashboard
1. Introduction
Project Title: Real-Time Crime Data Analytics Using Spark Streaming
Overview: This project demonstrates the development and deployment of a real-time big data
analytics system focused on crime data, which has become an increasingly important tool in
modern policing and urban safety initiatives. By leveraging the capabilities of Apache Kafka for
real-time data ingestion and Apache Spark Streaming for distributed processing, we can provide
instant insights into crime activities across multiple locations. This enables authorities to make
rapid decisions and adjust strategies accordingly.
Real-time crime analytics involve collecting incident reports from sources like police control
centers, emergency call data, and field units. These raw data streams are processed, structured, and
filtered to remove redundancy or errors. After enrichment and classification, the results are stored in
efficient formats like Parquet or Hive tables. Power BI acts as the front-end analytics platform,
allowing users to interact with dynamic visuals and respond to real-time changes in the data.
Key Objectives:
• Capture and process crime events in real time using Apache Kafka.
• Use Spark Streaming to cleanse, transform, and aggregate data in micro-batches.
• Categorize crimes based on type, severity, timestamp, and geolocation.
• Store the processed output in Hive or export as Parquet/CSV for long-term storage.
• Visualize patterns and KPIs through interactive Power BI dashboards.
Expected Benefits:
• Apache Kafka: Acts as a real-time distributed event broker. It captures raw crime event
messages and ensures fast delivery to consumers.
• Apache Spark Streaming (PySpark): Performs stream processing by reading
messages from Kafka topics. Applies transformation logic such as deduplication,
classification, and geolocation enrichment.
• Hive/HDFS/Parquet/CSV: Storage layer where the structured and cleaned crime data is
saved for querying and visualization.
• Power BI: Business Intelligence tool used to create dynamic visualizations and KPIs for
decision-makers.
Data Pipeline Overview:
Crime Incident Data → Kafka Topics → Spark Streaming → Hive/Parquet/CSV Outpu t → Power BI
Dashboard
Process Explanation:
• Data Ingestion: Kafka Producers send live crime records, structured in JSON or Avro
format.
• Stream Processing: Spark Streaming jobs operate in near real-time to batch and process
crime data every few seconds.
• Transformation: Includes parsing of timestamp, city classification, mapping GPS
coordinates to zones, and tagging severity based on crime type.
• Storage: Data is pushed into Hive for analytical queries or saved in efficient Parquet format
for reporting tools.
• Visualization: Power BI fetches and presents KPIs, trends, and maps.
Advantages of the Architecture:
• Offers visual segmentation of crime types reported within the last 24 hours.
o Theft: 35%
o Assault: 25%
o Burglary: 20%
o Robbery: 15%
o Others: 5%
• Helps identify dominant criminal activities in a given time window.
• Bar chart showing top five cities with the most crime cases:
o Delhi – 420 cases
o Mumbai – 390 cases
o Bengaluru – 310 cases
o Hyderabad – 290 cases
o Chennai – 250 cases
• Allows city administrators to focus on targeted regions.
3.3 Hourly Crime Trends – Line Chart
• In Power BI Service:
o Set schedule to refresh every 30 minutes or use DirectQuery for real-time
o Set alerts based on thresholds (e.g., more than 100 crimes/hour)
Advanced Tips: