Open navigation menu

Scribd

0% found this document useful (0 votes)

57 views3 pages

Taxi Trip Analysis Using Hive

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views3 pages

Taxi Trip Analysis Using Hive

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Taxi Trip Analysis Using Hive

This case study requires analyzing a large dataset using Hive for exploratory data analysis. Here’s a
structured approach to tackle the tasks:

1. Setup and Table Creation

Create the Table

Run the following DDL script in Hive to create the table schema for storing the taxi data:

CREATE TABLE IF NOT EXISTS taxidata (

vendor_id STRING,

pickup_datetime STRING,

dropoff_datetime STRING,

passenger_count INT,

trip_distance DECIMAL(9,6),

pickup_longitude DECIMAL(9,6),

pickup_latitude DECIMAL(9,6),

rate_code INT,

store_and_fwd_flag STRING,

dropoff_longitude DECIMAL(9,6),

dropoff_latitude DECIMAL(9,6),

payment_type STRING,

fare_amount DECIMAL(9,6),

extra DECIMAL(9,6),

mta_tax DECIMAL(9,6),

tip_amount DECIMAL(9,6),

tolls_amount DECIMAL(9,6),

total_amount DECIMAL(9,6),

trip_time_in_secs INT

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

STORED AS TEXTFILE

TBLPROPERTIES ("skip.header.line.count"="1");

2. Load the Data

Place the CSV file (2018_Yellow_Taxi_Trip_Data.csv) in Hadoop's file system (HDFS).

Steps to Load Data into HDFS and Hive

1. Upload the CSV file into HDFS:

2. hdfs dfs -put 2018_Yellow_Taxi_Trip_Data.csv /user/hive/taxidata/

3. Load the data into the Hive table:

4. LOAD DATA INPATH '/user/hive/taxidata/2018_Yellow_Taxi_Trip_Data.csv'

5. INTO TABLE taxidata;

3. Run Basic Queries to Verify the Data

Run simple queries to validate that the data is loaded correctly:

1. Check Row Count:

2. SELECT COUNT(*) FROM taxidata;

3. Preview Data:

4. SELECT * FROM taxidata LIMIT 10;

4. Analysis Queries

4.1. Total Number of Trips

SELECT COUNT(*) AS total_trips FROM taxidata;

4.2. Total Revenue Generated

SELECT SUM(total_amount) AS total_revenue FROM taxidata;

4.3. Fraction Paid for Tolls

SELECT SUM(tolls_amount) / SUM(total_amount) AS toll_fraction FROM taxidata;

4.4. Fraction Paid as Tips

SELECT SUM(tip_amount) / SUM(total_amount) AS tip_fraction FROM taxidata;

4.5. Average Trip Amount

SELECT AVG(total_amount) AS average_trip_amount FROM taxidata;

4.6. Average Trip Distance

SELECT AVG(trip_distance) AS average_trip_distance FROM taxidata;

4.7. Number of Different Payment Types

SELECT COUNT(DISTINCT payment_type) AS num_payment_types FROM taxidata;

4.8. Metrics for Each Payment Type

SELECT

payment_type,

AVG(total_amount) AS average_fare,

AVG(tip_amount) AS average_tip,

AVG(mta_tax) AS average_tax

FROM taxidata

GROUP BY payment_type;

4.9. Hourly Revenue Analysis

To find the hour of the day with the highest average revenue, extract the hour from
pickup_datetime:

SELECT

HOUR(TO_TIMESTAMP(pickup_datetime)) AS hour_of_day,

AVG(total_amount) AS average_revenue

FROM taxidata

GROUP BY HOUR(TO_TIMESTAMP(pickup_datetime))

ORDER BY average_revenue DESC

LIMIT 1;

5. Notes

• Ensure the file is formatted correctly and accessible in HDFS before loading.

• Verify Hive connectivity and configurations (like Hadoop and Hive services being active).

• Use TO_TIMESTAMP in Hive 2.2 or above for date-time manipulation.

• Always test queries on smaller datasets if needed for performance tuning.

This workflow should give you insights into the taxi dataset while using Hive effectively!

You might also like

TmForum ODA
No ratings yet
TmForum ODA
42 pages
Akshay Thite Hive
100% (1)
Akshay Thite Hive
4 pages
Cssip
No ratings yet
Cssip
6 pages
Taxis Management System
No ratings yet
Taxis Management System
25 pages
Case Study Instructions
No ratings yet
Case Study Instructions
2 pages
R12 Payments
No ratings yet
R12 Payments
35 pages
2014 Uconnect 84A 84AN Multimedia 2nd
No ratings yet
2014 Uconnect 84A 84AN Multimedia 2nd
263 pages
User Acceptance Test
No ratings yet
User Acceptance Test
3 pages
Hadoop (Hive) - NYC Yellow Taxi Case Study
No ratings yet
Hadoop (Hive) - NYC Yellow Taxi Case Study
2 pages
3 Analyze NYC Taxi Data Using Spark Pool
No ratings yet
3 Analyze NYC Taxi Data Using Spark Pool
3 pages
EMR Workshop - Lab 2
No ratings yet
EMR Workshop - Lab 2
4 pages
Project Report Edit
No ratings yet
Project Report Edit
20 pages
Taxi Fare Team 09
No ratings yet
Taxi Fare Team 09
25 pages
DriveNXT Project Schema
No ratings yet
DriveNXT Project Schema
5 pages
TDIA2 TP3 Spark
No ratings yet
TDIA2 TP3 Spark
2 pages
Taxi Trips Analysis Project 1682332303
100% (2)
Taxi Trips Analysis Project 1682332303
28 pages
NYC Taxi Data Analysis
No ratings yet
NYC Taxi Data Analysis
8 pages
How To Convert Casuals To Members?": Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic"
No ratings yet
How To Convert Casuals To Members?": Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic"
18 pages
Record of Experiments: Cloud Application Development Lab
No ratings yet
Record of Experiments: Cloud Application Development Lab
10 pages
N205 N212 DBMS Mini Project City Taxi Call Center
No ratings yet
N205 N212 DBMS Mini Project City Taxi Call Center
19 pages
BC Ca1,2
No ratings yet
BC Ca1,2
31 pages
Tourist Data Analysis
No ratings yet
Tourist Data Analysis
68 pages
Cab Fare Prediction Report by Abhinav Jha
No ratings yet
Cab Fare Prediction Report by Abhinav Jha
41 pages
Syed Shoieb Ahmed Mid Term Exam
No ratings yet
Syed Shoieb Ahmed Mid Term Exam
6 pages
Portfolio Project Solution Sheet
No ratings yet
Portfolio Project Solution Sheet
16 pages
CS Luxuary Car Project
No ratings yet
CS Luxuary Car Project
29 pages
Data Science Lab Group Submission
No ratings yet
Data Science Lab Group Submission
13 pages
You Have Two Datasets - Trips - TXT Which Records Tri...
No ratings yet
You Have Two Datasets - Trips - TXT Which Records Tri...
6 pages
Green and White Simple Illustrative Data Analytics Presentation
No ratings yet
Green and White Simple Illustrative Data Analytics Presentation
8 pages
Schema Design
No ratings yet
Schema Design
5 pages
Big Data Hadoop and Spark Developer: Certification Project
No ratings yet
Big Data Hadoop and Spark Developer: Certification Project
9 pages
IP Project On Car Rental System in India
100% (4)
IP Project On Car Rental System in India
33 pages
BigQuery Lab
No ratings yet
BigQuery Lab
13 pages
Test Case - FullStack Developer v.2
No ratings yet
Test Case - FullStack Developer v.2
2 pages
Big Data With Hadoop & Spark - VII
No ratings yet
Big Data With Hadoop & Spark - VII
3 pages
HOL Hive
No ratings yet
HOL Hive
85 pages
Taxi Service
No ratings yet
Taxi Service
18 pages
Group27 CS661 Report
No ratings yet
Group27 CS661 Report
3 pages
OLA Analysis Report
No ratings yet
OLA Analysis Report
5 pages
Group 7 - Data Mining Report
No ratings yet
Group 7 - Data Mining Report
18 pages
Database System For Taxi Service: Databse Design (Cs 6360.002) - Final Project
No ratings yet
Database System For Taxi Service: Databse Design (Cs 6360.002) - Final Project
18 pages
Int 421
No ratings yet
Int 421
2 pages
HIVE Codes
No ratings yet
HIVE Codes
6 pages
Major Project Mid Sem
No ratings yet
Major Project Mid Sem
9 pages
Car Analytics Solution
No ratings yet
Car Analytics Solution
4 pages
Pyspark File Commands and Theory
No ratings yet
Pyspark File Commands and Theory
29 pages
Advanced SQL 100 Questions
No ratings yet
Advanced SQL 100 Questions
3 pages
Creating Tables in Hive
No ratings yet
Creating Tables in Hive
3 pages
Interview Task - Locale
No ratings yet
Interview Task - Locale
5 pages
Assignment 1
No ratings yet
Assignment 1
1 page
DS B17 C3 CaseStudy ShyamDalsaniya IrannaChatti
No ratings yet
DS B17 C3 CaseStudy ShyamDalsaniya IrannaChatti
20 pages
Hive Lecture Notes
100% (1)
Hive Lecture Notes
17 pages
DWDM Lab Manual Finaldil
No ratings yet
DWDM Lab Manual Finaldil
147 pages
4 Analyze NYC Data Using Dedicated SQL Pool
No ratings yet
4 Analyze NYC Data Using Dedicated SQL Pool
5 pages
Uber SQL Interview Questions
No ratings yet
Uber SQL Interview Questions
8 pages
1 3 UnderstandingHadoop UseCases
No ratings yet
1 3 UnderstandingHadoop UseCases
18 pages
Hive Documet
No ratings yet
Hive Documet
33 pages
1.loading Data Into Mysql
No ratings yet
1.loading Data Into Mysql
21 pages
Aarohan Subedi
No ratings yet
Aarohan Subedi
38 pages
Comp Project
No ratings yet
Comp Project
32 pages
ML All Prints
No ratings yet
ML All Prints
25 pages
Banking Problem Database
No ratings yet
Banking Problem Database
5 pages
Designing XSD diagrams vol1
From Everand
Designing XSD diagrams vol1
Jose Luis Arias Cobreros
No ratings yet
A Star: Fundamentals and Applications
From Everand
A Star: Fundamentals and Applications
Fouad Sabry
No ratings yet
Daniel Pyld Resume
No ratings yet
Daniel Pyld Resume
1 page
100 Shell Script Examples
100% (2)
100 Shell Script Examples
62 pages
Abloy Protec
No ratings yet
Abloy Protec
8 pages
Associate Cloud Engineer Exam - Free Actual Q&As, Page 4 - ExamTopics
No ratings yet
Associate Cloud Engineer Exam - Free Actual Q&As, Page 4 - ExamTopics
3 pages
PBL Report
No ratings yet
PBL Report
10 pages
Rhod RGB: Quick Installation Guide
No ratings yet
Rhod RGB: Quick Installation Guide
12 pages
Challenges of Large-Scale Augmented Reality On Smartphones: Clemens Arth Dieter Schmalstieg
No ratings yet
Challenges of Large-Scale Augmented Reality On Smartphones: Clemens Arth Dieter Schmalstieg
4 pages
Compact HMI 800 5.0 Getting Started
100% (1)
Compact HMI 800 5.0 Getting Started
176 pages
80305a PDF
No ratings yet
80305a PDF
7 pages
ALV Reports
No ratings yet
ALV Reports
70 pages
Ok55-Fb - Dindllb Eng Web Mfl70504378
No ratings yet
Ok55-Fb - Dindllb Eng Web Mfl70504378
46 pages
A Presentation and A Demo On Real-Time Edge Analytics
No ratings yet
A Presentation and A Demo On Real-Time Edge Analytics
38 pages
883 Question Paper
No ratings yet
883 Question Paper
2 pages
Tentec Omc Manual
No ratings yet
Tentec Omc Manual
47 pages
Final Essay 15% ENGLISH FOR ENGINEERS V
No ratings yet
Final Essay 15% ENGLISH FOR ENGINEERS V
2 pages
QGIS 3.22 ServerUserGuide en
No ratings yet
QGIS 3.22 ServerUserGuide en
108 pages
Lenovo k43 - K43a Quanta Le8 Daole8mb8e0 Rev1a
No ratings yet
Lenovo k43 - K43a Quanta Le8 Daole8mb8e0 Rev1a
45 pages
APNA-380 Instruction Manual (E)
No ratings yet
APNA-380 Instruction Manual (E)
179 pages
DX200 High-Speed Ethernet Server Function
No ratings yet
DX200 High-Speed Ethernet Server Function
120 pages
Preparation: ERICSSON Node B Commissioning and Integration
No ratings yet
Preparation: ERICSSON Node B Commissioning and Integration
37 pages
4 Grading: Dia Marie R. Lalican
No ratings yet
4 Grading: Dia Marie R. Lalican
9 pages
Quiz
No ratings yet
Quiz
5 pages
Privacy Tools v19.84 Secure Open List: Ubuntu Touch: Android Alternative For Phones and Tablets
No ratings yet
Privacy Tools v19.84 Secure Open List: Ubuntu Touch: Android Alternative For Phones and Tablets
84 pages
Hardware User Manual
No ratings yet
Hardware User Manual
309 pages
Clojure Book
No ratings yet
Clojure Book
63 pages
CR and DR SW Config Matrix (6K0323 Rev AJ)
No ratings yet
CR and DR SW Config Matrix (6K0323 Rev AJ)
149 pages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy