0% found this document useful (0 votes)
51 views19 pages

2. Classification of Digital Data

The document classifies digital data into three categories: structured, semi-structured, and unstructured data, detailing their characteristics and sources. It provides examples of each type of data generated in scenarios such as airports and shopping malls, highlighting how structured data follows predefined schemas, while semi-structured and unstructured data do not. Additionally, it discusses methods for handling unstructured data and prompts readers to consider various data types generated in real-world situations.

Uploaded by

Pranshav Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views19 pages

2. Classification of Digital Data

The document classifies digital data into three categories: structured, semi-structured, and unstructured data, detailing their characteristics and sources. It provides examples of each type of data generated in scenarios such as airports and shopping malls, highlighting how structured data follows predefined schemas, while semi-structured and unstructured data do not. Additionally, it discusses methods for handling unstructured data and prompts readers to consider various data types generated in real-world situations.

Uploaded by

Pranshav Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Classification of Digital Data

Digital Data
Structured data
• When do we say that the data is structured??
• Sources of structured data
Working with structured data
• Insert/update/delete
• Indexing
• Transaction processing
• Security
• Scalability
Semi-structured data
• Self Describing Structure
• It does not conform to the data models that one typically associates
with relational databases or any other form of data tables
• It uses tags to segregate semantic elements
Sources of semi-structured data
Unstructured data
• Does not conform to any predefined data model
• The structure can be unpredictable.
Sources of unstructured data
How to deal with unstructured data?
Inclass#exercise
Solution

Email
Let’s Discuss
• Why email body in structured category?
• Where should we put CCTV footage?
You are at city shopping mall. You see few people are browsing
the items. Some of them are looking for discounts. Some of them
are filling feedback form. Few people are at billing counter. You
may consider other things and events happening in this
scenario. Think for while on the different types of data
generated. Mention each of them with proper logic
Imagine you are at a busy airport. You see passengers checking in at kiosks, waiting in
lounges, shopping at duty-free stores, and interacting with airport staff. There are numerous
flights arriving and departing, announcements being made, and security checks happening.
Consider the different types of data generated in this scenario. Mention each of them with
proper logic.
Location: Airport

Activities Observed:
✓ Passengers checking in
✓ Passengers waiting in lounges
✓ Shopping at duty-free stores
✓ Interacting with airport staff
✓ Announcements being made
✓ Security checks
Types of Data Generated:
❑ Structured Data:
• Flight Information: Flight schedules, gate numbers, departure and arrival times, and
passenger lists are stored in databases with predefined schemas.
• Example: A table in a relational database with columns like Flight_Number,
Departure_Time, Arrival_Time, Gate_Number.
• Passenger Information: Details such as passenger names, ticket numbers, passport
numbers, and seat assignments.
• Example: A database table with columns like Passenger_Name, Ticket_Number,
Passport_Number, Seat_Number.
❑ Semi-Structured Data:
• Shopping Receipts: Electronic receipts from duty-free stores which might be in XML or
JSON format, containing structured data but not following a strict schema.
• Example: A JSON document containing fields like {"ReceiptID": "12345", "Store":
"Duty-Free", "Items": [{"ItemName": "Perfume", "Price": "50 USD"}]}.
• Check-in Kiosk Data: Logs from check-in kiosks that include structured elements
(timestamps, transaction IDs) along with unstructured elements (error messages, user
inputs).
• Example: An XML file with elements like
<CheckIn><PassengerID>123</PassengerID><Status>Success</Status></CheckIn>.
❑ Unstructured Data:
• Announcements: Audio recordings of flight announcements, security warnings, and other public
address system communications.
• Example: An audio file containing the announcement: "Flight AB123 to New York is now
boarding at Gate 12."
• Passenger Interactions: Video recordings from security cameras capturing interactions between
passengers and staff, or passengers waiting in lounges.
• Example: A video file showing a passenger asking for assistance at an information desk.
• Social Media Posts: Tweets or Facebook posts from passengers sharing their travel experiences or
complaints about airport services.
• Example: A tweet saying, "Stuck at the airport due to flight delay #frustrated".
❑ Structured Data:
• Baggage Tracking Information
• Employee Schedules
• Parking Lot Occupancy Records
❑ Semi-Structured Data:
• Customer Feedback Forms in JSON format
• Inventory Lists from Duty-Free Stores in XML format
• Email Communications with Passengers
❑ Unstructured Data:
• Photographs taken at the airport
• Handwritten Notes from Staff
You are at university library. You see few students browsing through the
library catalog on kiosk. You see the working of librarians and other
staff to issue/return books, magazines, and journals. Few students are
using the e-library service, too. Which type of data is generated in this
scenario? Support your answer by considering big data

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy