100% found this document useful (1 vote)
215 views

Structured and Unstructured Data: Learning Outcomes

This document discusses the differences between structured and unstructured data. Structured data is organized into databases with rows and columns, while unstructured data is not organized in a predefined way. Structured data makes up about 20% of all data and includes things like databases and spreadsheets. Unstructured data accounts for 80% of all data and includes text, images, videos and more. Both types of data are growing, but unstructured data is growing much faster due to new sources like social media.

Uploaded by

Benz Choi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
215 views

Structured and Unstructured Data: Learning Outcomes

This document discusses the differences between structured and unstructured data. Structured data is organized into databases with rows and columns, while unstructured data is not organized in a predefined way. Structured data makes up about 20% of all data and includes things like databases and spreadsheets. Unstructured data accounts for 80% of all data and includes text, images, videos and more. Both types of data are growing, but unstructured data is growing much faster due to new sources like social media.

Uploaded by

Benz Choi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Structured and Unstructured Data

Learning Outcomes:
• Understand Structured and Unstructured Data
•Explain the examples, growth, characteristic, and storage technique and, storage and
management tool
•Define the key difference

ABSTRACTION
WHAT IS STRUCTURED DATA?
Structured data is the type of data that is well-organized and accurately formatted. This
data exists in a format of relational databases (RDBMSs), meaning the information is stored in
tables with rows and columns that are connected. In this way, structured data is arranged and
recorded neatly, so it can be easily found and processed. As long as data fits within the
structure of RDBMSs, we can easily search for specific information and single out the
relationships between its pieces. Such data can only be used for its intended purpose. On top of
that, structured data doesn’t normally require much storage space.

For analytical purposes, you can use data warehouses. DWs are central data storages used
by companies for data analysis and reporting. There is a special programming language used for
handling relational databases and warehouses called SQL, which stands for Structured Query
Language and was developed back in the 1970s by IBM.

Structured data examples. Structured data is familiar to most of us. Google Sheets and
Microsoft Office Excel files are the first things that spring to mind concerning structured data
examples. This data can comprise both text and numbers, such as employee names, contacts,
ZIP codes, addresses, credit card numbers, etc.
The typical structured data example: Excel spreadsheet that contains information about
customers and purchases.

Pretty much everyone has dealt with booking a ticket via one of the airline reservation
systems or withdrawing cash using an ATM. During these operations, we don’t normally think of
what kind of applications we deal with and what types of data they process. However, these are
the systems that typically use structured data and relational databases as well.

Other Examples of Structured Data:


•Meta-data (Time and date of creation, File size, Author etc.)
•Library Catalogues (date, author, place, subject, etc.)
•Census records (birth, income, employment, place etc.)
•Economic data (GDP, PPI, ASX etc.)
•Facebook like button
•Phone numbers (and the phone book)
•Databases (structuring fields)

WHAT IS UNSTRUCTURED DATA?


It makes sense that if the definition of structured data implies a neat organization of
components in a predetermined manner, the definition of unstructured data will be the
opposite. The pieces of such data aren’t structured in a pre-defined way, meaning data is stored
in its native formats.

The thing with unstructured data is that traditional methods and tools can’t be used to
analyze and process it. One of the ways to manage unstructured data is to opt for non-
relational databases, also known as NoSQL.

If there’s a need to keep data in its raw native formats for further analysis, storage
repositories called data lakes will be the way to go. A data lake is a storage repository or system
meant to store huge volumes of data in its natural/raw formats.

Taking into account the whole variety of file formats of unstructured data, it comes as no
surprise that it makes up more than 80 percent of all data. Given this, companies ignoring
unstructured data are left far behind as they don’t get enough valuable information.

Unstructured data examples. There is a wide array of forms that make up unstructured data
such as email, text files, social media posts, video, images, audio, sensor data, and so on.
The travel agency Facebook post: an example of unstructured data

As an example, we can take social media posts of a travel agency or all posts for that matter.
Each post contains some metrics like shares or hashtags that can be quantified and structured.
However, the posts themselves belong to the category of unstructured data. What we’re trying
to say here is, it will take some time, effort, knowledge, and special software tools to analyze
the posts and collect useful insights. If an agency posts new travel tours and wants to know the
audience’s reactions (comments), they will need to examine the post in its native format (view
the post via social media app or use advanced techniques like sentiment analysis).

Other Examples of Unstructured Data :


•Text files (Word processing, spreadsheets, presentations etc.)
•Email body
•Social Media ( Data from Facebook, Twitter, LinkedIn)
•Website (YouTube, Instagram, photo sharing sites )
•Mobile data ( Text messages )
•Communications ( Chat, IM, phone recordings, collaboration software )
•Media ( MP3, digital photos, audio and video files )
GROWTH

•STRUCTURED DATA 20% OF ALL DATA •UNSTRUCTURED DATA 80% OF ALL DATA

Unstructured data is growing at an astronomical pace. It is growing many times faster than the
structured data. About 20% of the total existing data is unstructured data.

SOURCE
With the growth of technology, new sources of data have emerged in the last few years. This
data is in large volumes and pose a challenge in terms of processing it.
The sources of data are divided into two categories :
• Computer or machine-generated
• Human-generated

Computer or machine-generated :
Machine-generated data generally refers to the kind of data that is created by a machine
without human intervention.

Machine Generated Structured Data sources Machine Generated Unstructured Data


sources
Sensor data: When you talk about radio Satellite images: When you take into
frequency ID tags, smart meters, medical consideration the weather data or the data
devices, and Global Positioning System data, that government agencies procure through its
you are basically referring to machine satellite surveillance imagery, you are talking
generated structured data. Supply chain about machine generated unstructured data.
management and inventory control is what Google Earth and similar mechanisms aptly
gets the companies interested in this. illustrate the point.
Web log data: When systems and mechanisms Scientific data: All scientific data that includes
such as servers, applications and networks etc. seismic imagery, atmospheric data and high
work, they soak in different types of data energy Physics so and so forth stand for
regarding whatever is the operation. It means machine generated unstructured data.
enormous piles of data of diverse kinds. Based
on this data, you can deal with service-level
agreements or predict security breaches.
Point-of-sale data: When the digital Photographs and video: When machines
transactions take place over the counter of a capture images and video for the purposes of
shopping mall, the machine captures a lot of security, surveillance and traffic, the data that
data. This is machine generated structured is produced is machine generated
data related to barcode and other relevant unstructured data.
details of the product etc.
Financial data: Computer programs are used Radar or sonar data: This includes vehicular,
with respect to financial data a lot more now. meteorological, and oceanographic seismic
Processes are automated with the help of profiles.
these programs. Take the case of stock-
trading. It carries structured data such as the
company symbol and dollar value. A part of
this data is machine generated and some of it
is human generated.
Human-generated :
This is data that humans, in interaction with computers, supply.

Human Generated Structured Data sources Human Generated Unstructured Data


sources
Input data: When a human user enters input Text internal to your company: This is the
such as name, age, income, non-free-form type of data that is restricted to a given
survey responses etc. into a computer, it is company such as documents, logs, survey
human generated structured data. results, emails etc. Such enterprise
Companies can find this type of data quite information forms a big part of such
useful in studying customer behavior. unstructured text information in the world
Clickstream data: This is the type of data Social media data: This kind of data is
generated through the process of a user generated when human users interact with
clicking a link on a website. Businesses like social media platforms such as Facebook,
this type of data because it allows them to Twitter, Flickr, YouTube, LinkedIn etc.
study customer behavior and purchase
patterns.
Gaming-related data: When a human user Mobile data: This type of data includes
makes a move in a game on a virtual information such as text messages and
platform, it produces a piece of information. location information.
How users navigate a gaming portfolio is a
source of a lot of interesting data.
Website content: This type of data is derived
from a site delivering unstructured content
such as YouTube, Flickr, Instagram etc.

CHARACTERISTICS :
Each data type behaves differently when weighed against a set of qualities or characteristics.
When one approaches data from the point of view of different characteristics such as flexibility,
robustness, accessibility etc. one begins to understand how each data type differs.

Since by nature both data types are distinct from each other, they will fare completely
differently with respect to these characteristics. For instance, when it comes to structured data,
scaling DB schema is difficult but for unstructured data, it is highly scalable. Hence, until and
unless we understand the different characteristics and compare the two data types against
these characteristics, it would not be possible to fully grasp the difference between structured
and unstructured data.

Therefore, it would be advisable to take a look at how the characteristics of two data types and
the way they differ in the context of these characteristics.
Structured data Unstructured data
Flexibility Schema dependent rigorous Absence of schema, Very
schema flexible
Scalability Scaling DB schema is difficult Highly scalable
Robustness Robust
Query Performance Structured query allows Only textual query possible
complex joins
Accessibility Easy to access Hard to access
Association Organized Scattered and dispersed
Analysis Efficient to analysis Additional preprocessing is
needed
Appearance Formally defined Free- From

STORAGE TECHNIQUES

Structured data storage technique :


Block storage / block level storage :

This type of data storage is used in the context of storage-area network (SAN)
environments. In such environments, data is stored in volumes which is also referred to as
blocks.

An arbitrary identifier is assigned to every block. It allows the block to be stored and
retrieved but there would be no metadata providing further context.

Virtual machine file system volumes and structured database storage are the use cases of
block storage.
When it comes to block storage, raw storage volumes are created on the device. With the
aid of a server-based system, the volumes are connected and each one is treated as an
individual hard drive.
Unstructured data storage technique :
Object storage :

This particular technique is basically a way of storing, organizing and accessing data on
disk. The difference however is that it is done so in a more scalable and cost-effective manner.

This kind of storage system makes it possible to retain huge volumes of unstructured data.
When it comes to storing photos on Facebook, songs on Spotify, or files in collaboration
services such as Dropbox, object storage come into play.

Each object incorporates data, a lot of metadata and a singularly unique identifier. This
kind of storage can be done at different levels such as device level, system level and interface
level.

Since objects are robust, this kind of storage works well for long-term storage of data
archives, analytics data and service provider storage with SLAs linked with data delivery.

DATA STORAGE AND MANAGEMENT TOOL

STRUCTURED DATA STORAGE AND MANAGEMENT TOOL :


Structured Query Language (SQL), a programming language devised for managing and
querying data in relational database management systems, is often used in managing
structured data.

Here’s how you can store and mange data using some of the different tools :
ORACLE RDBMS

•Oracle database has the distinction of being the universally used object-relational database
management software. Oracle Corporation produces and markets it.
•Oracle is quite secure. It does not occupy huge amount of space. It is good at supporting large
databases. It also reduces CPU time to process data.

MICROSOFT SQL SERVER

•Microsoft SQL Server is a relational database management system. As the name indicates, it
was created by Microsoft.
•As a database server, it is basically a software product whose primary function is to store and
retrieve data that is requested by other software applications. These applications may run on
the same computer or some other computer on some other network. It could be on the
Internet.

MYSQL

•Compared to the Microsoft product, MySQL is an open-source relational database


management system (RDBMS).
•MySQL is capable of powering the intricate and powerful web, e-commerce, SaaS and Online
Transaction Processing (OLTP) applications.

Unstructured Data storage and MANAGEMENT TOOL :


Unstructured data contains a lot of information that can be leveraged. Businesses can use
the information contained in emails, social media postings etc. to derive operational
intelligence, marketing intelligence etc.

Customer surveys are not enough for sentiment analysis and businesses need to go
beyond the same to work out new ways to study customer behavior. Unstructured data can be
of immense help in this regard.
However, you need to bear in mind that unstructured data is basically different and does
not fit into any of the traditional tools like relational databases. Searching it based on the
existing algorithms is not quite a viable exercise.

Let’s say if it was easy or possible to process it, it would become structured data and then
it would become easy to derive actionable intelligence from it in the same way. But it is not so.

However, there are some tools that you can use to store and manage unstructured data :

HADOOP

•Since it is an open source software framework, Hadoop has distributed storage and distributed
processing framework. Considering the size and complexity of unstructured data, such a system
is quite important for unstructured data analysis.

AMAZON WEB SERVICE S3


•Amazon web service S3 is useful because it provides cloud storage in order to allow you to
leverage an object-storage architecture.

IBM SPECTRUM SCALE

•It is basically distributed file systems that makes use of object-based architecture. In it, file
metadata is stored in metadata servers whereas file data is stored in object storage servers. The
file system client software which is in place gets into interaction with the distinct servers and
gets them to present a full file system to users and applications.

APPLICATION:
1. Explain what is structured and unstructured data?
2. Discuss how structured data and unstructured data different from each other?
3. Give at least 3 examples of unstructured data.
4. Elaborate at least 1 storage and management tool of structured data.
5. What is the storage technique of structured and unstructured data?
REFERENCES:
Pickell, P.(2018). Structured vs Unstructured Data – What's the Difference?

https://learn.g2.com/structured-vs-unstructured-data

Derda, M.(2020). Structured vs. Unstructured Data: What’s the Difference?

https://www.altexsoft.com/blog/structured-unstructured-data/

Hollander, G.(2019). What is Structured Data vs. Unstructured Data?

https://resources.m-files.com/blog/what-is-structured-data-vs-unstructured-data-3

Smallcombe, M.(2020). Structured vs Unstructured Data: 5 Key Differences

https://www.xplenty.com/blog/structured-vs-unstructured-data-key-

differences/#:~:text=Structured%20data%20is%20clearly%20defined,stored%20in

%20its%20native%20format.&text=Structured%20data%20exists%20in

%20predefined,in%20a%20variety%20of%20formats.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy