Structured and Unstructured Data: Learning Outcomes
Structured and Unstructured Data: Learning Outcomes
Learning Outcomes:
• Understand Structured and Unstructured Data
•Explain the examples, growth, characteristic, and storage technique and, storage and
management tool
•Define the key difference
ABSTRACTION
WHAT IS STRUCTURED DATA?
Structured data is the type of data that is well-organized and accurately formatted. This
data exists in a format of relational databases (RDBMSs), meaning the information is stored in
tables with rows and columns that are connected. In this way, structured data is arranged and
recorded neatly, so it can be easily found and processed. As long as data fits within the
structure of RDBMSs, we can easily search for specific information and single out the
relationships between its pieces. Such data can only be used for its intended purpose. On top of
that, structured data doesn’t normally require much storage space.
For analytical purposes, you can use data warehouses. DWs are central data storages used
by companies for data analysis and reporting. There is a special programming language used for
handling relational databases and warehouses called SQL, which stands for Structured Query
Language and was developed back in the 1970s by IBM.
Structured data examples. Structured data is familiar to most of us. Google Sheets and
Microsoft Office Excel files are the first things that spring to mind concerning structured data
examples. This data can comprise both text and numbers, such as employee names, contacts,
ZIP codes, addresses, credit card numbers, etc.
The typical structured data example: Excel spreadsheet that contains information about
customers and purchases.
Pretty much everyone has dealt with booking a ticket via one of the airline reservation
systems or withdrawing cash using an ATM. During these operations, we don’t normally think of
what kind of applications we deal with and what types of data they process. However, these are
the systems that typically use structured data and relational databases as well.
The thing with unstructured data is that traditional methods and tools can’t be used to
analyze and process it. One of the ways to manage unstructured data is to opt for non-
relational databases, also known as NoSQL.
If there’s a need to keep data in its raw native formats for further analysis, storage
repositories called data lakes will be the way to go. A data lake is a storage repository or system
meant to store huge volumes of data in its natural/raw formats.
Taking into account the whole variety of file formats of unstructured data, it comes as no
surprise that it makes up more than 80 percent of all data. Given this, companies ignoring
unstructured data are left far behind as they don’t get enough valuable information.
Unstructured data examples. There is a wide array of forms that make up unstructured data
such as email, text files, social media posts, video, images, audio, sensor data, and so on.
The travel agency Facebook post: an example of unstructured data
As an example, we can take social media posts of a travel agency or all posts for that matter.
Each post contains some metrics like shares or hashtags that can be quantified and structured.
However, the posts themselves belong to the category of unstructured data. What we’re trying
to say here is, it will take some time, effort, knowledge, and special software tools to analyze
the posts and collect useful insights. If an agency posts new travel tours and wants to know the
audience’s reactions (comments), they will need to examine the post in its native format (view
the post via social media app or use advanced techniques like sentiment analysis).
•STRUCTURED DATA 20% OF ALL DATA •UNSTRUCTURED DATA 80% OF ALL DATA
Unstructured data is growing at an astronomical pace. It is growing many times faster than the
structured data. About 20% of the total existing data is unstructured data.
SOURCE
With the growth of technology, new sources of data have emerged in the last few years. This
data is in large volumes and pose a challenge in terms of processing it.
The sources of data are divided into two categories :
• Computer or machine-generated
• Human-generated
Computer or machine-generated :
Machine-generated data generally refers to the kind of data that is created by a machine
without human intervention.
CHARACTERISTICS :
Each data type behaves differently when weighed against a set of qualities or characteristics.
When one approaches data from the point of view of different characteristics such as flexibility,
robustness, accessibility etc. one begins to understand how each data type differs.
Since by nature both data types are distinct from each other, they will fare completely
differently with respect to these characteristics. For instance, when it comes to structured data,
scaling DB schema is difficult but for unstructured data, it is highly scalable. Hence, until and
unless we understand the different characteristics and compare the two data types against
these characteristics, it would not be possible to fully grasp the difference between structured
and unstructured data.
Therefore, it would be advisable to take a look at how the characteristics of two data types and
the way they differ in the context of these characteristics.
Structured data Unstructured data
Flexibility Schema dependent rigorous Absence of schema, Very
schema flexible
Scalability Scaling DB schema is difficult Highly scalable
Robustness Robust
Query Performance Structured query allows Only textual query possible
complex joins
Accessibility Easy to access Hard to access
Association Organized Scattered and dispersed
Analysis Efficient to analysis Additional preprocessing is
needed
Appearance Formally defined Free- From
STORAGE TECHNIQUES
This type of data storage is used in the context of storage-area network (SAN)
environments. In such environments, data is stored in volumes which is also referred to as
blocks.
An arbitrary identifier is assigned to every block. It allows the block to be stored and
retrieved but there would be no metadata providing further context.
Virtual machine file system volumes and structured database storage are the use cases of
block storage.
When it comes to block storage, raw storage volumes are created on the device. With the
aid of a server-based system, the volumes are connected and each one is treated as an
individual hard drive.
Unstructured data storage technique :
Object storage :
This particular technique is basically a way of storing, organizing and accessing data on
disk. The difference however is that it is done so in a more scalable and cost-effective manner.
This kind of storage system makes it possible to retain huge volumes of unstructured data.
When it comes to storing photos on Facebook, songs on Spotify, or files in collaboration
services such as Dropbox, object storage come into play.
Each object incorporates data, a lot of metadata and a singularly unique identifier. This
kind of storage can be done at different levels such as device level, system level and interface
level.
Since objects are robust, this kind of storage works well for long-term storage of data
archives, analytics data and service provider storage with SLAs linked with data delivery.
Here’s how you can store and mange data using some of the different tools :
ORACLE RDBMS
•Oracle database has the distinction of being the universally used object-relational database
management software. Oracle Corporation produces and markets it.
•Oracle is quite secure. It does not occupy huge amount of space. It is good at supporting large
databases. It also reduces CPU time to process data.
•Microsoft SQL Server is a relational database management system. As the name indicates, it
was created by Microsoft.
•As a database server, it is basically a software product whose primary function is to store and
retrieve data that is requested by other software applications. These applications may run on
the same computer or some other computer on some other network. It could be on the
Internet.
MYSQL
Customer surveys are not enough for sentiment analysis and businesses need to go
beyond the same to work out new ways to study customer behavior. Unstructured data can be
of immense help in this regard.
However, you need to bear in mind that unstructured data is basically different and does
not fit into any of the traditional tools like relational databases. Searching it based on the
existing algorithms is not quite a viable exercise.
Let’s say if it was easy or possible to process it, it would become structured data and then
it would become easy to derive actionable intelligence from it in the same way. But it is not so.
However, there are some tools that you can use to store and manage unstructured data :
HADOOP
•Since it is an open source software framework, Hadoop has distributed storage and distributed
processing framework. Considering the size and complexity of unstructured data, such a system
is quite important for unstructured data analysis.
•It is basically distributed file systems that makes use of object-based architecture. In it, file
metadata is stored in metadata servers whereas file data is stored in object storage servers. The
file system client software which is in place gets into interaction with the distinct servers and
gets them to present a full file system to users and applications.
APPLICATION:
1. Explain what is structured and unstructured data?
2. Discuss how structured data and unstructured data different from each other?
3. Give at least 3 examples of unstructured data.
4. Elaborate at least 1 storage and management tool of structured data.
5. What is the storage technique of structured and unstructured data?
REFERENCES:
Pickell, P.(2018). Structured vs Unstructured Data – What's the Difference?
https://learn.g2.com/structured-vs-unstructured-data
https://www.altexsoft.com/blog/structured-unstructured-data/
https://resources.m-files.com/blog/what-is-structured-data-vs-unstructured-data-3
https://www.xplenty.com/blog/structured-vs-unstructured-data-key-
differences/#:~:text=Structured%20data%20is%20clearly%20defined,stored%20in
%20its%20native%20format.&text=Structured%20data%20exists%20in
%20predefined,in%20a%20variety%20of%20formats.