0% found this document useful (0 votes)
9 views35 pages

Data Analytics Iot Unit5 Modified

Uploaded by

bjananika17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views35 pages

Data Analytics Iot Unit5 Modified

Uploaded by

bjananika17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Data and Analytics for IoT

MODULE
4
 As more and more devices are added to IoT networks,
the data generated by these systems becomes
overwhelming
 Traditional data management systems are simply
unprepared
for the demands of what has come to be known as “big
data.”
 The real value of IoT is not just in connecting things but
rather in the data produced by those things, the new services you
can enable via those connected things, and the business
insights that the data can reveal.
 However, to be useful, the data needs to be handled in a
way that is organized and controlled.
 Thus, a new approach to data analytics is needed for
An Introduction to Data Analytics
for IoT
 In the world of IoT, the creation of massive amounts of data
from sensors is common and one of the biggest challenges—
not only from a transport perspective but also from a
data management standpoint

 Modern jet engines are fitted with thousands of sensors


that generate a whopping 10GB of data per second

 Analyzing this amount of data in the most efficient manner


possible falls under the umbrella of data analytics
 Not all data is the same; it can be categorized and thus
analyzed in different ways.

 Depending on how data is categorized, various data analytics


tools and processing methods can be applied.

 Two important categorizations from an IoT


perspective are whether the data is structured or unstructured
and whether it is in motion or at rest.
Structured Versus Unstructured
Data
 Structured data and unstructured data are important
classifications as they typically require different toolsets from
a data analytics perspective
 Structured data means that the data follows a model or
schema that defines how the data is represented or organized,
meaning it fits well with a traditional relational database
management system (RDBMS).
 In many cases you will find structured data in a simple
tabular form—for example, a spreadsheet where data
occupies a specific cell and can be explicitly defined and
referenced
 Structured data can be found in most computing systems
and includes everything from banking transaction and
invoices to computer log files and router configurations.

 IoT sensor data often uses structured values, such as


temperature, pressure, humidity, and so on, which are
all sent in a known format.

 Structured data is easily formatted, stored, queried, and processed


 Because of the highly organizational format of structured
data, a wide array of data analytics tools are readily
available for processing this type of data.

 From custom scripts to commercial software like Microsoft


Excel and Tableau
 Unstructured data lacks a logical schema
forunderstanding and decoding the data through
traditional programming means.

 Examples of this data type include text, speech, images,


and video.

 As a general rule, any data that does not fit neatly into a
predefined data model is classified as unstructured
data
 According to some estimates, around 80% of a business’s
data is unstructured.
 Because of this fact, data analytics methods that can be
applied to unstructured data, such as cognitive
computing and machine learning, are deservedly garnering
a lot of attention.
 With machine learning applications, such as natural
language processing (NLP), you can decode
speech.
 With image/facial recognition applications, you can extract
critical information from still images and video
Smart objects in IoT networks generate both
structured and unstructured data.

 Structured data is more easily managed and processed due


to its well-defined organization.

 On the other hand, unstructured data can be harder to


deal with and typically requires very different analytics
tools for processing the data
Data in Motion Versus Data at
Rest
Data in IoT networks is either in transit (“data in motion”)
or being held or stored (“data at rest”).

 Examples of data in motion include traditional client/server


exchanges, such as web browsing and file transfers, and
email.

 Data saved to a hard drive, storage array, or USB drive is


data at rest.
 From an IoT perspective, the data from smart objects is
considered data in motion as it passes through the network en
route to its final destination.
 This is often processed at the edge, using fog computing.
 When data is processed at the edge, it may be filtered and deleted
or forwarded on for further processing and possible storage at a
fog node or in the data center.
 Data does not come to rest at the edge.
 When data arrives at the data center, it is possible to process it
in real-time, just like at the edge, while it is still in motion.
 Tools with this sort of capability, are Spark, Storm, and
Flink
 Data at rest in IoT networks can be typically found in
IoT brokers or in some sort of storage array at the
data center

 Hadoop not only helps with data processing but also


data storage
IoT Data Analytics
Overview
 The true importance of IoT data from smart objects
is realized only when the analysis of the data leads to
actionable business intelligence and insights.

 Data analysis is typically broken down by the types


of results that are produced
Types of Data Analysis Results
Four types of data analysis
results
 Descriptive:
 Descriptive data analysis tells you what is happening,
either now or in the past.
 For example, a thermometer in a truck engine
reports temperature values every second.
 From a descriptive analysis perspective, you can pull this data at
any moment to gain insight into the current operating
condition of the truck engine.
 If the temperature value is too high, then there may
be a cooling problem or the engine may be experiencing
too much load.
 Diagnostic:
 When you are interested in the “why,” diagnostic data
analysis
can provide the answer.
 Continuing with the example of the temperature sensor in the
truck engine, you might wonder why the truck engine
failed.
 Diagnostic analysis might show that the temperature
of the engine was too high, and the engine
overheated.
 Applying diagnostic analysis across the data generated by a
wide range of smart objects can provide a clear picture of why
a problem or an event occurred
 Predictive:
 Predictive analysis aims to foretell problems or
issues
before they occur.
 For example, with historical values of temperatures for the
truck engine, predictive analysis could provide an
estimate on the remaining life of certain components
in the engine.
 These components could then be proactively replaced before
failure occurs.
 Or perhaps if temperature values of the truck engine start to
rise slowly over time, this could indicate the need for an oil
change or some other sort of engine cooling
maintenance.
 Prescriptive:
 Prescriptive analysis goes a step beyond predictive and
recommends
solutions for upcoming problems.
 A prescriptive analysis of the temperature data from a truck
engine might calculate various alternatives to cost-
effectively
maintain our truck
 These calculations could range from the cost necessary for more frequent
oil
changes and cooling maintenance to installing new cooling equipment on the
engine or upgrading to a lease on a model with a more powerful
engine.
 Prescriptive analysis looks at a variety of factors and makes the
 Both predictive and prescriptive analyses are more resource
intensive and increase complexity, but the value they
provide is much greater than the value from descriptive and
diagnostic analysis
IoT Data Analytics
Challenges
Problems by using RDMS in IoT

1.Scaling Problems (performance issues, costly to


resolve, req more h/w, architechture changes)

2. Volatility of Data (change in schema)


Machine
Learning
ML is central to IoT.
 Data collected by smart objects needs to be analyzed, and
intelligent actions need to be taken based on these
analyses.
 Performing this kind of operation manually is almost impossible
(or very, very slow and inefficient).

 Machines are needed to process information fast and


react instantly when thresholds are met
 Ex: advances in self-driving vehicle--abnormal pattrn
recognition in a crowd and automated intelligent
and machine-assisted decision system
Machine Learning
Overview
 Machine learning is, in fact, part of a larger set of technologies
commonly grouped under the term artificial intelligence
(AI).

 AI includes any technology that allows a computing system to


mimic human intelligence using any technique, from
very advanced logic to basic “if-then-else” decision loops.

 Any computer that uses rules to make decisions belong


to this group
 A simple example is an app that can help you
find your parked car.
 A GPS reading of your position at regular intervals calculates
your speed.
 A basic threshold system determines whether you are driving
(for example, “if speed > 20 mph or 30 kmh, then start
calculating speed”).
 When you park and disconnect from the car
Bluetooth system, the app simply records the location
when the disconnection happens.
 This is where your car is parked.
 In more complex cases, static rules cannot be simply
inserted into the program because they require parameters
that can change or that are imperfectly understood
 A typical example is a dictation program that runs on a
computer.
The program is configured to recognize the audio pattern
of each word in a dictionary, but it does not know your
voice’s specifics—your accent, tone, speed, and so on
You need to record a set of predetermined sentences to
help the tool match well-known words to the sounds
you make when you say the words.
 This process is called machine learning.
 ML is concerned with any process where the
computer needs to receive a set of data that is
processed to help perform a task with more
efficiency.
 ML is a vast field but can be simply divided in two main
categories: supervised and unsupervised
learning
Supervised
Learning
 In supervised learning, the machine is trained with input for
which there is a known correct answer.
 For example, suppose that you are training a system to recognize
when there is a human in a mine tunnel.
 A sensor equipped with a basic camera can capture shapes
and return them to a computing system that is responsible
for determining whether the shape is a human or
something else (such as a vehicle, a pile of ore, a rock, a piece
of wood, and so on.).
 With supervised learning techniques, hundreds or thousands
of images are fed into the machine, and each image is
labelled (human or nonhuman in this case).
 This is called the training set.
 An algorithm is used to determine common parameters
and common differences between the images.
 The comparison is usually done at the scale of the entire
image, or pixel by pixel.
 Images are resized to have the same characteristics
(resolution, color depth, position of the central figure, and
so on), and each point is analyzed.
 Each new image is compared to the set of known “good images,” and a
deviation is calculated to determine how different, the new
image is from the average human image and, therefore, the
probability that what is shown is a human figure. This process is
called classification.

 After training, the machine should be able to recognize human shapes.


Before real field deployments, the machine is usually tested with
unlabeled pictures— this is called the validation or the test set,
depending on the ML system used—to verify that the recognition
level is at acceptable thresholds. If the machine does not reach the
level of success expected, more training is needed
 In other cases, the learning process is not about classifying in two
or more categories but about finding a correct value.
 For example, the speed of the flow of oil in a pipe is a
function of the size of the pipe, the viscosity of the oil, pressure, and a
few other factors.
 When you train the machine with measured values, the machine
can predict the speed of the flow for a new, and unmeasured,
viscosity.
 This process is called regression; regression predicts numeric
values, whereas classification predicts categories
Unsupervised
Learning
In some cases, supervised learning is not the best method for a
machine to help with a human decision.
 Suppose that you are processing IoT data from a
factory
manufacturing small engines.

 You know that about 0.1% of the produced engines on average


need adjustments to prevent later defects, and your task is to
identify them before they get mounted into machines and shipped
away from the factory.

 With hundreds of parts, it may be very difficult to detect the


potential defects, and it is almost impossible to train a machine to
recognize issues that may not be visible
 However, you can test each engine and record multiple
parameters, such as sound, pressure, temperature of key
parts, and so on.
 Once data is recorded, you can graph these elements in
relation to one another (for example, temperature as a
function of pressure, sound versus rotating speed
overtime).
 You can then input this data into a computer and use
mathematical functions to find groups.
 For example, you may decide to group the engines by the
sound they make at a given temperature.
 A standard function to operate this grouping, K-means clustering,
finds the mean values for a group of engines (for example,
mean value for temperature, mean frequency for sound).
 Grouping the engines this way can quickly reveal several types of
engines that all belong to the same category (for example, small
engine of chainsaw type, medium engine of lawnmower type).
 All engines of the same type produce sounds and temperatures in
the same range as the other members of the same group.
 There will occasionally be an engine in the group that
displays unusual characteristics (slightly out of
expected temperature or sound range).
 This is the engine that you send for manual evaluation.
 The computing process associated with this determination is
called unsupervised learning.
 This type of learning is unsupervised because there is not a
“good” or “bad” answer known in advance.
 It is the variation from a group behavior that allows the
computer to learn that something is different

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy