Unit 4 FIoT Notes
Unit 4 FIoT Notes
HARDWARE IMPLEMENTATION: The system that implements the Internet of Things includes clusters of
hardware components that we are familiar with. Firstly, we need a host like a Personal Computer or a mobile
phone that can be used to pass commands to a remotely operable device. As the brain of the system, we are using
a Raspberry Pi that can be used to control and obtain a desired result from a device. The “things” that we use
here are basically day-to-day objects like a bulb, a fan, a washing machine etc., Our intention is to show the
operation of the Internet of Things in a concise way. As the Raspberry Pi is more like a compact computer itself, it
cannot control “things” directly.
It needs an interface to communicate the with them. Fortunately, Raspberry Pi comes with a 40-pin GPIO set
that could efficiently be utilized to communicate with the “things”.
SOFTWARE IMPLEMENTATION: Hardware without proper software is nothing but a piece a brick. When it
comes to Raspberry Pi, an OS must be installed to control and configure it. And python scripts are to be coded to
work with the “things”. We have, a communications platform for IOT devices that enables device setup and user
interaction from mobile devices and the web, can be used to accomplish communication between Host device
and the Raspberry Pi.
Introduction to SDN
Software-defined networking (SDN) is an architecture designed to make a network more flexible and easier
to manage. SDN centralizes management by abstracting the control plane from the data forwarding function in
the discrete networking devices.
SDN elements An SDN architecture delivers a centralized, programmable network and consists of the
following:
A controller, the core element of an SDN architecture, that enables centralized management and control,
automation, and policy enforcement across physical and virtual network environments
Southbound APIs that relay information between the controller and the individual network devices (such as
switches, access points, routers, and firewalls)
Northbound APIs that relay information between the controller and the applications and policy engines, to
which an SDN looks like a single logical network device
Forwarding of packets
2.Control plane: All activities necessary to perform data plane activities but do not involve end user data
packets belong to this plane. In other words, this is the brain of the network. The activities of the control plane
include:
3.Management Plane: The management plane is used for access and management of our network
devices. For example, accessing our device through telnet, SSH or the console port.
When discussing SDN, the control and data plane are the most important to keep in mind. Here’s an illustration
of the control and data plane to help you visualize the different planes:
SDN Architecture:
SDN attempts to create network architecture that is simple, inexpensive, scalable, agile and easy to manage. The
below figure shows the SDN architecture and SDN layers in which the control plane and data plane are decoupled
and the network controller is centralized. SDN controller maintains a unified view of the network and make
configurations, management and provisioning simpler. The underlying infrastructure in SDN uses simple packet
forwarding hardware.
Network devices become simple with SDN as they do not require implementations of a large no of protocols.
Network devices receive instructions from SDN controller on how to forward packets. These devices can be
simpler and costless as they can be built from standard hardware and software components.
SDN architecture separates the network into three distinguishable layers, i.e., applications communicate with the
control layer using northbound API and control layer communicates with data plane using southbound APIs. The
control layer is considered as the brain of SDN. The intelligence to this layer is provided by centralized SDN
controller software. This controller resides on a server and manages policies and the flow of traffic throughout
the network. The physical switches in the network constitute the infrastructure layer.
SDN Architecture Diagram.
More about OPENFLOW Architecture
In an OpenFlow-enabled network, flow can be represented as a transmission control protocol (TCP) connection.
Flows can also be packets with a matching MAC address, an IP address
The OpenFlow switch has one or more flow tables. A flow table is a set of flow entries. A flow entry is used to
match and process packets. It consists of many matching fields to match packets, a set of encounters to track
packets, and instructions to apply.
The OpenFlow switch uses an OpenFlow channel to communicate with the OpenFlow controller. The OpenFlow
channel is a secure channel between the OpenFlow switch and the OpenFlow controller. It permits
communication by allowing the control plane to send instructions, receive requests, or exchange information. All
messages are encrypted, using transport layer security.
The OpenFlow channel has three types of messages. The controller/switch message is initiated by the controller
and may not require a response from the switch. The asynchronous message informs the controller about a packet
arrival, a switch state change, or an error. The symmetric message can be sent in both directions for other
purposes.
The OpenFlow controller handles flow tables inside the switch by adding and removing flow entries. It uses the
OpenFlow channel to send and receive information [9]. It can be considered as an operating system that serves
the whole network. The OpenFlow protocol is the southbound interface that permits communication between
the OpenFlow controller and the OpenFlow switch via the OpenFlow channel
The OpenFlow switch may be programmed to:
1. identify and categorize packets from an ingress port based on a various packet header field.
2. Process the packets in various ways, including modifying the header; and,
3. Drop or push the packets to a particular egress port or to the OpenFlow Controller.
The OpenFlow instructions transmitted from an OpenFlow Controller to an OpenFlow switch are structured as
“flows”. Each individual flow contains packet match fields, flow priority, various counters, packet processing
instructions, flow timeouts and a cookie.
How SDN is different from Conventional or Traditional Architecture
In Conventional architecture the control plane and data plane are coupled. Control plane is part of the network
that carries the signalling and routing message traffic while the data plane is part of the network that carries the
payload traffic.
Difference between SDN and Conventional/Traditional architecture
Difference between SDN and Traditional Network:
04. Software Defined Network is open interface. Traditional network is closed interface.
In Software Defined Network data plane and In traditional network data plane and control
05. control plane are decoupled by software. plane are mounted on same plane.
It can prioritize and block specific network It leads all packets in the same way no
07. packets. prioritization support.
08. It is easy to program as per need. It is difficult to program again and to replace
existing program as per use.
09. Cost of Software Defined Network is low. Cost of Traditional Network is high.
Its maintenance cost is lower than traditional Traditional network maintenance cost is higher
12. network. than SDN.
**Key Elements of SDN
1. Centralized Network controller:
With decoupled control and data planes and centralized network controller, the network
administrators can rapidly configure the network. SDN applications can be deployed through
programmable open APIs. This speed up innovation as the network administrator no longer
need to wait for the device vendors to embed new features in their proprietary hardware.
Each Flow entries contains the match fields, counters and set of instructions to apply
matching packets
Data Handling
In Internet of Things (IoT)
According to Techopedia, IoT “describes a future where every day physical objects will be
connected to the internet and will be able to identify themselves to other devices.”
For example, let’s consider sensor devices where these Sensors are embedded into various
devices and machines and deployed into fields. These sensors transmit sensed data to
remote servers via Internet. The Continuous data acquisition from mobile equipment,
transportation facilities, public facilities, and home appliances are produced now huge
challenge is how to handle all the data that is received from various devices and how to store
this huge data.
Types of Data
Structured data
Data that can be easily organized.
Usually stored in relational databases.
Structured Query Language (SQL) manages structured data in databases.
It accounts for only 20% of the total available data today in the
world.
Unstructured data
Information that do not possess any pre-defined model.
Traditional RDBMSs are unable to process unstructured data.
Enhances the ability to provide better insight to huge datasets.
It accounts for 80% of the total data available today in the world
Volume: Quantity of data that is generated Sources of data are added continuously
Example of volume -
1)30TB of images will be generated every ight from the Large Synoptic Survey
Telescope (LSST)
2) 72 hours of video are uploaded to YouTube every minute
Velocity: *Refers to the speed of generation of data
*Data processing time decreasing day-by-day in order to provide real-time services
*Older batch processing technology is unable to handle high velocity of data
Example of velocity –
1)140 million tweets per day on average (according to a survey conducted in 2011)
2) New York Stock Exchange captures 1TB of trade information during each trading
Session
Variety:
* Refers to the category to which the data belongs
Variability:
*Refers to data whose meaning is constantly changing.
Example:
Language processing, Hashtags, Geo-spatial data,
Multimedia, Sensor events
Veracity:
*Veracity refers to the biases, noise and abnormality in data.
*It is important in programs that involve automated decision-making, or feeding the
data into an unsupervised machine learning algorithm.
*Veracity isn’t just about data quality, it’s about data understandability
Visualization:
*Presentation of data in a pictorial or graphical format Enables decision makers to see analytics
presented visually Identify new patterns
Value:
*It means extracting useful business information from scattered data.
*Includes a large volume and variety of data
*Easy to access and delivers quality analytics that enables informed decisions
Platform-as-a-Service (PaaS)
Software-as-a-Service (SaaS)
Flow of data
The above fig shows how data flows from generation to analysis.
Enterprise data
Online trading and analysis data.
/Production and inventory data. Sales and other financial data.
IoT data
Data from industry, agriculture, traffic, transportation Medical-care data,
Data from public departments, and families.
Bio-medical data
Masses of data generated by gene sequencing.
Data Acquisition:
Data collection
Log files or record files that are automatically generated by data sources to record
activities for further analysis, that has been collected from devices like Sensory data such as
sound wave, voice, vibration, automobile, chemical, current, weather, pressure, temperature
etc, and even Complex and variety of data collection through mobile devices.
E.g. – geographical location, 2D barcodes, pictures, videos etc.
Data transmission
1. After collecting data, it will be transferred to storage system for further processing
and analysis of the data.
2. Data transmission can be categorized as – Inter-DCN transmission and Intra- DCN
transmission
Data pre-processing
1. Collected datasets suffer from noise, redundancy, inconsistency etc., thus,
preprocessing of data is necessary.
2. Pre-processing of relational data mainly follows – integration, cleaning, and redundancy
mitigation
3. Integration is combining data from various sources and provides users with a uniform
view of data.
4. Cleaning is identifying inaccurate, incomplete, or unreasonable data, and then
modifying or deleting such data.
5. Redundancy mitigation is eliminating data repetition through detection, filtering and
compression of data to avoid unnecessary transmission.
Data Storage: Data can be stored in Filesystems or
Databases
File system
1. Distributed file systems that store massive data and ensure – consistency, availability,
and fault tolerance of data.
2. GFS is a notable example of distributed file system that supports large-scale file system,
though it’s performance is limited in case of small files
3. Hadoop Distributed File System (HDFS) and Kosmoses are other notable file systems,
derived from the open-source codes of GFS.
Databases
1. Emergence of non-traditional relational databases (NoSQL) in order to deal with the
characteristics that big data possess i.e unstructured data.
2. No sql uses 3 different types of databases
1. Key value database
2. column oriented database
3. Document oriented database
Nosql doesn’t uses the table module instead data is stored in single document
file.
Data Handling Using Hadoop
Hadoop is an open-source software framework for storing data and running applications on
clusters of commodity hardware. It provides massive storage for any kind of data, enormous
processing power and the ability to handle virtually limitless concurrent tasks or jobs.
Slave
Serves read and write requests from the file system’s clients.
Performs block creation, deletion, and replication as instructed by the Name node
YARN:
1. Yet Another Resource Negotiator, as the name implies, YARN is the one who
helps to manage the resources across the clusters. In short, it performs
scheduling and resource allocation for the Hadoop System.
2. Consists of three major components i.e.
• Resource Manager
• Nodes Manager
• Application Manager
Resource manager has the privilege of allocating resources for the applications in
a system whereas Node managers work on the allocation of resources such as
CPU, memory, bandwidth per machine and later on acknowledges the resource
manager. Application manager works as an interface between the resource
manager and node manager and performs negotiations as per the requirement of
the two
MapReduce:
•By making the use of distributed and parallel algorithms, MapReduce makes it possible to
carry over the processing’s logic and helps to write applications which transform big data sets into
a manageable one.
•MapReduce makes the use of two functions
i.e. Map () and Reduce () whose task is:
1. Map () performs sorting and filtering of data and thereby organizing them in the
form of group. Map generates-KEY value pair-based result which is later on processed by
the Reduce () method.
2. Reduce (), as the name suggests does the summarization by aggregating the
mapped data. In simple, reduce () takes the output generated by Map () as input and
combines those tuples into smaller set of tuples.
Data Analytics
What is Data Analytics
“Data analytics (DA) is the process of examining data sets in
order to draw conclusions about the information they contain, increasingly with the
aid of specialized systems and software. Data analytics technologies and techniques
are widely used in commercial industries to enable organizations to make more
informed business decisions and by scientists and researchers to verify or disprove
scientific models, theories and hypotheses.”
1. Allows for the identification of important (and often mission- critical) trends
2. Helps businesses identify performance problems that require some sort of action
3. Can be viewed in a visual manner, which leads to faster and better decisions
4. Better awareness regarding the habits of potential customers
5. It can provide a company with an edge over their competitors
2. Analysis of variables
3. Data dispersion
4. Analysis of relationships between variables
5. Contingence and
correlation
6. Regression analysis
7. Statistical significance
8. Precision
9. Error limits