Unit 1 BDA
Unit 1 BDA
In addition to these three original Vs, three others that are often mentioned
in relation to harnessing the power of big data: veracity, variability, and
value.
Veracity:
Big data can be messy, noisy, and error-prone, which makes it
difficult to control the quality and accuracy of the data. Large
datasets can be unwieldy and confusing, while smaller datasets could
present an incomplete picture. The higher the veracity of the data,
the more trustworthy it is.
Variability:
The meaning of collected data is constantly changing, which can lead
to inconsistency over time. These shifts include not only changes in
context and interpretation but also data collection methods based on
the information that companies want to capture and analyze.
Value:
It’s essential to determine the business value of the data you collect.
Big data must contain the right data and then be effectively analyzed
in order to yield insights that can help drive decision-making.
Sources of Big Data
These data come from many sources like
UNSTRUCTURED DATA
Types of Big Data
All data cannot be stored in the same way. The methods for data storage
can be accurately evaluated after the type of data has been identified
1.Structured data
Structured data is data whose elements are addressable for effective
analysis. It has been organized into a formatted repository that is typically a
database. It concerns
all data which can be stored in database in a table with rows and columns.
They have relational keys and can easily be mapped into pre-designed
fields. Today, those data are most processed in the development and
simplest way to manage information. Example: Relational data.
2.Semi-Structured data
Semi-structured data is information that does not reside in a
relational database but that has some organizational properties that make
it easier to analyze. With some processes, you can store them in the
relation database (it could be very hard for some kind of semi-structured
data), but Semi-structured exist to ease space. Example: XML data.
3.Unstructured data
Unstructured data is a data which is not organized in a predefined
manner or does not have a predefined data model, thus it is not a good
fit for a mainstream
relational database. So for Unstructured data, there are alternative
platforms for storing and managing, it is increasingly prevalent in IT
systems and is used by organizations in a variety of business intelligence
and analytics applications. Example: Word, PDF, Text, Media logs.
In today’s world, there are a lot of data. Big companies utilize those data
for their business growth. By analyzing this data, the useful decision can be
made in various cases as discussed below:
1.Tracking Customer Spending Habit, Shopping Behavior:
In big retails store (like Amazon, Walmart, Big Bazar etc.)
management team has to keep data of customer’s spending habit (in
which product customer spent, in which brand they wish to spent, how
frequently they spent), shopping behavior, customer’s most liked product
(so that they can keep those products in the store). Which product is being
searched/sold most, based on that data, production/collection rate of that
product get fixed.
Banking sector uses their customer’s spending behavior-related
data so that they can provide the offer to a particular customer to buy his
particular liked product by using bank’s credit or debit card with discount
or cashback. By this way, they can send the right offer to the right person
at the right time.
2.Recommendation:
By tracking customer spending habit, shopping behavior, Big retails
store provide a recommendation to the customer. E-commerce site like
Amazon, Walmart, Flipkart does product recommendation. They track what
product a customer is searching, based on that data they recommend that
type of product to that customer.
As an example, suppose any customer searched bed cover on
Amazon. So, Amazon got data that customer may be interested to buy bed
cover. Next time when that customer will go to any google page,
advertisement of various bed covers will be seen. Thus, advertisement of
the right product to the right customer can be sent.
YouTube also shows recommend video based on user’s previous
liked, watched video type. Based on the content of a video, the user is
watching, relevant advertisement is shown during video running. As an
example suppose someone watching a tutorial video of Big data, then
advertisement of some other big data course will be shown during that
video.
3.Smart Traffic System:
Data about the condition of the traffic of different road, collected
through camera kept beside the road, at entry and exit point of the city,
GPS device placed in the vehicle (Ola, Uber cab, etc.). All such data are
analyzed and jam-free or less jam way, less time taking ways are
recommended. Such a way smart traffic system can be built in the city by
Big data analysis. One more profit is fuel consumption can be reduced.
4.Secure Air Traffic System:
At various places of flight (like propeller etc) sensors present. These
sensors capture data like the speed of flight, moisture, temperature, other
environmental condition. Based on such data analysis, an environmental
parameter within flight are set up and varied.
By analyzing flight’s machine-generated data, it can be estimated
how long the machine can operate flawlessly when it to be
replaced/repaired.
5.Auto Driving Car:
Big data analysis helps drive a car without human interpretation. In
the various spot of car camera, a sensor placed, that gather data like the
size of the surrounding car, obstacle, distance from those, etc. These data
are being analyzed, then various calculation like how many angles to
rotate, what should be speed, when to stop, etc carried out. These
calculations help to take action automatically.
6.Virtual Personal Assistant Tool:
Big data analysis helps virtual personal assistant tool (like Siri in
Apple Device, Cortana in Windows, Google Assistant in Android) to provide
the answer of the various question asked by users. This tool tracks the
location of the user, their local time, season, other data related to question
asked, etc. Analyzing all such data, it provides an answer.
As an example, suppose one user asks “Do I need to take
Umbrella?”, the tool collects data like location of the user, season and
weather condition at that location, then analyze these data to conclude if
there is a chance of raining, then provide the answer.
7.IoT:
Manufacturing company install IOT sensor into machines to collect operational
data. Analyzing such data, it can be predicted how long machine will work
without any problem when it requires repairing so that company can take
action before the situation when machine facing a lot of issues or gets
totally down. Thus, the cost to replace the whole machine can be saved.
In the Healthcare field, Big data is providing a significant contribution.
Using big data tool, data regarding patient experience is collected and is
used by doctors to give better treatment. IoT device can sense a symptom
of probable coming disease in the human body and prevent it from giving
advance treatment. IoT Sensor placed near-patient, new-born baby
constantly keeps track of various health condition like heart bit rate, blood
presser, etc. Whenever any parameter crosses the safe limit, an alarm sent
to a doctor, so that they can take step remotely very soon.
8.Education Sector:
Online educational course conducting organization utilize big data to
search candidate, interested in that course. If someone searches for
YouTube tutorial video on a subject, then online or offline course provider
organization on that subject send ad online to that person about their
course.
9.Energy Sector:
Smart electric meter read consumed power every 15 minutes and
sends this read data to the server, where data analyzed and it can be
estimated what is the time in a day when the power load is less throughout
the city. By this system manufacturing unit or housekeeper are suggested
the time when they should drive their heavy machine in the night time
when power load less to enjoy less electricity bill.
10. Media and Entertainment Sector:
Media and entertainment service providing company like Netflix,
Amazon Prime, Spotify do analysis on data collected from their users. Data
like what type of video, music users are watching, listening most, how long
users are spending on site, etc are collected and analyzed to set the next
business strategy.
BIG DATA TECHNOLOGIES
Big data technologies can be categorized into four main types: data
storage, data mining, data analytics, and data visualization [2]. Each of
these is associated with certain tools, and you’ll want to choose the right
tool for your business needs depending on the type of big data technology
required.
1.Data storage
Big data technology that deals with data storage has the capability to fetch,
store, and manage big data. It is made up of infrastructure that allows users
to store the data so that it is convenient to access. Most data storage
platforms are compatible with other programs. Two commonly used tools
are Apache Hadoop and MongoDB.
Apache Hadoop: Apache is the most widely used big data tool. It is
an open- source software platform that stores and processes big data
in a distributed computing environment across hardware clusters.
This distribution allows for faster data processing. The framework is
designed to reduce bugs or faults, be scalable, and process all data
formats.
MongoDB: MongoDB is a NoSQL database that can be used to store
large volumes of data. Using key-value pairs (a basic unit of data),
MongoDB categorizes documents into collections. It is written in C,
C++, and JavaScript, and is one of the most popular big data
databases because it can manage and store unstructured data with
ease.
2.Data mining
Data mining extracts the useful patterns and trends from the raw data. Big
data technologies such as Rapidminer and Presto can turn unstructured
and structured data into usable information.
Rapidminer: Rapidminer is a data mining tool that can be used to
build predictive models. It draws on these two roles as strengths, of
processing and preparing data, and building machine and deep
learning models. The end-to- end model allows for both functions to
drive impact across the organization [3].
Presto: Presto is an open-source query engine that was originally
developed by Facebook to run analytic queries against their large
datasets. Now, it is available widely. One query on Presto can
combine data from multiple sources within an organization and
perform analytics on them in a matter of minutes.
3.Data analytics
In big data analytics, technologies are used to clean and transform data
into information that can be used to drive business decisions. This next
step (after data
mining) is where users perform algorithms, models, and predictive
analytics using tools such as Apache Spark and Splunk.
Apache Spark: Spark is a popular big data tool for data analysis
because it is fast and efficient at running applications. It is faster than
Hadoop because it uses random access memory (RAM) instead of
being stored and processed in batches via MapReduce . Spark
supports a wide variety of data analytics tasks and queries.
Splunk: Splunk is another popular big data analytics tool for deriving
insights from large datasets. It has the ability to generate graphs,
charts, reports, and dashboards. Splunk also enables users to
incorporate artificial intelligence (AI) into data outcomes.
4.Data visualization
Finally, big data technologies can be used to create stunning visualizations
from the data. In data-oriented roles, data visualization is a skill that is
beneficial for presenting recommendations to stakeholders for business
profitability and operations—to tell an impactful story with a simple graph.
Tableau: Tableau is a very popular tool in data visualization because
its drag- and-drop interface makes it easy to create pie charts,
bar charts, box plots, Gantt charts, and more. It is a secure platform
that allows users to share visualizations and dashboards in real time.
Looker: Looker is a business intelligence (BI) tool used to make
sense of big data analytics and then share those insights with other
teams. Charts, graphs, and dashboards can be configured with a
query, such as monitoring weekly brand engagement through social
media analytics.
Big data refers to the data which Cloud computing refers to the
01. is huge in size and also on demand availability of
increasing rapidly with respect to computing resources over
time. internet.
benefit
Click on Subject/Paper under Semester to enter.
- II - HS3252 Mathematics Environmental
Professional Sciences and
- MA3354
Sustainability -
English - Statistics and GE3451
I - HS3152 Numerical Digital Principles
Methods - MA3251 and Computer
Organization Probability and
Matrices and Statistics -
Engineering - CS3351 MA3391
Semester
Calculus Graphics
- MA3151 Database Design
Semester
- GE3251
Semester
Semester
and Management - Operating Systems -
AD3391 AL3452
Engineering Physics for
2nd
- PH3256
4th
1st
Deep Learning -
AD3501
Embedded
Data and
Semester
Semester
Elective-4 Management
-
CCS334 Elective-5 Elective
Elective 1 Elective-6
Elective 2
All Computer Engg [ B.E., M.E., ] (Click on Subjects to
Subjects - enter)
Programming in C Computer Operating Systems
Networks
Programming and Data Programming and Problem Solving and
Structures I Data Python
Structure II Programming
Database Management Computer Analog and Digital
Systems Architecture Communication
Design and Analysis of Microprocessors and Object Oriented
Algorithms Microcontrollers Analysis
and Design
Software Engineering Discrete Internet Programming
Mathematics
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal
Processing
Artificial Intelligence Software Testing Grid and Cloud
Computing
Data Ware Housing and Cryptography and Resource
Data Mining Network Security Management
Techniques
Service Oriented Embedded and Real Multi - Core
Architecture Time Systems Architectures and
Programming
Probability and Queueing Physics for Transforms and
Theory Information Partial
Science Differential
Equations
Technical English Engineering Engineering Chemistry
Physics
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Problem Solving Environmental Science
Electronics and and Python and Engineering
Measurement Engineering Programming