0% found this document useful (0 votes)

33 views17 pages

Big Data

This document discusses big data and tools for analyzing large datasets. It defines big data as large, diverse datasets that are difficult to process using traditional methods due to the volume, variety and velocity of the data. It describes three types of data: structured, unstructured and semi-structured. It also outlines five key characteristics of big data - volume, variety, velocity, veracity and value. Finally, it provides an overview of the Apache Hadoop framework and how it uses distributed processing through MapReduce to efficiently store and analyze big data across clusters of computers.

Uploaded by

Jelin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views17 pages

Big Data

Uploaded by

Jelin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

BIG DATA

ABSTRACT

A huge repository of terabytes of data is generated each day

from modern information systems and digital technologies such as
Internet of Things and cloud computing. Analysis of these massive
data requires a lot of efforts at multiple levels to extract knowledge
for decision making. Therefore, big data analysis is a current area of
research and development. The basic objective of this paper is to
explore the potential impact of big data challenges and various tools
associated with it. As a result, this article provides a platform to
explore big data at numerous stages.
BIG DATA

Big data is a collection of data from

many different sources. Big data refers
to data that is so large, fast or complex
that it's difficult or impossible to
process using traditional methods. The
act of accessing and storing large
amounts of information for analytics
has been around for a long time.
TYPES OF BIG DATA

1.Structured Data
2.Unstructured Data
3.Semi-Structured Data
STRUCTURED DATA

Any data that can be stored, accessed and

processed in the form of fixed format is
termed as a ‘structured’ data. Over the
period of time, talent in computer science
has achieved greater success in developing
techniques for working with such kind of
data (where the format is well known in
advance) and also deriving value out of it.
However, nowadays, we are foreseeing
issues when a size of such data grows to a
huge extent, typical sizes are being in the
rage of multiple zettabytes.
UNSTRUCTURED DATA

Any data with unknown form or the structure

is classified as unstructured data. In addition
to the size being huge, un-structured data
poses multiple challenges in terms of its
processing for deriving value out of it. A
typical example of unstructured data is a
heterogeneous data source containing a
combination of simple text files, images,
videos etc. Now day organizations have
wealth of data available with them but
unfortunately, they don’t know how to derive
value out of it since this data is in its raw
form or unstructured format.
SEMI-STRUCTURED DATA

Semi-structured data can

contain both the forms of data.
We can see semi-structured data
as a structured in form but it is
actually not defined.
Example of semi-structured
data is a data represented in an
XML file.
CHARACTERISTICS OF BIG DATA

 VOLUME
 VARIETY
 VELOCITY
 VERACITY
 VALUE
 Volume: the size and amounts of big data
that companies manage and analyze.

 Value: the most important “V” from the

perspective of the business, the value of big
data usually comes from insight discovery
and pattern recognition that lead to more
effective operations, stronger customer
relationships and other clear and
quantifiable business benefits.
 Variety: the diversity and range of different data
types, including unstructured data, semi-structured
data and raw data
 Velocity: the speed at which companies receive,
store and manage data – e.g., the specific number of
social media posts or search queries received within
a day, hour or other unit of time
 Veracity: the “truth” or accuracy of data and
information assets, which often determines
executive-level confidence
TOOLS FOR BIG DATA

 Apache Hadoop and MapReduce

 Apache Mahout
 Apache Spark
 Dryad
 Storm
 Apache Drill
 Jaspersoft
 Splunk
APACHE HADOOP

Apache Hadoop is a collection of open-source

software utilities that facilitates using a network of
many computers to solve problems involving
massive amounts of data and computation. It
provides a software framework for distributed storage
and processing of big data using the MapReduce
programming model.
Apache Hadoop is an open source framework
that is used to efficiently store and process large
datasets ranging in size from gigabytes to petabytes
of data. Instead of using one large computer to
store and process the data, Hadoop allows
clustering multiple computers to analyze massive
datasets in parallel more quickly.
Hadoop consists of four main modules:
Hadoop Distributed File System (HDFS) – A
distributed file system that runs on standard or
low-end hardware. HDFS provides better data
throughput than traditional file systems, in
addition to high fault tolerance and native
support of large datasets.
Yet Another Resource Negotiator (YARN) –
Manages and monitors cluster nodes and
resource usage. It schedules jobs and tasks.
 MapReduce – A framework that helps
programs do the parallel computation on data.
The map task takes input data and converts it
into a dataset that can be computed in key
value pairs. The output of the map task is
consumed by reduce tasks to aggregate output
and provide the desired result.
 Hadoop Common – Provides common Java
libraries that can be used across all modules.
THANK YOU……..
ANY QUERIES ? ? ?

Big Data Analytics 1-5
100% (1)
Big Data Analytics 1-5
63 pages
BASH Guide - Joseph DeVeau
100% (2)
BASH Guide - Joseph DeVeau
227 pages
Bda Unit 1 - Mam
No ratings yet
Bda Unit 1 - Mam
198 pages
Big Data Analytics - Lecture Slides
No ratings yet
Big Data Analytics - Lecture Slides
72 pages
$RM5TSDQ
No ratings yet
$RM5TSDQ
70 pages
Big Data
No ratings yet
Big Data
110 pages
Unit 1 BDA
No ratings yet
Unit 1 BDA
43 pages
Big Data Study 1
No ratings yet
Big Data Study 1
77 pages
Hadoop PPT
No ratings yet
Hadoop PPT
25 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
Big Data-Intro
No ratings yet
Big Data-Intro
31 pages
Unit 1
No ratings yet
Unit 1
89 pages
Big Data Seminar
100% (2)
Big Data Seminar
27 pages
G12 Phy Sci P2 June 2025 Marking Guidelines
No ratings yet
G12 Phy Sci P2 June 2025 Marking Guidelines
13 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Wa0000.
No ratings yet
Wa0000.
35 pages
4 A Review Paper On Big Data and Hadoop
No ratings yet
4 A Review Paper On Big Data and Hadoop
3 pages
Experiment No. 7: Numerical Aperture of The Optical Fiber
No ratings yet
Experiment No. 7: Numerical Aperture of The Optical Fiber
4 pages
Big Data (Unit 1)
No ratings yet
Big Data (Unit 1)
32 pages
Bda M1
No ratings yet
Bda M1
111 pages
Scripta Minoa Part II
No ratings yet
Scripta Minoa Part II
450 pages
BDA ppt1
No ratings yet
BDA ppt1
45 pages
Seminar Big Data Hadoop
No ratings yet
Seminar Big Data Hadoop
28 pages
Unit 5
No ratings yet
Unit 5
63 pages
Big Data Analytics (VN) 1
No ratings yet
Big Data Analytics (VN) 1
98 pages
BDA NOTES With Questions Included
No ratings yet
BDA NOTES With Questions Included
108 pages
Bda - Unit 1
No ratings yet
Bda - Unit 1
32 pages
Taming Big Data
No ratings yet
Taming Big Data
268 pages
DAY 6 PATHFit 1
No ratings yet
DAY 6 PATHFit 1
34 pages
Unit 1 BDT
No ratings yet
Unit 1 BDT
27 pages
Experiment No - 1 Bda
No ratings yet
Experiment No - 1 Bda
10 pages
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
Module 1. 16974328175990
No ratings yet
Module 1. 16974328175990
119 pages
Big Data UNIT I
No ratings yet
Big Data UNIT I
91 pages
BIG Data1
No ratings yet
BIG Data1
49 pages
Unit 1
No ratings yet
Unit 1
20 pages
Module 1
No ratings yet
Module 1
54 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
151 pages
Bda CHP1
No ratings yet
Bda CHP1
83 pages
Biggdata
No ratings yet
Biggdata
24 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Module I Big Data
No ratings yet
Module I Big Data
7 pages
Unit 1 Bigdata
No ratings yet
Unit 1 Bigdata
30 pages
01 - Introduction To Big Data Analytics PDF
No ratings yet
01 - Introduction To Big Data Analytics PDF
37 pages
FM Modulators: Experiment 7
100% (2)
FM Modulators: Experiment 7
17 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
Bigdata
No ratings yet
Bigdata
12 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
03 Corpo Rigido-2d
No ratings yet
03 Corpo Rigido-2d
91 pages
Big Data Overview
No ratings yet
Big Data Overview
18 pages
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
No ratings yet
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
6 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
CAPE Computer Science Unit 1 - Proposal
No ratings yet
CAPE Computer Science Unit 1 - Proposal
2 pages
Big Data: Presented By, Nishaa R
No ratings yet
Big Data: Presented By, Nishaa R
24 pages
Big Data and Hadoop Self Notes
No ratings yet
Big Data and Hadoop Self Notes
16 pages
Grade 8 and 9 Workbook
No ratings yet
Grade 8 and 9 Workbook
155 pages
01 - Introduction To Big Data Analytics PDF
No ratings yet
01 - Introduction To Big Data Analytics PDF
38 pages
Government College of Engineering and Technology Jammu
No ratings yet
Government College of Engineering and Technology Jammu
20 pages
Data Mining With Bigdata
No ratings yet
Data Mining With Bigdata
30 pages
Upsc Cms Guru Answerkey2022p1
No ratings yet
Upsc Cms Guru Answerkey2022p1
45 pages
Ce-1254 - Surveying Ii
No ratings yet
Ce-1254 - Surveying Ii
9 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
Big Data Analytics Using Apache Hadoop
No ratings yet
Big Data Analytics Using Apache Hadoop
33 pages
2 Staad Analysis Output
No ratings yet
2 Staad Analysis Output
7 pages
The Duties and Responsibilities of A Garment Merchandiser
100% (9)
The Duties and Responsibilities of A Garment Merchandiser
10 pages
BIG DATA Research PDF
No ratings yet
BIG DATA Research PDF
9 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
DLL Mapeh-5 Q2
No ratings yet
DLL Mapeh-5 Q2
99 pages
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
No ratings yet
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
9 pages
VH Cbse-Gr-8 Mathematics Sample QP Half-Yearly
No ratings yet
VH Cbse-Gr-8 Mathematics Sample QP Half-Yearly
10 pages
Big Data Mining: A Challenge and How To Manage It: Csa Deptt. Pdmce Jitender Csa Deptt. Pdmce
No ratings yet
Big Data Mining: A Challenge and How To Manage It: Csa Deptt. Pdmce Jitender Csa Deptt. Pdmce
3 pages
215 PDF
No ratings yet
215 PDF
7 pages
An Approach To Analysis and Classification of Data From Big Data by Using Apriori Algorithm
No ratings yet
An Approach To Analysis and Classification of Data From Big Data by Using Apriori Algorithm
4 pages
The Elements and Principles of Art
No ratings yet
The Elements and Principles of Art
4 pages
Morphology of Flowering Plants Learn Cbse
No ratings yet
Morphology of Flowering Plants Learn Cbse
6 pages
Choosing A Course Booklet 2022
No ratings yet
Choosing A Course Booklet 2022
9 pages
f389 Saw Filter
No ratings yet
f389 Saw Filter
9 pages
Unit 2 - School - Keys
No ratings yet
Unit 2 - School - Keys
15 pages
Economic Incentive For Intermittent Operation of Air Separation Plants With Variable Power Cost
No ratings yet
Economic Incentive For Intermittent Operation of Air Separation Plants With Variable Power Cost
8 pages
1 Text For Reading Comprehension
100% (1)
1 Text For Reading Comprehension
3 pages
Buddhist Animal Release Practices - Shiu, Stokes
No ratings yet
Buddhist Animal Release Practices - Shiu, Stokes
17 pages
Tax Quizzer
No ratings yet
Tax Quizzer
3 pages
IPR Gandhinagar Apprentice (Diploma Degree) Recruitment 2020RIJADEJAcom
No ratings yet
IPR Gandhinagar Apprentice (Diploma Degree) Recruitment 2020RIJADEJAcom
3 pages
Int J Mental Health Nurs - 2003 - Happell - Burnout and Job Satisfaction A Comparative Study of Psychiatric Nurses From
No ratings yet
Int J Mental Health Nurs - 2003 - Happell - Burnout and Job Satisfaction A Comparative Study of Psychiatric Nurses From
9 pages
Bhumika Kasar
No ratings yet
Bhumika Kasar
1 page
CH8568DOCSIS 3.1 Wireless Voice Gateway
No ratings yet
CH8568DOCSIS 3.1 Wireless Voice Gateway
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Big Data

Uploaded by

Big Data

Uploaded by

BIG DATA

A huge repository of terabytes of data is generated each day

Big data is a collection of data from

Any data that can be stored, accessed and

Any data with unknown form or the structure

Semi-structured data can

 Value: the most important “V” from the

 Apache Hadoop and MapReduce

Apache Hadoop is a collection of open-source

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.