0% found this document useful (0 votes)

47 views3 pages

.Analysis and Processing of Massive Data Based On Hadoop Platform A Perusal of Big Data Classification and Hadoop Technology

The document discusses big data classification and Hadoop technology. It covers topics like Hadoop components, limitations of Hadoop, and using Naive Bayes classifier for sentiment analysis on big data. It also discusses improving performance for storing massive small files in Hadoop.

Uploaded by

Precious Pearl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views3 pages

.Analysis and Processing of Massive Data Based On Hadoop Platform A Perusal of Big Data Classification and Hadoop Technology

Uploaded by

Precious Pearl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

2.

A Perusal of Big Data Classification and Hadoop

Technology
1. .ANALYSIS AND PROCESSING OF MASSIVE
DATA BASED ON HADOOP PLATFORM Big Data is used to delineate data which are very huge in
Based on the analysis of the key technical foundation size, which makes it arduous to analyze in conventional
and other existing distributed storage and calculation ways. Infrastructure:Big data storage is concerned with
researches on Hadoop cluster technology combination, storing and managing data in a scalable way, Distributed File
Systems: Hadoop File System (HDFS) offers the efficiency to store huge
as well as their business needs and the actual hardware amounts of unstructured data in a reliable way on commodity
and software capabilities, the paper proposes a large- hardware.No SQL Databases:It is the most important family of big data
scale storage technologies are No SQL database management systems. .New
SQL .
. It is urgent for enterprises to change their traditional
The Analytics:- Big data analysis provides making “sense” out of
architecture, and how to analyze these data and how to huge volumes of multifarious data that in its raw form deficiency a
make full use of data value in the face of massive data. data model to define what every element means in the context of
In massive data processing, how to mine potential value the others Summary Visualization:A current class of visualization
and transformation ability from mass data efficiently techniques has been proposed recently that process a such huge-scale data
before rendering it to visualization routines. IInfrastructure is the
and quickly will provide a basis for decision making,
foundation of Big Data architecture. Possessing the proper tools
and will become the core competitiveness of enterprises. for storing, processing and analyzing your data is vital in any Big
But with the faster and faster data generation and larger Data projectHadoop: Hadoop is really an open-source framework for
data volume, data processing technology faces more and processing, storing and analyzing data,Map
Reduce,Yarn,Spark,NoSQL,Massively Parallel Processing ,Cloud,
more challenges.
The Security and Privacy in Big Data :scenario the enormous
In this paper, as the research platform of Hadoop cloud amount of data being collected continues to rapidly grow, more
platform, for a large number of Web log data and more companies are building big data repositories to gather,
preprocessing model, the research of massive data aggregate and extract meaning from their data. Anonymization is the
procedure of altering and masking personal data in such a way that
processing performance of the Apriority algorithm
individuals cannot be re-identified. Hadoop accomplishes two tasks first
based on distributed data mining, effectively improve massive data storage and second distributed processing.
the cloud platform, make a contribution to promote the
Hadoop is a low-cost alternative for data storage over conventional data
development of large data processing technology. storage options. Hadoop uses commodity hardware to credibly store huge
Massive data is generally used to describe a lot of quantities of data.and application processing are protected against
unstructured data and semi-structured data, the data in a hardware Hadoop is really an open-source framework for processing,
storing and analyzing data
relational database for downloading to spend too much
time and money when analyzing. Massive data analysis .The Data Cleaning of Hadoop: Data Cleaner has this ability to help you
and cloud computing often linked together, because with the quality of data, ingestion of data, standardizing and monitoring of
data. We can leverage the computing power of your Hadoop cluster to
real-time analysis of large data sets requires the same as vanquish infrastructure and performance hurdles.
Map Reduce framework to assign to computer tens,
The Components of Hadoop: Hadoop Distributed File System
hundreds or even thousands of jobs .
(HDFS),Hadoop YARN ,The Hadoop,Map Reduce ,Pig,Hive
Suitable for mass data technologies, including massively ,HBase,Cassandra ,HCatalog ,Lucene ,Hama ,Crunch ,Avro ,Thrift
parallel processing (MPP) database, data mining grids, ,Drill ,Mahout,Ambari ,ZooKeeper ,Oozie ,Sqoop ,Flume ,Chukwa.
distributed file system, distributed databases, cloud
Limitations of Hadoop:Hadoop is an impressive platform for
computing platform, the 2018 4th World Conference on processing massive volumes of data with remarkable speed on
Control, Electronics and Computer Engineering lowcost commodity hardware, but it does have some momentous
(WCCECE 2018) data analysis system, limitations.
3.A Method to Improve the Performance for Storing 4. Scalable Sentiment Classification for Big Data
Massive Small Files in Hadoop Analysis Using Nave Bayes Classifier:.Sentiment class
ification is useful for the business consumer industry or
Hadoop is a popular distributed framework which online recommendation systems. Important steps of the
mainly consists of a high-performance distributed work flow.1.Instruct data parser of the format of input data
computing platform.HDFS is designed to store the and the desired output,2.Transmit source code to the name
node and execute.3.Trigger the result collector to collect
oversized files originally.It mainly consists of three
computing results once they are available on Hadoop
processes: the merging small files, the establishing
Distributed File System (HDFS) A review’s class is then
mapping index and the prefetching.
decided by the frequency of each word that appears in the
When it needs to upload the small files to HDFS, the model obtained from training data set.1) Pre-processing Raw
Data set: The data parser first pre processes all reviews into
client creates a temporary file in the memory
a common format. After the processing, each review is one
firstly.Small files are merged into the temporary
line in the data set, with document ID and sentiment
file.When the system accesses a small file, the client (positive or negative) pre_fixed All pre -processed reviews
firstly retrieves to HBase based on the small file are stored in the name node as a repository.
name.Then, according to the merged file nameThis
algorithm will operate when the small files are The WFC and the data parser work together to prepare input
uploaded, then compute the sum of the current data sets for all test trials. Sentiment Class ification Using
Had oop: The sentiment class ification is the key step in the
temporary merge file.If the sum is more than 64MB or
work flow. Once the training data and test data are ready in
the time of the temporary merge file has been in
HDFS, the WFC starts the training job to build a model. The
memory is longer than T1, then upload the merged file combining job then combines test data with the model,
to HDFS and put the index to Hbase. This algorithm is resulting an intermediate table. This automatic scheduling
executed periodically..The experiments take 100000 method can be easily applied to other programs with minor
small files as the data set, and the size of small files change of the parameters.we only have two classes of
ranges from 1 KB to 50 KB. The formats of the small documents.Training job ,Combining job ,Classify job
files are txt, doc, jpg.
.Virtual Had oop cluster is a fast and easy way to test a Had
We randomly select 20000, 40000, 60000, 80000 and oop program in the Cloud, although the performance might
100000 files from the file setFive groups of data are be weaker compared to a physical . To test the scalability of
uploaded by the improved scheme and the original Na¨ıve Bayes class ifier, the size of data set in our
experiment varies from one thousand to one million reviews
HDFS respectively. Five sets of data are uploaded by
in each class.The result statistics include the class ification
the improved scheme and the original HDFS
accuracy, the computation time and the throughput of the
respectively,uploading Speed,NameNode Memory
system.
Usage,Reading Speed.For the improved scheme, each
group randomly generates 1000 access logs, then starts
to read the small files according to the randomly
CONCLUSION believe that our work is just a beginning of
generated file name and record the time it takes. Each employing machine learning technologies in large-scale data
group randomly generates 1000 access logs, then starts sets. Future work will include using our framework for
to read the small files according to the randomly information fusion over imagery and text, distributed
generated file name and record the time it takes. Each robotics applications, and cyber analysis using cloud
group of the experiment is done for three times. computing.

the efficiency performance by considering processing

time and throughput. The system gives efficient results
even on larger data sets. The system throughput is
increased with the rise in data size

TE Computer Science - 2019 Pattern - 25062025
No ratings yet
TE Computer Science - 2019 Pattern - 25062025
103 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
BAD601 Module 2 PDF
No ratings yet
BAD601 Module 2 PDF
61 pages
Introduction To Data Science - Ii-I Course File 2025-26
No ratings yet
Introduction To Data Science - Ii-I Course File 2025-26
152 pages
BAD601 Module 2 PDF
No ratings yet
BAD601 Module 2 PDF
58 pages
Hdfs Part 1
No ratings yet
Hdfs Part 1
72 pages
Seminar Report PDF
100% (2)
Seminar Report PDF
35 pages
Unit 1,2,3,4
No ratings yet
Unit 1,2,3,4
116 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
Bda 2 - Hadoop
No ratings yet
Bda 2 - Hadoop
112 pages
Week 4 - Business Intelligent and Analytics
No ratings yet
Week 4 - Business Intelligent and Analytics
30 pages
Vector Search - GenAI+Search
No ratings yet
Vector Search - GenAI+Search
40 pages
Bda U2
No ratings yet
Bda U2
68 pages
BIGDATA
No ratings yet
BIGDATA
180 pages
Hadoop Administration
No ratings yet
Hadoop Administration
97 pages
Distributed Databases, NOSQL Systems and BIGDATA
No ratings yet
Distributed Databases, NOSQL Systems and BIGDATA
62 pages
Bsd1313 Chapter 4
No ratings yet
Bsd1313 Chapter 4
129 pages
Unit 1
No ratings yet
Unit 1
138 pages
Big Data Unit 2 (Easy Notes) Edushine Classes
No ratings yet
Big Data Unit 2 (Easy Notes) Edushine Classes
35 pages
Big Data - No SQL Databases and Related Concepts
100% (1)
Big Data - No SQL Databases and Related Concepts
101 pages
Data Science
No ratings yet
Data Science
87 pages
L8 Big Data Management en
No ratings yet
L8 Big Data Management en
58 pages
Unit 3 (Big Data Analytics)
No ratings yet
Unit 3 (Big Data Analytics)
18 pages
HADOOP
No ratings yet
HADOOP
55 pages
Hadoop Ecosystem Large PDF
No ratings yet
Hadoop Ecosystem Large PDF
229 pages
cp4152 Database Practices Unit 12 Compress
No ratings yet
cp4152 Database Practices Unit 12 Compress
72 pages
11 Lecture
No ratings yet
11 Lecture
22 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
Ashish Presentation Stage1 Modify LR
No ratings yet
Ashish Presentation Stage1 Modify LR
24 pages
Chapter Five of PHP
No ratings yet
Chapter Five of PHP
37 pages
Big Data Analytics Unit-1
No ratings yet
Big Data Analytics Unit-1
39 pages
Analyzing Big Data in Hadoop Spark
No ratings yet
Analyzing Big Data in Hadoop Spark
30 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
BIA BigData Overview
No ratings yet
BIA BigData Overview
38 pages
Hadoop - Project 5th Sem - 1
No ratings yet
Hadoop - Project 5th Sem - 1
62 pages
Cloud Computing Unit-5
No ratings yet
Cloud Computing Unit-5
22 pages
Road Map
No ratings yet
Road Map
27 pages
Big Data Introduction PDF
No ratings yet
Big Data Introduction PDF
180 pages
Fillatre Big Data
No ratings yet
Fillatre Big Data
98 pages
Lec 3
No ratings yet
Lec 3
25 pages
Data Analytics Certificate Course
No ratings yet
Data Analytics Certificate Course
15 pages
Lec 3
No ratings yet
Lec 3
28 pages
RDBMS Unit - Iii 2023
No ratings yet
RDBMS Unit - Iii 2023
27 pages
Jifs223295 2
No ratings yet
Jifs223295 2
25 pages
Experiment No - 1 Bda
No ratings yet
Experiment No - 1 Bda
10 pages
Bda QB
No ratings yet
Bda QB
18 pages
2nd Unit Bda
No ratings yet
2nd Unit Bda
30 pages
RAG With Knowledge Graph (Neo4j) - Guide On Nosql Database
No ratings yet
RAG With Knowledge Graph (Neo4j) - Guide On Nosql Database
9 pages
Big Data and Mapreduce Challenges, Opportunities and Trends
No ratings yet
Big Data and Mapreduce Challenges, Opportunities and Trends
9 pages
Big Data Analytics
No ratings yet
Big Data Analytics
12 pages
4.49 IT T. Y. B. Sc. Syllabus PDF
No ratings yet
4.49 IT T. Y. B. Sc. Syllabus PDF
71 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
44 pages
9 Hadoop PDF
No ratings yet
9 Hadoop PDF
59 pages
Chatgpt
No ratings yet
Chatgpt
7 pages
Big Data
No ratings yet
Big Data
3 pages
Scheduling For Hadoop Cluster
No ratings yet
Scheduling For Hadoop Cluster
5 pages
Cassandra Quick Guide
No ratings yet
Cassandra Quick Guide
60 pages
Faunadb: A Guide For Relational Users: Technical Whitepaper
No ratings yet
Faunadb: A Guide For Relational Users: Technical Whitepaper
26 pages
DataStax Astra
No ratings yet
DataStax Astra
3 pages
Big Data New Tricks For Econometrics
No ratings yet
Big Data New Tricks For Econometrics
27 pages
BDA Class3
No ratings yet
BDA Class3
15 pages
Testing Big Data: Camelia Rad
No ratings yet
Testing Big Data: Camelia Rad
31 pages
Big Data Analytics On Large Scale Shared Storage System: First Seminar
No ratings yet
Big Data Analytics On Large Scale Shared Storage System: First Seminar
22 pages
Hadoop in Bigdata Processing Concept
No ratings yet
Hadoop in Bigdata Processing Concept
2 pages
502 T3694 NoSQL Databases
No ratings yet
502 T3694 NoSQL Databases
2 pages
11 Most In-Demand Programming Languages in 2021 - Berkeley Boot Camps
No ratings yet
11 Most In-Demand Programming Languages in 2021 - Berkeley Boot Camps
7 pages
Hadoop by Dr. Kamal Gulati
No ratings yet
Hadoop by Dr. Kamal Gulati
33 pages
Cassandra - An Introduction
100% (1)
Cassandra - An Introduction
35 pages
Apache Hadoop and Spark:: and Use Cases For Data Analysis
No ratings yet
Apache Hadoop and Spark:: and Use Cases For Data Analysis
48 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
Mongodb Schema Validation
No ratings yet
Mongodb Schema Validation
8 pages
Important Questions-Bigdata
No ratings yet
Important Questions-Bigdata
4 pages
Big 3
No ratings yet
Big 3
2 pages
Iot U-4
No ratings yet
Iot U-4
14 pages
1 IE6700 Syllabus Fall 2021 - 01
No ratings yet
1 IE6700 Syllabus Fall 2021 - 01
5 pages
Hadoop & HDFS Final
No ratings yet
Hadoop & HDFS Final
31 pages
Big Data Analytics Using Apache Hadoop
No ratings yet
Big Data Analytics Using Apache Hadoop
33 pages
M.E.cse - R21 Syllabus
No ratings yet
M.E.cse - R21 Syllabus
20 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
Cloud Comp Techno
No ratings yet
Cloud Comp Techno
5 pages
V3i308 PDF
No ratings yet
V3i308 PDF
9 pages
Subject: Data Driven Decision Making: Apache Hadoop For Big Data
No ratings yet
Subject: Data Driven Decision Making: Apache Hadoop For Big Data
5 pages
Hadoop Chapter 1
No ratings yet
Hadoop Chapter 1
6 pages
IRJET - Big Data-A Review Study With Comp
No ratings yet
IRJET - Big Data-A Review Study With Comp
6 pages
An Insight On Big Data Analytics Using Pig Script
No ratings yet
An Insight On Big Data Analytics Using Pig Script
7 pages
Hadoop V.01
No ratings yet
Hadoop V.01
24 pages
Google Bigtable
No ratings yet
Google Bigtable
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

.Analysis and Processing of Massive Data Based On Hadoop Platform A Perusal of Big Data Classification and Hadoop Technology

Uploaded by

.Analysis and Processing of Massive Data Based On Hadoop Platform A Perusal of Big Data Classification and Hadoop Technology

Uploaded by

2.

A Perusal of Big Data Classification and Hadoop

the efficiency performance by considering processing

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.