0% found this document useful (0 votes)

42 views1 page

Case 11 - Big Data and The Elephant 2022 Valacich IS Today

Uploaded by

Trần Nhật Vy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views1 page

Case 11 - Big Data and The Elephant 2022 Valacich IS Today

Uploaded by

Trần Nhật Vy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

420 CHAPTER 9 • DEvElOPING AND ACqUIRING INfORmATION SySTEmS

CASE 2 Big Data and the Elephant

It may seem obvious that the amount of data in So, how do we store and do something known outside Google. As a result, many
the world keeps getting larger and larger, but useful with this much data being generated so other projects implemented a similar
what is really meant by Big Data? Physical fast? Traditional mechanisms for storing and approach. One of these projects, an open
size of the storage device? Number of records? searching large data sets don’t scale well. In source effort, became very popular and
How do we effectively store it, and how do we the early 2000s, Google started utilizing a new widely used. The project was named Hadoop
approach doing useful things with it? approach that allowed for cheap hardware to after a stuffed elephant belonging to the pri-
Big Data can be defined using three attri- be used to easily store and process large data mary developer’s son. Hadoop implements
butes: volume, velocity, and variety (often, the sets. The key to the approach is to expect fail- the core functions of GFS as the Hadoop
term veracity is added to refer to the often ure of any given component of the system and Distributed File System (HDFS) and
unknown origins of the data). Volume refers to to design the system such that it can easily and MapReduce as Hadoop MapReduce.
traditional measures of size—how many bytes. rapidly recover when such failures do occur. Because it is open source technology and can
A byte is an encoding of information using 8 The Google File System (GFS) spreads data be freely incorporated into anyone’s soft-
bits—on or off states, like 1s and 0s. Large across multiple servers and incorporates ware system, it has been widely deployed. In
numbers of bytes are quantified using the same redundancy such that if any one server goes keeping with the elephant name and logo, the
prefixes as metric units. Kilo- means “a thou- down, the others can pick up where it left off suite of tools that grew up around Hadoop is
sand,” mega- “a million,” giga- “a billion,” and no data get lost. To do something useful called Mahout (a term for an elephant han-
and so on. A single e-mail may be a few kilo- with this much data, however, requires even dler). Mahout and Hadoop were incorporated
bytes of data including the message itself and more ingenuity. Google developed an into the Apache Software Foundation suite
the associated address information. A word approach to split queries into multiple steps of open source technologies.
processor document may be anywhere from a that can be distributed across multiple servers, Google has moved past MapReduce as its
few kilobytes to a few megabytes depending much as the data files themselves get split up primary Big Data processing model, and
on how long the contents are and if there are in GFS. First, the input data files are filtered Mahout has also moved on to more capable
any embedded images. A high-definition and sorted into chunks (referred to as “map- processing models that are enabled by or
movie takes up around 5 gigabytes when ping”). Then, these chunks are distributed improvements on Hadoop and MapReduce.
encoded and stored on a Blu-ray disc. An across multiple servers for processing. Each Apache Pig is a high-level language for devel-
example of the high velocity of Big Data is the chunk gets processed into a smaller output set oping and implementing Hadoop programs.
amount of video submitted to YouTube— (the “reduce” step). The algorithm is designed Hive is a data warehouse platform built on top
every minute more than 500 additional hours in such a way that these steps can be per- of Hadoop and HDFS. New approaches to dis-
of video are uploaded. The variety aspect of formed simultaneously on each chunk on dif- tributed processing beyond MapReduce that
Big Data represents all the different types of ferent servers at the same time. Just like GFS, are being pursued by the Apache Foundation
digital data we encounter—everything from if any given server fails, the others can include Spark and Flink—both provide resil-
e-mails, to homework assignments, to fitness smoothly recover and keep processing. These ient distributed data set functionality and
tracker data, to photos, status updates, and VR smaller output sets are then combined back implement modern programming architectures
videos. Veracity refers to the quality of the together, and the process is repeated until a using languages like Java and Scala. These
data and the ability to generate value from solution is reached. These steps lead to the technologies are a key part of the behind-the-
using accurate data. As the Internet of Things algorithm’s name: MapReduce. scenes infrastructure that makes our modern
expands and more and more devices are con- MapReduce was a proprietary technol- world work. Any platform, like an app store, a
nected all the time, the volume, velocity, and ogy that belonged to Google and was a key fitness app, or a social network, must deal with
variety of data will only continue to increase, part of its competitive advantage in the early the challenges of volume, velocity, and vari-
and ensuring veracity will be crucial to deriv- 2000s. However, the underlying computer ety. Technologies like Hadoop, Mahout, and
ing value from Big Data. science research was published openly and their kin are what make it all possible.

Questions
9-43. How much data do you generate on a daily, weekly, month- Big Data. (2020, July 1). In Wikipedia, The Free Encyclopedia. Retrieved
ly, and annual basis? Think about every digital encounter July 16, 2020, from https://en.wikipedia.org/w/index.php?title=Big_
you have and what gets stored. data&oldid=965530277
9-44. What advantages are there to storing and processing Big MapReduce. (2020, June 29). In Wikipedia, The Free Encyclopedia. Retrieved
Data? How can companies and individuals benefit? July 16, 2020, from https://en.wikipedia.org/w/index.php?title=MapReduce&
9-45. What skills might help you to pursue opportunities in the oldid=965086280
area of Big Data? Smith, K. (2020, February 21). 57 fascinating and incredible YouTube statistics.
Based on: Brandwatch. Retrieved July 16, 2020, from https://www.brandwatch.com/blog/
youtube-stats
Apache Hadoop. (2020, June 20). In Wikipedia, The Free Encyclopedia. Re-
trieved July 16, 2020, from https://en.wikipedia.org/w/index.php?title=Apache_
Hadoop&oldid=963539845

M09_VALA8115_09_GE_C09.indd 420 07/04/22 2:07 AM

Inventory Management System
100% (2)
Inventory Management System
28 pages
Best Practices in Data Migration
100% (4)
Best Practices in Data Migration
13 pages
Quiz in English
No ratings yet
Quiz in English
60 pages
BP B1 CB Videoscript For MEL
No ratings yet
BP B1 CB Videoscript For MEL
17 pages
BTL Tiếng Anh 3 HVNH
No ratings yet
BTL Tiếng Anh 3 HVNH
36 pages
Giao Trinh Market Leader (Script) Pages 147 - 154
No ratings yet
Giao Trinh Market Leader (Script) Pages 147 - 154
8 pages
Chapter 11 - Information Systems Management (Thuyet Trinh)
No ratings yet
Chapter 11 - Information Systems Management (Thuyet Trinh)
42 pages
NCU3e BrE B1 Assessment EndofUnit Test Unit02
No ratings yet
NCU3e BrE B1 Assessment EndofUnit Test Unit02
6 pages
Mock Test Mock Test 1
No ratings yet
Mock Test Mock Test 1
9 pages
Unit 11 Advertising
No ratings yet
Unit 11 Advertising
47 pages
Unit 1: International Trade: B. Vocabulary
No ratings yet
Unit 1: International Trade: B. Vocabulary
39 pages
Chap2 Part 1 Extracted Pages From Business Ethics Now - by Andrew Ghillyer - 5th Ed-McGraw-Hill-2013
No ratings yet
Chap2 Part 1 Extracted Pages From Business Ethics Now - by Andrew Ghillyer - 5th Ed-McGraw-Hill-2013
11 pages
Unit 3 - Market Economies
No ratings yet
Unit 3 - Market Economies
5 pages
H. Huong - Đề Toeic - HK 1-2019 - No2
No ratings yet
H. Huong - Đề Toeic - HK 1-2019 - No2
11 pages
Unit 2 How Economics Is Organized
No ratings yet
Unit 2 How Economics Is Organized
13 pages
Trac nghiem môn Chuẩn mực báo cáo tài chính quốc tế
No ratings yet
Trac nghiem môn Chuẩn mực báo cáo tài chính quốc tế
33 pages
Team 12 - Examining The Impact of Gig Economy On Work Quality - A Case Study of Vietnam
No ratings yet
Team 12 - Examining The Impact of Gig Economy On Work Quality - A Case Study of Vietnam
61 pages
Vat or Not
No ratings yet
Vat or Not
3 pages
(UVTN 2022) Aptitude Test
No ratings yet
(UVTN 2022) Aptitude Test
14 pages
Tiểu luận lý thuyết tài chính tiền tệ
No ratings yet
Tiểu luận lý thuyết tài chính tiền tệ
12 pages
Testbank - Legal - Final
No ratings yet
Testbank - Legal - Final
29 pages
Crisis Planning At Livestrong Foundation: Trường Đại Học Tôn Đức Thắng Khoa Quản Trị Kinh Doanh
No ratings yet
Crisis Planning At Livestrong Foundation: Trường Đại Học Tôn Đức Thắng Khoa Quản Trị Kinh Doanh
11 pages
A Case Study of VNG Corporation
No ratings yet
A Case Study of VNG Corporation
22 pages
Ctec hw5
No ratings yet
Ctec hw5
4 pages
Elekey
No ratings yet
Elekey
55 pages
CIT Questions
No ratings yet
CIT Questions
29 pages
Case Study L'ORÉAL PARIS Chính N
No ratings yet
Case Study L'ORÉAL PARIS Chính N
4 pages
Reading Comprehension
No ratings yet
Reading Comprehension
13 pages
Berman 2012
No ratings yet
Berman 2012
11 pages
(Group 5) Fundamentals of Laws - People's Committee
No ratings yet
(Group 5) Fundamentals of Laws - People's Committee
11 pages
(UVTN 2021) Answer Keys - Aptitude Test 1
No ratings yet
(UVTN 2021) Answer Keys - Aptitude Test 1
17 pages
TATQ6
No ratings yet
TATQ6
5 pages
Final Crtitcal Thinking k58
100% (1)
Final Crtitcal Thinking k58
6 pages
Final Exam CS434B 2022S 0e64b
No ratings yet
Final Exam CS434B 2022S 0e64b
6 pages
T Lo I 2021 Aa
No ratings yet
T Lo I 2021 Aa
15 pages
Đề Thi Speaking Av4
No ratings yet
Đề Thi Speaking Av4
15 pages
3 4 5 IT Infrastructure
No ratings yet
3 4 5 IT Infrastructure
97 pages
Core Competencies of Google 1
No ratings yet
Core Competencies of Google 1
5 pages
Đề Cương Tiếng Anh 1 Có Đáp Án
No ratings yet
Đề Cương Tiếng Anh 1 Có Đáp Án
40 pages
Chữa bài tập VAT
No ratings yet
Chữa bài tập VAT
4 pages
Text Completion
No ratings yet
Text Completion
6 pages
Vin School
No ratings yet
Vin School
15 pages
IRLTS Reading - Hypochondria PDF
33% (9)
IRLTS Reading - Hypochondria PDF
2 pages
Key To Reading - Hp1 Unit 1: Economic Activity Oral Questions
No ratings yet
Key To Reading - Hp1 Unit 1: Economic Activity Oral Questions
30 pages
Full Test Bank For Macroeconomics Policy and Practice 1st Edition Frederic S Mishkin Ebook and TestBank Bundle EPUB DOCX PDF Instant Download
No ratings yet
Full Test Bank For Macroeconomics Policy and Practice 1st Edition Frederic S Mishkin Ebook and TestBank Bundle EPUB DOCX PDF Instant Download
403 pages
Sample Test TACN
No ratings yet
Sample Test TACN
25 pages
Thi Cuối Kì Av1
No ratings yet
Thi Cuối Kì Av1
10 pages
INS4018.03 - Dinh Thi Thu Ha - 20070151
No ratings yet
INS4018.03 - Dinh Thi Thu Ha - 20070151
21 pages
Business Ethics
No ratings yet
Business Ethics
12 pages
Test - C3-C4 - Xem lại lần làm thử - UTEx
No ratings yet
Test - C3-C4 - Xem lại lần làm thử - UTEx
25 pages
F2
No ratings yet
F2
17 pages
OSCM Tự luận
No ratings yet
OSCM Tự luận
21 pages
Fundamentals of Law: Dr. Tran Thi Hong Nhung Faculty of Law National Economics University (NEU)
No ratings yet
Fundamentals of Law: Dr. Tran Thi Hong Nhung Faculty of Law National Economics University (NEU)
28 pages
Chapter 9 KEY
No ratings yet
Chapter 9 KEY
2 pages
EOUTest 1 HP3 K44
No ratings yet
EOUTest 1 HP3 K44
7 pages
(Updated 0323) - Vở ghi TACN2 - Cô Minh Hiền
No ratings yet
(Updated 0323) - Vở ghi TACN2 - Cô Minh Hiền
50 pages
EAP 4 Assignment Cover Sheet
No ratings yet
EAP 4 Assignment Cover Sheet
10 pages
Hacker IELTS READING-chapter3
No ratings yet
Hacker IELTS READING-chapter3
24 pages
VNU-EPT Sample Test and Key - Official Version
100% (2)
VNU-EPT Sample Test and Key - Official Version
29 pages
PM WedIBF01K49 Tran Song Lam 31231022036
No ratings yet
PM WedIBF01K49 Tran Song Lam 31231022036
9 pages
Test Bank Ngan Hang Thuong Mai 2
No ratings yet
Test Bank Ngan Hang Thuong Mai 2
110 pages
Survey Paper On Big Data Analytics Using Hadoop Technologies
No ratings yet
Survey Paper On Big Data Analytics Using Hadoop Technologies
7 pages
The Curse and Blessings of Dynamic SQL
No ratings yet
The Curse and Blessings of Dynamic SQL
36 pages
Firewall AutoBackup
No ratings yet
Firewall AutoBackup
7 pages
Cybersecurity Outline 2024-2025
No ratings yet
Cybersecurity Outline 2024-2025
17 pages
University of Mauritius
No ratings yet
University of Mauritius
6 pages
.Sandisk Backup Mapper
No ratings yet
.Sandisk Backup Mapper
103 pages
4AD20EC032 KUSUMA Edge Computing Tech Report
No ratings yet
4AD20EC032 KUSUMA Edge Computing Tech Report
25 pages
Masking Sensitive Data in Oracle Database: Maja Veselica, Consultant
No ratings yet
Masking Sensitive Data in Oracle Database: Maja Veselica, Consultant
62 pages
BRF Plus - Example
No ratings yet
BRF Plus - Example
19 pages
Cpe 510
No ratings yet
Cpe 510
14 pages
D365 Dployment CloudVsOnprem
No ratings yet
D365 Dployment CloudVsOnprem
17 pages
Data Model 521
No ratings yet
Data Model 521
248 pages
Seminar - Intro: Distributed Systems - Student Version
No ratings yet
Seminar - Intro: Distributed Systems - Student Version
5 pages
Installing and Configuring Windows Server 2016
No ratings yet
Installing and Configuring Windows Server 2016
37 pages
Visio Agriculture
No ratings yet
Visio Agriculture
1 page
Aws Innovate Aiml and Data Edition Agenda
No ratings yet
Aws Innovate Aiml and Data Edition Agenda
1 page
API Interview Questions
No ratings yet
API Interview Questions
8 pages
NCR Report
No ratings yet
NCR Report
3 pages
AWS Dumps
No ratings yet
AWS Dumps
5 pages
Smart UI, Building and Customizing Solutions For Content Server FINAL
No ratings yet
Smart UI, Building and Customizing Solutions For Content Server FINAL
49 pages
Module-4.1 DBMS
No ratings yet
Module-4.1 DBMS
31 pages
Lab 02
No ratings yet
Lab 02
7 pages
Web Application Development With Yii 2 and PHP Sample Chapter
No ratings yet
Web Application Development With Yii 2 and PHP Sample Chapter
18 pages
Oracle Weblogic Server 12c Troubleshooting Workshop
No ratings yet
Oracle Weblogic Server 12c Troubleshooting Workshop
3 pages
2022 Decentralized and Self-Sovereign Identity in The Era of Blockchain A Survey
No ratings yet
2022 Decentralized and Self-Sovereign Identity in The Era of Blockchain A Survey
8 pages
Romney Ais13 PPT 03
No ratings yet
Romney Ais13 PPT 03
18 pages
MS Word Chapter 3
No ratings yet
MS Word Chapter 3
29 pages
Untitled
No ratings yet
Untitled
7 pages
What Is A Spring Bean?: Org - Springframework.Beans Org - Springframework.Context Beanfactory Applicationcontext Beanfactory
No ratings yet
What Is A Spring Bean?: Org - Springframework.Beans Org - Springframework.Context Beanfactory Applicationcontext Beanfactory
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Case 11 - Big Data and The Elephant 2022 Valacich IS Today

Uploaded by

Case 11 - Big Data and The Elephant 2022 Valacich IS Today

Uploaded by

420 CHAPTER 9 • DEvElOPING AND ACqUIRING INfORmATION SySTEmS

CASE 2 Big Data and the Elephant

M09_VALA8115_09_GE_C09.indd 420 07/04/22 2:07 AM

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.