0% found this document useful (0 votes)

50 views13 pages

BigData Reading

This document discusses big data, including what it is, how large it is, and how it can be analyzed. Big data is large, diverse data generated from many sources that is too large to store and analyze using traditional methods. It is characterized by high volume, velocity, variety, and veracity. Techniques for analyzing big data include MapReduce, machine learning, and distributing the data and computations across clusters of computers.

Uploaded by

sunny peoupang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views13 pages

BigData Reading

Uploaded by

sunny peoupang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Big Data

Jeff rey L. Popyack and William M. Mongan

What is Big Data?

Big Data is large, diverse, longitudinal and/or distributed. It is generated from a variety of sources
including sensors, digital equipment, internet transactions, email, video, click streams, phone calls and
any digital source available today or in the future. We are in an era of observation, and it all generates
data. Data is additionally generated by medical equipment, telescopes, satellites, environmental
networks, scanners, financial transactions, blogs, twitter, digital photos, geo maps, etc.

Big Data can be structured, as in the data stored in databases, or unstructured as in the data contained
within a wiki or product recommendation site. Data can be temporal, where time is part of the value of
the data. The exact time and date the photo was taken may be critical to its value. Data can be spatial,
such as maps, where geolocation is part of the value of the data. Data can also be dynamic as in real
time click-steams from large e-commerce sites (read Amazon).

One definition of big data is that the data set must be too large to store using traditional database
storage and query techniques (Wikipedia). These data sets are commonly in the tera- to petabyte size
range, and growing larger.

Four V’s
The four V’s of Big Data are volume, velocity, variety, and veracity. A fifth V is sometimes included,
value. This is a useful lens for looking at applications or potential uses of big data. More on this later.
source: NSF Solicitation 12-499 (Core Techniques and Technologies for Advancing
Big Data Science & Engineering, http://www.nsf.gov/pubs/2012/nsf12499/nsf12499.pdf

How Big is Big Data?

1 byte is needed to store a single The words “Big Data” can be stored in 8 bytes
letter, digit or symbol.
1 kilobyte = 1000 bytes (or 1024) The text of Dr. Seuss’ “Green Eggs and Ham” is 3.3 kilobytes in
size.
1 megabyte = 1000 kilobytes An mp3 audio recording of The Beatles’ “I Want to Hold Your
Hand” occupies 2.76 megabytes.
1 gigabyte = 1000 megabytes The text of articles in Encyclopædia Britannica is about 0.2
gigabytes. The text of English Wikipedia articles is ~44 gigabytes
1 terabyte = 1000 gigabytes The Library of Congress book collection has been estimated at 10
terabytes.
1 petabyte = 1000 terabytes As of November 2019, the Library of Congress had collected over
1 petabytes of web archive data since 2000
(https://www.loc.gov/programs/web-archiving/about-this-
program/frequently-asked-questions/)
1 exabyte = 1000 petabytes We’re not there yet. YouTube users upload 48 hours of video per
minute, or about 15 terabytes of data per hour. At this rate, 8
years = 1 exabyte.
Big Data Surrounds Us
Think about the volume of data represented in
social networking posts. We can capture real-time
sentiments in Twitter, determine who is connected
90% OF ALL DATA WAS CREATED IN
to whom with LinkedIn, and play “Six Degrees of
Separation” with almost anyone.
THE LAST TWO YEARS ALONE
(ACCORDING TO IBM)
In the medical field we are using big data to help
improve cancer screening and to find patterns in
disease vectors and genetics.

In finance, big data is providing significant analysis capabilities that were not possible before. We can
now determine if a credit card use is likely to be fraudulent, we can determine what decisions a
consumer is likely to make, and even analyze the market.

In security and protection services, big data is providing new methods of screening …

“E VERY TWO DAYS NOW WE CREATE AS MUCH INFORMATION AS WE DID FROM THE DAWN OF CIVILIZATION UP
UNTIL 2003.” --E RIC S CHMIDT , G OOGLE CEO, A UG . 2010

Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabytes—even
petabytes—of information.

• Turn 12 terabytes of Tweets created each day into improved product sentiment analysis

• Convert 350 billion annual meter readings to better predict power consumption

Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data
must be used as it streams into your enterprise to maximize its value.

• Scrutinize 5 million trade events created each day to identify potential fraud

• Analyze 500 million daily call detail records in real-time to predict customer churn faster

Variety: Big data is any type of data - structured and unstructured data such as text, sensor data, audio,
video, click streams, log files and more. New insights are found when analyzing these data types
together.

• Monitor 100’s of live video feeds from surveillance cameras to target points of interest

• Exploit the 80% data growth in images, video and documents to improve customer
satisfaction

Veracity: 1 in 3 business leaders don’t trust the information they use to make decisions. How can you
act upon information if you don’t trust it? Establishing trust in big data presents a huge challenge as the
variety and number of sources grows.

Big Data is Interdisciplinary

Big Data encompasses several fields of computing such as computer architecture, distributed
computing, artificial intelligence, data science, and systems administration. More importantly, Big Data
reaches many other fields such as medicine, social networking, finance, business intelligence and public
safety.

Applications of Big Data

• The FBI is combining data from social media, CCTV cameras, phone calls and texts to track down
criminals and predict the next terrorist attack.

• Supermarkets are combining their loyalty card data with social media information to detect and
leverage changing buying patterns. For example, it is easy for retailers to predict that a woman is
pregnant simply based on the changing buying patterns. This allows them to target pregnant
women with promotions for baby related goods.

• Facebook is using face recognition tools to compare the photos you have up-loaded with those
of others to find potential friends of yours (see my post on how Facebook is exploiting your
private information using big data tools).

• Politicians are using social media analytics to determine where they must campaign the hardest
to win the next election.
source: Bernard Marr, “Big Data: The Mega-Trend That Will Impact All Our Lives”
http://www.linkedin.com/
today/post/article/20130827231108-64875646-big-data-the-mega-trend-that-will-impact-all-our-lives

How Can We Analyze It?

Data need not be “this large” to qualify for big data processing techniques. The techniques developed
for working on very large data sets can also be applied to smaller, but still large, datasets. The “Single
Instruction, Multiple Data” paradigm for programming gave rise to techniques such as MapReduce,
which processes data sets on a distributed cluster with relative efficiency.

Open research problems include to improve this efficiency by reducing the number of network
transactions, caching the data more intelligently and others. The important concept is scale – we need
big data analysis techniques when one of the V’s described earlier can scale beyond what we can handle
on a single computer. For example, if you can purchase a 1TB hard drive for $100; how would you use
one computer to process 100PB (100,000 TB) of data?

Machine Learning Techniques

Use the power, speed, capacity and relentlessness of computing to look for patterns in your data. A
correlation of data is often sufficient to identify trends. Certain search terms are good indicators of flu
outbreaks (http://www.google.org/flutrends/).

Supervised Learning can be used to identify incoming email as spam. The user identifies certain email as
spam, then the machine learns to classify others accordingly.

Unsupervised Learning can be used to find out connections we never made ourselves. What traits
identify a potentially good NBA player? Yes, they’re tall, but “average ratio of arms to height in the NBA
is an astounding 1.06. (To put that in context, a ratio of greater than 1.05 is one of the diagnostic criteria
for Marfan syndrome, a disorder of the body's connective tissues that often results in elongated limbs.)”
The Sports Gene, David Epstein
MapReduce
MapReduce is used to process large amounts of data with two parts, a mapper and a reducer. The
mapper breaks data into usable segments to be processed and creates key-value pairs. The reducer
consolidates key-value pairs into meaningful data. There can be many reducers to distribute the
workload.

To facilitate distribution of workload, we convert our idea of storing data in tables to objects. The
columns of a table become fields of an object. The objects are representable in common notations such
as JSON. The objects can be stored in hashtables as key-value pairs.

A table like this one…

zip city state

19063 Middletown PA

19064 Springfield PA

19065 Media PA

22125 Occoquan VA

22134 Quantico VA

22150 Springfield VA

22172 Triangle VA

22191 Woodbridge VA

Becomes JSON…
[
{17057, "Middletown","PA"}
("Worthington", "MA")
("Springfield", "MA")
("Springfield", "MA")
…
("Springfield", "PA")
…
("Springfield", "VA")
…
Becomes key-value pairs:
{
…
"Middletown":["PA"],
"Springfield":["MA", "PA", "VA"],
…
}

Computing Systems for Handling Big Data

Hadoop is one such system. Hadoop was developed by Apache as a MapReduce framework. It is
distributed and cluster-based. It is used for processing very large data sets by Facebook, Twitter, Spotify
and many more. (hadoop.apache.org)

The data is stored on a filesystem (Hadoop Distributed Filesystem – HDFS), is fault-tolerant and can be
run on commodity hardware. It can also use services like Amazon S3, has 99.999999999% durability and
99.99% availability of objects over a given year, and can sustain concurrent loss of data in two facilities.
It is elastic and does not require replication of data across clusters. For example, Netflix uses this to
study video-streaming trends with a 500+ node query cluster, reported and visualized via REST web
service endpoints. Execution service allows Netflix to execute jobs over HTTP.

Harnessing Big Data

Big data can come from user-generated data such as Social media dumps or Wikipedia dumps. It can
also come from computer-generated data such as weather sensor data, credit card transactions or HTTP
server logs. It can be historical trend data such as student grade performance data, Medical records or
crime reports.

Other systems include MapReduce, jsmapreduce (supports Python and JavaScript), IBM BigInsights and
IBM SPSS, Tableau and R.

Here is an example from jsmapreduce:

• The Mapper takes a “shard” of data to process, like a line of text from a web log file.

• This is an offshoot of a common parallel programming paradigm known as “Single Instruction,

Multiple Data” (SIMD), and is well-suited to symmetric multiprocessing environments (SMP) like
multicore.

• The data is already broken up across the computing nodes, but this is transparent to the
Mapper, which sees a traditional filesystem.

• The data is moved as-needed

• The best mapreduce algorithms require little movement of data

• Replication of that data is provided across the nodes because frequent disk failure at this scale is
expected

• The Mapper’s job is to “emit” data back into the distributed filesystem that will be processed by
the Reducer

• This could be statistics about the data, or further instructions on how to process the data as a
whole

• Typically done as one or more key/value tuples, i.e.: { “forbidden” : 5 } to represent 5 HTTP 403
forbidden messages found in the shard of data.

• The key/value tuples are then “shuffled,” or arranged such that the keys and values are grouped
by common key.

• That is, all the “forbidden” keys emitted by the mappers are put together.

• This data is then broken up by key and distributed to the Reducers for processing

• …the Reducers might have previously been Mappers

• Like the Mappers, the Reducers take this data and emit key/value tuples into the distributed
filesystem, typically representing aggregations of the values it received with its key.

• For example, { “forbidden” : 5 } { “forbidden” : 3 } might result in

{ “forbidden” : 8 }
JSMapReduce Example
Data:

hope is the thing with feathers

that perches in the soul
and sings the tune without the words
and never stops at all
and sweetest in the gale is heard
and sore must be the storm
that could abash the little bird
that keeps so many warm
Ive heard it in the chillest land
and on the strangest sea
yet never in extremity
it asked a crumb of me

Mapper:
function Mapper(jsmr_context, data) {
// count the number of each word in this line of input ...
var words_list = data.split(' ');
var word_counts_map = {};
for (var i = 0; i < words_list.length; i++) {
var word = words_list[i];
if (word in word_counts_map) {
word_counts_map[word]++;
} else {
word_counts_map[word] = 1;
}
}
// ... and Emit() each word along with its count (reducer will sum
these)
for (word in word_counts_map) {
var count = word_counts_map[word];
jsmr_context.Emit(word, count);
}
}

Mapper Output:
calling mapper with data="hope is the thing with feathers"
mapper: emitted: key=hope , value=1
mapper: emitted: key=is , value=1
mapper: emitted: key=the , value=1
mapper: emitted: key=thing , value=1
mapper: emitted: key=with , value=1
mapper: emitted: key=feathers , value=1
calling mapper with data="that perches in the soul"
mapper: emitted: key=that , value=1
mapper: emitted: key=perches , value=1
mapper: emitted: key=in , value=1
mapper: emitted: key=the , value=1
mapper: emitted: key=soul , value=1
calling mapper with data="and sings the tune without the words"
mapper: emitted: key=and , value=1
mapper: emitted: key=sings , value=1
mapper: emitted: key=the , value=2
mapper: emitted: key=tune , value=1
mapper: emitted: key=without , value=1
mapper: emitted: key=words , value=1
calling mapper with data="and never stops at all"
mapper: emitted: key=and , value=1
mapper: emitted: key=never , value=1
mapper: emitted: key=stops , value=1
mapper: emitted: key=at , value=1
mapper: emitted: key=all , value=1
calling mapper with data="and sweetest in the gale is heard"
mapper: emitted: key=and , value=1
mapper: emitted: key=sweetest , value=1
mapper: emitted: key=in , value=1
mapper: emitted: key=the , value=1
mapper: emitted: key=gale , value=1
mapper: emitted: key=is , value=1
mapper: emitted: key=heard , value=1
calling mapper with data="and sore must be the storm"
mapper: emitted: key=and , value=1
mapper: emitted: key=sore , value=1
mapper: emitted: key=must , value=1
mapper: emitted: key=be , value=1
mapper: emitted: key=the , value=1
mapper: emitted: key=storm , value=1
calling mapper with data="that could abash the little bird"
mapper: emitted: key=that , value=1
mapper: emitted: key=could , value=1
mapper: emitted: key=abash , value=1
mapper: emitted: key=the , value=1
mapper: emitted: key=little , value=1
mapper: emitted: key=bird , value=1
calling mapper with data="that keeps so many warm"
mapper: emitted: key=that , value=1
mapper: emitted: key=keeps , value=1
mapper: emitted: key=so , value=1
mapper: emitted: key=many , value=1
mapper: emitted: key=warm , value=1
calling mapper with data="Ive heard it in the chillest land"
mapper: emitted: key=Ive , value=1
mapper: emitted: key=heard , value=1
mapper: emitted: key=it , value=1
mapper: emitted: key=in , value=1
mapper: emitted: key=the , value=1
mapper: emitted: key=chillest , value=1
mapper: emitted: key=land , value=1
calling mapper with data= "and on the strangest sea"
mapper: emitted: key=and , value=1
mapper: emitted: key=on , value=1
mapper: emitted: key=the , value=1
mapper: emitted: key=strangest , value=1
mapper: emitted: key=sea , value=1
calling mapper with data= "yet never in extremity"
mapper: emitted: key=yet , value=1
mapper: emitted: key=never , value=1
mapper: emitted: key=in , value=1
mapper: emitted: key=extremity , value=1
calling mapper with data= "it asked a crumb of me"
mapper: emitted: key=it , value=1
mapper: emitted: key=asked , value=1
mapper: emitted: key=a , value=1
mapper: emitted: key=crumb , value=1
mapper: emitted: key=of , value=1
mapper: emitted: key=me , value=1

Shuffle Output:
shuffle result : { "hope":[1], "is":[1,1], "the":[1,1,2,1,1,1,1,1],
"thing":[1], "with":[1], "feathers":[1], "that":[1,1,1], "perches":
[1], "in":[1,1,1,1], "soul":[1], "and":[1,1,1,1,1], "sings":[1],
"tune":[1], "without":[1], "words":[1], "never":[1,1], "stops":[1],
"at":[1], "all":[1], "sweetest":[1], "gale":[1], "heard":[1,1],
"sore":[1], "must":[1], "be":[1], "storm":[1], "could":[1], "abash":
[1], "little":[1], "bird":[1], "keeps":[1], "so":[1], "many":[1],
"warm":[1], "Ive":[1], "it":[1,1], "chillest":[1], "land":[1], "on":
[1], "strangest":[1], "sea":[1], "yet":[1], "extremity":[1],
"asked":[1], "a":[1], "crumb":[1], "of":[1], "me":[1] }
Reducer:
function Reducer(jsmr_context, key) {
// sum word counts from each line to get total for each word
var total_count = 0;
while (jsmr_context.HaveMoreValues()) {
var value_str = jsmr_context.GetNextValue();
total_count += parseInt(value_str);
}
jsmr_context.Emit(key + ':' + total_count.toString());
}

Reducer Output:
initialized reducer for key: Ive
initialized reducer for key: a
initialized reducer for key: abash
initialized reducer for key: all
initialized reducer for key: and
initialized reducer for key: asked
initialized reducer for key: at
initialized reducer for key: be
initialized reducer for key: bird
initialized reducer for key: chillest
initialized reducer for key: could
initialized reducer for key: crumb
initialized reducer for key: extremity
initialized reducer for key: feathers
initialized reducer for key: gale
initialized reducer for key: heard
initialized reducer for key: hope
initialized reducer for key: in
initialized reducer for key: is
initialized reducer for key: it
initialized reducer for key: keeps
initialized reducer for key: land
initialized reducer for key: little
initialized reducer for key: many
initialized reducer for key: me
initialized reducer for key: must
initialized reducer for key: never
initialized reducer for key: of
initialized reducer for key: on
initialized reducer for key: perches
initialized reducer for key: sea
initialized reducer for key: sings
initialized reducer for key: so
initialized reducer for key: sore
initialized reducer for key: soul
initialized reducer for key: stops
initialized reducer for key: storm
initialized reducer for key: strangest
initialized reducer for key: sweetest
initialized reducer for key: that
initialized reducer for key: the
initialized reducer for key: thing
initialized reducer for key: tune
initialized reducer for key: warm
initialized reducer for key: with
initialized reducer for key: without
initialized reducer for key: words
initialized reducer for key: yet
shuffle complete; initialized 48 reducer phases
(step)
calling reducer with key="Ive"
reducer: reducer fetched value: 1
reducer: emitted: "Ive:1"
(step)
calling reducer with key="a"
reducer: reducer fetched value: 1
reducer: emitted: "a:1"
(step)
calling reducer with key="abash"
reducer: reducer fetched value: 1
reducer: emitted: "abash:1"
(step)
calling reducer with key="all"
reducer: reducer fetched value: 1
reducer: emitted: "all:1"
(step)
calling reducer with key="and"
reducer: reducer fetched value: 1
reducer: reducer fetched value: 1
reducer: reducer fetched value: 1
reducer: reducer fetched value: 1
reducer: reducer fetched value: 1
reducer: emitted: "and:5"
(step)
calling reducer with key="asked"
reducer: reducer fetched value: 1
reducer: emitted: "asked:1"
(step)
calling reducer with key="at"
reducer: reducer fetched value: 1
reducer: emitted: "at:1"
(step)
calling reducer with key="be"
reducer: reducer fetched value: 1
reducer: emitted: "be:1"
(step)
calling reducer with key="bird"
reducer: reducer fetched value: 1
reducer: emitted: "bird:1"
step)
calling reducer with key="chillest"
reducer: reducer fetched value: 1
reducer: emitted: "chillest:1"
(step)
calling reducer with key="could"
reducer: reducer fetched value: 1
reducer: emitted: "could:1"
(step)
calling reducer with key="crumb"
reducer: reducer fetched value: 1
reducer: emitted: "crumb:1"
(step)
calling reducer with key="extremity"
reducer: reducer fetched value: 1
reducer: emitted: "extremity:1"
(step)
calling reducer with key="feathers"
reducer: reducer fetched value: 1
reducer: emitted: "feathers:1"
(step)
calling reducer with key="gale"
reducer: reducer fetched value: 1
reducer: emitted: "gale:1"
(step)
calling reducer with key="heard"
reducer: reducer fetched value: 1
reducer: reducer fetched value: 1
reducer: emitted: "heard:2"
(step)
(and so on…)
References

• http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.html

• http://queue.acm.org/detail.cfm?id=1961297

• http://www-01.ibm.com/software/data/bigdata/

• http://www.cs.kent.edu/~jin/Cloud12Spring/HbaseHivePig.pptx

• http://marianoguerra.github.io/json.human.js/

• http://jsonprettyprint.com/

• http://stackoverflow.com/questions/12628246/how-to-send-oauth-request-with-python-
oauth2

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant No. CNS-
1301171.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the
author(s) and do not necessarily reflect the views of the National Science Foundation.

Supported in part by IBM Big Data Faculty Awards 2013

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6441)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (642)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1174)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (999)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4102)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (628)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1018)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Question Application Server (AS)
100% (2)
Question Application Server (AS)
34 pages
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (581)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1138)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5145)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2133)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (463)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (279)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4360)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2010)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2789)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2884)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4088)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Gap Assessment Report
100% (1)
Gap Assessment Report
128 pages
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
CO - General Ledger Allocation Cycle
0% (1)
CO - General Ledger Allocation Cycle
12 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Visionworks Lifecycle Service Assurance: White Paper
No ratings yet
Visionworks Lifecycle Service Assurance: White Paper
10 pages
Google Privacy Policy en-GB Eu
No ratings yet
Google Privacy Policy en-GB Eu
30 pages
1 - Introduction To Software Engineering
No ratings yet
1 - Introduction To Software Engineering
8 pages
CEH v9 - Certified Ethical Hacker
No ratings yet
CEH v9 - Certified Ethical Hacker
8 pages
Review-Questions-Etech With Aswer Key69
No ratings yet
Review-Questions-Etech With Aswer Key69
4 pages
Parking Management System Parking Manage
No ratings yet
Parking Management System Parking Manage
57 pages
IO - Collab - MS Teams Support Guide
No ratings yet
IO - Collab - MS Teams Support Guide
18 pages
MICROPROCESSORS
No ratings yet
MICROPROCESSORS
10 pages
Data Mining MCQ FINAL
No ratings yet
Data Mining MCQ FINAL
32 pages
Rf1100-232 RF 433Mhz Transceiver Module: What Does It Look Like?
No ratings yet
Rf1100-232 RF 433Mhz Transceiver Module: What Does It Look Like?
9 pages
Red Team Ops pt1
No ratings yet
Red Team Ops pt1
25 pages
Chapter 4 Data Resource Management
No ratings yet
Chapter 4 Data Resource Management
2 pages
Basic Design KCGP-BD-E-IN-RE-0006 Kazakhstan-China Gas Pipeline Project Telecommunication & SCADA Revision 0
No ratings yet
Basic Design KCGP-BD-E-IN-RE-0006 Kazakhstan-China Gas Pipeline Project Telecommunication & SCADA Revision 0
11 pages
Bioconnect Enterprise v5.0 Software Configuration Guide
No ratings yet
Bioconnect Enterprise v5.0 Software Configuration Guide
50 pages
Chapter 1 - I: Ntroduction
No ratings yet
Chapter 1 - I: Ntroduction
46 pages
CSC 3209 - Lecture Introduction
No ratings yet
CSC 3209 - Lecture Introduction
33 pages
Interview Questionjkndqwjks For System Engineer Net. Eng
No ratings yet
Interview Questionjkndqwjks For System Engineer Net. Eng
5 pages
Chapter24 Nosql Dbs
No ratings yet
Chapter24 Nosql Dbs
35 pages
CIS017-1 - CIS095-1 - Assignment 1 (Design and Implement A Database) Report Template 2020-2021-16!3!2021
No ratings yet
CIS017-1 - CIS095-1 - Assignment 1 (Design and Implement A Database) Report Template 2020-2021-16!3!2021
7 pages
MorhpoRDServiceUserManual Window
100% (1)
MorhpoRDServiceUserManual Window
10 pages
Getting Started With Ebsopen: Gettingstartedwithebsopen
No ratings yet
Getting Started With Ebsopen: Gettingstartedwithebsopen
16 pages
Sawla Polytechnic College: PBL Title: Design and Develop Digital Public Library System
No ratings yet
Sawla Polytechnic College: PBL Title: Design and Develop Digital Public Library System
8 pages
Haida Texts and Myths
100% (2)
Haida Texts and Myths
462 pages
TD-W8961N 300Mbps Wireless N ADSL2+ Modem Router
No ratings yet
TD-W8961N 300Mbps Wireless N ADSL2+ Modem Router
79 pages
Secure Semantic Web (Ontology Sharing)
No ratings yet
Secure Semantic Web (Ontology Sharing)
60 pages
Unit 2-Part2
No ratings yet
Unit 2-Part2
49 pages
Detection of Malicious Web Contents Using Machine and Deep Learning Approaches
No ratings yet
Detection of Malicious Web Contents Using Machine and Deep Learning Approaches
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

BigData Reading

Uploaded by

BigData Reading

Uploaded by

Big Data

Jeff rey L. Popyack and William M. Mongan

What is Big Data?

How Big is Big Data?

Big Data is Interdisciplinary

Applications of Big Data

How Can We Analyze It?

Machine Learning Techniques

A table like this one…

zip city state

Computing Systems for Handling Big Data

Harnessing Big Data

Here is an example from jsmapreduce:

• This is an offshoot of a common parallel programming paradigm known as “Single Instruction,

• The data is moved as-needed

• The best mapreduce algorithms require little movement of data

• …the Reducers might have previously been Mappers

• For example, { “forbidden” : 5 } { “forbidden” : 3 } might result in

hope is the thing with feathers

Supported in part by IBM Big Data Faculty Awards 2013

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.