0% found this document useful (0 votes)
127 views32 pages

Big Data 101

Comprehensive guide for Big Data

Uploaded by

anjanasundaram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views32 pages

Big Data 101

Comprehensive guide for Big Data

Uploaded by

anjanasundaram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Big Data 101

A N INT R O DUCT ION TO B I G D ATA


FOR T HE
CU R IO US L IB R A R IA N
5 . 22.20 17
A NN MA DHAVA N, MS L I S
R ESE A R CH A ND DATA COOR DINATOR
NNL M PACIFIC NO R T HWE S T R E G I O N
What we will cover in this presentation?
• What is Big Data?
• How is it different from “small data”?
• How will it impact our lives?
• Is it a good thing?
• How can librarians prepare?
The Information Continuum

Cartoon by David Somerville, based on a two pane version by Hugh McLeod


The Scientific Method

© ArchonMagnus
Traditional Research
1. Generate a hypothesis.
2. Assemble a sample
population and a control
group.
3. Expose both to an
intervention (drug,
treatment, etc.).
4. Do statistical analysis to
identify causal relationships.
5. Rinse and repeat… ©Mark A. Hicks
Types of Data
Quantitative Data Qualitative Data
• Measurable • Descriptive
• Collected through measuring things • Collected through observation, field
that have a fixed reality work, focus groups, interviews,
recording or filming conversations
• Close ended
• Open ended
Big Data
Data that is too large or too
complex to be managed using
traditional data processing,
analysis, and storage
techniques.
Volume Variety
The amount The types
of data of data
The 4 V’s
of
Big Data
Velocity Veracity
The frequency The quality
of data of data
Volume: scale of data
Volume: scale of data
• 90% of today’s data has been created in just the last 2 years
• Every day we create 2.5 quintillion bytes of data or enough to fill 10
million Blu-ray discs
• 40 zettabytes (4o trillion gigabytes) of data will be created by 2020,
an increase of 300 times from 2005, and the equivalent of 5,200
gigabytes of data for every man, woman and child on Earth
• Most companies in the US have over 100 terabytes (100,000
gigabytes) of data stored
Variety: different forms of data
Velocity: analysis of streaming data
Veracity: trustworthiness of data

• Origin
• Authenticity
• Trustworthiness
• Completeness
• Integrity
Value
Volume Variety
The amount The types
of data The 4 V’s of data
of
Big Data
Velocity Veracity
The frequency The quality
of data of data
Big Data and Research
Big Data Mining
1. Collect Big Data or obtain
access to a repository.
2. Perform data analysis to
explore patterns (pattern
recognition, predictive
analytics).
3. Identify potential
correlations.
©Rina Piccolo
4. Good enough!
Big Data in Health Care
• Faster and cheaper technology and data storage
• Widespread sensing devices
• An increase in “born” digital data
• Greater availability of data via repositories
• Data sharing mandates
Faster and
cheaper
technology and
data storage

The cost to sequence a whole


human genome sequence has fallen
from +$100 million to less than
$1,000 over the past 15 years.
Sensing devices
• Smartwatches
• Smart jewelry
• Fitness trackers
• Sport watches
• Smart glasses
• Smart clothing…
An increase in © Alan Levine

“born”
digital data
© NEC Corporation of America

Data that originates as digital data,


rather than being converted or
digitized later is proliferating. Think
digital electronic medical records,
implanted medical devices, diagnostic
imaging technology…

©Hellerhoff
Greater
availability of
data via
repositories
As of April 2016 the Registry of
Research Data Repositories
(re3data.org) listed 1,500 research
data repositories. Currently 458 are
key worded “medicine.”
Sharing
mandates

The number of funders and journals


with data sharing policies has
grown significantly in the past
decade…
The Health Care Big Data Horizon
• Leverage the Electronic Health Record to improve diagnosis,
outcomes, and reduce costs
• Integrate patient-generated health data and the Internet of Things
(IoT)
• Incorporate environmental and socioeconomic data in patient
diagnosis and treatment
• Develop personalized care specific to each patient’s particular needs
(Precision Medicine)
Health Disparities: Big Data to the Rescue?
“Big Data” on PubMed
1400
1196
1200
Instances of “Big Data”

1000
800 723

600
463
400
201
200
2 1 9 3 2 7 41
0
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Year
Hurdles and Risks
• Unstructured Data (~75% of data in the healthcare environment)
• Data privacy/security (HIPAA Compliance, Patient Confidentiality,
Personally Identifiable Information/PII)
• Inconsistent, incomplete , unavailable, poor quality or invalid data
• Poor analysis/analytics leading to erroneous correlations/conclusions
• Misused data
Big Data and Librarians

What role will librarians play in


the Big Data revolution?
Do you see yourself playing a
part?
How will you prepare yourself?
What resources will you use?

Patricia Brennan, RN, PhD, NNLM Director


Resources…
• DataMed https://datamed.org/
• Institute for Health Metrics and Evaluation’s Global Health Data Exchange
http://ghdx.healthdata.org/
• NNLM RD3: Resources for Data-Driven Discovery https://nnlm.gov/data/
• NNLM’s YouTube Channel
https://www.youtube.com/channel/UCmZqoegBFKJQF69V8d-05Bw
• OHSU’s Big Data to Knowledge https://dmice.ohsu.edu/bd2k/topics.html
• Registry of Research Data Repositories (re3data.org)
http://www.re3data.org/
• NIH’s All of Us Program https://allofus.nih.gov/
References
• Borgman, Christine L. Big data, little data, no data: Scholarship in the
networked world. MIT Press, 2015.
• Federer, Lisa. Beyond the SEA: Data Science 101: An introduction for
librarians https://www.youtube.com/watch?v=i78ciP1eGxo&t=3s
• Mayer-Schönberger, Viktor, and Kenneth Cukier. Big data: A
revolution that will transform how we live, work and think. Houghton
Mifflin Harcourt, 2013.
Contact Information

Ann Madhavan, MSLIS


Research and Data Coordinator
NNLM Pacific Northwest Region
Seattle, WA
Email: albm@uw.edu
206-616-7283
NNLM Pacific Northwest Region
https://nnlm.gov/pnr/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy