0% found this document useful (0 votes)
120 views7 pages

Data, Big Data, and Metadata in Anesthesiology

Uploaded by

Soumya Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views7 pages

Data, Big Data, and Metadata in Anesthesiology

Uploaded by

Soumya Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

E The Open Mind

Data, Big Data, and Metadata in Anesthesiology


Matthew A. Levin, MD,* Jonathan P. Wanderer, MD, MPhil,†‡ and Jesse M. Ehrenfeld, MD, MPH†‡§∥

T
he last decade has seen an explosion in the growth DEFINING BIG DATA
of digital data. Since 2005, the total amount of digital The term Big Data is not new, but its use has only recently
data created or replicated on all platforms and devices become widespread (Fig. 1). Broadly speaking, Big Data
Downloaded from https://journals.lww.com/anesthesia-analgesia by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3YeLEAQtdJ81PAeFrKW8fjtPSSQVXHaUBPIhr4UkA44I= on 09/07/2020

has been doubling every 2 years, from an estimated 132 exa- is data that are so large and complex, and generated
bytes (132 billion gigabytes) in 2005 to 4.4 zettabytes (4.4 from such a wide variety of sources at such a high rate,
trillion gigabytes) in 2013, and a projected 44 zettabytes (44 that they exceed the ability of traditional tools and infra-
trillion gigabytes) in 2020.a This growth has been driven in structure to capture, store, and analyze them. The defin-
large part by the rise of social media along with more pow- ing characteristics of Big Data, as originally put forth
erful and connected mobile devices, with an estimated 75% by Laney in 2001, are the “3 Vs”: Volume, Velocity, and
of information in the digital universe generated by individ- Variety (Table 1).b The combination of these 3 attributes
uals rather than entities. Transactions and communications is what makes Big Data so challenging to work with,
including payments, instant messages, Web searches, social although the difficulty most often arises more from data
media updates, and online posts are all becoming part of volume and velocity than from variety. For example,
a vast pool of data that live “in the cloud” on clusters of an average of 500 million “tweets” is posted on Twitter
servers located in remote data centers. The amount of accu- every day, and Google indexes over 20 billion sites daily
mulating data has become so large that it has given rise to to process >3.5 billion searches per day.c This volume
the term Big Data. In many ways, Big Data is just a buzz- and velocity of data are simply too great for most con-
word, a phrase that is often misunderstood and misused to ventional database systems.
describe any sort of data, no matter the size or complexity. The 3 Vs are not the only definition of Big Data. Some
However, there is truth to the assertion that some data sets have suggested that a fourth V, Veracity, is important. If
truly require new management and analysis techniques. sources cannot be trusted and data are not reliable, they
Increasingly, health care data are becoming a part of cannot be acted upon. Veracity, however, is a necessary sub-
this continuous stream of digital data, driven in part by the jective property of all data, regardless of size or complexity,
mandates included in the Health Information Technology and is not unique to Big Data. A recent review of Big Data
for Economic and Clinical Health (HITECH) Act, which was definitions found strong ties between Big Data and certain
part of the larger American Recovery and Reinvestment infrastructure, specifically technologies such as “NoSQL”
Act passed in 2009. The HITECH Act established financial databases that organize and store data using simple key-
incentives and penalties to encourage the “meaningful use” value pairs rather than the tables and columns used by
of electronic health records (EHRs). This has undoubtedly modern relational databases.1 In fact, the best definition of
increased the volume of electronic health care data by an Big Data might simply be “too big to fit on my computer.”
order of magnitude over the past 5 years. This leads to the Conversely, many technologists would say, “If the data fit in
question: have health care data become Big Data? If so, can a database, they are not Big Data.”d
health care Big Data provide new insights that help improve Another proposed definition of Big Data is “to describe
outcomes on both an individual and a population level? In ‘big’ in terms of the number of useful permutations of
this article, we define Big Data, discuss whether anesthesi- sources making useful querying difficult ... and complex
ology has Big Data, and determine whether there is truly a interrelationships making purging difficult ... Big Data can
need to use new infrastructure and analytic techniques to be small and not all large datasets are big.”e This somewhat
manage data in anesthesiology. contradictory statement captures the concept that, as much
as size, what can make Big Data difficult to manage is the
lack of structure and the complexity of the relationships
From the *Department of Anesthesiology, Icahn School of Medicine at Mount among data elements.
Sinai, New York, New York; Departments of †Anesthesiology, ‡Biomedical
Informatics, §Health Policy, and ∥Surgery, Vanderbilt University School of
Medicine, Nashville, Tennessee.
a
http://www.emc.com/leadership/digital-universe/2014iview/executive-
Accepted for publication December 10, 2014. summary.htm. Accessed November 26, 2014.
Funding: No external sources. b
http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-
Conflict of Interest: See Disclosures at the end of the article. Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Accessed
Reprints will not be available from the authors. November 26, 2014.

Address correspondence to Matthew A. Levin, MD, Department of Anesthe- http://www.internetlivestats.com. Accessed October 20, 2014.
c

siology, Mount Sinai School of Medicine, 1 Gustave L. Levy Place, Box 1010, d
http://ask.slashdot.org/story/14/11/08/0139248/ask-slashdot-choosing-
New York, NY 10029. Address e-mail to matthew.levin@mssm.edu. a-data-warehouse-server-system. Accessed November 24, 2014.
Copyright © 2015 International Anesthesia Research Society e
http://mike2.openmethodology.org/wiki/Big_Data_Definition. Accessed
DOI: 10.1213/ANE.0000000000000716 July 21, 2014.

December 2015 • Volume 121 • Number 6 www.anesthesia-analgesia.org 1661


E THE OPEN MIND

Figure 1. Graph showing the exponential


increase in Google searches for the term
Big Data.

Potential of Big Data


Table 1.  Key Attributes of Big Data The current generation of perioperative research is the
Attribute Description
analyses of tens of thousands of anesthetic cases compris-
Volume Terabytes or more of new data each day
Velocity Data are generated at high speed. Analysis needs to ing megabytes of data. The next generation of perioperative
be near real-time to remain relevant research will involve millions of anesthetic cases, a sample
Variety Heterogeneous data, encompassing many different size currently restricted to research that uses large adminis-
formats and concepts, and coming from a variety trative data sets.4 Perhaps the biggest argument for Big Data
of different sources in anesthesiology is that there remain important clinical
problems for which we do not have good answers, because
we do not have enough power to perform meaningful sta-
One issue that has generally been poorly addressed in tistical analyses. This conundrum has been recognized by
discussions of Big Data is data quality.f In fact, there is the anesthesia community for some time.5 Examples of such
an attitude that “‘good enough’ is good enough,” mean- problems and questions are:
ing that some degree of data loss and inaccuracy is an
acceptable trade-off for the insight gained from mas- • True root causes of ischemic optic neuropathy (the
sive data sets.2 Although this may be true for nonclinical largest study to date has only looked at dozens of
applications, adopting this approach in the health care cases).6
setting can be problematic. Data loss and inaccuracy are • True incidence of and risk factors for postoperative
clearly not acceptable for granular analyses at the indi- pulmonary complications.7–9 The recently completed
vidual patient level. However, utilization of statistical PERISCOPE trial, which externally validated a previ-
analysis techniques designed to deal with “noisy data” ously developed postoperative pulmonary complica-
may render Big Data with suboptimal data quality still tions risk score, showed wide variation in predictive
useful. Using such methods requires early and close col- power even within a fairly large sample of over
laboration with formally trained statisticians and bioin- 5000 patients drawn from 63 centers in 21 European
formaticists because these techniques are mathematically countries.10 This suggests that even larger and more
complex and well beyond the level of statistical educa- diverse data are needed to develop globally appli-
tion of most clinicians.3 The analysis process is no lon- cable risk scores.
ger highly linear (as in a classic randomized controlled • The true value of processed electroencephalography
trial), but iterative and even branching. Early collabora- monitoring (e.g., Bispectral Index) in preventing
tion helps ensure success in both the interpretation and awareness. The largest studies thus far have only
presentation of results. had a handful of incidences of awareness in each
arm.11–13

DOES ANESTHESIOLOGY HAVE BIG DATA OR In the examples above, it is not only that more raw
NEED BIG DATA? numbers of cases may be needed to answer the question,
Having defined Big Data, we can now ask: does the medi- it is also that more detail per case is likely needed. This
cal specialty of anesthesiology have Big Data? By looking at can lead to Big Data even if the number of cases remains
each of the Vs in turn, a more detailed answer can be devel- relatively modest. For example, having access to intra-
oped. First, however, it is worthwhile to ask, does anesthe- operative blood pressure waveform recordings might
siology really need Big Data? facilitate better understanding of whether rapid and
transient blood pressure changes (not recorded by con-
f
http://www.techrepublic.com/article/data-quality-the-ugly-duckling-of- ventional monitoring) play any role in causing ischemic
big-data/. Accessed November 24, 2014. optic neuropathy.

1662   
www.anesthesia-analgesia.org anesthesia & analgesia
Data, Big Data, and Metadata in Anesthesiology

1 million cases have detailed intraoperative physiologic


Table 2.  Example Storage Requirements for Heart
Rate (HR) Data for a 2-Hour (120-Minute) Casea data. With the use of the estimates above, this is only about
Sample rate Storage required per casea 1 terabyte of data, the equivalent of about 400 full-length
Every 5 min (ASA standard) 94 bytes (24 data points) DVD movies, or 33 movies in the higher-quality Blu-
Every minute (standard AIMS) 480 bytes (120 data points) Ray format, an amount that can easily fit onto a modern
Every 15 s (high-fidelity AIMS) 1.8 kB (480 data points) consumer hard drive. Additionally, by design, NACOR
256 Hz ECG waveform sampling 7.4 MB (1.8 million data points) is focused on breadth rather than depth of data capture,
(15,360 samples/min), per lead per lead with the result that the quality and completeness of case
Waveform sampling, 5-lead ECG 37 MB
data may vary widely among contributing sites. Further,
AIMS = Anesthesia Information Management System; ECG = electrocardiogram. current data use agreements limit reporting to one’s own
a
Assuming each data point stored as a 4-byte integer.
data and benchmarking, and site and provider identities
are masked in the Participant User File (the research extract
Anesthesiology Big Data—Volume made available to participants). Probability sampling can-
There are 2 aspects to consider for volume: (1) data for an
not be done and, therefore, no conclusions can be drawn
individual case, and (2) aggregate data across practices
about incidence. This limits NACOR’s current utility for
(i.e., institutional and/or national level data). Individual
research purposes.
case data consist of patient demographics, physiologic
Another possible source of Big Data in anesthesiology is
data (vital signs), event data, medication data, fluid data,
the Multicenter Perioperative Outcomes Group (MPOG).j
and any associated information describing these data. Most
This initiative, started by the University of Michigan in 2008,
of these elements involve minimal amounts of data, on
aims to aggregate EHR, administrative and outcome data
the order of kilobytes. Table  2 shows an example of how
into a single unified source that can be used for periopera-
much storage might be required to record one physiologic
tive research. To date, 17 sites are actively contributing data
parameter, in various formats. It is evident that individual
to this effort from the United States and the Netherlands.
anesthesia records generated by the current generation of
The MPOG database contains over 2 million patient cases
anesthesia information management systems (AIMSs) are
representing 1.4 million unique patients, with over 5 billion
not Big Data. Full waveform capture, however, begins to
vital signs and 125 million laboratory values.k MPOG lim-
generate a significant volume of data. As shown in Table 2,
its access to contributing members, and use of the data for
waveform data for a 5-lead electrocardiogram for a 2-hour
research projects requires approval from its Perioperative
case would generate 37 MB of data. Add capnography,
Clinical Research Committee. Access to data is restricted to
arterial blood pressure and central venous pressure, pulse
the subset relevant to each research project, which limits the
oximetry, electroencephalograph traces, airway pressure
use of MPOG as a true large scale data source.
and volume waveforms, and the data volume explodes.
Other possible sources of anesthesia Big Data are national
For example, Liu et al.14 from the University of Queensland
perioperative databases such as the National Surgery Quality
recorded waveform data with 10 millisecond resolution (100
Program (NSQIP) and the Society of Thoracic Surgeons (STS)
Hz) from 32 patients undergoing anesthesia. This generated
National Database.15 Both the STS and NSQIP rely on man-
approximately 5.5 GB of data, or about 170 MB per case.
ual data collection, which is time consuming, costly, labor
(The researchers have made this data set freely available.g)
intensive, and inflexible.16 STS and NSQIP data entry is per-
Another perspective from which to consider volume is
formed via structured forms with prespecified values that
at the aggregate institutional or national level. A modern
do not have the flexibility to allow free text input. Updating
AIMS, sampling physiologic data once every minute, and
the data capture forms requires administrative review and
including all other patient and case data, will generate a
consensus, which cannot be done by individual reporters.
file approximately 1 MB in size. A large tertiary care cen-
Because these registries are surgically oriented, they gener-
ter might perform 200 cases per day, generating 200 MB
ally do not contain detailed intraoperative anesthesia data.
of data. If the center performs 50,000 anesthesia cases per
STS participation is voluntary and only captures cardiotho-
year, 50 GB of data are generated. In the United States in
racic surgical cases, and NSQIP relies on sampling and thus
2010, approximately 51 million inpatient surgical proce-
only captures a small fraction of all surgical cases performed
dures were performed.h If data from all of these cases were
in the United States. These registries remain small and can-
captured, this would result in approximately 51 terabytes
not be considered Big Data.
of anesthesia case data per year. The National Clinical
Outcomes Registry (NACOR), a nationwide anesthesia
database maintained by the Anesthesia Quality Institute, Anesthesiology Big Data—Velocity
At first glance, it appears obvious that anesthesiology has
has the stated goal of capturing data on all anesthetics
high-velocity data. Intraoperative monitoring is continu-
administered in the United States. Since starting opera-
ous, and every minute of every day there are thousands of
tions in 2010, NACOR has collected data on over 21 million
cases occurring simultaneously across the United States and
cases through November 2014.i Of these, only an estimated
the world. Yet, the vast majority of these data is never cap-
tured because waveform data are not stored. Therefore, in
g
http://dropbox.eait.uq.edu.au/uqdliu3/uqvitalsignsdataset/index.html. reality, the current velocity of anesthesia data is quite low,
Accessed November 7, 2014.
with the typical AIMS only recording data once per minute
h
http://www.cdc.gov/nchs/fastats/inpatient-surgery.htm. Accessed June 3,
2014.
https://www.aqihq.org/introduction-to-nacor.aspx. Accessed October 20,
i
j
https://mpog.med.umich.edu/.
2014. k
Personal correspondence, author MAL, September 1, 2014.

December 2015 • Volume 121 • Number 6 www.anesthesia-analgesia.org 1663


E THE OPEN MIND

(Table 2). Additionally, data are often not available for use occurs, in time for preemptive intervention to occur. On a
in near real time, but are only made available for report- national level, NACOR will become an increasingly impor-
ing the next day. This is not true of many of the older, more tant resource for perioperative research, and in time, as
established AIMS, but it is the case for some of the newer more providers begin sending their data to NACOR and the
AIMS provided by the large EMR vendors, such as Epic frequency of submission increases, the volume and velocity
Anesthesia (Epic Systems Corp., Madison, WI). This further may begin to approach a scale that could be called Big Data.
decreases the velocity of data. It must be noted again, however, that as long as NACOR
While many practitioners may not currently benefit only provides aggregate data to researchers, no studies of
from real-time data analysis, there are growing examples incidence can be undertaken using NACOR data.
of ways in which real-time predictive analysis of intraop-
erative trends can lead to improved outcomes. For exam- METADATA
ple, at Vanderbilt University Medical Center, real-time We have described the volume, velocity, and variety of data
data including vital signs and operating room video are in anesthesiology and provided information describing each
streamed wirelessly to mobile devices to allow supervising of these 3 attributes. Another important attribute of Big Data
anesthesiologists to remotely monitor their cases and man- (or any data) is metadata, which is defined as data about
age clinical workflow.17 Others have proposed using real- data. The National Information Standards Organization
time waveform data analysis to support clinical decision specifies 3 types of metadata: descriptive, structural, and
support around fluid responsiveness and management in administrative.21 Descriptive metadata are data points that
both the operating room and intensive care unit.18 are used to assist with discovery and identification of data,
including elements like content authorship. In the context
Anesthesiology Big Data—Variety of an AIMS, for instance, descriptive metadata associated
The last dimension of Big Data is variety. There are clearly with a case comment would indicate who entered the com-
a variety of data types present in perioperative data. ment, at what time, and from which device. Structural meta-
Physiologic data can be continuous or discrete numerical data specifies how data are ordered and linked together.
data. Demographic data are numeric, text, and categorical For an AIMS, this would include a database schema that
data (e.g., ASA Physical Status Classification). Medication describes how a case might be stored as one patient identi-
data are a mixture of data types, representing medication fier record linked to many physiologic data entry records.
name, units, dose, and administration time stamps. Allergy Administrative metadata are used to manage data and may
data can be either structured and mapped to a standard- specify elements such as data access permissions and data
ized nomenclature, or can be represented as unstructured access logs. An example of this is an audit log that records
free text.19,20 Intraoperative events are represented by time individual provider accesses of AIMS records.
stamps or time series. Cases may include imaging data such United States law mandates retention of these records.
as transesophageal echocardiographic images, video laryn- The Health Insurance Portability and Accountability Act of
goscopy images, and intraoperative radiographs. There 1996 requires that covered entities, such as hospitals, “imple-
are also a variety of data sources. Data can come from an ment hardware, software, and/or procedural mechanisms
AIMS, an EHR, an anesthesia workstation (via an AIMS or that record and examine activity in information systems that
directly), a picture-archiving and communication system, contain or use electronic protected health information” (45
an ultrasound device, an intraoperative video-recording C.F.R. § 164.312). These records must be maintained for 6
device, etc. On a regional or national level, data could come years, and a process must be in place to examine these logs
from a wide variety of different institutions (e.g., commu- and generate compliance reports. The volume of data needed
nity hospitals, ambulatory surgical centers, freestanding to meet this requirement can be very large. At Vanderbilt
imaging centers, or tertiary medical centers). The challenge University Medical Center, approximately 4.5 million audit
of integrating data from all of these sources and care set- records are generated daily for one of the non-AIMS EHR
tings is daunting, particularly given the current lack of systems, which scales to 10 billion records extrapolated over
widespread interoperability among systems. 6 years. Maintaining these data stores in an accessible fash-
ion can be challenging and may require some techniques
Anesthesiology Big Data—Summary associated with Big Data, even without AIMS records.
In summary, does the field of anesthesiology really have AIMS records were featured in a case report by Vigoda
Big Data? The answer is: not yet. There is definitely variety. and Lubarsky in 2006, where an unrecognized failure of
There is increasing velocity, especially as the field moves the AIMS to record intraoperative vital signs likely led to
toward more real-time analysis of intraoperative data. The increased medical liability in a procedure that resulted
volume of data, however, is modest. Looking toward the in patient harm.22 While not the focus of the case report,
near future, if full waveform data were captured and used the authors also noted that the plaintiff’s attorney had
for real-time signal processing (e.g., heart rate variabil- requested the metadata associated with the AIMS entries.
ity, entropy analysis), not only would the volume of data These entries included the attending anesthesiologist’s
suddenly become very large, the analysis would start to attestation of being present at emergence, which was
become computationally complex and resource intensive. entered soon after the surgery’s start. Thus, metadata
This might truly push anesthesiology into the realm of Big revealed the temporal context of the attestation, which
Data. The payoff of such real-time waveform analysis might undermined the credibility of the anesthesia team. This
be better prediction of impending clinical decompensa- episode highlighted the importance of metadata to anes-
tion (e.g., postoperative hemodynamic instability) before it thesiologists specifically.

1664   
www.anesthesia-analgesia.org anesthesia & analgesia
Data, Big Data, and Metadata in Anesthesiology

In summary, metadata are an important component of STRENGTHS AND LIMITATIONS OF TRADITIONAL


health care data that provide the context for data generated ANALYTIC TECHNIQUES
in perioperative care, which can be leveraged in nontradi- While waveform or genomic data may necessitate new tools
tional ways to provide insight into health care workflow. and frameworks, the majority of current perioperative data
It has substantial volume and velocity, but does not have sets do not require new analytic techniques or infrastructure.
significant variety. However, there is another form of high- A statistical analysis of postanesthesia care unit staffing per-
volume health care Big Data that may become increasingly formed by Dexter et al.32 in 2001 analyzed approximately 580
important: genomics. billion shift permutations (a Big Data number of permuta-
tions) on a low-power personal computer in approximately
EMERGING BIG DATA: GENOMICS AND 7 hours. Computer hardware continues to follow “Moore’s
ANESTHESIOLOGY Law,” roughly doubling in processing power every 2 years.33
As our understanding of genomics and our ability to deliver In combination with advances in algorithmic optimization
personalized medicine grows, there will be a growing need techniques, this has resulted, in some cases, in a 200 billion
to incorporate genomics data within the context of periopera- factor speedup in processing time over the last 20 years.34 A
tive medicine. The explosive growth in genomics has been modern relational database, with fast disk arrays, adequate
driven by next-generation sequencing machines, which have memory (typically 128 GB or more), properly indexed tables,
the ability perform whole-genome sequencing at an unprec- and an intelligently constructed query that avoids table scans
edented resolution and price point. The cost to sequence the and unrestricted joins, can easily scale to terabytes of data.
entire human genome has fallen from $100 million in 2001 Partitioning, which splits 1 large table into multiple smaller
to about $10,000 in 2014.l Genome-Wide Association Studies, tables, and sharding, which distributes the partitions across
which attempt to link several genes to a single phenotype, multiple servers, are 2 techniques commonly used by mod-
and Phenome-Wide Association Studies, which attempt to ern databases to handle very large data. These technologies
link several phenotypes to a single gene, are ushering in a are available on proprietary (e.g., Microsoft SQLServerm) and
new understanding of biology and medicine, along with free (e.g., MySQLn) databases. Statistical tools such as SAS
unprecedented amounts of clinical data. The complete human (SAS Institute, Cary, NC) and R (R Foundation for Scientific
genome is approximately 3 GB. If even 1% of these data were Computing, Vienna, Austria) also have the ability to handle
used during perioperative management, that would repre- very large data sets with essentially no limitation on file size
sent a 30-fold increase in the amount of perioperative data other than that imposed by the underlying hardware and
potentially generated and stored per patient. software. In combination, these platforms and programs can
Some centers (including the authors’) are already pro- easily handle most perioperative data sets.
spectively genotyping patients and using that information This does not mean that analysis of perioperative data
to provide personalized therapeutics such as initial dosing of is immune to the limitations of traditional tools, especially
clopidogrel.23,24 The shift from population-based to patient- if those tools are not up to date. A recent observational
centered care will require the development of new approaches study of outcome after hip surgery in the United Kingdom
to managing the data that are generated as a part of this new was later found by the authors to have inadvertently
care process. Within anesthesiology, there is great promise as excluded 8 months of data.35 The error was not discovered
we begin to understand the genomic underpinnings of drug until months after publication when the senior author read
metabolism and response, pain susceptibility, and wound an editorial on Big Data and noticed the numerical similar-
healing.25–29 To be successful in this area, investigators will ity between the largest number able to be represented by
need to develop new tools that can combine, manage, and ana- 16 bits (65,536) and the number of patients in their data
lyze the growing genomic and physiologic data that are gen- set (65,535). Further investigation revealed that a very old
erated during the perioperative period that may require using version of Microsoft Excel (Excel 2003) with a 16-bit limit
a Big Data framework. An example of such a new approach on the total number of rows had been used for data analy-
was recently described in which investigators used a high- sis. This resulted in the data set being truncated at 65,535
throughput unbiased next-generation sequencing pipeline to patients (plus 1 header row for 65,536 total rows).36 The
identify leptospira in a cerebrospinal fluid sample from a criti- authors subsequently issued a correction that redefined
cally ill patient in whom all conventional diagnostic workup the time period for the original article to only include
had been negative.30 Over 10 million raw DNA sequence the analyzed cases, leaving the conclusions unaffected.
reads from the patient’s cerebrospinal fluid were compared This highlights the importance of careful data analysis
to over 40 gigabases of reference sequences obtained from the and familiarity with computer science concepts for those
National Center for Biotechnology Information. This was a involved in the analysis of large data sets, as well as in the
massive computational problem. The rapid turnaround time review of any resultant manuscripts.
(within 48 hours) was fast enough to enable clinicians to suc- Use of large data sets can present potential statistical prob-
cessfully treat the infection within the same hospitalization, lems for both researchers and readers. Application of statis-
with a near-complete recovery. The pipeline architecture is tical tests such as the Student t test, for instance, can yield
specifically intended to be cloud deployable.31 It is not hard “statistically significant” results with miniscule P values
to envision a future in which such real-time sequencing will
be routinely used in clinical practice, although its specific role m
http://technet.microsoft.com/en-us/library/ms345599(v=sql.105).aspx.
within anesthesia practice remains to be defined. Accessed November 19, 2014.
n
https://github.com/greenlion/swanhart-tools/blob/master/shard-
l
http://www.genome.gov/sequencingcosts/. Accessed November 7, 2014. query/README.md. Accessed November 26, 2014.

December 2015 • Volume 121 • Number 6 www.anesthesia-analgesia.org 1665


E THE OPEN MIND

when used on large data sets even when the actual differ- data, social media posts, etc.), where the daily volume of
ences are clinically insignificant. This requires approaches data is billions or even trillions of data points. The funda-
that establish clinically meaningful differences a priori, and mental approach to dealing with such data sets is not new:
statistical testing that establishes effect sizes of differences split them into smaller pieces, analyze each piece, and then
with confidence intervals. These approaches are bolstered by reassemble the results. What is new over the past decade is
carefully planned statistical analyses that are registered with applying this approach by using thousands or even millions
an institutional or governmental research entity before data of machines. There are issues of availability, fault tolerance,
access, which is the approach currently taken by MPOG’s and load balancing that are truly challenging. Tools such as
Perioperative Clinical Research Committee. Readers should MapReduce45,q and Hadoopr were designed to address such
maintain awareness of these implications for research results problems. These tools provide a programming framework
and cautiously interpret the significance of small effect sizes. and infrastructure to streamline and automate the use of
Other issues with statistical analysis of Big Data are massively distributed computing clusters for data process-
noise accumulation, spurious correlation, and measure- ing. They typically do not provide a relational framework
ment errors.37 “Noise accumulation” refers to the increasing but act as a more primitive key-value pair store, and are opti-
amount of corrupt, missing, or spurious data that become mized more for tasks such as processing large Web server log
present as the size and dimension (number of variables) of a files rather than ad hoc queries. It is important to understand
data set becomes very large. This can decrease the signal-to- that these tools are programmer-intensive and not turnkey
noise ratio and make it hard to identify true positives. The solutions. In truth, there are likely no perioperative data sets
high dimensionality of Big Data sets can also lead to spuri- extant today that require such advanced techniques.
ous correlations where unrelated random variables appear
to be highly and causally related but are in fact not. CONCLUSIONS
Machine-learning and data-mining methods are com- Anesthesiology is on the threshold of a change in scale that
monly applied to Big Data sets to help overcome these is affecting all of medicine and health care. While at an indi-
issues.3 The 2 terms are often conflated, and in many ways vidual case level we do not have Big Data, the demand for
overlap, but can be distinguished roughly as follows: national-level metrics, personalized medicine (genomics),
machine learning is focused on making predictions about and population-scale outcomes research will be key drivers
new data, based on known properties learned from existing for the creation of large, collaborative anesthesia data sets.
data, whereas data mining is concerned with discovery of Unless they incorporate waveform data, these data sets may
previously unknown properties.o Data mining may some- never become big enough to truly be called Big Data. The
times be referred to disparagingly as “fishing” since it usu- challenges, however, in standardization, quality control,
ally involves analyzing data without an a priori hypothesis.p and linking data across institutions will be considerable and
There is no doubt, however, that mining can provide valu- will require a keen understanding of how to manipulate
able insight into massive data sets and may be useful for very large data sets. The reward will be new insights that
hypothesis generation. Data mining can encompass sum- will allow our specialty to remain relevant in the health care
marization, outlier detection, dependency modeling, classi- ecosystem and improve the care of our patients. E
fication, clustering, and regression fitting.3 These techniques
are helpful addressing the issue of low signal-to-noise ratio DISCLOSURES
mentioned above. Name: Matthew A. Levin, MD.
Some of the commonly used machine-learning algo- Contribution: This author contributed to manuscript preparation.
rithms are Bayesian networks, cluster analysis, and sup- Attestation: Matthew A. Levin approved the final manuscript.
port vector machines.38,39 In medicine, machine learning Conflicts of Interest: This author declares no conflicts of interest.
has found particular application in genetics and genomics, Name: Jonathan P. Wanderer, MD, MPhil.
although it has also been used in the perioperative arena, Contribution: This author contributed to manuscript preparation.
particularly the intensive care unit.18,40,41 There has been Attestation: Jonathan P. Wanderer approved the final manuscript.
work done in the field of anesthesiology that uses support Conflicts of Interest: Jonathan P. Wanderer is supported by the
vector machines for predicting the depth of anesthesia in Foundation for Anesthesia Education and Research (FAER)’s
rats42 and for entropy analysis to discriminate awake ver- Mentored Research Training Grant in Health Services Research
sus asleep states in recovery from anesthesia43 Tighe et al.44 (MRTG-HSR).
explored the use of a machine-learning classifier to predict Name: Jesse M. Ehrenfeld, MD, MPH
the need for femoral nerve block after anterior cruciate liga- Contribution: This author contributed to manuscript preparation.
ment repair and found that machine-learning techniques Attestation: Jesse M. Ehrenfeld approved the final manuscript.
Conflicts of Interest: This author declares no conflicts of interest.
outperformed the more traditional logistic regression.
This manuscript was handled by: Franklin Dexter, MD, PhD.

NEW TECHNOLOGIES FOR BIG DATA REFERENCES


The real limitation of traditional relational databases and 1. Ward JS, Barker A. Undefined by data: a survey of Big Data
analytic tools becomes apparent when the volume of data definitions. arXiv 2013;cs.DB
to be analyzed becomes very large and highly dynamic. 2. Helland P. If you have too much data, then “good enough” is
good enough. Queue 2011;9
Typically, this occurs with Internet-scale data (i.e., search

http://en.wikipedia.org/wiki/Machine_learning. Accessed August 20, 2014.


o q
http://en.wikipedia.org/wiki/Map_reduce. Accessed November 26, 2014.
p
http://en.wikipedia.org/wiki/Data_mining. Accessed August 20, 2014. http://hadoop.apache.org/. Accessed November 26, 2014.
r

1666   
www.anesthesia-analgesia.org anesthesia & analgesia
Data, Big Data, and Metadata in Anesthesiology

3. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical 24. Gottesman O, Scott SA, Ellis SB, Overby CL, Ludtke A, Hulot
Learning. New York: Springer Science & Business Media, 2009 JS, Hall J, Chatani K, Myers K, Kannry JL, Bottinger EP. The
4. Sessler DI, Sigl JC, Manberg PJ, Kelley SD, Schubert A, Chamoun CLIPMERGE PGx Program: clinical implementation of person-
NG. Broadly applicable risk stratification system for predict- alized medicine through electronic health records and genom-
ing duration of hospitalization and mortality. Anesthesiology ics-pharmacogenomics. Clin Pharmacol Ther 2013;94:214–7
2010;113:1026–37 25. Kitzmiller JP, Groen DK, Phelps MA, Sadee W. Pharmacogenomic
5. Kheterpal S, Woodrum DT, Tremper KK. Too much of a good testing: relevance in medical practice: why drugs work in some
thing is wonderful: observational data for perioperative patients but not in others. Cleve Clin J Med 2011;78:243–57
research. Anesthesiology 2009;111:1183–4 26. Choi EM, Lee MG, Lee SH, Choi KW, Choi SH. Association of
6. Postoperative Visual Loss Study Group. Risk factors associated ABCB1 polymorphisms with the efficacy of ondansetron for post-
with ischemic optic neuropathy after spinal fusion surgery. operative nausea and vomiting. Anaesthesia 2010;65:996–1000
Anesthesiology 2012;116:15–24 27. Edwards RR. Genetic predictors of acute and chronic pain. Curr
7. Ramachandran SK, Nafiu OO, Ghaferi A, Tremper KK, Shanks Rheumatol Rep 2006;8:411–7
A, Kheterpal S. Independent predictors and outcomes of unan- 28. Lötsch J, Geisslinger G. Current evidence for a genetic modula-
ticipated early postoperative tracheal intubation after non- tion of the response to analgesics. Pain 2006;121:1–5
emergent, noncardiac surgery. Anesthesiology 2011;115:44–53 29. Candiotti KA, Birnbach DJ, Lubarsky DA, Nhuch F, Kamat A,
8. Brueckmann B, Villa-Uribe JL, Bateman BT, Grosse-Sundrup M, Koch WH, Nikoloff M, Wu L, Andrews D. The impact of pharma-
Hess DR, Schlett CL, Eikermann M. Development and valida- cogenomics on postoperative nausea and vomiting: do CYP2D6
tion of a score for prediction of postoperative respiratory com- allele copy number and polymorphisms affect the success or fail-
plications. Anesthesiology 2013;118:1276–85 ure of ondansetron prophylaxis? Anesthesiology 2005;102:543–9
9. Canet J, Gallart L. Predicting postoperative pulmonary com- 30. Wilson MR, Naccache SN, Samayoa E, Biagtan M, Bashir H, Yu
plications in the general population. Curr Opin Anaesthesiol G, Salamat SM, Somasekar S, Federman S, Miller S, Sokolic R,
2013;26:107–15 Garabedian E, Candotti F, Buckley RH, Reed KD, Meyer TL,
10. Mazo V, Sabaté S, Canet J, Gallart L, de Abreu MG, Belda J, Seroogy CM, Galloway R, Henderson SL, Gern JE, DeRisi JL,
Langeron O, Hoeft A, Pelosi P. Prospective external validation Chiu CY. Actionable diagnosis of neuroleptospirosis by next-
of a predictive score for postoperative pulmonary complica- generation sequencing. N Engl J Med 2014;370:2408–17
tions. Anesthesiology 2014;121:219–31 31. Naccache SN, Federman S, Veeraraghavan N, Zaharia M, Lee D,
11. Myles PS, Leslie K, McNeil J, Forbes A, Chan MT. Bispectral Samayoa E, Bouquet J, Greninger AL, Luk KC, Enge B, Wadford
index monitoring to prevent awareness during anaesthesia: the DA, Messenger SL, Genrich GL, Pellegrino K, Grard G, Leroy E,
B-Aware randomised controlled trial. Lancet 2004;363:1757–63 Schneider BS, Fair JN, Martínez MA, Isa P, Crump JA, DeRisi JL,
12. Avidan MS, Jacobsohn E, Glick D, Burnside BA, Zhang L,
Sittler T, Hackett J Jr, Miller S, Chiu CY. A cloud-compatible bioinfor-
Villafranca A, Karl L, Kamal S, Torres B, O’Connor M, Evers AS, matics pipeline for ultrarapid pathogen identification from next-gen-
Gradwohl S, Lin N, Palanca BJ, Mashour GA; BAG-RECALL eration sequencing of clinical samples. Genome Res 2014;24:1180–92
Research Group. Prevention of intraoperative awareness in a 32. Dexter F, Epstein RH, Penning DH. Statistical analysis of post-
high-risk surgical population. N Engl J Med 2011;365:591–600 anesthesia care unit staffing at a surgical suite with frequent
13. Avidan MS, Zhang L, Burnside BA, Finkel KJ, Searleman AC, delays in admission from the operating room–a case study.
Selvidge JA, Saager L, Turner MS, Rao S, Bottros M, Hantler C, Anesth Analg 2001;92:947–9
Jacobsohn E, Evers AS. Anesthesia awareness and the bispec- 33. Moore GE. Cramming more components onto integrated cir-
tral index. N Engl J Med 2008;358:1097–108 cuits. Electronics 1965;38:114–7
14. Liu D, Görges M, Jenkins SA. University of Queensland vital 34. Bertsimas D. Statistics and machine learning via a modern opti-
signs dataset. Anesth Analg 2012;114:584–9 mization lens. INFORMS Annual Meeting. Catonsville, MD: The
15. Sessler DI. Big Data–and its contributions to peri-operative
Institute for Operations Research and the Management Sciences
medicine. Anaesthesia 2014;69:100–5 (INFORMS), 2014. Available at: https://www.informs.org/conten
16. Ramachandran SK, Kheterpal S. Outcomes research using qual- t/.../2014+Morse+McCord+Lecture.pdf. Accessed March 25, 2015
ity improvement databases: evolving opportunities and chal- 35. White SM, Moppett IK, Griffiths R. Outcome by mode of anaes-
lenges. Anesthesiol Clin 2011;29:71–81 thesia for hip fracture surgery. An observational audit of 65 535
17. Lane JS, Sandberg WS, Rothman B. Development and imple- patients in a national dataset. Anaesthesia 2014;69:224–30
mentation of an integrated mobile situational awareness 36. White SM, Moppett IK, Griffiths R. Big data and big numbers.
iPhone application VigiVU™ at an academic medical center. Int Anaesthesia 2014;69:389–90
J Comput Assist Radiol Surg 2012;7:721–35 37. Fan J, Han F, Liu H. Challenges of Big Data analysis. arXiv
18. Pinsky MR. Functional haemodynamic monitoring. Curr Opin 2013;stat.ML
Crit Care 2014;20:288–93 38. Cortes C, Vapnik V. Support-vector networks. Mach Learn

19. Epstein RH, St Jacques P, Stockin M, Rothman B, Ehrenfeld JM, 1995;20:273–97
Denny JC. Automated identification of drug and food allergies 39. Bal M, Amasyali MF, Sever H, Kose G, Demirhan A. Performance
entered using non-standard terminology. J Am Med Inform evaluation of the machine learning algorithms used in inference
Assoc 2013;20:962–8 mechanism of a medical decision support system. Sci World J
20. Levin MA, Krol M, Doshi AM, Reich DL. Extraction and map- 2014;2014:1–15
ping of drug names from free text to a standardized nomen- 40. Yoo C, Ramirez L, Liuzzi J. Big data analysis using modern
clature. AMIA Annual Symposium proceedings/AMIA statistical and machine learning methods in medicine. Int
Symposium AMIA Symposium 2007:438–42 Neurourol J 2014;18:50–7
21. National Information Standards Organization. Understanding 41. Pinsky MR, Dubrawski A. Gleaning knowledge from data in the
Metadata. Bethesda, MD: NISO Press, 2004. Available at: http:// intensive care unit. Am J Respir Crit Care Med 2014;190:606–10
www.niso.org/publications/press/UnderstandingMetadata. 42. Shi L, Li X, Wan H. A predictive model of anesthesia depth
pdf based on SVM in the primary visual cortex. Open Biomed Eng
22. Vigoda MM, Lubarsky DA. Failure to recognize loss of incom- J 2013;7:71–80
ing data in an anesthesia record-keeping system may have 43. Nicolaou N, Houris S, Alexandrou P, Georgiou J. Entropy mea-
increased medical liability. Anesth Analg 2006;102:1798–802 sures for discrimination of ‘awake’ Vs ‘anaesthetized’ state in
23. Pulley JM, Denny JC, Peterson JF, Bernard GR, Vnencak-Jones recovery from general anesthesia. Conf Proc IEEE Eng Med Biol
CL, Ramirez AH, Delaney JT, Bowton E, Brothers K, Johnson Soc 2011;2011:2598–601
K, Crawford DC, Schildcrout J, Masys DR, Dilks HH, Wilke 44. Tighe P, Laduzenski S, Edwards D, Ellis N, Boezaart AP, Aygtug
RA, Clayton EW, Shultz E, Laposata M, McPherson J, Jirjis JN, H. Use of machine learning theory to predict the need for femo-
Roden DM. Operational implementation of prospective geno- ral nerve block following ACL repair. Pain Med 2011;12:1566–75
typing for personalized medicine: the design of the Vanderbilt 45. Dean J, Ghemawat S. MapReduce: simplified data processing
PREDICT project. Clin Pharmacol Ther 2012;92:87–95 on large clusters. Communications of the ACM 2008;51:107–13

December 2015 • Volume 121 • Number 6 www.anesthesia-analgesia.org 1667

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy