Psycho-Informatics Big Data Shaping Modern Psychometrics
Psycho-Informatics Big Data Shaping Modern Psychometrics
Medical Hypotheses
journal homepage: www.elsevier.com/locate/mehy
a r t i c l e i n f o a b s t r a c t
Article history: For the first time in history, it is possible to study human behavior on great scale and in fine detail simul-
Received 6 September 2013 taneously. Online services and ubiquitous computational devices, such as smartphones and modern cars,
Accepted 19 November 2013 record our everyday activity. The resulting Big Data offers unprecedented opportunities for tracking and
analyzing behavior. This paper hypothesizes the applicability and impact of Big Data technologies in the
context of psychometrics both for research and clinical applications. It first outlines the state of the art,
including the severe shortcomings with respect to quality and quantity of the resulting data. It then pre-
sents a technological vision, comprised of (i) numerous data sources such as mobile devices and sensors,
(ii) a central data store, and (iii) an analytical platform, employing techniques from data mining and
machine learning. To further illustrate the dramatic benefits of the proposed methodologies, the paper
then outlines two current projects, logging and analyzing smartphone usage. One such study attempts
to thereby quantify severity of major depression dynamically; the other investigates (mobile) Internet
Addiction. Finally, the paper addresses some of the ethical issues inherent to Big Data technologies. In
summary, the proposed approach is about to induce the single biggest methodological shift since the
beginning of psychology or psychiatry. The resulting range of applications will dramatically shape the
daily routines of researches and medical practitioners alike. Indeed, transferring techniques from com-
puter science to psychiatry and psychology is about to establish Psycho-Informatics, an entire research
direction of its own.
Ó 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.mehy.2013.11.030
0306-9877/Ó 2013 Elsevier Ltd. All rights reserved.
406 A. Markowetz et al. / Medical Hypotheses 82 (2014) 405–411
and employment status, a single interview quickly costs several but will ultimately be made redundant by sensor-less methods of
hundred Euros. In addition, the necessary appointments impose tracking.
too great of a burden on the participant, especially when the con- In this paper, we propose observing behavior directly on digital
tent of the interviews only relies to negative aspects of life such as devices and services, such as laptops, social networks, or even cars.
psychopathological disorders. Including travel, a single interview Specifically, we focus on user interaction with smartphones. Car-
can consume the better half of a day, a burden that only be ried on the person, around the clock, and used for a wide range
imposed infrequently (in particular with participants pursuing a of (informal) communication, these devices constitute a particu-
professional career). Self-reports in form of diaries do not provide larly rich and intimate source of information. The gathered data
a viable solution either. This method, too, quickly meets a limit is of highest quality, gathered entirely in the background, and auto-
of how much time commitment can be expected from a partici- matically forwarded to a central server. The method thus burdens
pant. In sum, the constraints (i) reduce the temporal granularity neither patient/participant nor researcher. Most importantly,
at which data can be gathered, and (ii) pose tremendous problems avoids the dominant sources of bias, commonly encountered by
for longitudinal studies with respect to the amount and complete- self-reports and questionnaires.
ness of data gathered over an extended time range. For several areas of research, the proposed methodologies con-
Unfortunately, data collected by the traditional means is also stitute the only viable solution. Most notably, it constitutes the
strongly biased. Most notably, it is commonly faulty and distorted, only valid measure for usage and abuse of digital media. Kimberly
due to poor recollection of the variable of interest. This holds espe- Young [12] first saw a problem for the human condition when
cially for coarse intervals of reporting, and especially for questions excessively using the Internet, an issue also put forward for the
regarding interaction with digital devices. Very few people could usage of mobile phones [13]. Whether the observed phenomena
accurately report how often they have checked their email over constitute a ‘new disorder’ is a matter of heated debate [14].
the past 10 days (which would be an interest variable to study Although excessive use of the Internet is not a distinct disorder
Internet use/addiction). Additionally, reports about variables from in the DSM-V, evidence from both psychology, psychiatry and the
other research areas, such as subjective well-being, tend to simply neurosciences suggest that ‘‘Internet addiction’’ constitutes a sub-
reflect altered psychological states. In particular, it has been shown stantial challenge [15,16]. While a high daily ‘‘dosage’’ does not
that people use their momentary affective state for judging how qualify for an addiction, a rising number of hours spent with the
happy and satisfied they are with their lives in general. A phone over a certain time could indicate developing tolerance. In
depressed patient for example will usually see his/her well-being, any case, such behavior must be recorded directly on the device.
social functioning, and living conditions worse than they would Ordinary patients/participants cannot be expected to accurately
appear to an independent observer, or to himself/herself after answer how often they unlock their phone each day (up to 200
recovery [2]. Thus, self-reports are affected by the state of mind at times, according to our preliminary experimental findings). The
time of reporting, and the social desirability of the reported behavior. particularly poor recollection in this context arises due to the ‘‘vir-
Together, these factors introduce significant noise to infrequently tual’’ character of phone behavior. Alcohol consumption for exam-
recorded data. Clinician-rated psychometric tests, entail the risk of ple, is significantly easier to quantify, if only by the number of
being similarly biased, since assessments of experts are not entirely empty bottles.
objective. In this context, the term ‘‘objective’’ assessment is mis- The proposed methodology is about to equally revolutionize the
leading and should be replaced by ‘‘external’’, as this evaluation work of researchers with more classic research agendas, such as
might reflect the subjective view of the assessor himself [2]. personality or behavior. Recently, Kosinski et al. impressively
In short, data gathered by traditional means thus capture the inferred personality traits from the behavior on the Internet plat-
situation of a study’s participant or patient rather poorly. It is too form Facebook [17]. Yet, such research endeavors only mark the
coarse to show temporal patterns, and generally lacks dynamics. beginning of tight collaboration between psychology/psychiatry
Additionally, it commonly employs shallow scales, thus quickly and informatics. After all, Facebook usage ‘only’ represents a rather
encounter floor effects. Most questionnaires regarding depression, narrow glimpse on people’s lives. By comparison, how much can
for example, only permit answers on each item on a scale of 0–3 we learn about the human condition when monitoring mobile
[3–5]. The effects of these coarse measurements are dramatic, phones 24 h/7 days a week? The socially outgoing (extraverted)
because novel psychotropic substances frequently become stuck person could easily be detected by the amount of in- and out-
during the development phase, because (visible) positive effects coming calls, indicating a large active social network. The intro-
cannot be quantified reliably [6,7]. Clearly, innovation in method- verted person in contrast might display longer reading sessions,
ology has long been overdue. perhaps using an e-book application. The person being open for
As early as the seventies, researchers circulated the idea of new experiences (another of the Big Five Factors of Personality
actigraphy as a simple and non-invasive method for monitoring describing human characteristics by McCrae and John [18]) might
human rest and activity cycles. Inter alia, they measured sleep often install and test new apps. Numerous such dependent
patterns [8,9] and circadian rhythms [10] via specific actimetry sen- variables can be detected by observing humans through their
sors, worn on the body of the patient. While this approach overcame mobile phone interactions. These measures will capture the human
some of the obstacles faced by questionnaires, it did not quite hit the condition more precise than ever. For the first time, psychiatrists
mark. Early technology was rather simple, rendering sensors com- and psychologists can observe human behavior on a large scale,
plex, expensive and socially awkward, thus requiring substantial in the finest temporal granularity. They can thus assess the course
compliance and discipline from the patient. In some areas of of treatment and disease in a temporal continuum, instead of rely-
research such as neuropharmacology, actigraphy only was adminis- ing on selective snapshots.
tered in very few cases [11]. In recent years, miniaturization of Equally, the proposed methodology is about to revolutionize
digital devices has given new rise to the methodology. Sensors have clinical therapy, a role in which it will affect our everyday lives
become smaller, less power-hungry, and can independently trans- to an even higher degree. In this scenario, patients track a wide
mit their data. While finally practical, the central obstacle to actigra- range of personal data, from phones, cars, and fridges. From this
phy remains: the patient/participant must be coaxed into carrying a raw (and rather cryptic) data, large-scale analysis extracts mean-
sensor for a substantial period of time. In sum, actigraphy has only ingful indices, such as an ‘‘activity index’’, or a ‘‘social interaction
been used sporadically in most areas of psychiatry and psychology. index’’. The patient can then self-track his condition. He is
Due to miniaturization, it is about to enjoy a second lease of life, reassured that it is not worsening. Or, if a worsening of his health
A. Markowetz et al. / Medical Hypotheses 82 (2014) 405–411 407
condition occurs, he could confidently ask for an ad-hoc appoint- The twenty-first century will likely be remembered as the age
ment with his doctor. In addition, he can explore interdependen- of Big Data. Recent advancements in hard- and software enable
cies between his health condition and his lifestyle, such as us to store and analyze massive amounts of data, at surprisingly
staying up late, or working out. Most importantly, he can provide little cost. Currently, such technology is most prominently
(selected) data access to his coach, therapist or doctor. employed by search engines, social networks, credit card issuers
For the clinician, this methodology enables an entire range of and insurance companies. These very different businesses gather
new options. For the first time, he does not have to rely on the massive amounts of data, e.g., to seek underlying patterns or assign
(poor) self-report of his patient. Instead, he receives clear indica- scores to individual users. Based on these quantitative measures,
tors of the patient’s mental state, and changes therein, in a fine- they automatize decision-making processes, such as which adver-
granular temporal resolution. He can thus observe the continuous tisement to show, whether to approve a transaction, or to grant a
changes of health parameters over time (to follow the course of a credit line. In the context of psychology and psychiatry, we pro-
disease, or the progress of therapy). The clinician will also be able pose Big Data to produce psychometric parameters and to trace
to investigate changes of his patient throughout the day, and fine- the course of a disorder.
tune timing and dosage of medication, providing a highly individ- Unfortunately, terms like ‘‘Big Data’’ or ‘‘data mining’’ are
ualized therapy. For example, he thus could match medication surrounded by a significant amount of buzz. Frequently fed by
doses in a patient suffering from schizophrenia. The clinician can marketing departments from IT companies, the hype often obfus-
even prescribe a range of dosage, from which the patient can inde- cates the actual potential and limits. Yet, as this section should
pendently choose, according to his or her latest data. The therapist illustrate, the potential of these technologies is indeed immense.
can be automatically alarmed when symptom data indicates a crit- To complicate the terminology further, there is a significant over-
ical situation. In this case, he can intervene via phone, video confer- lap between the areas and concepts. Terms such as ‘‘data mining’’
ence, or an ad-hoc appointment. At the same time, regular and ‘‘Big Data’’ are often used interchangeably, even by experts. We
appointments can be spaced further apart. thus next clarify the core concepts, before outlining their applica-
Most importantly, the proposed methodology is significantly bility in the context of psychiatry and psychology.
cheaper than personal interaction with a therapist. This profane Big Data applications commonly comprise a range of various
observation has vast implications, opening the application area to- complex components. Data is extracted, collected, cleaned and
wards wellness and prevention for large amounts of people. Cur- transformed, stored and managed, analyzed, indexed and searched,
rently, society focuses its limited therapeutic resources on sick as well as visualized. Accordingly, these applications touch many
patients. In the future, data driven early warning systems will areas of computer science, such as database systems, text retrieval,
enable us to help people a long time before their conditions data mining, machine learning and data visualization. For the
becomes serious or chronic. Raising red flags early, some people scope of this paper, we thus define Big Data as the union over
might just need to attend a seminar on sustainable usage of digital the range of tools and disciplines involved in collecting, storing,
media, or an extended vacation, or the HR department talk to their and analyzing large amounts of data originating from observing
chaotic manager. Eventually, most corporations will deploy data the interaction between users and devices (phones).
driven preventive mental health programs. The ethical perspective Indeed, independent of this paper’s focus, the data in Big Data
(as discussed below) only constitutes a fraction of the challenges applications is frequently generated by logging user interaction.
these services face. The integration into the processes and struc- Examples range from recording queries from search engines to
tures of large corporations might turn out far more difficult. Yet, transactions from online shops. This data is commonly character-
occupational doctors can serve as a blueprint for a data driven ized by the three ‘‘v’’s, its (i) velocity, (ii) volume, and (iii) variety.
occupational mental health service, leading to their widespread The former two indicate the speed at which data records arrive,
deployment much sooner than anticipated. and the large volume they amount to. The latter refers to the wide
The remainder of this paper is structured as follows. Next, we range of different data types and sources commonly involved in
outline the underlying technological vision, comprised of various even a single application. Consider Google as a prime example
data sources, and means to store and analyze the data. Subse- for a Big Data use case, the collected data origins from a wide vari-
quently, we introduce two current studies and respective hypoth- ety of services, such as its search engine, email service, and smart-
eses. One study tracks depression, the other investigates the phone offerings (Android). Consequently, this data comes in a wide
misuse of mobile phones. We then touch upon the ethical aspects range of formats. Furthermore, already recording the queries
of the proposed methodology, a topic we feel very strongly about. issued to Google’s search engine generates data tuples at a fright-
The article ends with an outlook on the anticipated changes in ening rate, requiring extensive server farms.
research and therapy. As we outline, the proposed methodology Big Data applications commonly focus on data analysis.
will shape, if not revolutionize, psychiatry and psychology. The Researchers either attempt to (i) detect hitherto unknown trends
envisioned shift will be massive, touch every aspect of both and patterns, or (ii) ‘‘learn’’ new properties about known entities.
sciences, and eventually create its own field of research: Psycho- The former discipline is commonly referred to as data mining, the
Informatics. latter as machine learning. One common application of data mining
is the so-called market basket analysis, generating observations
Underlying technological vision such as ‘‘customers who buy bread and butter commonly also
buy beer’’. Machine learning in contrast attempts to detect hitherto
This paper’s is based on a single central thesis. The user’s mental unknown properties of objects. In classification for example, a
state, we claim, affects the way he interacts with a machine. A common machine learning task, the software is shown a reason-
stressed user may thus generate more typographic errors than ably large training set of objects, for which the property of interest
ordinarily; a depressed user may communicate less over his phone is known. By observing this labeled data, the software ‘‘learns’’ a
than previously. Conversely, so the claim continues, changes in his computational model. Later, this model allows to classify other
interaction with a machine reflect changes in his mental state. data items, i.e. automatically assign their labels. A credit card
Modern computer science enables us to automatically gather the company could for example train a classifier by showing a large
appropriate data, transfer, and analyze it, all at very little cost. number transactions which have been labeled as either correct or
The proposed methodology, so we hypothesize, might outperform fraudulent. In this example, features of interest could contain
traditional methods in both data quality as well as quantity. the country in which the card was issued, the one in which the
408 A. Markowetz et al. / Medical Hypotheses 82 (2014) 405–411
transaction takes place, the type of shop and the amount involved. more than a desktop PC. Nowadays, the same technology fits to
The resulting model could later be used to (dis-)approve transac- the size of a USB-stick, and sells for as little as 50$US. While it is
tions on a large scale. thus possible to track the user’s movements in the real world, we
Generating the computational models for data analysis is inher- must also track online behavior. The average user spends a signif-
ently labor intensive. In a trial-and-error fashion, data scientists icant amount of our lives interacting with Web applications, such as
repeatedly test the data for a certain hypothesis, fail, reconsider social networks, shopping sites or online games. Most of these ser-
their assumptions and start over. In particular for machine learn- vices offer a programming-interface which straightforwardly
ing, they have to (i) label a large training set, (ii) identify the most allows recording with who we communicate, for how long, what
expressive data features to expose to the learning software, and we buy, at which time, and how much we enjoy bashing orcs with
(iii) extract these features from the raw data. Note, only the crea- magic swords in computer games. Any such behavior might corre-
tion of the initial computational model is labor intensive. Later, this late with traits of personality, while reported changes therein
model can be used on a large scale, at little additional cost, thus might aid a clinical practitioner in treating a psychiatric disorder.
benefiting from an economy of scale. The currently foremost source of behavioral data represents
In contrast to what the title might suggest, Big Data is not so Smartphones, featuring a constant broadband connection, and
much characterized by the size of the accumulated data, but by exceeding the computational power of an early Pentium PC. They
the underlying motivation. Massive amounts of data are commonly are intimate devices, carried on the body throughout the day. We
collected without an immediate business case, but simply because it commonly interact with our phone within the last half an hour
is affordable. This data, so it is hoped, will later answer questions, before falling asleep, and no later than 30 min after waking up.
most of which yet have to arise. In addition to not knowing the (We will report the first exact data on this topic in the near future).
later application scenario at the time of data collection, scientist The phones provide our most central platform to communicate,
also commonly cannot anticipate whether or not the data will professionally, as well as to friends and spouses. The operating sys-
display patterns of interest or support a certain hypothesis. In a tem allows tracking any phone-call or SMS, as well as any interac-
Big Data context, data analytics thus become inherently post-hoc. tion with the installed software, the so-called apps. In addition,
The proposed architecture resembles those employed in many smartphones feature an entire set of sensors, such as for GPS sig-
other Big Data scenario. As depicted in Fig. 1, it gathers data from nals and acceleration. It is thus possible to track the user’s macro
various sources, preferably without requiring user interaction. It movement throughout the day, as well as e.g., her phones.
forwards this data to a central data storage, where it is analyzed On the central server, the data is subject to various levels of
and mined for patterns, trends, and outliers. The resulting knowl- analysis. These can range from simple counting (the number of
edge is then used in various application scenarios, such as academic times someone flicks on his phone), to a complex form of data min-
research, therapy, or entirely new products. ing and machine learning, especially for temporal patterns. Com-
Any piece of machinery allowing user interaction constitutes puter scientists have generated an entire range of methods for
a potential data source. The user’s desktop computer for example data analysis that yet have to be broadly applied to behavioral
allows documenting the number of typographic errors, a potential data. While far from trivial, one can essentially detect all patterns
symptom of stress and tiredness. Yet, modern IT has penetrated that are also apparent to a (trained) human eye. Riding on a train
our everyday lives to an even higher degree. From coffee machines next to a passenger who keeps checking his phone, one might
to cars and fridges, all gadgets in this world of ubiquitous computing deduce that he is nervous, or bored. The human eye might come
(will) have some computational power, as well as the ability to to the conclusion of the passenger to be nervous instead of being
communicate over the Internet. They can opaquely track the num- bored if the human brain then connects the observation of constant
ber of cups we require to get started in the morning, the frequency phone checking with the trembling of hands. The above software
of out-of-schedule snacks, as well as our propensity to tailgate on architecture might draw similar conclusions, albeit on the scale
the highway. Changes in any such parameter, so we claim, might of thousands of phone users. Frequently, modern machine learning
indicate a shift in e.g., stress or aggression. even outperforms the observational powers of humans. Duhigg
The introduction of smart watches and eyeglasses has further reports the prominent case of a concerned father complaining to
enhanced the ability to track human behavior. Additionally, recent an American retailer for marketing pregnancy related products to
advances in miniaturization have made wearable sensors small and his teen-age daughter [19]. The company’s marketing department
affordable enough for everyday use. Only a few years ago, wrist- had indeed analyzed user’s shopping behavior, trying to identify
mounted accelerometers were the size of a bar of soap, and cost women in the second trimester of their pregnancy, who would
soon make baby-related purchases. As it turned out, the girl was each of the three parameters, and one for encrypted data transfer
indeed pregnant. The retailer’s data-analytics team had outper- to the server.
formed the observational skills of her own father, who she was liv- For app usage, we introduce the concept of an app-session, the
ing with. time interval during which the user interacts with any one partic-
For the academic community, the above methodology yields an ular app. This implies that the app runs in the foreground, and the
entire range of benefits. First, it generates behavioral data of patient actively interacts with the phone. A session ends, when the
unprecedented quantity. A single cellphone may well produce a patient (i) switches to another app, (ii) closes the app, or (iii) stops
thousand data points per day. Combined with the methods of data interacting with the phone. We thus record data tuples of the for-
mining and machine learning, the approach thus enters an uncharted mat: [app name, start-time, end-time]. In this context, we treat
area of behavioral patterns and trends. Second, it is entirely opaque both, the lock-screen as well as the home menu, as just two addi-
to the user, and does not require any explicit interaction. It can tional apps. We thus also record, how often the user flicks the
thus be employed to track user behavior for months and years, phone on (without actually unlocking it), and how much time he
without overburdening the participant. Third, the approach avoids spends in the main menu.
the dominant sources of bias in studying human behavior. In par- Regarding communication patterns, we log in- and outgoing SMS
ticular, it does not remind the user that he is participating in a clin- as well as phone calls. These measures allow to infer the size and
ical trial. Most notably, collected data measures the objective usage of the users social network. In this context, we only docu-
behavior (i.e., how he actually interacted with the phone), in con- ment with how many contacts the user regularly interacts, and
trast to subjective self-reports, which commonly suffer from a dis- who initiates the communication. The actual content of the calls
torted or faulty recollection. Finally, data is transferred and (or SMS) is of no concern. Also, we are not interested in the actual
analyzed automatically, rendering the proposed approach signifi- identity of the communication partner. Hence, we anonymize their
cantly cheaper than traditional paper-based methods. In summary, phone numbers, using cryptographic hash functions (SHA-512).
the approach thus increases data quantity and quality, while Akin to app-session, we record time stamped data tuples, namely
simultaneously unburdening participants as well as researchers. for calls: [anonymized number, start-time, end-time, in/out], for
Any conclusion automatically deducted from observations of outgoing SMS: [anon. number, length in characters, time-sent],
human behavior can naturally be off. Typographic errors for exam- and for incoming SMS: [anon. number, length in characters,
ple may as well be due to stress, as to a four-year-old yanking our time-received, and time-read]. Additionally, we monitor the
sleeve. Thus, we do not propose to automatically generate diagno- phone’s address book, to estimate the total number of contacts,
ses, but to quantitatively assist the medical practitioner, akin to a their changes over time, and the fraction, which the user actively
complete blood count. Following a multi-modal approach, simulta- communicates with.
neously taking several different sensors into account, can further For macro movement, we regularly record the user’s geographic
minimize error rates. For example, the combination of an increas- position. In the context of severe depression, we are commonly
ingly monotonous voice, and progressing social withdrawal might concerned whether a user even just leaves the apartment. Every
indicate the worsening of a case of depression. Commonly, it will 20 min, we thus estimate his location using GPS, recording the data
not be possible to determine absolute thresholds. Instead, one will tuple [time, latitude, longitude, accuracy]. Location estimation is
rather investigate intra-personal change over time. For example, it additionally supported through available Wi-Fi signals and trian-
will not be possible to deduce stress from the rate of typos exceed- gulation of cell-phone towers. Such functionality is readily pro-
ing x per 100 lines of written text. In contrast, a 50% increase in typos vided by the phone’s operating system. The resulting accuracy
over 6 months could imply a substantial change in the patient’s can be expected to range between 10 and 100 m, enough for macro
condition. Research will thus focus on detecting changes over time, movement, but not sufficient to track movement inside a building.
the first statistical derivation of the observed signal. Likewise, it We are currently embarking on a study, observing patients
will frequently not be possible to detect trends applicable to an en- suffering from major depressive disorder (MDD) over a period of
tire population of users. Yet, it will be feasible to detect typologies, four months. These patients are either treated with electro convul-
i.e. classes of people exhibiting similar behavior. Data mining, in sive therapy (ECT), magnetic seizure therapy (MST) or deep brain
form of clustering and classification, can help find these typologies stimulation (DBS) in the Department of Psychiatry and Psychother-
as well as the characteristic patterns in behavior. apy at the University Hospital of Bonn. In order to establish a base-
line, we install the app 1 month prior to the treatment, and
continue measuring over the following 3 months of treatment.
Current research hypotheses in psychiatry and psychology
For the assessment of the current depressive symptomatology,
we apply the Montgomery-Åsberg Depression Rating Scale (MAD-
In two current studies, we monitor smartphones to track (i) the
RS) as well as the Beck Depression Inventory (BDI-II) on a biweekly
severity and course of depression as well as (ii) conspicuous usage
basis, as clinician- and self-rated questionnaires respectively. Due
of the Internet and phone. While these studies are decidedly small-
to drastic improvements commonly caused by the employed treat-
scale, at least compared to the above technological vision, they are
ment methods, we expect to observe a significant signal-to-noise
primarily intended to evaluate the validity as well as practicability
ratio. Furthermore we hypothesize that there is a significant corre-
of the proposed methodology.
lation between the data concerning app usage, social interaction,
macro movement and the applied questionnaires. We moreover
Monitoring depression hypothesize that this new method for monitoring the severity
and course of depression is more sensitive to change, i.e. improve-
In a first study, we currently employ smartphones to monitor ment of the depressive symptomatology (increase in activity pat-
depression. In particular we observe (i) app usage, (ii) social inter- terns, resumption of social contacts) can be detected earlier.
action, and (iii) macro movement of patients. In the context of
depression, our central thesis is that these (and eventually most Internet addiction & online social networks
other) signals show less energy and dynamics, reflecting anhedonia
and social withdrawal as central symptoms. To collect the neces- Our second current study addresses a topic inherently linked to
sary usage data, we have developed an app running on Android smartphone usage – Internet Addiction. As it does not constitute a
phones (version 4.0 and up), comprised of a software module for distinct disorder in the DSM-V, but can only be found in the
410 A. Markowetz et al. / Medical Hypotheses 82 (2014) 405–411
Appendix, further research efforts are required to better character- On a practical level, we currently follow a simple two level
ize this emerging threat to our mental health. Until now, the diag- privacy model. We only collect usage data (i.e., behavior) that com-
nosis of Internet Addiction heavily relies on questionnaires and to a monly needs to be aggregated to make any significant statement.
lesser extent on structural interviews. Given the mentioned short- In contrast, we refrain from collecting one-to-one written conver-
comings of both methods, the detection of behavioral addictions sation, audio recordings, or video captures. Where applicable, text,
will profit enormously from the collaboration with computer audio and video are analyzed locally on the phone, and only the
science. Indeed, illustrating the validity and practicability of resulting markers and numeric values are sent to the server. In a
Psycho-Informatics represents one of the most important aims of current prototype for example, we estimate the size of the vocab-
this our research endeavor. ulary from messenger applications on the phone, and only report
In a pilot study, we currently monitor the mobile phone behav- resulting figures to the server.
ior of N = 100 healthy participants for a duration of 6 weeks. This On a different note, this paper does not propose an automatic
data thus provides the first reliable longitudinal statistics on diagnosis of psychiatric disorders. Instead, it suggests to assist
diverse facets of smartphone usage and (mobile) Internet Addic- the medical practitioner with additional information, in order to
tion. We are particularly interested in fluctuations in the mobile monitor the course of a disease and treatment. In certain cases, it
phone behavior across the 6 weeks, but also search for stable – may point a clinician in a certain direction while making an initial
non-changing – facets. Questions to be answered are: How often diagnosis. The ultimate responsibility, for false positives as well as
is a phone exactly used each day? What are the most commonly negatives, remains with the clinician, as it has for two-thousand
used applications? When does a person go online for the first time years. Whether one may be able to fully automatically diagnose
in the morning? How often do they check their phones, mails or mental disorders at some point in the future, we dare not hypoth-
news? All these questions are impossible to grasp accurately by esize. Neither case would render the clinician obsolete, as commu-
just asking participants. Instead, behavior needs to be recorded nicating the findings to a patient, and even whether to do so at all,
on the device. Indeed, a central point to the current study is the is a complex matter, not to be left to machinery.Proposed method-
vast discrepancy between questionnaire based self-reports and ology could potentially serve as an early warning sign to a medical
actual phone usage. Additionally, we try to better characterize practitioner indicating that a patient might suffer from a condition.
social activity. In this context, we explicitly do not limit ourselves In this spirit, the paper compared the proposed methodology to a
to online social networks (such as Facebook), but study the size of complete blood count. Neither technique makes a diagnosis, but
the actual active social network reflected by phone-calls and SMS. assists a diagnosis, made by a clinical practitioner. The ultimate
Since we also record brain scans (structural MRI and resting state responsibility, for false positives as well as negatives, thus remain
fMRI) and genetic material from participants, we are then able to with the clinician, as it has for two thousand years. By providing
correlate biological markers with social activity. Ideally, we would additional quantitative information, we would hope to reduce
find a correlation between the active social network and certain the inherent actalepsy.
areas of the brain. A recent study addressed a similar question, cor-
relating amygdala volumes with the size of the social network [20].
This work however had to rely on self-reports, whereas we are able Conclusions and vision for the future
to measure the actual social activity, and in much finer detail. Ulti-
mately, correlating Big Data with neuroscientific measures should This paper introduces Psycho-Informatics, the application of Big
carry far enough to eventually establish its own research direction Data to psychology and psychiatry. Highly sensitive, the suggested
of Psychoneuroinformatics. method collects, stores, and analyzes massive amounts of indica-
tive data at little cost and without risks or stress for patients or
Ethical aspects and data privacy issues of ‘Big Data’ research participants. The paper outlines the technical vision, sketches the
signals that can be detected, and illustrates the tremendous bene-
The use of Big Data in research and therapy necessarily raises fits over traditional methods of psychometrics. In particular, it sug-
ethical concerns. Bordering mass surveillance, it realizes the vision gests tracking user behavior with smartphones, a particularly rich
of a ‘‘Gläserner Mensch’’, a transparent human. Data privacy thus and intimate source of data. This approach underlies two current
takes on a central role, and the potential of abuse cannot be over- studies, in the context of (i) depression and (ii) (excessive) usage
estimated. While monitoring depression in a medical scenario ful- of smartphones. The proposed methodology outperforms tradi-
fills the highest ethical standards, it could equally well be misused tional methods in both quality as well as quantity. Namely, it
by an employer to secretly monitor his staff, or by an insurance avoids biased self-reports, and avoids altering the user’s behavior,
company to reject at-risk applicants. This research is however as the data is completely collected in the background. Furthermore,
not aimed at Digital Taylorism, a strategy that would surely back- it collects data on a much finer granularity than conventional ques-
fire, but preventing, detecting and curing psychological disorders. tionnaires and enables the search for temporal activity patterns.
Both, medicine and psychology, have worked on the vision of a Additionally, there is no need to collect the data manually, as it
transparent human since their very beginning. And for the same is directly available in electronic form. While there are strong eth-
time-frame, both have had to handle sensitive data. There thus ical concerns, these must not be allowed to evolve into thought-
exists a proud tradition of confidentiality, whose methods can terminating clichés. Instead, they are to be addressed on a detailed
serve as blueprints for the deployment of Big Data technologies. level, case-by-case, following a rich tradition in medicine as well as
Scaling and extending these concepts to an entirely new dimension psychology. These concerns being addressed appropriately, Big
is no mean feat, and will generate a significant amount of work for Data is about to revolutionize both psycho-sciences in research
researchers, practitioners and occupational bodies. as well as therapy.
From a different perspective, privacy concerns constitute some- In the near future, the researchers will embark on numerous
what of a side effect. Their denial would be entirely unethical. But, projects incorporating simple aspects of human-machine-interac-
as this paper should outline, Big Data holds the potential to facili- tion and wearable sensors. Already in the medium future however,
tate treatment of mental diseases. Hence, it would be equally focus will shift towards data analytics. Once the low hanging fruit
unethical to bluntly deny its usage due to privacy concerns. Rather, has been picked (e. g. ‘simple’ descriptive statistical data on what is
the medical sciences have to follow another of their proud tradi- done how often on a smartphone), scientists will need to dig dee-
tions: balancing risks and benefits on a case-by-case basis. per inside data. Simple aggregate functions (e.g., count, or sum)
A. Markowetz et al. / Medical Hypotheses 82 (2014) 405–411 411
will no longer suffice, but be replaced by mining for complex tem- Acknowledgement
poral patterns. Eventually, the entire range of methodologies from
data mining and machine learning will have to be adapted to This work was partially funded in part by a grant awarded to
behavioral data. The effects on psychology and psychiatry will C.M. by the DFG (MO-2363/2-1) and an independent investigator
equal those of the massive change that the life sciences have grant for the assessment of effects of deep brain stimulation for
undergone, and even fundamental research methodologies will treatment resistant depression by Medtronic Inc. to TS.
have to be revisited. We will frequently hear of Psycho-Informatics
and its sub-areas, such as Psycho-Neuro-Informatics. Most impor- References
tantly, it will be possible, and not uncommon, to make an academic
career, by solely studying data. Frequently, we will no longer [1] Costa e Silva JA. Personalized medicine in psychiatry: new technologies and
approaches. Metabolism 2013;62(Suppl. 1):S40–4.
design studies, but subject existing data to deeper analysis. Some [2] Katschnig H. Quality of life in mental disorders: challenges for research and
of these datasets may have been conceived as a by-product of clinical practice. World Psychiatry 2006;5(3):139–45.
entirely different (non-academic) applications. In this context, [3] Hamilton M. Rating scale for depression. J Neurol Neurosurg Psychiatry
1960;23:56–61.
scientists will have to learn to yield control. Research will shift [4] Hamilton M. HAMA Hamilton Anxiety Scale. In: Guy W, editor. ECDEU
from carefully constructed experiments on small parts of the assessment manual for psychopharmacology, 193–198. Rockville,
population in a controlled environment, to massive longitudinal Maryland: National Institute of Mental Health; 1976.
[5] Montgomery S, Åsberg M. A new depression scale designed to be sensitive to
recorded data on tremendously large populations, full of errors
change. Br J Psychiatry 1979;134:382–9.
and noise. Yet, so we hypothesize, signals will contrast from noise [6] Schlaepfer TE et al. The hidden third: improving outcome in treatment-
clearer than ever, due to the sheer amount of data. resistant depression. J Psychopharmacol 2012;26(5):587–602.
To the same extend, Big Data will affect the daily routines of [7] Della Pasqua O, Santen GW, Danhof M. The missing link between clinical
endpoints and drug targets in depression. Trends Pharmacol Sci
patients and clinical practitioners alike. The former will collect 2010;31(4):144–52.
seemingly unrelated data to share with coaches and therapists. [8] Sadeh A. The role and validity of actigraphy in sleep medicine: an update. Sleep
Provided with the necessary toolkits and expertise, the latter will Med Rev 2011;15(4):259–67.
[9] Sadeh A, Acebo C. The role of actigraphy in sleep medicine. Sleep Med Rev
be able to observe the course (and origins) of a disorder as well 2002;6(2):113–24.
as the progress of treatment. This painted picture will be more [10] Ancoli-Israel S et al. The role of actigraphy in the study of sleep and circadian
accurate compared to previous self-reports, and of such fine gran- rhythms. Sleep 2003;26(3):342–92.
[11] Stanley N. Actigraphy in human psychopharmacology: a review. Hum
ularity as to allow a highly individualized medication. Compared to Psychopharmacol 2003;18(1):39–49.
traditional methods, this data-driven therapy will be cheaper, and [12] Young K. Internet addiction: the emergence of a new clinical disorder.
consume less time from both, therapists as well as patients. Such Cyberpsychol Behav 2009;1(3):237–44.
[13] Bianchi A, Phillips J. Psychological predictors of problem mobile phone use.
technology requires tremendous research efforts. Already the Cyberpsychol Behav 2005;8(1):39–51.
establishment of meaningful metrics (as opposed to raw data [14] Shaw M, Black DW. Internet addiction: definition, assessment, epidemiology
tuples) as well as visual data exploration tools will be a laborious and clinical management. CNS Drugs 2008;22(5):353–65.
[15] Montag C, Jurkiewicz M, Reuter M. Low self-directedness is a better predictor
and ongoing effort. However, in the medium to long term, this
for problematic internet use than high neuroticism. Comput Hum Behav
data-driven therapy will become cheaper than traditional meth- 2010;26(6):1531–5.
ods, consuming less time from both, therapists as well as patients. [16] Ko CH et al. The association between Internet addiction and psychiatric
Most importantly, the approach extends the benefits of psychiatry disorder: a review of the literature. Eur Psychiatry 2012;27(1):1–8.
[17] Kosinski M, Stillwell D, Graepel T. Private traits and attributes are predictable
and psychology far beyond treatment, into systems for early warn- from digital records of human behavior. Proc Natl Acad Sci USA
ing and mental wellness. 2013;110(15):5802–5.
[18] McCrae RR, John OP. An introduction to the five-factor model and its
applications. J Pers 1992;60(2):175–215.
Conflicts of interest statement [19] Duhigg C. How companies learn your secrets. The New York Times; 2012.
[20] Bickart KC et al. Amygdala volume and social network size in humans. Nat
Neurosci 2011;14(2):163–4.
None of the authors’ reports a conflict related to the work
described. The software mentioned is currently developed for
research purposes only, no commercial exploitation of it is planned
at this stage.