Anderson 1991
Anderson 1991
com/
Published by:
http://www.sagepublications.com
On behalf of:
Additional services and information for Psychological Science can be found at:
Subscriptions: http://pss.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
What is This?
Research Article
REFLECTIONS OF THE
ENVIRONMENT IN MEMORY
John R. Anderson and Lael J. Schooler
Department of Psychology, Carnegie Mellon University
Abstract-Availability of human memories for specific items current day? Memory would be behaving optimally if it made
shows reliable relationships to frequency, recency, and pattern this memory less available than memories that were more likely
ofprior exposures to the item. These relationships have defied to be used but made it more available than less likely memories.
a systematic theoretical treatment. A number ofenvironmental In this paper we examine a number of environmental sources
sources (New York Times, parental speech, electronic mail) to determine how probability of a memory being needed varies
are examined to show that the probability that a memory will be with pattern of past use. However, we first review how avail-
needed also shows reliable relationships tofrequency, recency, ability in human memory varies with pattern of past use. Some
and pattern of prior exposures. Moreover, the environmental aspects of this problem have been extensively studied in em-
relationships are the same as the memory relationships. It is pirical studies of human memory.
argued that human memory has the form it does because it is
adapted to these environmental relationships. Models for both
the environment and human memory are described. Among the FORM OF THE MEMORY FUNCTIONS
memory phenomena addressed are the practice function, the
retentioll functioll, the effect of spacing of practice, and the Two of the most basic statistics we might gather about pat:
relationship between degree ofpractice and retention. tern of past use are how often a memory has been practiced and
how long it has been since it was last practiced. Learning func-
tions and retention functions to describe these two aspects of
human memory· have been collected since the original experi-
The title of our paper is inspired by the following remark in ments of Ebbinghaus (1885/1964). Figure 1 shows the retention
Shepard (1990): "We may look into that window on the mind as function and practice function obtained by Ebbinghaus.
through a glass darkly, but what we are beginning to discern
there looks very much like a reflection of the world" (p. 213).
He was commenting on how the principles of perception are The Retention Function
exquisitely tuned to the features of the environment in which
we live. Basically, Shepard's thesis is that perception has been Ebbinghaus measured retention in terms of the percent sav-
optimized through evolution to make the best possible infer- ings in relearning a list of nonsense syllables. The function
ences about the world given the perceptual input. Recently, shows the classic negative acceleration typical of such retention
Anderson (1989, 1990) has suggested that the same might be functions. In order to be able to compare this memory function
true about human memory. to the environment, we need to decide how to characterize the
Many people hold the bias that human memory is anything forgetting function. Some (e.g., Loftus, 1985) have suggested
but optimal. They point to the many frustrating failures of mem- that these functions satisfy an exponential formula:
ory. However, these criticisms fail to appreciate the task before
human memory, which is to try to manage a huge stockpile of P = Ae- bT (1)
memories. In any system responsible for managing a vast data
base there must be failures of retrieval. It is just too expensive
Where P is the performance measure, T is the delay time, and A
to maintain access to an unbounded number of items.
and b are parameters of the model. The intuitive appeal of an
Given the initial bias against human memory, it would be
exponential function probably explains why it is so often sug-
particularly compelling if we could show that human memory
gested. It implies that during each unit of time, the memory
were optimal. How does a system behave optimally when it is
loses a constant fraction of what is left. This process evokes
faced with a huge data base of items and cannot make all of
images of radioactive decay, an analogy often used to describe
them instantaneously available? It would be behaving optimally
forgetting. One can investigate whether this function holds by
if it made most available those items that were most likely to be
performing a log transformation of the performance scale. lethe
needed.
underlying relationship is exponential, a linear relationship
In this paper we explore the issue of whether human mem-
should obtain between log performance and time:
ory is behaving optimally with respect to the pattern of past
information presentation. Each item in memory has had some
history of past use. For instance, our memory for one person's log P = log A - bT. (2)
name may not have been used in the past month but might have
been used five times in the month previous to that. What is the A precondition to performing an adequate test of such a func-
probability that the memory will be needed (used) during the tion is that we have a large manipulation of the time scale.
396 Copyright © 1991 American Psychological Society VOL. 2, NO.6, NOVEMBER 1991
Downloaded from pss.sagepub.com at COLUMBIA UNIV on October 14, 2014
PSYCHOLOGICAL SCIENCE
60
60
(a) Ebblnghaus's Retention data
(b) Ebblnghaus's Practice Data
50
50
l/I 40
Cl
t:
E
l'lI
:; CIl
..J
~ 40
.2 30
l/I
m
'i:
I- 20
30
10
Fig. 1. (a) Ebbinghaus's (1885/1964) retention function showing percent savings as a function of delay.
Ebbinghaus used delays from 20 minutes to 31 days. (b) Ebbinghaus's practice data showing total number of
trails to master a set of lists as a function of number of days of practice.
Ebbinghaus's data certainly satisfy this precondition, as he var- function relating delay to retention. I A power function has the
ied retention intervals from 20 minutes to 31 days. form:
Figure 2a illustrates the Ebbinghaus data with the perfor-
mance scale transformed. As may be observed, the resulting (3)
function is anything but linear. Thus, despite its popularity, the
hypothesis of an exponential forgetting function is not sup-
ported. Wickelgren (1976), using a d' memory measure and de- 1. Actually. Wickelgren's theory also had an exponential component
lays from 2 minutes to 14 days, found evidence for a power that would dominate the power component at very long delays.
3.6
3.8
3.6
3.6
3.4 3.4
3.2 3.2
3.0+---~-~~----.----I 3.0+---~----~---6~/--;-"----l
o 200 400 600 800 ·2 0 2 4 8
Hours of Delay Log Hours of Delay
Fig. 2. The retention data from Figure 1 with (a) the performance measure transformed according to a
logarithmic function and (b) both performance and delay scales transformed according to a logarithmic
function.
This can produce a very slowly decaying memory function. If case we have to switch the sign ofthe exponent since recall time
one performs log transformations of both the performance mea- increases with delay.
sure and the time measure, one obtains a linear relationship: One of our goals is to explain why retention functions tend to
satisfy a power relationship. Given that people have preferred
Log P = log A - b log T. (4) an exponential function on an intuitive basis, such an explana-
tion would be a nontrivial result. Power functions seem to de-
Figure 2b illustrates the Ebbinghaus data with both scales log scribe memory performance from a few seconds to years. As
transformed. As can be seen, one gets a very good approxima- Wickelgren (1974) has argued, there does not seem to be any
tion to a linear relationship in these log scales with log A = discontinuity that would be associated with a shift from short-
3.862 and b = - .126. If we go back to the original scales, we term memory to long-term memory. It will be a significant re-
get a relationship of the form: sult if we can find a reason for predicting a power function (in
contrast to an exponential function) from an analysis of the
P = 47.56 T-· 126 • (5) environment.
The exponent .126 can be taken as the forgetting rate.
A power function implies that the performance measure will The Practice Function
go to infinity as time goes to zero. In contrast, an exponential
function implies a bound on how good performance can be at t We can ask the same thing about the practice functions-are
= O. Although we never realize a true delay of zero, we still can they better fit by an exponential form or a power form? The
fail to find power functions if we use scales with an upper measure used in Ebbinghaus's Figure 1b is appropriate for ad-
bound. Probability of recall is such a scale. Ebbinghaus's per- dressing this question. Plotted there are the number of trials to
cent savings is another scale, but even at the 20-minute delay in learn a list of 36 nonsense syllables to a criterion of one correct
Ebbinghaus's experiment there was only 58% savings, so the anticipation. Ebbinghaus practiced these lists each successive
ceiling was not approached. Power functions for forgetting tend day and so we see the improvement across days with practice.
to be obtained when we use measures that do not have upper Figure 3 compares how well exponential and power func-
bounds or do not approach their upper bounds. The d' measure tions fit these data. The range of practice (1 to 6 days) is not
of Wickelgren is a scale that does not have an artificial upper large enough to enable a clear discrimination among the func-
bounds. Later we will also advocate recall odds rather than tions, although the power function produces a somewhat better
recall probability, since odds varies from zero to infinity. Recall fit. This practice function has been explored over much larger
time is another measure that ranges from zero to infinity and ranges of practice and a power function typically provides a
tends to yield power functions for retention, although in this better fit (Newell & Rosenbloom, 1981), although there has
log TrIals =4.24 - 0.50 Days log Trials = 4.08 - 1.44 log Days
=
R"2 =0.949 R"2
5.--------------,
0.996
5,.--------------,
(a) Ebbinghaus's Practice Data (b) Ebbinghaus's Practice Data
with log Transformation with log Transformations
of the Performance Scale of Both Scales
4 4
...c
III
...
c
III
ClJ ClJ
...I ...I
E E
III
3 ~ 3
lii III
-.: -.:
I- I-
Cl Cl
0 0
;.J ...I
2 2
1+--------r----~--__l
o 1 2
log Days of Practice
Days of Practice
Fig. 3. The practice data from Figure 1b with (a) the performance measure transformed according to loga-
rithmic function and (b) both performance measures transformed according to a logarithmic function.
again been a history of initial preference for the exponential Thus, it is fair to say that there is no theory of human mem-
function (Mazur & Hastie, 1975; Restle & Greeno, 1970). An- ory that adequately predicts both the practice and forgetting
other goal we have is to provide an environmental explanation functions. This is a pretty startling result since it has been a field
for why there is this ubiquitous practice function. Again this of constant research and theorizing for over 100 years.
result is not trivial given the initial beliefs that the learning
function should be exponential in form.
The Spacing Effect
The power function that corresponds to the data in Figure
3 is: One other effect that we would like to note creates even
greater stress on theories of memory-the spacing effect (Bahr-
P = 513 S-1.24 (6) ick, 1979; Glenberg, 1976). It is found that the spacing between
successive repetitions of an item affects how well the item is
where S is the number of days of study. The size of this expo- remembered. Moreover, this effect interacts with the delay be-
nent can be interpreted as the learning rate. tween the last study of an item and the test. Figure 4 displays
the results from Glenberg (1976). In this experiment there were
two studies of an item followed by a test. The data are orga-
Implications of Power Functions
nized according to the lag between the two studies and the lag
Note that in Figures 1-3 we are measuring retention by a between the second study and the test. At short test lags, recall
savings measure, where larger numbers are better, while we are is better the shorter the study lag. This can be seen as derivative
measuring practice by a trials-to-relearn measure, where large from what we have seen about the retention curve. The longer
numbers are worse. Throughout the literature one can find a the study lag, the greater the retention interval from the first
variety of performance scales, some of which have a positive study to the final test. However, when the test lag is long, there
valence like savings and others of which have a negative va- is better recall the longer the study lag. This result contradicts
lence like trials to relearn. Later we have more to say about what we would extrapolate from the retention curve alone. The
percent correct, the most common positive valence scale, and spacing effects might be characterized as showing greatest re-
reaction time, the most common negative valence scale. Gen- call when study lag matches test lag. Whether this conclusion is
erally, power functions are found whatever scale is used (pro- correct or not is unclear, but there is abundant evidence for an
vided it is not a scale with an upper bound, or if it is, the upper interaction of the sort illustrated in Figure 4 between study lag
bound is not approached). Forgetting functions display a nega- and test lag.
tive slope on positive valence scales and a positive slope on No theory of human memory, including Anderson (1982),
negative valence scales. This relation is reversed for practice has been able to account adequately for practice effects, reten-
functions. It might seem curious that power functions appear tion effects, and the spacing effect. The reason should be ap-
for different performance scales, but the power relationship is a
strong one and will be approximately maintained by many
transformations of scale. As a final comment, we should say we
have no investment in the claim that these empirical functions .6 RETENTION INTERVAL
are best modeled or correctly modeled as power functions. For 2 EVENT
our purposes, it is enough to note that power functions give ____------e
remarkably good approximations. Our goal is to show that these
8 EVENTS
remarkably good approximations are implied by the structure of fa-I .5 .---------x
the environmental input to memory. -I 32 EVENTS
A number of recent theories are capable of accounting for
<l:
()
W
_ - - - >----------0
power-law learning (Anderson, 1982; Lewis, 1978; Logan, 1988; ll:
parent, as the three effects would seem to be somewhat in con- be the gain associated with a successful retrieval, one should
tradiction. Holding test lag to last presentation constant, the stop when C > pG.
advantage of each presentation should diminish as they are Despite the description of this process in terms that evoke
spaced further apart because we are increasing the retention images of memories being considered one at a time, there are
intervals from the earlier presentations. However, the spacing equivalent parallel processes. We prefer a parallel model in
effect tells us that this is not always true. One should not think, which different memories are allocated different resources ac-
however, that the spacing effect eliminates the retention effect. cording to their need probability. However, for current pur-
The biggest effect in Figure 4 is the retention effect, which is poses we simply note that this analysis does not imply a com-
reflected in how far apart the separate curves are. One way of mitment as to the mechanism of retrieval.
characterizing what is going on is that there is a large effect of
delay since last presentation but that the other delays have a
less clear effect. Relationship between Need Odds and
A number of theories have been able to predict simulta- Behavioral Measures
neously a forgetting function, a retention function, and a spac- This analysis does allow predictions to be derived about the
ing effect (Estes, 1955; Landauer, 1975; Glenberg, 1976). How- relationship between need probability and the dependent mea-
ever, it does not seem that they can predict the power-function sures of recall latency and recall accuracy. With respect to
form that these functions appear to take. These theories assume recall latency, the critical assumption is that there is a distribu-
that memories get associated to contexts that gradually change tion of memories in terms of their estimated need probabilities.
over time. The practice function simply results from the in- The reasonable assumption is that there will be a mass of need
creased associations to context with repetition. The retention probabilities near zero with a tail of a few higher probability
function results because with time, the test context changes memories; that is, to say the distribution of memories will be
from the learning context. The spacing effects result because at J-shaped or highly skewed. It is more convenient to think about
long lags memories are likely to be associated to different con- the shape of such a distribution in terms of need odds. If pis
texts. This results in increased probability that the test context need probability, then q = p/(1 - p)"will be need odds. An odds
will overlap with one ofthe study contexts. Such a model might measure has the advantage of varying from zero to infinity.
well be given an expression that would produce the parametric Thus, the expectation is that most memories will have near-zero
form of the three effects. However, we and others have been odds and a rapidly diminishing few will have higher odds.
frustrated in our attempts to find such an expression. 2 A great many phenomena show such J-shaped distributions,
including distributions of scientists by number of publications,
words by frequency, and firms by size. Simon and Ijiri (1977)
AN ENVIRONMENTAL EXPLANATION
present the following density as characterizing such distribu-
Given that there have been no successful mechanistic expla- tions:
nations for practice, retention, and spacing phenomena, it be-
comes all the more interesting to see whether we can explain f(x) = ax- k (7)
these phenomena from the assumption that the memory system
is adapted to the structure of the environment. The basic idea is where f is the frequency of an item of measure x (e.g., word
that at any point in time, memories vary in how likely they are frequency, firm size, or need odds) and a and k are constants.
to be needed and the memory system tries to make available If we assume that memories are examined in order of odds,
those memories that are most likely to be useful. The memory then the time to examine a memory with odds q will be propor-
system can use the past history of use of a memory to estimate tional to the number of memories with odds greater than q. This
whether the memory is likely to be needed now. This view sees can be calculated as:
human memory in some sense as making a statistical inference.
However, it does not imply that memory is explicitly engaged in
(8)
statistical computations. Rather, the claim is that whatever
memory is doing parallels a correct statistical inference.
What memory is inferring is something we call the need where b = a/(k - I). Thus, we see that time is related to need
probability, which is the probability that we will need a partic- odds as a power function with exponent (k - I). Thus, if odds
ular memory trace now. The basic assumption developed in were related to retention interval or practice as a power relation
Anderson (1990) is that memories are considered in order of with exponent c, then time would be related to retention inter-
their need probabilities until the need probability is so low that valor practice with exponent c(k - 1). The force of this anal-
it no longer is worth considering any more. If we let p be the ysis is that power functions in need probability imply power
need probability, C be the cost of considering a memory, and G functions in time, although not necessarily with the same ex-
ponent. Ifk = 2, the exponent will be the same. Simon and Ijiri
2. Wickelgren (1972) produced a mathematical theory that was tai- report that values of k = 2 are common.
lored to the form of the retention function but does not address the form The above was an analysis of time. Anderson and Milson
of practice function. It mispredicts the spacing effect in that it claims (1989) can be consulted for a similar analysis of recall proba-
that the u~i1ity of later presentations is a function of how distant they are bility. The basic assumption there is that recall will stop before
from the first. It has no role for the lag among these later presentations. retrieving the target item if its need probability is too low. This
might seem to imply a step function in which all items above a article to retrieve information about the referent of that word
certain need probability are recalled and all below are not re- to decide whether this is an article that the reader might want
called. However, there has to be some noise in the process such to read.
that the distance between an item's need probability and the
threshold varies. A natural scale on which to try to model this 2. We have looked at the subset of the CHILDES data base of
variation is log need odds, which varies from minus infinity to MacWhinney and Snow (1990) looking at children's verbal
interactions. Every time someone says a word to a child, this
infinity. If we assume that there is a normal distribution of
estimated log need odds around true need odds, we predict a is a demand on the child to retrieve the word's meaning.
sigmoidal function rather than a step function relating need 3. We have looked at the electronic mail messages the first
odds to recall odds. Anderson and Milson show that this rela- author (J.A.) received from March 1985 to December 1989.
tion implies a power relationship between need odds and recall Here we have analyzed the senders of the messages. The
odds. Thus, as in the case of time, we see that the natural assumption here is that every time J .A. receives a message
prediction is that a power function in need odds implies a power from a certain person, that is another demand to retrieve
function in the observed behavior. Again, the exponent need some information from J.A.'s memory about the sender.
not be the same.
These considerations about recall odds and reaction time Figure 5 illustrates the pattern of usage of some words over a
greatly simplify our research program. They mean that these 100-day period for the New York Times. The question of inter-
dependent measures should directly reflect the functional form est is how does this pattern of use over the 100 days predict the
and ordinal relationships displayed by need odds. Thus, we can probability of use on the 101st day? In addressing this question
look to see whether need odds functions are power functions we can look at the relationship between various statistics de-
like the behavioral functions. It is not necessary that they have scribing the past 100 days and probability of occurring on the
the same parameters such as exponent or scale constant for the IOlst day. For instance, "Reagan" occurs 52 times in that 100-
power function. For instance, it is reasonable to suppose recall day period. We can look at need probability on the WIst day.
odds will be much greater than the corresponding need odds, This would be representative of an item that has had 52 prac-
but they should have the same functional forms. tices in an experiment and we are looking at its recall. It turns
out in this case "Reagan" actually appeared in the headlines on
INFORMATION ABOUT day 101 but aggregating over items that appeared 52 times in a
ENVIRONMENTAL STRUCTURE 100-day window, some will appear on day 101 and some will
not. We can use the empirical proportion as an estimate of the
What we need to find out is how past history of usage of probability that an item used 52 times in 100 days will be needed
information predicts the probability that the knowledge will be on day 101.
used in the next time interval. Anderson and Milson (1989)
developed a theory based on mathematical models that were
developed to explain library borrowings and accesses to files in The Practice Function
computer systems. While this approach has some strengths, it Figure 6a shows the relationship between the number of pre-
has two considerable weaknesses that we hope to redress in this vious days on which a word has appeared during the past 100
paper. First, while these are examples of systems that have to
retrieve information, they are not systems facing human re-
trieval demands and so we are left with an argument by analogy. Patterns of Word Usage (New York Times)
Second, while a formal model has some analytic advantages, it
obscures the very direct relationship being proposed between
the environment and memory, leading some (e.g., Simon, in
press) to claim that the predictions rest on the auxiliary assump- ---'-- -------_._--_._- of
tions in the environmental model. Quite the contrary, it is the •• - •• - . _ •• - - - _ _.- - _.- - reagan
case that the predictions are a direct reflection of the structure
of the environment. noM
Ideally, we would like to follow people about determining
When demands are being made on their memory to retrieve a .. .. amencan
0.8
co 0.3
~
... 3c
~
c
0
0.6
.
:!
5 0.2
~ .:
..
:a 0.4
~
Z>
l!
A.
.
:a
Z> 0.1
0.2 0
ci:
20 40 60 80 100 10 20 30 40 50 20 40 60 80 100
Frequency In Po.1 100 Doy. Frequency In Po.1 100 Utlerances Frequency In Pill 100 Doy.
-1
0
·2
0
~ ~
~ ·3
·1
I ·2 I 1.2
~ ~ -4 ~
·3
-4
·5
-4
~ ~ ·5
0 2 3 0 2 3 4 0 2 3 4 5
Log Frequency Log Frequency Log Frequency
Fig. 6. (a) Probability of a word occurring in a headline of the New York Times on Day 101 as a function of
the number of times it occurred in the previous 100 days; (b) probability of a word occurring in the 101st
utterance from a parent as a function of the number of times it occurred in the previous 100 days; (c)
probability of receiving a message on the 101st day from a source as a function of the number of times
messages were received from that source in the previous 100 days. Panels (d-O provide transformation of
(a-c) plotting log needs against log frequency.
days and the probability it will appear in the current day. We Simon (1955) noted that the probability of an item being
have plotted probability of occurrence on the 101st day against repeated was proportional to its past frequency of usage in a
number of uses in the previous 100 days. This analysis reveals number of sources. We have just replicated this result. The
a particularly straightforward relationship. In this data base, constant of proportionality (1.0 for New York Times, .76 for
future probability of use perfectly reflects the proportion of past child language, and .9 for mail messages) reflects the rate at
use in the data base. which new terms are appearing. One minus this constant is the
Figure 6b shows a similar analysis for the child language data probability that the next item is a new term.
base. Here we are looking at the probability of a word occurring In Figures 6a-c we have plotted the relationship between
in the 101st utterance to the chiid as a function of the number of need probability and frequency. Our prediction is that there
times it appeared in the previous 100 utterances to the child. should be a power relationship between need odds and fre-
Again we have plotted probability of use against number of quency or a linear relationship between log need odds and log
prior utterances. The relationship is again linear, although we frequency. Figures 6d-f plot log odds rather than log probabil-
find that past proportion overestimates future use. Basically, if ity. Generally, there is a strong correlation between log need
an item has occurred in a proportion P of the past 100 utter- odds and log frequency but systematic deviations appear for
ances, it has a probability .76P of occurring in the next utter- frequencies over 50. We have estimated best·fitting linear func-
ance. tions for frequencies under 50 and the results are every bit as
Finally, Figure 6c shows a similar analysis for the electronic good as in the original Figures 6a-c. We are not bothered by·
mail data. Again a linear relationship is found, but this time the deviations for frequencies over 50 because these represent very
function is .9P. few items. In the case of the New York Times, they are a few
functor words. In the case of electronic mail, they are two rable analysis from the child language data. Here we plotted
individuals. There are no such items in the case of child lan- probability that the word would appear in the 101st utterance to
guage. These few items do represent extremes that are not re- the child as a function of where last it appeared in the last 100
alized in memory experiments that produce power functions. utterances. Figure 7e shows another power relationship, this
They are items that occur nearly every day of our lives and no time with exponent .77. Figures 7c and f show the data for the
memory experiment comes close to creating that ubiquitous a mail messages. Again a linear relationship appears in the case of
learning experience. the log transformed data in Figure 7f, implying a power rela-
tionship. In this case the exponent is .83. Although we have not
bothered to include the plots, in each case the data do not
The Retention Function satisfy an exponential relationship.
We also used a window of 100 days in analyzing the New
York Times for an analog of the retention function. Here we Spacing Effects
look at probability of recall on the 101st day as a function of
how many days have elapsed since the item last occurred in that We tried to find an analog of the Glenberg study in the en-
100-day window. Figure 7a shows this relationship with an un- vironment. For the New York Times, we selected cases where
transformed scale, and Figure 7d shows the relationship plot- a word occurred exactly twice in the past 100 days and consid-
ting log need odds against log time. As can be seen, the data in ered the probability of its occurring on day 101. We analyzed
Figure 7a show the typical negative acceleration of a retention this probability of occurrence as a function of the lag between
curve, and the data in Figure 7d show that this satisfies a power the two occurrences (the analog of study lag) and the lag be-
function with exponent .73. Figures 7b and e shOW the compa- tween the second occurrence and test (the analog of test lag).
.
~.
~
:a ~ :D
:a J
!
0.04 0.1
...e
6>
...e
0.02
0.0
0 20 40 60 .0 100
0.00
0 20 40 60 80 100 0.00!---:"2~0 -=4=0::::,::0=:::.:::0:=J,0 D
DeJ••Ince LSII occurrence Uu.ranee••Ine. La.1 occurrence Deys .rnce La.1 OCCUrrence
Log Odds: - 1.95·0.73 Log DaJ. Log Odd• • • '.70 - 0.77 LOll Uuerance. Log Odde = • 1.09 • 0.83 Log DaJ.
R'2 = 0.993 R·2.0.984 R'2 = 0.986
-1r--------------, ·29<:""""'.---------,
(e) Perenl.l Speech Relenllon
0.------------.
(d) New Yorll Time. Relenllon
(I) Men Sourc•• Rel.nllon
~ -4
I
J
-5
·5
_5l---_~- ~~-1
-6'-------~---~
o
2 3 4 5 2 3 4 $ o 2 3 4 5
Logo-Js LOll U!lerentes LogPeJ'
Fig. 7. (a) Probability of a word occurring in a headline in the New York Times on day 101 as a function of
how long it has been since the word previously occurred; (b) probability of word occurring in the 101st
utterance from a parent as a function of how many utterances it has been since the word previously occurred;
(c) probability of receiving a mail message from a source as a function of how many days it has been since
a message was last received from that source. Panels (d-,.O provide transformation of (a-c) plotting log need
odds against log frequency.
....
(.) N.w York Time. Sp.cing
0.03 0.08
. ...
- 5 D.y. Unmontlontd ...........- 5 OIW" Sine.lasl ..... Mg •
Cl - 20 D.y. \lnrnIlI_ ~ 0.07 - 20 DIJI Since ....1...._
~
! ~
i
\-
0.035
---.- 40 Day- \lnrnIlItIontd _ _ 40 D.y. Since ....1....._
" i
."
::l 0.06
~
=
0
-
Ii
t
z
1... .."
0.05
.5
•
0.025
---------
1 _ _ 20 Uttwane•• Unmentioned
•
i '0
li _ 4OUtlarInct.~_
0.03
1:-
i 0.015 i.!
~ .e
'0 0.02
1-'"
1:'
0.01
0
~ 10 20 30 .0 50
~
.
.:
e 0.005
a 10 20 30 .0 50
0.01
0.00
0
~
10 20 30 .0 50
NYmbor 01 D.y. a.lWlln 2 "'ntionlng. N _ 01 D.y. _ _ ...... go.
Numbtr of UtlI..nc.. bt_n 2 1Ion_lng.
Fig. 8. Ana~og o~the Glenberg study (Fig. 4) in the (a) New York Times, (b) child language data source, and
(c) electrOnIc matI data source.
(I) Nlw York Times Retention Interaction (b) Parenlll Speech Retention Inlerlctlon (c) MIll Sourcls Rellntlon Inlerlctlon
·3r-------------, ·3 -2
·5
.fjl--_ _- _ _- -_ _-l
. 5 l - - - - - - - -_ _- - - J · 5 l - - - - - - - -_ _- - - - - I
1 2 3 4 1 2 3 4 1 2 3 4 5
Log Oays Log Ullerances Log Oays
Fig. 9. Retention function for items occurring twice in the previous 100 time units: (a) New York Times, (b)
child language, and (c) electronic mail messages. Separate functions are plotted for items for whom the two
occurrences were less than 10 units apart (short lag) and for whom the two occurrences were at least 10 units
apart (long lag).
quite plausible. Evidence for their nontriviality can be seen in Formulating the Effects of Practice and Retention
the fact that these conclusions have been reached with some
reluctance and controversy in psychology-to the extent that There remains the question of what memory mechanism
we can consider these conclusions established. Finally, there is would actually produce the practice and retention functions we
an interaction between spacing and retention such that reten- saw. One can aspire to address this question at different levels.
tion functions are steeper for more massed practice. One level would be the underlying processes that produce these
What are we to make of this parallelism between memory results. We believe that such an explanation would have to be
and environment? Certainly we can go away with the conclu- at the neural level in terms of the physical changes that underlie
sion that the functioning of memory is remarkably well adapted learning. Short of this, one could aspire to have a mathematical
to the structure of the environment. We also believe that there description of how memory would respond to various presen-
is a causal link here-that memory has the structure it has be- tation schedules. There has not been a satisfactory mathemat-
cause the environment has the structure it has. However, it is ical description to date. However, as a consequence of the anal-
possible to hold out for the hypothesis of an accidental corre- yses we have developed in this paper, we think we are now in
lation between the two. possession of such a formulation.
--
(a) Helleyer Data
----
6 4
-.- - - 1 day retention
--
--<>-- 8p.....,"'IIoNl 200% teeming
4p.....,lalloNl --{}- 7 daya retantlon
lSO%leemlng
--{}- 2p_IIoNl 100% leemlng
lp.....,"'11on 3
·1
. . . :I
a ·2
."
~ 2 ~
!' ...ll' !'
~
-3
0
IJ
'"
.
·2 oS ·1
1 2 3 4 0 2 3 .( 0 1 2 3 4
Log SeconcIa Log DIy. Log TrI.l. of Learning
Fig. 10. (a) Forgetting curves at four practice levels from Hellyer (1962); (b) forgetting curves at four practice
levels from Krueger (1929); (c) effects of practice at two retention intervals from.Underwood and Keppel
(1963).
·1..-----------------, -1,--------------,
7.12 occurrences
--0--
--0-- 7.12 Occurrences -2 - - 1·6 occurrences
- - 1·6 Occurrences
·3.J-----.-----.------,.--:.---l
o 1 2 3 4 -3+----.-----.-~---r--=--__1
o 1 2 3 4
Log Days Log Utterances
Fig. 11. (a) Retention effects in the New York Times for a word with different frequencies of occurrence; (b)
retention effects in the child language data base for items with different frequencies of OCcurrence.
Before providing a mathematical formulation, we would like 2. Strengths of individual presentations decay as a power func-
to state the basic assumptions behind the model: tion of the time.
3. The exponent of the power function for decay of each pre-
O. Strength of a trace provides an encoding of its need odds sentation decreases as a function of time since previous pre-
memory performance. sentation.
I. The strengths from individual presentations sum to produce We now give an equation to formalize each of the assump·
a total strength. tions 1-3.
.r:
'&
c
l!!
en -11
Cl
o
-'
·1
-12t--.....-----.---.---..........--....----l
1 2 3 4 5 6 o 1 2 3 4 5 6
Logoays
Log Days
Fig. 12. Practice functions generated by the mathematical model for (a) d l .125 and (b) d 1 = 1.000.
-
n
S = A 2: s(tj)
;=1
(9)
0.7 r-o-. ~
~
-0--- Test Lag 2
"""V
•
- ~
c
Q)
where dj is an exponent that can be different for each presen- ~
0.5
tation i. In the case of the first presentation, d. is a parameter of (/') -{}-
Test Lag 32
the experiment. It may vary with the type of material. Corre- l ~
-
-2 Ebbinghaus, H. (1964, 1885). Memory: A contribution to experimental psychol.
ogy. Mineola, NY: Dover Publications.
.s:: Estes, W.K. (1955). Statistical theory of distributional phenomena in learning.
en Psychological Review, 62, 369-377.
c Glenberg, A.M. (1976). Monotonic and nonmonotonic lag effects in paired-
~
en associate and recognition memory paradigms. Journal of Verbal Learning
and Verbal Behal'ior, 15, 1-16.
en Hellyer, S. (1962). Frequency of stimulus presentation and short-term decrement
o in recall. Journal of Experimental Psychology, 64, 650.
..I
-4 Krueger, W.C.F. (929). The effects of overlearning on retention. Journal of
Experimental Psychology, 12, 71-78. "
Landauer, T.K. (1975). Memory without organization: Properties of a model with
random storage and undirected retrieval. Cognitive Psychology, 7, 495-531.
Lewis, C.H. (978). Production system models of practice effects. Unpublished
doctoral dissertation, University of Michigan, Ann Arbor.
Loftus, G.R. (985). Evaluating forgetting curves. Journal of Experimental Psy-
chology: Learning, Memory, and Cognition, II, 397-406.
Logan, G.D. (1988). Toward an instance theory of automatization. Psychological
Re"iew, 95, 492-527.
_6+---------r--~--__r_--~-____l
MacWhinney, B., & Snow, C. (1990). The child language data exchange system:
1 2 3 4 an update. Journal of Child Language, 17,457-472.
Mazur, J.E., & Hastie, R. (1975). Learning as accumulation: A reexamination of
Log$econds the learning curve. Psychological Bulletin, 85, 1256-1274.
McKay, D.G. (1988). The problem flexibility, flu~,•.;;}, and speed·accuracy trade·
off in skilled behavior. Psychological Rel'iew. 89. 483-506.
Fig. 14. Simulation bfthe Hellyer (1962) data (Fig. lOa) by the Newell, A., & Rosenbloom, P. (1981). Mechanisms of skill acquisition and the law
of practice. In J.R. Anderson (Ed.), Cogniti."e skills and their acquisition
mathematical model. (pp. I-55). Hillsdale, NJ: Lawrence Erlbaum Associates.
Resile, F., & Greeno, J.G. (1970). Introduction to mathematical psychology.
Reading, MA: Addison-Wesley.
This is not a particularly obscure model of the environmental Schooler. L.J., & Anderson. J.R. (unpublished). Environmental demands in
properties of memories. Nonetheless, it turns out these simple memory: Statistical analogs to learning and forgetting curves.
assumptions have led to memory characteristics that have con- Shepard, R.N. (1990). Mind sights. New York: Freeman.
Shrager, J.C .• Hogg, T .• & Huberman. B.A. (1988). A dynamical theory of the
founded psychologists since Ebbinghaus. power-law learning in problem-solving. In Proceedings of the tenth annual
conference of the Cognith'e Science Society (pp. 468-474). Hillsdale, NJ:
Cognitive Science Society.
Acknowledgments-This research was supported by Grant BNS- Simon, H.A. (1955). On a class of skew distribution functions. Biometrika, 52.
8705811 from the National Science Foundation and Contract 425-440.
NOOOI4-90-J-1489 from the Office of Naval Research. We would like Simon, H.A. (in press). Cognitive architectures and rational analysis: Comment.
to thank Ching-Fan Sheu for his comments on this paper. The sec- In K. Van Lehn (Ed.), Architectures for intelligence. Hillsdale. NJ:
ond author was supported by Training Grant I-T32-MHI9102-01 Lawrence Erlbaum Associates.
from the National Institute of Mental Health. Simon, H.A .• & Ijiri, Y. (1977). Skew distributions and the sizes ofbusiness/irms.
New York: ElsevierlNorth Holland.
Slamecka, N.J .• & McElree, B. (1983). Normal forgetting of verbal lists as a
function of their degree of learning. Journal of Experimental Psychology:
Learning. Memory, and Cognition. 9. 384-397.
Underwood. B.J.• & Keppel, G. (1963). Retention as a function of degree of
REFERENCES learning and letter-sequence interference. Psychological Monographs, 77
(I, Whole No. 567).
Wickelgren, W.A. (1972). Trace resistance and the decay of long-term memory.
Anderson, J.R. ,<1982). Acquisition of cognitive skill. Psychological Review, 89, Journal of Mathematical Psychology, 9, 418-455.
36~06. Wickelgren, W.A. (1974). Single-trace fragility theory of memory dynamics.
Anderson, J.R. (1989). A rational analysis of human memory. In H.L. Roedinger
Memory and Cognition. 2, 775-780.
& F.I.M. Craik (Eds.), Varieties of memory and consciousness (pp. 195- Wickelgren, W.A. (1976). Memory storage dynamics. In W.K. Estes (Ed.). Hand-
210). Hillsdale, NJ: Lawrence Erlbaum Associates. book of learning and cognitive processes (pp. 321-361). Hillsdale, NJ:
Lawrence Erlbaum Associates.
Anderson, J.R. (1990). The adapth'e character of thought. Hillsdale, NJ:
Lawrence Erlbaum Associates. (RECEIVED 1/18/91; REVISION ACCEPTED 7/1/91)