Chapter 6 Learning
Chapter 6 Learning
Behaviorism
John B. Watson is considered the founder of behaviorism. In stark contrast with Freud,
who considered the reasons for behavior to be hidden in the unconscious, Watson
championed the idea that all behavior can be studied as a simple stimulusresponse reaction,
without regard for internal processes. Watson argued that in order for psychology to become
a legitimate science, it must shift its concern away from internal mental processes because
mental processes cannot be seen or measured. Instead, he asserted that psychology must
focus on outward observable behavior that can be measured.
Watson’s ideas were influenced by Pavlov’s work. According to Watson, human behavior,
just like animal behavior, is primarily the result of conditioned responses. Whereas Pavlov’s
work with dogs involved the conditioning of reflexes, Watson believed the same principles
could be extended to the conditioning of human emotions.
His experiment: Little Albert was not afraid of rats, rabbits and other furry things (neutral
stimulus). In the experiment, Little Albert touching a rat was paired with a loud noise of
banging a hammer against a metal bar behind him (unconditioned stimulus), each time Little
Albert touched the rat, so that he started crying (unconditioned response). Albert now began
to cry (conditioned response) when any furry thing (conditioned stimulus) was presented to
him (stimulus generalization).
Psychologist B. F. Skinner saw that classical conditioning is limited to existing behaviors that
are reflexively elicited, and it doesn’t account for new behaviors such as riding a bike. He
proposed a theory about how such behaviors come about. Skinner believed that behavior is
motivated by the consequences we receive for the behavior: the reinforcements and
punishments. His idea that learning is the result of consequences is based on the law of
effect, which was first proposed by psychologist Edward Thorndike. According to the law of
effect, behaviors that are followed by consequences that are satisfying to the organism are
more likely to be repeated, and behaviors that are followed by unpleasant consequences are
less likely to be repeated.
Shaping
1. Reinforce any response that resembles the desired behavior.
2. Then reinforce the response that more closely resembles the desired behavior. You
will no longer reinforce the previously reinforced response.
3. Next, begin to reinforce the response that even more closely resembles the desired
behavior.
4. Continue to reinforce closer and closer approximations of the desired behavior.
5. Finally, only reinforce the desired behavior.
Shaping is often used in teaching a complex behavior or chain of behaviors. An important
part of shaping is stimulus discrimination. Recall Pavlov’s dogs—he trained them to respond
to the tone of a bell, and not to similar tones or sounds. This discrimination is also important
in operant conditioning and in shaping behavior.
Ex. Remember the lecturers dog… teaching the dog to turn around was an example of
shaping.
Reinforcements
Primary reinforcers are reinforcers that have innate reinforcing qualities. These kinds of
reinforcers are not learned. Water, food, sleep, shelter, sex, and touch, among others, are
primary reinforcers. Pleasure is also a primary reinforcer. Organisms do not lose their drive
for these things. For most people, jumping in a cool lake on a very hot day would be
reinforcing and the cool lake would be innately reinforcing—the water would cool the person
off (a physical need), as well as provide pleasure.
A secondary reinforcer has no inherent value and only has reinforcing qualities when
linked with a primary reinforcer. Another example, money, is only worth something when you
can use it to buy other things—either things that satisfy basic needs (food, water, shelter—all
primary reinforcers) or other secondary reinforcers. If you were on a remote island and you
had stacks of money, the money would not be useful if you could not spend it.
Reinforcement Schedules
When an organism receives a reinforcer each time it displays a behavior, it is called
continuous reinforcement. This reinforcement schedule is the quickest way to teach
someone a behavior, and it is especially effective in training a new behavior. Timing is
important when teaching a dog to sit: you will be most successful if you present the
reinforcer immediately after he sits, so that he can make an association between the target
behavior (sitting) and the consequence (getting a treat).
In partial reinforcement, also referred to as intermittent reinforcement, the person or
animal does not get reinforced every time they perform the desired behavior.
A fixed interval reinforcement schedule is when behavior is rewarded after a set amount
of time.
Only receiving painkillers once per hour, when behavior is exhibited. Since the reward (pain
relief) only occurs on a fixed interval, there is no point in exhibiting the behavior when it will
not be rewarded.
With a variable interval reinforcement schedule, the person or animal gets the
reinforcement based on varying amounts of time, which are unpredictable.
A restaurant gets checked on its hygiene every once in a while, but the restaurant never
knows when the check is coming. Staff must keep the restaurant clean at all times, so that
they keep their job when the check comes up.
With a fixed ratio reinforcement schedule, there are a set number of responses that must
occur before the behavior is rewarded.
Earning a bonus every time you sell the costumer something. The quality of what you sell
does not matter, as long as you sell something, you receive a reward.
Fixed ratios are better suited to optimize the quantity of output, whereas a fixed interval, in
which the reward is not quantity based, can lead to a higher quality of output.
In a variable ratio reinforcement schedule, the number of responses needed for a reward
varies. This is the most powerful partial reinforcement schedule.
Gambling. You never know when you receive your reward, so you keep on going until you
finally get it. Because the reinforcement schedule in most types of gambling has a variable
ratio schedule, people keep trying and hoping that the next time they will win big. This is one
of the reasons that gambling is so addictive—and so resistant to extinction.
In operant conditioning, extinction of a reinforced behavior occurs at some point after
reinforcement stops, and the speed at which this happens depends on the reinforcement
schedule. Fixed interval is the least productive and the easiest to extinguish.
Cognition and Latent Learning
Skinner was such a staunch believer that cognition didn't matter that his ideas were
considered radical behaviorism. Skinner considered the mind a "black box"—something
completely unknowable—and, therefore, something not to be studied. However, another
behaviorist, Edward C. Tolman, had a different opinion. Tolman’s experiments with rats
demonstrated that organisms can learn even if they do not receive immediate reinforcement.
This finding was in conflict with the prevailing idea at the time that reinforcement must be
immediate in order for learning to occur, thus suggesting a cognitive aspect to learning.
In the experiments, Tolman placed hungry rats in a maze with no reward for finding their way
through it. He also studied a comparison group that was rewarded with food at the end of the
maze. As the unreinforced rats explored the maze, they developed a cognitive map: a
mental picture of the layout of the maze. After 10 sessions in the maze without
reinforcement, food was placed in a goal box at the end of the maze. As soon as the rats
became aware of the food, they were able to find their way through the maze quickly, just as
quickly as the comparison group, which had been rewarded with food all along. This is
known as latent learning: learning that is not observable until there is a reason to
demonstrate it.
The Lecture
Learning and conditioning
Non-associative learning
A snail has very big neurons, so it’s an easy animal to study. They use sea slugs, very big
slugs, so that everything is easy to see. It can learn, for example, that if you keep pressing
on its skin, it’s not a danger and therefore it will stop responding.
Habituation: An organism decreases or ceases to respond to a stimulus after repeated
presentation.
Sensory habituation: Stimulus in for example vision or audio ‘disappears’ after a while,
when it is shown for a long time.
Habituation isn’t the same as learning though. Habituation in learning means a decrease in
response to a stimulus that stays the same over time, due to changes in neurons in the
central nervous system.
Sensitization: opposite of habituation. Process likely to be cellular receptors becoming more
likely to respond to a certain stimulus.
Strengthening of synaptic signals: Long-term potentiation
For example, PTSD. You become scared to intense noises, because you had negative
experiences in the past with these noises.
Non-associative learning can occur very early on in life, even in the womb.
Babies show a systematic preference for things that smell like their mother, then things that
smell like their father, and then to other things.
Associative learning
Two types of associative learning (they often appear together):
Classical conditioning: there is a stimulus and a response, and you associate them with
each other. Learning process in which previously neutral stimulus becomes associated with
another stimulus through repeated pairing with that stimulus.
Operant or Instrumental conditioning: you do something, which appears to lead to
something else. So you learn the consequences of your actions. Involves learning the
relationship between voluntary responses and their outcomes.
Classical conditioning
Extinction: when you stop reinforcing
associations, eg. Stop giving your dog treats,
the conditioning will go extinct. Conditioned
Response (CR) gradually diminishes if
Unconditioned Stimulus (US) omitted.
After extinction, there is spontaneous
recovery. They won’t forget about the behavior,
but they will still be expecting some
reward/consequence they’ve seen before.
Stimulus discrimination – adaptive ability to
react to differences if negative association with
aspect of stimulus, such as differentiating
between a high and low tone.
Stimulus generalisation – adaptive ability to
react to new stimulus which is similar to the
familiar one by generalising the response.
Instead of having to learn behavior for every
stimulus, you can generalize stimuli. If you like
strawberries, every other red berry is probably
tasty too.
Advertisements often use classical conditioning: if you see a funny advertisement, you will
associate this positivity to the brand.
One form of unconscious learning that appears to be due to classical conditioning is drug
tolerance. Drug-taking behavior (such as using a needle or even opening a bottle of beer)
functions as a signal or CS that predicts the introduction of the drug into the body. Eventually
the act of drug taking triggers an anticipatory response: the secretion of drug antagonists
that help eliminate the drug from the body.
(Note that drug tolerance is also an example of habituation – a physiological tolerance to a
drug resulting from repeated use. Thus, multiple types of learning can co-occur.)
Second-order conditioning – possibility to condition participant to produce CR to novel
stimulus by pairing novel stimulus to CS repeatedly even though novel stimulus never paired
with US Conditioning and fear – CS leads to CR because it predicts occurrence of certain
US – also true for emotional reactions.
You can condition someone out of their phobia. If you are afraid of spiders, you can be
exposed to a spider. You can become desensitized if you are exposed to the stimulus for a
longer time and you find out there’s no danger.