Chapter 7: Reinforcement and Behavior Control
A. Traditional Theories of Reinforcment.
Pavlov considered conditioning as the
connection of "biologically meaningful" stimuli. Conditioning allowed greater
adaptation and survival.
Thorndike considered that satisfaction/annoyance
"stamped in" the associations between the situation and the successful
Clark Hull proposed intervening variables
that occurred between the S and the R -- Drives such as thirst, hunger,
etc. Besides Primary Drives, he also included Acquired Drives (conditioned
stimuli associated with unconditioned reinforcers). He proposed that these
Primary and Acquired Drives were REDUCED by the presentation of the reinforcers.
Behavior was motivated by these drives, which acted to MULTIPLY the learned
behavior. Thus, when there is NO Drive, there is NO Behavior (behavior
is multiplied by zero). Hull had problems, though, with data provided by
others (such as Sheffield and Roby) that showed learning even when Drives
went unfulfilled (e.g. copulation without orgasm and thus without drive
reduction). His solution to this problem was to claim that it was the reduction
of the Drive Stimulus which was the reinforcer. I have, by the way, never
understood this pointÖ
Skinner proposed in 1938 that we should
wait for physiology to tell us about why reinforcers were reinforcing.
Instead of psychologists "making up" things inside, they should spend their
time working out the behavioral relations of reinforcers and punishers.
He therefore offered the Functional Definition of Reinforcement and Punishment
as shown in Fig. 7.1
Rate of Response
Rate of Response
Withdrawn or Omitted
B. Schedules of Operant Reinforcrs
C. Contemporary Theories of Reinforcement
Continuous reinforcement (CRF)
Schedules that Shape High Rate of Responding
(DRH reinforces a burst of response) or Low Rate of Responding (DRL reinforces
pausing between responses).
Intermittent Reinforcement - Switch between
EXT and CRF according to a "rule." Intermittent reinforcement often SHAPES
a particular kind of responding (e.g., fast or slow).
Ratio Schedules determine reinforcement
by the number of responses.
Fixed (e.g., FR 25) - Pause and Run (post-reinforcement
pause followed by rapid responding in a steady run). Example: stairway,
sales commission, piece work.
Variable (e.g., VR 25) - Steady, rapid
responding. Example: Slot Machine.
Interval Schedules deterrmine reinforcement
Fixed (e.g., FI 2 min) - Pause followed
by a slow acceleration of responding (scalloping) until the time arrives
when the final response produces the reinforcer. Example: Checking the
mail box to see if a delivery has been made -- postal worker has regular
Variable (e.g., VI 2 min) - Steady, slow
responding. Example: Fishing.
Pattern and rate of responding are shown
in Cumulative Records that allow you to assess the rate of responding (i.e.,
probability of responding) at each point in time (and also mark where reinforcers
Log Response Rate = f(Log Reinforcement
Rate) [Fig. 7.4]
The Partial Reinforcement Effect [PRE]
(including Frustration Theory -- learning to persist in the face of frustration)
Intermittent Reinforcement and Everyday
Life -- Pay schedules, feedback from others, etc. often shape our behavior
through intermittent reinforcement.
Access to a higher probability activity
(HPA) will reinforce a lower probability activity (LPA). That is, if an
LPA is REQUIRED to gain access to an HPA, then the LPA will INCREASE in
Measure of probability: percent of time
spent when access is free. Can compare two or more kinds of activity to
judge their relative probability
Examples: Running and drinking; Eating
candy, playing pinball; Studying, watching Soaps.
Note: These probabilities can change with
deprivation and other conditions. Thus, the "reinforcement relationship
Biological Basis of Reinforcement
Olds and Milner's observe that rats returned
to the place where they received an electrical stimulus to their medial
forebrain bundle (MFB) or mesotelencephalic dopamine system (ESB). They
later showed that rats would learn to press a bar in order to receive ESB.
The study by Anderson and colleagues (1992)
demonstrates the strength of these reinforcers --
Two conditions alternated for their rats:
1. When a light was on, every ten bar presses led to a reinforcer (FR 10
condition) This condition lasted 90 sec. 2. When a tone was on, they had
to delay their bar press by 30 sec in order to return to the first condtion
(DRL condition). During DRL, pressing the bar before 30 sec passed caused
the 30 sec clock to restart. This condition lasted at least 30 sec but
would continue for a long time (til the rat stopped pressing).
The rats were trained with food as the
reinforcer but then ESB was substituted. Whether Food or ESB would be earned
during the light-on period was signalled.
With continued training, the rats came
to respond vigorously to obtain the ESB reinforcers but ceased (!) to respond
to earn the food reinforcers. Negative contrast occurred: The food LOST
its ability to reinforce bar pressing when it alternated with ESB. Thus
the power of the food to reinforce depended on the context (behavioral
contrast, negative contrast).
Barker notes that following the first
few ESB's, the rats performance during the DRL was disrupted (Excitement?)
With continued training, however, the rats again performed well during
the DRL condition.
An extreme example of the power of ESB
is shown in the Routtenberg & Lindy (1965) study where rats had two
levers to press. One produced ESB and the other produced their daily food
ration. They spent their time pressing for ESB and some died of starvation!
Thus reinforcers are not always adaptive.
Harriman's study on rats with their adrenal
glands removed (adrenalectomized). These rats excrete too much salt and
therefore need to eat extra salt to stay alive. They will do so EXCEPT
when a highly sweet diet sugary diet was also available. Then some rats
switched to the sugary diet and died of salt depletion.
Conclusion: We often choose wisely between
alternatives -- selecting the best option. We and other animals, however,
do not always choose what is best for us. One way to look at this is that
evolution has not prepared us to choose wisely between certain alternatives.
Social Reinforcement Theory -- Albert
Bandura. I see three themes woven into this viewpoint.
Many of our powerful reinforcers involve
interactions with other individuals -- i.e., they are social. Are these
Primary reinforcers? Acquired (secondary) reinforcers? "Social approval"
is clearly (a) very powerful for most individuals but (b) not universal
(e.g., the sociopath). Perhaps social reinforcers are something in between
primary and secondary.
Imitation (i.e., observational learning)
is a very common way to encourage someone else to emit a behavior. "Please,
do thisÖ" Once emitted, then the behavior can be influenced by its consequences.
Vicarious reinforcement and punishment
(i.e., seeing the outcome of someone else's behavior) influences our behavior.
If they are reinforced, we also can be reinforcedÖ when they are punished,
we too are punished. (for enemies this may be reversed).
D. Issues of Behavioral Control
Note: Though these seem less "biological"
at one level, I would propose that they also involve activities of the
nervous system -- and are therefore just as biological as other reinforcers.
Skinner's Viewpoint: The current context
and your past reinforcement history in contexts like this determine what
you will do.
Discriminative Stimuli (S+, S-) provide
Stimulus Control since in the past they have predicted operant response
contingencies (reinforcement, punishment).
Discriminated operant -- operant behavior
under stimulus control.
Verbal statements others make can be discriminative
stimuli (e.g., "No! You can't have it.").
Occasion setting is a higher-order kind
of stimulus control. Another name for this is "instructional control."
Such an instructional stimulus determines the effect of other stimuli --
e.g., "The presence of the light means that the tone is an S+; the absence
of the light means that the tone is an S-." The context can take
on the role of an occasion setter.
Sequences of behaviors that accomplish
a goal can be considered to be chain of operants, with each step providing
a discriminative stimulus for the next response. For example, tying
Backwards chaining: Training chains often
goes better if you start at the end and add successive steps backwards.
A conversation also can be an example
where each person makes statements that have the role of discriminative
stimuli that influence the verbal statements of the other person.
Since each statement provides discriminative stimuli for the next response,
this can be considered to be a chain of events.