WARNING: This Appendix contains a large number of equations and may not format well
on all browsers. If you have a postscript printer or postscript enabled display,
consider downloading the
postscript version
of
this Appendix. (Compressed Postscript available soon.)
The two important components in determining learning in the Clavier model are the model's response function and the schedule's reinforcement function. Mathematically, we can designate these as follows. First, the response:
Output Rule (yi indicates whether unit i bursts):
|
|
1 iff xi >
|
(1) | |
| yi = | |||
| 0 otherwise. |
where xi ~ U[0,1]
and 0 <
i
< 1.
Each emitter unit, i, consists of a threshold,
i,
and a random variable,
xi,
drawn from a Uniform distribution over the unit interval.
An emitter unit signals whenever the value of the random variable exceeds the
threshold. Otherwise, there is no signal. The output variable,
yi,
indicates whether unit i signals
(yi = 1)
or not
(yi = 0).
Next, reinforcement:
|
|
1 iff +S is delivered at time, t. | (2) | |
| 0 otherwise. |
The variable,
,
indicates whether or not the reinforcement
schedule delivers a reinforcer at a given time, t. The equation determining
the specific function for
will, of course, depend upon the
reinforcement schedule used. For the percentile reinforcement schedule, with
parameters p and m, Platt (1973) gives us the following:
Let Y(t) be the network's output at time, t, consisting of the set of all the unit outputs at that time.
Y(t) = { yi(t) | t
}
(3.1)
where
is the set of natural numbers (i.e., the non-negative integers) and t=0
indicates the start of the simulation.
We assume that there is an ordinal measure of responding, v, such that v(Y) reaches a maximum when the response output matches some target criterion towards which the shaping is directed. (In the case of the Police Artist simulation,
v(Y) = n - h(Yo,Y) (3.2)
where n is the number of units in the network, and h is the Hamming distance between the target, Yo, and the response output, Y.)
Let z be the count of all responses amongst the last m preceding the current response that have lower values than the current response:
z(t) = c(Z(t)) (4.1)
Z(t) = { Y(t*) | v(Y(t*)) < v(Y(t)) } (4.2)
(t -1) > t* > (t - m) (4.3)
Reinforcement is delivered according to the following rule:
|
|
1 iff z > k1 | (5.1) | |
|
Pr (
|
k - k1 iff z > k1 | ||
| 0 otherwise. |
where
k = (m + 1) (1 - p) (5.2)
and
k1 = int (k) (5.3)
(that is, k1 is the largest integer less than or equal to k.)
This rule insures that the overall probability of reinforcement is p and that the probability of reinforcement is determined with respect to the last m responses, evaluated in terms of the measure according to which the behavior is to be shaped.
With the rules for responding and reinforcement determined, we can specify the
learning function, that is, how the threshold for each unit,
i,
changes over time.
i(t+1)
=
i(t)
+
i(t)
(6.1)
where yi and
(t)
are defined in Equ.s (1) and (2) above, and
= 1 -
(6.2)
The constant,
,
specifies the relative size of the decrease in
threshold for each reinforced signal and the constant,
,
specifies the
relative size of the increase in threshold with each non-reinforced signal. In
order to tune the network to learn with maximum efficiency, given a percentile
reinforcement schedule with parameter, p, set
/
= p / (1 - p)
(7.1)
One simple implementation is to specify a stepsize parameter,
,
and set
and
accordingly, viz.,
= (1 - p)
.
(7.2)
=
p .
(7.3)
Finally,
,
is a damping factor à la Bush & Mosteller
(1951), that prevents the threshold from moving outside the (0,1)
interval[1]:
i
= min
(
i,
1 -
i)
(8.1)
Substituting a truncation rule for the damping factor allows the use of a
linear learning rule, but the damping factor appears to improve speed. In
either case, the learning rule appears to generate unstable performance, at
least for the Police Artist task. Other damping factors are also possible. (A
neurologically plausible damping factor would be a decided improvement.) Any
damping factor should be symmetrical about a maximum at .5 and should not
exceed the value of
i
given above in Equ. (8.1) at any point.
Another possible damping factor would be
i
=
i
.(1 -
i)
(8.2)
This factor has the advantage of being both smooth and differentiable.
Bush, R. R. & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58, 313-323.
Platt, J. R. (1973). Percentile reinforcement: Paradigms for experimental analysis of response shaping. In G. H. Bower (Ed.), The Psychology of Learning and Motivation, Volume VII: Advances in research and theory (pp. 271-296). New York: Academic Press.