Activationist Learning
in Selectionist Neural Networks:
Computational models of R-S learning

Appendix C:
Algorithms for Clavier and Police Artist.

Steven M. Kemp

University of North Carolina
at Chapel Hill



WARNING: This Appendix contains a large number of equations and may not format well on all browsers. If you have a postscript printer or postscript enabled display, consider downloading the postscript version of this Appendix. (Compressed Postscript available soon.)

The two important components in determining learning in the Clavier model are the model's response function and the schedule's reinforcement function. Mathematically, we can designate these as follows. First, the response:

Output Rule (yi indicates whether unit i bursts):

  lefthand brace 1 iff xi > taui (1)
yi   =    
  0 otherwise.  

where xi ~ U[0,1] and 0 < taui < 1.

Each emitter unit, i, consists of a threshold, taui, and a random variable, xi, drawn from a Uniform distribution over the unit interval. An emitter unit signals whenever the value of the random variable exceeds the threshold. Otherwise, there is no signal. The output variable, yi, indicates whether unit i signals (yi = 1) or not (yi = 0).

Next, reinforcement:

  lefthand brace 1 iff +S is delivered at time, t. (2)
omega(t)  =    
  0 otherwise.  

The variable, omega, indicates whether or not the reinforcement schedule delivers a reinforcer at a given time, t. The equation determining the specific function for omega will, of course, depend upon the reinforcement schedule used. For the percentile reinforcement schedule, with parameters p and m, Platt (1973) gives us the following:

Let Y(t) be the network's output at time, t, consisting of the set of all the unit outputs at that time.

Y(t) = { yi(t) | t element of N }                 (3.1)

where N is the set of natural numbers (i.e., the non-negative integers) and t=0 indicates the start of the simulation.

We assume that there is an ordinal measure of responding, v, such that v(Y) reaches a maximum when the response output matches some target criterion towards which the shaping is directed. (In the case of the Police Artist simulation,

v(Y) = n - h(Yo,Y)                 (3.2)

where n is the number of units in the network, and h is the Hamming distance between the target, Yo, and the response output, Y.)

Let z be the count of all responses amongst the last m preceding the current response that have lower values than the current response:

z(t)   =   c(Z(t))                                     (4.1)

Z(t)   =   { Y(t*) | v(Y(t*)) < v(Y(t)) }             (4.2)

(t -1)   >   t*   >   (t - m)                           (4.3)



Reinforcement is delivered according to the following rule:

  lefthand brace 1         iff z > k1 (5.1)
Pr ( omega = 1)  = k - k1     iff z > k1
  0         otherwise.  

where

k   =   (m + 1) (1 - p)                               (5.2)

and

k1   =   int (k)                                       (5.3)

(that is, k1 is the largest integer less than or equal to k.)

This rule insures that the overall probability of reinforcement is p and that the probability of reinforcement is determined with respect to the last m responses, evaluated in terms of the measure according to which the behavior is to be shaped.

With the rules for responding and reinforcement determined, we can specify the learning function, that is, how the threshold for each unit, taui, changes over time.

taui(t+1)   =   taui(t)   +   DELTAtaui(t)                                 (6.1)

=   taui(t)   +   yi(t) . [omega(t).lambda + baromega(t).delta] . etai(t)

where yi and omega(t) are defined in Equ.s (1) and (2) above, and

baromega   =   1   -   omega                 (6.2)

The constant, lambda, specifies the relative size of the decrease in threshold for each reinforced signal and the constant, delta, specifies the relative size of the increase in threshold with each non-reinforced signal. In order to tune the network to learn with maximum efficiency, given a percentile reinforcement schedule with parameter, p, set

delta / lambda   =   p / (1 - p)                 (7.1)

One simple implementation is to specify a stepsize parameter, sigma, and set delta and lambda accordingly, viz.,

lambda   =   (1 - p) . sigma                     (7.2)

delta   =   p . sigma                           (7.3)

Finally, eta, is a damping factor à la Bush & Mosteller (1951), that prevents the threshold from moving outside the (0,1) interval[1]:

etai   =   min (taui, 1 - taui)               (8.1)

Substituting a truncation rule for the damping factor allows the use of a linear learning rule, but the damping factor appears to improve speed. In either case, the learning rule appears to generate unstable performance, at least for the Police Artist task. Other damping factors are also possible. (A neurologically plausible damping factor would be a decided improvement.) Any damping factor should be symmetrical about a maximum at .5 and should not exceed the value of etai given above in Equ. (8.1) at any point.

Another possible damping factor would be

etai   =   taui .(1 - taui)                     (8.2)

This factor has the advantage of being both smooth and differentiable.


References

Bush, R. R. & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58, 313-323.

Platt, J. R. (1973). Percentile reinforcement: Paradigms for experimental analysis of response shaping. In G. H. Bower (Ed.), The Psychology of Learning and Motivation, Volume VII: Advances in research and theory (pp. 271-296). New York: Academic Press.



[1]Thanks to R. Duncan Luce for suggesting this solution for controlling the range of tau.


Return to my HomePage or to the UNC/CH HomePage.