Click here for contact info or to return to WebSite.

Activationist Learning in Selectionist Neural Networks:
Computational models of R-S learning

Steven M. Kemp

University of North Carolina
at Chapel Hill


Abstract

The first example of a novel type of neural network with a learning algorithm based upon the neurophysiological research of In-Vitro Reinforcement is presented and described in detail. Various tasks already demonstrated by the network or proposed for future research are described. Three architectures are offered: a basic model called Clavier capable of being shaped to search spaces of binary strings, a smoothed version called Orchestrion capable of searching spaces of integer vectors, and a damped version called Piano-Forte capable of pattern recognition. Details are offered for all three models.





This document is intended to describe ongoing research into a very novel type of artificial neural network. In order to reach the widest possible audience, both background tutorials and technical appendices are provided. There is also a comprehensive glossary of terms. At present, this paper is only available as hypertext. The printed version, downloadable as PostScript or RTF, will be available soon. The hypertext version uses html and can be read using any graphics-enabled http-capable browser such as Netscape Navigator. A non-graphical browser such as lynx will give you access to some, but not all, of the document. All the files for the hypertext version are contained in the directory named "ActLearn". The hypertext can be read online, or all of the files from that directory can be downloaded together.

This is a working document. If you find that anything is unclear or incomplete, please email me at steve_kemp@unc.edu with comments and I will see what I can do. Thanks.

There are several novel features to the neural networks discussed here. Novel aspects of simulations already completed include:

Novel aspects of the proposed research include:

The remainder of this document will include descriptions of:

  1. the artificial neural network currently implemented, relating it to the relevant neurophysiological, computational, and psychological research.
  2. completed research on shaping the network in two search tasks, simple search of a space of binary strings and a variant of the "lawn-mower problem" a standard task for evaluating Genetic Algorithm models (Kozar, 1994).
  3. ongoing research into (a) producing stable convergence on standard computational search tasks and (b) performance on additional cognitive tasks, including video game play and mazes.
  4. proposed research to simulate the use of the method of successive approximations as commonly used to train hearing-impaired persons to improve phonemic pronunciation.
  5. proposed research to simulate standard categorization/pattern recognition tasks of the sort well-handled by LTP-based neural networks such as backpropagation nets in order to evaluate Rescorla's hierarchical model of stimuli modulating response-outcome relations for complex cognitive tasks.

Artificial Neural Networks (ANNs) are computer programs whose structure resembles brains and whose function resembles mind and/or behavior. At the level of the computer code, the elements of the program, called units and connections, resemble neurons and their synaptic interconnections. At the level of the inputs and outputs, the goal of an artificial neural network is to simulate or emulate the performance of tasks of which animals or humans are capable.

Many neural networks are adaptive, that is, they are designed to learn new tasks. This means that some aspect of the structure of the network must alter over time during training. Like other aspects of the system, the learning algorithm finds its basis in neurophysiology. In the case of the activationist neural networks described here, the specific neurophysiological phenomenon that provides the basis for the learning algorithm is In-Vitro Reinforcement (IVR).

In the neurophysiological laboratory, IVR has been demonstrated only at the level of individual neurons. Activationist neural networks demonstrate that interconnected systems of individual units, each of which adapts according to a logic analogous to IVR, can learn to perform complex tasks. Thus, the biology of IVR is linked to the psychology of learning.

Details of IVR and a proposed molecular mechanism as to how it may work are given in Appendix B. Details of the learning algorithm based on IVR are given in Appendix C.


Completed Research

At present, only one of the three planned neural networks is operational. This first, simplest network model is called Clavier. Clavier consists of a rectangular grid of units representing the neurons. These units are not interconnected. The output of each unit is either zero or one on each cycle. Thus, the output of the entire network is a rectangular array of ones and zeros, usually displayed on the screen as black and white squares. This allows the network to represent any binary sequence with length equal to the number of units.


Successful Search

The usual task for Clavier is a search task. A particular binary sequence is selected by the user and the network, guided by a training system, learns to duplicate the target sequence. The only information provided to the network by the training system is a single binary value, one for reinforcement (reward) and zero for no reinforcement (non-reward). On each cycle, the training system examines the output of the network, compares it to the target, and calculates whether or not to reinforce the entire network on that cycle.

This architecture is distinct from most neural networks in that there is no input layer. Other than the reinforcement, there is no stimulus information provided to the network. The simulation is called Police Artist, because the task for the neural network is to draw a picture of a target it never sees.

Each unit in the Clavier network, called an emitter unit, is a variant on the standard McCulloch-Pitts cell with a threshold between zero and one and a random activation, also between zero and one. If the activation exceeds the threshold, a signal (one) is output by that emitter unit. (When a unit emits a one, we say that the unit has fired. When it emits a zero, we say that it did not fire.) Depending upon which signal (one or zero) is output by that unit and what reinforcement signal (also one or zero) is provided to the entire network by the training system, the threshold of that unit is either raised or lowered by a calculated amount. This makes each unit in Clavier a Variable-Structure Stochastic Learning Automaton (VSLA).

For each unit, on each cycle, there are four possibilities. Either the unit signalled or it did not. Either the network was reinforced or it was not. The learning rule for altering the threshold can thus be specified by a two-by-two table (c.f. Figure 1).


Output Rule (yi indicates whether unit i bursts):

  lefthand brace 1 iff xi > tau i  
yi  =   xi ~ U[0,1]
  0 otherwise. 0 < tau i < 1

      where xi is the activation of unit i and tau i is its threshold.


Learning Rule ( DELTA tau i is the change in the threshold for unit i):

DELTA tau i = SR+ ~SR+
yi -lamdba +delta
~yi 0 0

      where lamdba is the learning increment (set to .01)

      and delta is the "decay" increment (set to .0043).


Figure 1. Algorithms for Clavier.

Research thus far has demonstrated that, using a training system called percentile reinforcement (Platt, 1973, Galbicka, 1994) taken from research in training humans and animals to perform complex tasks via shaping (Skinner, 1951), along with a learning rule derived from research on the In-Vitro Reinforcement of CA1 hippocampal neurons of the rat, the Clavier network can be trained to duplicate any binary sequence. Thus, the Clavier network has successfully linked a real training procedure from the psychological laboratory and clinic to a particular sort of neural plasticity found in the neurophysiological laboratory.

The search task proceeds as follows: On each cycle, Clavier outputs a string of ones and zeros that is matched to the black and white target pattern (see Figure 2). The difference between the two pictures is interpreted as a numerical distance (a.k.a. similarity). Because Clavier only accepts binary (yes/no) feedback on each cycle, the percentile reinforcement calculation is required to convert the latest distance measure into a binary signal by comparing how good or bad the current match is to the most recent matches on previous cycles. Clavier alters its thresholds according to the feedback and then outputs another pattern. The simulation stops when Clavier duplicates the target pattern exactly, successfully completing the search.

Over and above an initial demonstration that the Clavier network could duplicate various patterns selected by users, Monte Carlo tests were performed to evaluate how long it took Clavier to discover randomly selected target patterns. All targets were found, both by undamped and damped versions of the Clavier algorithm. The undamped version tended to be slower and tended to find patterns with more zeros than ones easier than patterns with more ones than zeros.

Mathematical details of both the training procedure and the network are given in Appendix C. Details of the results of completed Police Artist simulations are given in Appendix D. A version of the program, rewritten in C, is available for downloading.


The Lawnmower Problem

Kozar (1994) offers the lawnmower problem as a challenge to learning systems. Since the Clavier system has certain features similar to neural networks and other features similar to genetic programming systems and simulated annealing, simulations were conducted to determine if the Clavier system could be shaped, using percentile reinforcement, to a good solution to the lawnmower problem.

The Lawnmower problem consists of a programmable lawnmower and a rectangular lawn. In this version of the lawnmower problem, the programs that instruct the lawnmower consist of a sequence of two types of commands, TURN-LEFT (symbolized with a zero) and MOW-FORWARD (symbolized with a one). After all of the commands are executed in order, the lawnmower has mowed a certain proportion of the lawn.

The sequence of ones and zeros is generated by Clavier in the same way as in the search task. The amount of lawn mowed by the lawnmower using the program generated by Clavier determines the fitness measure of the outputs. The fitness measure is analogous to the measure of the similarity to the target in the Police artist task. As before, the percentile reinforcement schedule is used to convert the scalar measure into a binary reinforcement signal. On each cycle, Clavier generates a new lawnmower program. The best program is saved.

Results were mixed. Solutions were found, but analysis of the changing parameters of the system indicated that learning was not occurring. Research on this task is continuing.

Appendix E describes the Lawnmower task in detail and gives preliminary results of these simulations.


Planned Research

At present, a number of simulations are planned for IVR-based neural networks. These can be broken down in terms of the specific network architecture to be used (Clavier, Orchestrion, Piano-Forte) and the task to be accomplished.


Clavier Simulations

There are three areas of planned research for Clavier. (1) Continued research on the search task. (2) Establishing convergence and schedule control. (3) Traversing a maze.

More search tasks. Two features of the completed research on search tasks are to be modified in the planned research: similarity measure and match criterion.

The first feature is the similarity measure. In comparing the target picture to any single picture output by the network, there are four possibilities for every pixel. The target pixel may be black (zero) or white (one) and the output pixel may be black (zero) or white (one). In the completed research, the Hamming distance was used. The Hamming distance is the number of mismatches (black/white or white/black) between the target and output.

A large number of statistical measures, called measures of association, all can be used as distance/similarity measures between two sets of binary variables. At least four of these, chi-squared, phi, kappa, and lambda, are planned for future simulations. (See SAS, 1990, ch. 23, The FREQ Procedure, pp.864-877; . Liebetrau, 1983.) The research question is whether the additional information available from these statistics over the limited information provided by the Hamming measure will improve the speed of the search.

The current criterion for success in the search task is a perfect match. This demonstrates the power of the combination of the Clavier algorithm and the percentile reinforcement schedule to generate precise binary sequences with minimal feedback. With respect to many animal learning tasks, however, this criterion is overly strict. Often, in animal learning studies, the task to which the animal is initially shaped is kept very simple in order that subsequent performance in repetition of the task can be more easily studied.

In order to match more closely the animal studies, looser criteria than exact match are needed. Three have been suggested. The first two criteria were suggested by J. E. R. Staddon: (1) treat any output pattern with the same proportion of ones to zeros as the target pattern as a match. Obviously, there are many patterns with exactly the same number of zeros as the target as opposed to the one single exact matching pattern. (2) treat any output pattern with as many or more ones as in the target pattern. This is an even looser criterion than the first. A third criterion, suggested by James Beluzzi, is that a certain number of mismatched pixels be allowed for the Hamming distance. Thus, any near miss would count as a match.

The first research question is to see if parameters can be scaled to match various tasks in the animal learning literature. Is there a variant of one of the above criterion which will produce performance on the part of the Clavier network matching that of laboratory animals on the simpler tasks for which they are typically trained. The second research question is to see how repeated performance on tasks of various difficulty differ from one another. This aspect of repeated performance is dealt with in terms of convergence and schedule control.

Establishing convergence and schedule control. Within computer science, convergence refers to the performance on a search task after the initial match or, in the absence of an initial match, performance in approaching the region of the match. Ideally, once the system has moved to within a certain distance of the correct answer, it should stay in that area, like a pendulum winding down to its resting position. There are a number of mathematical criteria for convergence given by Narendra & Thathachar (1989, ch. 5). Since the simulations analyzed thus far have always stopped with the first match, the general rates of convergence have not been determined to date. (However, see Appendix D for a report of a single instance of apparent convergence on the simple search task.) Simply by extending the time of the simulation past the initial match, convergence rates can be measured with respect to various parameters, various similarity measures, and various match criteria, etc.

Within the study of animal learning, once the animal has been shaped to perform the task in question, various rules for the delivery of food or water or other reinforcers, are used to sustain repeated performance of the task. These rules are called reinforcement schedules. Different reinforcement schedules produce typical temporal patterns of behavior across species. At least three of these schedules, called continuous reinforcement, extinction, and fixed ratio, are planned as tests of Clavier.

Continuous reinforcement means that every time the organism (or, in this case, the network) performs the task, a reinforcer is given, but not otherwise. Under extinction, no reinforcement is delivered and animals usually cease to respond. Under fixed ratio, every Nth performance is reinforced for some number N. The research question is to see whether schedule performance by the Clavier network resembles animal performance on those same schedules or not.

Traversing a maze. The two commands, TURN-LEFT and MOW-FORWARD, taken from the lawnmower task can be modified into TURN-LEFT and MOVE-FORWARD to allow traversal of any two dimensional space. Simulations are planned wherein a maze will be drawn on the computer screen and Clavier will output a binary sequence that will be interpreted as a set of instructions for traversing the maze. How far the animat (synthetic animal) gets toward the goal box will determine the similarity measure. The difference between maze traversal and mowing the lawn is that the MOVE-FORWARD command will have no effect when the animat is facing one of the walls of the maze, but only when it is facing an open corridor of the maze.

The first research question will be to see if Clavier can be shaped by the percentile reinforcement schedule to traverse the maze successfully from the start box to the goal box. The second research question will be to see if errors made in learning the maze resemble those made by animals in learning similar mazes.


Orchestrion

The next version of an IVR-based neural network planned is called Orchestrion. Two variants of Orchestrion, one using a threshold output function and the other using a simple sum are planned. The threshold version can be used to solve the same tasks as Clavier and this is currently being tested. The simple-sum version can produce integer outputs and is planned for two tasks, a submarine hunt task and an interactive version of the Police Artist task.

Like Clavier, Orchestrion has a basic R-S architecture in that there is no model of sensory inputs, but only response outputs and reinforcement. The basic unit with its variable threshold, called the emitter unit, is identical to the Clavier unit. A second type of unit, called a collector unit, that does not learn, is used in a second layer. The activation rule for the collector unit depends upon the task the network is designed to solve, but the basic architecture involves the notion of a cluster of emitter units feeding into a single collector unit.

The easiest way to think of Orchestrion is as a variant of Clavier that can produce positive integers rather than just binary values. Imagine a group of emitter units (called a "cluster") with various thresholds. On any cycle, some will fire (i.e., emit ones) and the others will not fire (they emit zeros). The sum of the outputs will be an integer between zero and the total number of units in the group. The distribution of those integers over cycles will depend upon the distribution of the thresholds.

A standard technique for implementing a numerical sum in a connectionist network is to have the outputs of a group of units connected to another unit, called here a collector unit. The activation level of the collector unit is a function of the sum of the outputs of all of the emitter units in the cluster. In Orchestrion, that function is just the sum itself.

Thus, Orchestrion is composed of two layers, a layer of emitter units that resembles Clavier and a layer of collector units where each collector unit takes the output of a number of emitter units as its input. Different versions of Orchestrion have different output functions of those inputs. Two variants will be considered here: one using a threshold function for output and the other just passing along the simple sum.

Threshold-Orchestrion. Consider a design such as that illustrated in Figure 3. Emitter units are triangular. Collector units are circular. The column of emitter units directly below each collector unit is the cluster feeding that collector unit. The layer of collector units is identical to the eight-by-eight emitter output array of Clavier seen in Figure 2. It is capable of the same range of outputs. The output function of each collector is a fixed threshold set somewhere between zero and the number of emitter units in the cluster connected to that collector unit. The collector unit outputs a zero when the number of active emitter units in the cluster is less than or equal to the threshold and outputs a one (i.e., fires) otherwise.

The only difference between the output of threshold-Orchestrion and the output of Clavier is the distribution of the outputs over time. The underlying distribution of activation values for Clavier is uniform. Thus the output is a binomial process with the probability of a one determined by the variable threshold. The underlying distribution of activation values for a collector unit in Orchestrion is a sum of binomial processes. Thus variance of the underlying distribution can vary and, under ordinary circumstances, will tend to be lower than that for Clavier.

In the Police Artist task and the Lawnmower task, the network searches through a space of binary strings attempting to locate a string that best satisfies some criterion. For some search tasks, a high autocorrelation between successive attempts at solution is useful. For other tasks, a low autocorrelation is useful. In other words, some searches are better conducted by smooth, gradual movement through the space and others better with a certain amount of jumping about. If the reinforcement algorithm can be designed so as to produce higher or lower variance in the clusters for different searches, this may produce better performance by threshold-Orchestrion over Clavier for some tasks.

At present, tests are being conducted to see if threshold-Orchestrion can solve the Police Artist task.

Simple-sum Orchestrion. A number of common search tasks involve a search of a vector-space. Using multiple clusters, Simple-sum Orchestrion can be used to search a space of integer vectors. In fact, if the search-space can be approximated by a space of integer vectors, then Simple-sum Orchestrion should be able to conduct the search. In this variant of Orchestrion, there is one collector unit for every dimension of the vector space. The output of each collector unit is just the sum of the outputs of the emitter units in its cluster. The number of emitter units in the cluster determines the range of values on the associated dimension.

Simple-sum Orchestrion has not yet been tested. It is planned for three search tasks: a vocal shaping task, a submarine hunt task, and an interactive police artist task.

Vocal Shaping Task. Klatt (1980) offers a digitized formant synthesizer that produces phonemes from integer inputs. The integer outputs of Simple-Sum Orchestrion can be input into the Klatt synthesizer, producing an artificial phoneme. The target would correspond to a correctly-pronounced phoneme. Ideally, shaping would proceed in a manner analogous to Clavier on the Police Artist task. Initially, the phonemes generated would be random babble. Over time, the phonemes would come to match the target phoneme exactly. This task, should it succeed, would demonstrate the ability of the Simple-Sum Orchestrion to be shaped in a manner analogous to Clavier and also replace the somewhat arbitrary output of Clavier with a more humanlike task of learning correct phonemic pronounciation.

Submarine Hunt Task. Crone-Todd & Pear (1997) have investigated human performance on a submarine hunt task. The human subject sits before a computer screen with a mouse. The screen displays a rectangle. The subject uses the mouse to select points within the rectangle. One point in the rectangle has been randomly assigned to be the location of the submarine. Successful search is completed when the subject clicks on the submarine location. With each click of the mouse, the subject receives auditory feedback (beep or no beep). The beep indicates that the latest selected location represents an improvement over past attempts. The precise rule for what constitutes improvement depends upon the feedback rule. One feedback rule used in Todd's experiments has been the percentile reinforcement schedule, based on the distance of the selection point from the submarine.

Todd's research provides a unique opportunity to compare human performance on a laboratory task with neural network performance on that same task. The goal of the Orchestrion research on the submarine search will be, first, to see if Simple-sum Orchestrion can solve the problem and, second, to see how closely Simple-sum Orchestrion's performance can be made to match human performance on the same task, not only in terms of time to solution, but also in terms of geometric characteristics of the sequence of points on the rectangle.

Interactive Police Artist Task. Dewdney (1986) discusses a computer program that allows parameterization of information from a photograph of a human face and generates a line drawing of that face. Using an early version of a morphing algorithm, the program can then produce a caricature by distorting the parameters.

Suppose that a person was a victim of a crime, but no photograph of the criminal's face could be found. Normally, the person would work with a police artist, often assisted by use of a computerized identikit, providing information as to the size, shape and location of the various facial characteristics. Suppose instead that Simple-sum Orchestrion was used to produce a sequence of four faces. The crime victim could then select the one picture of the four that best matched the criminal. (This is mathematically equivalent to percentile reinforcement with p=25% and m=4.) The earliest of the four pictures would be replace with a new picture and the process would continue until a good match was generated by Orchestrion.

The advantage to using Orchestrion over other neural networks or computer systems capable of outputing pseudorandom integer vectors is that Orchestrion only requires binary feedback. The crime victim need make no judgements as to how much more one picture looks like the criminal than another, but only which picture looks more like the criminal. Further, due to the stochastic nature of the Orchestrion algorithm, it is very forgiving of occasional errors on the part of the human operator. Thus, Orchestrion can accept feedback of the sort that a human operator is able to generate most effectively.

Note on overlapping clusters. It is important to note that there is no requirement for any variant of Orchestrion that the clusters of emitter units be disjoint. In fact, there may be important advantages to using overlapping clusters of emitter units. Consider the case of Threshold-Orchestrion. Suppose that there are ten emitter units per cluster and that there are two collector units whose clusters overlap, sharing three emitter units. Suppose that the fixed threshold of each of the collector units is set to two. If the thresholds on the three shared emitter units drop to very low values, so that on almost every cycle, those three emitter units fire, then the activation for both of the collector units will exceed the fixed threshold ( = 2) irrespective of whether the other emitter units in the two clusters fire or not. This leaves those fourteen other emitter units free to vary so as to produce the correct outputs on other collector units to whose clusters they may belong.


Piano-Forte

The Piano-Forte network is an extension of the Orchestrion network that includes a module for modeling the effects of sensory inputs. Unlike standard neural networks, where sensory inputs increase the likelihood of appropriate response outputs, the net effect of the Piano-Forte's sensory module is to decrease the likelihood of inappropriate response outputs. The psychological basis for this design is Rescorla's (1991) model of stimuli as modulating the R-S relation between behavior and consequent reinforcing outcomes.

Rescorla (1991; Colwill & Rescorla, 1990) argues that response-outcome relations are the foundation of instrumental conditioning and that stimulus effects are due to a hierarchical relation between stimuli and the response-outcome relation. He hypothesizes that the effect of stimuli is to modulate the relation between responses and reinforcing outcomes. By "modulate," instrumentalists appear to mean either the amplification or the diminution of reinforcing effects. In the Piano-Forte network, however, modulation is meant in its more usual sense of differential attenuation (decrease only).

In Table 1, we see that the threshold for any emitter unit that does not signal will not change on that cycle. If an emitter unit's signal is suppressed, the status of the threshold remains unchanged. (This is the model's analog of memory.) In Clavier, an emitter unit signals if and only if the value of the random variable exceeds the threshold. With Piano-Forte, an emitter unit signals if and only if the value of the random variable exceeds the threshold and no sensory signal arrives at that emitter unit.

This architecture has a neurophysiological basis. Two special types of cells have axons that terminate in Type II GABAergic (presumably inhibitory) synapses on pyramidal cells, located so as to have maximum effect on pyramidal activity (Crick & Asanuma, 1986). The chandelier cells terminate on the axon hillock and the basket cells terminate on the pyramidal cell body. Signals from chandelier cells and basket cells may inhibit pyramidal activity. The architecture of Piano-Forte is based on the hypothesis that, for at least some IVR-susceptible pyramidal cells, the chandelier cells (or basket cells) are transmitters of sensory signals.

Consider an example of how Rescorla's hierarchical logic might be exhibited in behavior. Suppose that a human subject has high propensities for both giving formal toasts and standing on his head. On Saturday night, at a formal dinner, the subject initiates a toast to general approbation and thereafter proceeds to offer several more toasts and is praised for each in turn. The subject's toasting has been positively reinforced. According to Thorndike's Law of Effect, the likelihood of toasting should increase. However, as Bateson (1979, ch.2, no. 8) points out, the subject has also been rewarded for not standing on his head. Why then, on Sunday morning, when the subject arrives in Yoga class and the instructor claps his hands, does the subject promptly stand on his head as always?

The solution is that it is behavior that is reinforced, not the subject. On Saturday night, no head-standing behavior occurred, so its likelihood was unaffected. How can this principle be implemented in a neural network? Something about the sensory information from the formal dining hall protects head-standing from emission and thus from extinction. The usual hypothesis that stimuli increase the likelihood of appropriate behavior will not serve. Instead, the present hierarchical model uses the hypothesis that stimuli decrease the likelihood of inappropriate behavior. That hypothesis provides the basis for the architecture of Piano-Forte.

Thus the underlying architecture of Piano-Forte, like that of Clavier and Orchestrion, remains R-S, rather than S-R.

It is unclear whether there will be any effective differences in performance of a connectionist network using modulation, rather than stimulation, as its model of sensory input function. Piano-Forte will provide a first attempt to generate data from an R-S network on tasks typically solved by feedforward (S-R) networks such as backpropagation networks (Rumelhart, Hinton & Williams, 1986). Such simulations should provide a basis for comparison of these different network architectures.

The basic architecture of Piano-Forte is illustrated in Figure 4. There are four layers, an input module with two layers, a hidden layer of emitter units, and an output layer of collector units. All connections are of fixed weights. Connections between the second input layer and the hidden layer as well as connections between the hidden layer and the output layer are random. Inputs are binary strings of length m. Outputs are binary strings of length n. The second input layer has 2*m units, half containing a copy of the inputs and the other half a negative image of the inputs. The hidden layer contains N units.

A binary vector modeling the sensory inputs and the negative image of that vector are both transmitted from the second input layer to the emitter layer. (The addition of the negative image insures that the proportion of active inputs to inactive inputs is always fifty percent, thus guaranteeing the overall level of inhibitory suppression of the emitter module. There are always exactly m inhibitory signals output from the second input layer. This is a mathematical convenience.) Each second layer input unit inhibits a random sample of emitter units according to the standard McCulloch-Pitts rule of one inhibitory link overriding all excitatory outputs. The output of each emitter unit has an excitatory link to a random sample of collector units.

The various parameters of the network are calculated as follows:

(1) Three parameters are fixed by the problem: m, the dimension of the inputs (i.e., the number of units in the first input layer); n the dimension of the outputs (i.e., the number of units in the output layer); and C, the number of output categories into which the inputs are to be classified. (Obviously, one constraint is that C < 2n, since the output layer must be able to represent all of the output categories.)

There are three free parameters for the network:

(2) The first free parameter is r*, which is the number of emitter units per cluster. This should be set based on performance by the Threshold variant of Orchestrion. Presumably, larger r* will lead to more rapid convergence to local minima, but further simulations with Orchestrion are necessary to determine good values for r*.

(3) After the inhibitory signals damp out the emitter units in the hidden layer, only a small proportion of the N emitter units will be left active. Call this proportion, p*. This is more or less a free parameter, but there are two constraints on p*. To the degree that the output patterns representing the output categories are arbitrary and not similar to one another, there must be disjoint pools of emitter units shaped to generate each output symbol. Thus, p* should approximately equal 1/C. A smaller value allows for more independent pools and greater discrimination between stimuli. A larger value should produce greater generalization.

(4) The final free parameter is N, the number of emitter units. Once a preliminary value is set for p*, the second constraint on p* can be used to determine N. Since Nxp* will be the number of emitter units active on any cycle and r* emitter units are required for each of the n collector units, if Nxp* = nxr*, then there will be enough active emitter units to have n non-overlapping emitter clusters. (See the note under Orchestrion for details about overlapping versus non-overlapping clusters. In general, the less overlap, the more power the network will have to distinguish between similar inputs. The more overlap, the more efficient the storage by the neural network.) Thus, N should be set such that N < (nxr*)/p*, with smaller N producing more overlap.

With all of the free parameters set, the values for constructing the network can be determined.

(5) Let k be the number of random inhibitory connections from each of the input units in the second input layer. (A small degree of additional neurodevelopmental plausibility can be obtained by connecting each input unit to an average of k randomly selected hidden units, rather than exactly k units. See Edelman, 1992, for a discussion of neurodevelopment.) Set k = N x (1 - p*^(1/m)) where "^" means "raised to the power of." (Thanks Dan Hunter for this calculation.)

(6) Let r be the number of emitter units connected to each collector unit. (Again, additional neurodevelopmental plausibility can be obtained using an average of r.) Set r = r*/p*.

(7) One last fixed parameter is the initial threshold for the emitter units. Since these thresholds vary, network performance is generally insensitive to this parameter. A good value for the initial emitter threshold is 1/10. The fixed threshold for each collector unit, call it q, should be set as follows: q = (1/10) x r*.

Until successful search has been demonstrated by the threshold variant of Orchestrion, effective parameters cannot be determined for Piano-Forte. Planned research for Piano-Forte is to take standard classification tasks used for backpropagation and other feedforward networks and determine how Piano-Forte compares to these other networks on these standard tasks.


Conclusion

Obviously, there is a great deal of work to be done to demonstrate the viability and utility of IVR-based neural networks. We are interested in collaborations not only with programmers, but with psychologists and behavior analysts interested in using IVR-based networks to model behavior, cognition, and learning, as well as neuroscientists interested in improving the match between IVR-based neural networks and the neurophysiology of IVR itself.

Ultimately, hybrid neural networks with multiple modes of reinforcement will probably become the norm. In the meantime, it is important to demonstrate that reinforcement mechanisms other than LTP can provide a basis for effective computational models of learning. Any persons interested in contributing to that effort are encouraged to contact us.


References

Barto, A. G. & Anandan, P. (1985). Pattern recognizing stochastic learning automata. IEEE Transactions on Systems, Man, and Cybernetics, 15, 360-374.

Barto, A. G., Sutton, R. S., & Brouwer, P. S. (1981). Associative search network: A reinforcement learning associative memory. Biological Cybernetics, 40, 201-211.

Bateson, G. (1979). Mind and Nature: A necessary unity. New York: Dutton.

Bliss, T. V. P. & Colingridge, G. L. (1993). A synaptic model of memory: Long-term potentiation in the hippocampus. Nature, 361, 31-39.

Bliss, T. V. P. & Lømø, T. (1973). Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path. Journal of Physiology, 232, 331-356.

Bush, R. R. & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58, 313-323.

Colwill, R. M. & Rescorla, R. A. (1990). Evidence for the hierarchical structure of instrumental learning. Animal Learning & Behavior, 18, 71-82.

Crick, F. & Asanuma, C. (1986). Certain aspects of the anatomy and physiology of the cerebral cortex. In J. L. McClelland and D E. Rumelhart (Eds.), Parallel Distributed Processing: Explorations in the microstructure of cognition, Volume II: Psychological and biological models (ch. 20, pp.333-371). Cambridge, MA: Bradford Books/MIT Press.

Dewdney, A. K. (1986, October). The compleat computer caricaturist and a whimsical tour of face space. Scientific American, 255, 20-22, 24, 27-28.

Crone-Todd, D. E., & Pear, J. J. (1997, May). A comparison of Effective Percentile Schedules of Reinforcement with Modified Hillclimbing Shaping Parameters. Paper presented at the 23rd Annual Convention of the Association for Behavior Analysis, Chicago, IL.

Edelman, G.M. (1987). Neural Darwinism: The theory of neuronal group selection. New York: Basic Books.

Edelman, G.M. (1992). Bright air, brilliant fire: On the matter of the mind. New York: Basic Books.

Galbicka, G. (1994). Shaping in the 21st Century: Moving percentile schedules into applied settings. Journal of Applied Behavior Analysis, 27, 739-760.

Goodman, L. A. & Kruskal, W. H. (1963). Measures of association for cross-classification III. Journal of the American Statistical Association, 58, 310-364.

Grossberg, S. (1973). Contour enhancement, short term memory, and constancies in reverberating neural networks. Studies in Applied Mathematics, 62, 213-257. Reprinted in S. Grossberg, Studies of mind and brain: Neural principles of learning, perception, development, cognition , and motor control, (Boston Studies in the Philosophy of Science, Volume 70), ch. 8, pp.332-378. Dordrecht: D. Reidel.

Gullapalli, V. (1990). A stochastic reinforcement learning algorithm for learning real-valued functions. Neural Networks, 3, 671-692.

Hebb, D. O. (1949). The organization of behavior. New York: John Wiley and Sons.

Johnson-Laird, P. N. (1988). The computer and the mind: An introduction to cognitive science. Cambridge, MA: Harvard University Press.

Kauffman, S. (1995). At home in the universe: The search for the laws of self-organization and complexity. New York: Oxford University Press.

Kelso, S.R., Ganong, A.H., & Brown, T.H. (1986). Hebbian synapses in hippocampus. Proceedings of the National Academy of Sciences, 83, 5326-5330.

Kemp, S. M. (1995, November). The art of shaping and the shaping of art: Search in Hamming space using an IVR-based neural network. Poster session presented at the annual meeting of the South Eastern Association for Behavior Analysis, Charleston, SC.

Kemp, S. M. (1997). R-S and S(-O)-R: Alternative designs for neural networks. Journal of the Experimental Analysis of Behavior, 67, 229-231.

Klatt, D. H. (1980). Software for a cascade/parallel formant synthesizer. Journal of the Acoustical Society of America, 67, 971-995.

Kozar, J. R. (1994). Genetic Programming II: Automatic discovery of reusable progams. Cambridge, MA: MIT Press.

Liebetrau, A. M. (1983). Measures of association (Quantitative Applications in the Social Sciences, No.32). Newbury Park, CA: SAGE Publications.

McClelland, J.L., Rumelhart, D.E., & The PDP Research Group. (1986). Parallel Distributed Processing: Explorations in the microstructure of cognition, Volume II: Psychological and biological models. Cambridge, MA: Bradford Books/MIT Press.

McCulloch, W.S. & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115-133.

Meyer, M. F. (1911). The fundamental laws of human behavior. Boston: Richard G. Badger.

Minsky, M.L. (1967). Computation: Finite and infinite machines. Englewood Cliffs, NJ: Prentice-Hall.

Minsky, M.L. & Papert, S. (1969). Perceptrons. Cambridge, MA: MIT Press..

Narendra, K. S. & Thathachar, M. A. L. (1989). Learning Automata: An introduction. Englewood Cliffs, NJ: Prentice-Hall.

Phillips, A. G. & Fibiger, H. C. (1989). Neuroanatomical bases of intracranial self-stimulation: Untangling the Gordian knot. In J. M. Leibman and S. J. Cooper (Eds.), The neuropharmacological basis of reward (pp.66-105). Oxford: Clarendon Press.

Platt, J. R. (1973). Percentile reinforcement: Paradigms for experimental analysis of response shaping. In G. H. Bower (Ed.), The Psychology of Learning and Motivation, Volume VII: Advances in research and theory (pp. 271-296). New York: Academic Press.

Rescorla, R. A. (1991). Associative relations in instrumental learning: The Eighteenth Bartlett Memorial Lecture. The Quarterly Journal of Experimental Psychology, 438, 1-23.

Roberts, G. W. (1990). Schizophrenia: The cellular biology of a functional psychosis. Trends in Neurosciences, 13, 207-211.

Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the microstructure of cognition, Volume I: Foundations, edited by D E. Rumelhart and J. L. McClelland (ch. 8, pp.318-362). Cambridge, MA: Bradford Books/MIT Press.

Rumelhart, D.E., McClelland, J.L., & The PDP Research Group. (1986). Parallel Distributed Processing: Explorations in the microstructure of cognition, Volume I: Foundations. Cambridge, MA: Bradford Books/MIT Press.

SAS Institute. (1990). SAS/STATreg. User's Guide, Volume 1, ACECLUS-FREQ. Cary, NC: Author.

Skinner, B. F. (1938). The Behavior of Organisms: An experimental analysis. New York: Appleton-Century-Crofts.

Skinner, B. F. (1951, December). How to teach animals. Scientific American, 185, 26-29.

Smith, E. E. & Medin, D. L. (1981). Categories and concepts. Cambridge, MA: Harvard University Press.

Smolensky, P. (1988). On the proper treatment of connectionism. Behavior and brain sciences, 11, 1-74.

Snyder, S. H. (1976). The dopamine hypothesis of schizophrenia: Focus on the dopamine receptor. American Journal of Psychiatry, 133, 197-202.

Snyder, S. H. (1978). Neuroleptic drugs and neurotransmitter receptors. Journal of Clinical and Experimental Psychiatry, 133, 21-31.

Staddon , J. E. R. & Zhang, Y. (1991). On the assignment-of-credit problem in operant learning. In M. L. Commons, S. Grossberg, & J. E. R. Staddon (Eds.), Neural network models of conditioning and action: A volume in the Quantitative Analyses of Behavior series (ch. 11, pp. 279-293). Hillsdale, NJ: Lawrence Erlbaum.

Stein, L. (1994). In-vitro reinforcement of hippocampal bursting: Possible cellular and molecular mechanism of drug reward. Regulatory Peptides, 54, 285-286.

Stein, L. (1995, May). Skinner's Behavioral Atom: A cellular analogue of operant conditioning and its implications. Paper presented at the 21st annual meeting of the Association for Behavior Analysis, Washington, DC.

Stein, L. (1997). Biological substrates of operant conditioning and the operant-respondent distinction. Journal of the Experimental Analysis of Behavior, 67, 246-253.

Stein, L. & Belluzzi, J. D. (1982). Beyond the reflex arc: A neuronal model of operant conditioning. In A. R. Morrison & P. L. Strick (Eds.), Changing concepts of the nervous system (pp. 651-665). New York: Academic Press.

Stein, L. & Belluzzi, J. D. (1988). Operant conditioning of individual neurons. In M. L. Commons, R. M. Church, J. R. Stellar, & A. R. Wagner (Eds.), Quantitative Analyses of Behavior, Vol. VII (pp. 249-264). Hillsdale, NJ: Lawrence Erlbaum Associates.

Stein, L. & Belluzzi, J. D. (1989). Cellular investigations of behavioral reinforcement. Neuroscience and Biobehavioral Reviews, 13, 69-80.

Stein, L., Xue, B.G., & Belluzzi, J. D. (1993). A cellular analogue of operant conditioning. Journal of the Experimental Analysis of Behavior, 60, 41-53.

Stein, L., Xue, B.G., & Belluzzi, J. D. (1994). In vitro reinforcement of hippocampal bursting: A search for Skinner's atoms of behavior. Journal of the Experimental Analysis of Behavior, 61, 155-168.

Stevens, C. F. (1991). New recruit to the magnificent seven. Current Biology, 1, 20-22.

Sutton, R. S. & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press/Bradford.

Thorndike, E. L. (1911). Animal Intelligence: Experimental studies. New York: MacMillan.

Wise, R. A. & Rompre, P. -P. (1989). Brain dopamine and reward. Annual Review of Psychology, 40, 191-225.



Appendix A.

Glossary.

Activation function: In a neural network, the mathematical function that collects all the inputs to a unit and calculates a single value, the activation value, for that unit at that moment in time.

Activationist Neural Network: A type of neural network, as distinguished from connectionist neural networks and selectionist neural networks. Technically, activationist neural networks are neural networks that adapt by modification of the activation function, while connectionist neural networks adapt by modification of the connection weights. Modeling learning by modification of the activation function was first proposed by McCulloch & Pitts (1943), but has only recently been implemented in a neural network. The activationist neural networks proposed here are also selectionist and also use an R-S architecture.

Activation value: The value calculated by the activation function. for a particular unit in a connectionist model. Typically, the activation value is a weighted sum of the inputs to that unit, where the weights correspond to the values of the connections from the inputs to that unit. Usually, the output of a unit in a neural network is calculated from the activation value on that cycle via the output function. Thus, on each cycle, for each unit, the steps in the calculation are: (1) multiply each input from the previous layer by the connection weight for the connection over which it travels, (2) sum the weighted inputs to produce the activation value, and (3) transform the activation value into an output value via the output function. The output of that unit then constitutes the input to connected units in the next layer.

Adaptive Neural Network: An adaptive neural network is an artificial neural network where learning is modeled by modification of some portion of the structure of the neural network.

Usually, adaptive neural networks are classified in terms of their architecture as either auto-associative or feedforward and in terms of the amount of feedback provided from the environment or training system to guide learning as either unsupervised, reinforcement, or Supervised_Learning learning.

Artificial Intelligence: An engineering discipline, a subdiscipline of computer science, the results of which are purported to have implications for both science and the philosophy of mind. The basic notion of Artificial Intelligence (or AI as its adherents call it) is that, with the acceptance of a monist/materialist conception of mind, the relationship between mind and brain has been reduced from a philosophical problem to an engineering problem.

AI researchers build computer programs intended to solve problems that are too ill-defined to be approached analytically and which give the appearance of being the sort of problem that humans do solve. Some AI research is geared toward making computers perform tasks that humans perform readily, though how humans perform them is not well understood. Other AI research is geared toward making computers perform tasks too complex for humans, but which appear to have the character of the simpler tasks that humans do perform and computers (to date) cannot.

The field of Artificial Intelligence is renowned for its wildly optimistic predictions of its own success and its relatively modest accomplishments over the last forty-plus years. The underlying philosophy is extremely idealistic and materialistic. Pragmatic and social concerns are usually neglected. Since the mid-1980s, with the re-emergence of neural network research after a hiatus of over twenty years, abandonment of the strictly symbolic approach to AI has been correlated with important engineering successes. The anticipated scientific and philosophical benefits are still in the planning stages.

Artificial Neural Network (ANN): See neural network.

Auto-Associative: An architecture for a connectionist model. (See Figure A-1.) An auto-associative architecture is distinctive first and foremost because there is no structural distinction between the input units and the output units. A unit is defined as input or output functionally, depending upon whether information is written onto the unit or read off the unit and that definition can change from cycle to cycle. Because all of the units are connected to each other by two-way connections, the typical auto-associative network is said to be recurrent.

One type of learning algorithm used in auto-associative networks is a Hebbian learning algorithm where connections that are most often used grow strongest.

Backpropagation: One of the most popular of the learning algorithms used in Feedforward networks. An iterative, nonlinear variant of multiple regression, backpropagation requires a real-valued vector as feedback on every cycle in order to determine how far the actual output was from the desired output. The neurophysiological basis of backprop (as it is called) is LTP. The neurophysiological architecture has been questioned. Because of the complex nature of the feedback required, it is generally assumed that the source of feedback to a backpropagation module is not sensation (such as blood sugar or pain, or vision, or sound) but rather an error vector calculated by a module in some other part of the brain. How this error vector is calculated and to what purpose is unclear.

Importantly, it was the development of the backpropagation algorithm that allowed connectionist models with hidden units to learn, thus allowing connectionist models that were general computational systems without limitations of perceptrons.

Binary Variable: (Also known as a Boolean variable.) A numerical variable that only takes on one of two values: one or zero. The state of an on/off switch can be represented by a binary variable. Since almost everthing inside a computer is equivalent to a bunch of on/off switches, binary variables come in very handy in computer models and simulations.

Binary Sequence: (Also known as a binary string.) A string of ones and zeros. Each position in the string may represent the value of some binary variable. Numbers can be represented using binary sequences called binary numerals. Inside a computer, everything is ultimately represented as a binary sequence.

Binary Variable: (Also known as a binary digit or bit.) A variable taking on only the value of one or zero. Inside a computer, everything is ultimately represented as a sequence of bits.

Clavier: The first and simplest of the IVR-inspired neural networks. First and foremost, Clavier demonstrates that units whose thresholds vary with reinforcement according to the logic derived from neurophysiological research on In-Vitro Reinforcement can be coordinated into a network that solves computational tasks. Thus far, the only task successsfully completed is a simple search of a space of binary strings. Secondarily, Clavier alters its responding using only a binary reinforcement signal without any model of sensory stimulation. This architecture is intended to test Skinner's hypothesis that useful action can be learned as a product of freely emitted responses, rather than of responses elicited by antecedent stimuli.

The name "Clavier" is rumored to have been a backwards derivation from "Piano-Forte," the name of a more complex IVR-based neural network that was designed first.

Details of Clavier's learning algorithm are given in Appendix C.

Cognitive Psychology: A subdiscipline within psychology. At present, psychology is bifurcated into a professional discipline and a scholarly discipline. The scholarly discipline is one of the more methodologically regulated of the social sciences. Its subdisciplines are sometimes linked to characteristics of the person, such as developmental and clinical psychology, occasionally linked to the methods employed, as in experimental and biological psychology and psychometrics, and sometimes linked to characteristics of the behavior, as in social and cognitive psychology.

Cognitive psychology is ostensibly demarcated by a focus on behavior that is a product of the so-called higher mental faculties, traditionally construed. However, only explorations of these behaviors using mainstream stimulus-response methodologies are deemed part of cognitive psychology. Research using older methods, such as factor analysis (part of psychometrics) and newer methods, such as behavior analysis (part of experimental psychology) are not included. Research on specialized populations is included, but usually published in venues connected with the population-based subdiscipline, such as developmental or clinical psychology.

Historically, cognitive psychology arose out of experimental psychology in the mid-1950s due to none of these three distinctions, neither facultative, population-based, nor methodological. Instead, cognitive psychology derived from a disatisfaction with the then-current metatheoretical assumptions of a "black-box" stimulus-response conception of human behavior. The provenance of cognitive psychology was part of the "cognitive revolution" of the mid-fifties, along with revolutions in linguistics and applied mathematics, as well as the birth of artificial intelligence and computational simulation as a method in the study of cognition. Initially, at least some of the new metatheoretical assumptions were made explicit. From the early sixties until the early eighties, however, exploration and even declaration of these assumptions was anathema within psychology. (Cf. cognitive science.)

Cognitive Science: A multidisciplinary enterprise named in the mid-1980s, but claiming roots in the "cognitive revolution" of the mid-1950s. As distinct from cognitive psychology, cognitive science includes engineering subdisciplines such as robotics and artificial intelligence, social sciences such as anthropology and linguistics, philosophical disciplines such as epistemology, philosophy of mind, and philosophy of science, "hard" sciences such as physics and biology, as well as cognitive, developmental, and social psychology.

The philosophical roots of cognitive science do, indeed, derive from the cognitive revolution, both in terms of the more implicit assumptions of idealism and individualism as well as (now) explicit assumptions of materialism, computationalism, and representationalism. From the perspective of cognitive psychology, the changes in the 1980s (aside from the confraternal multidisciplinary convocation) included the first explicit admission that representationalism constituted an assumption of cognitive psychology, the legimation of computer simulation, long a feature of artificial intelligence as well as harder sciences including ecology and economics, as an acceptable method within mainstream psychology, and the re-emergence of neural networks as a popular computational model.

Three sources of this second cognitive revolution may be considered: A generational change within cognitive psychology to researchers who had early experience actually programming computers and saw little purpose in the strict distinction drawn by older cognitive psychologists between the legitimacy of computation as a metaphor and of computer simulation as a method. The winding down of the Cold War, which radically reduced research funding for physicists (who moved into biology and, to a lesser extent, economics) and for artificial intelligence researchers (who moved into economics and, via cognitive science, into psychology), all of whom were far more comfortable and far more facile with mathematics and computer programming than were the cognitive psychologists. The explosion of new data about the brain derived from new technologies including brain scans (PET, MRI, CAT, computer-interpreted EEG, etc.) as well as in-vivo single-cell recording of neurons.

Collector Unit: A type of unit used in the Orchestrion and Piano-Forte networks to accumulate multiple outputs of emitter units. The activation function is the sum of the inputs. Any of a number of standard output functions may be used. Two variants planned at present are (1) outputs that are a fixed threshold function of the activations and (2) outputs that are identical to the respective activations (i.e., a simple sum of the inputs from the emitter units). Each is used in a distinct version of Orchestrion.

Connection: Associative component of a connectionist model corresponding to axon, synapse, and dendrite. Graphically, connections are usually drawn as lines or arrows connecting the units or nodes of the connectionist network. Signals pass from unit to unit over the connections. Mathematically, units normally correspond to coordinates of some matrix. Each connection has a numerical value called a connection weight (cf). Ordinarily, the values of outputs passing from one unit through the connection to input into another unit are modified by being multiplied by the connection weight. Learning can be modeled by modification of the connection weights. (See also Unit.)

Connectionist Models: Computational models of psychological phenomena composed of networks of interconnected units. The units are intended to compute solutions after the manner that neurons are assumed to transform neural signals. In general, solutions are calculated in steps called cycles, with each unit in the network taking on a new activation value with each step. At appropriate points in the calculation, values are read off certain of the units (normally called output units). These output values constitute the calculated solutions. The logic is a form of parallel processing.

As distinct from other computational models used in Artificial Intelligence, Robotics, Cognitive Science, and Cognitive Psychology, connectionist models can perform calculations in parallel, with individual units calculating portions of the solution more or less independently of other units. Most commonly, each unit receives signals from the outputs of other units via connections that modify those signals. The modified signals are input to the unit and combined to determine the unit's activation. An output function modifies the activation value and the result is output along other connections to other units. When learning is also modeled, the model is called adaptive. Learning is modeled by modification of the connection weights. An important type of connectionist model is the neural network.

There are numerous introductions to connectionist modeling. Recommendations here are limited first and foremost by the small number that the present author has reviewed personally. Johnson-Laird (1988, ch.10) offers possibly the best brief introduction to the topic. Chapters one through four, volume one of the PDP "Bible" (Rumelhart, McClelland, & The PDP Research Group, 1986) gives the initial presentation by the founders. Smolensky (1988) offers a unique philosophical perspective and numerous commentators take a shot at it.

Connectionist Neural Network: Within the context of this paper, a connectionist neural network is an adaptive neural network where learning is modeled by the modification of connection weights as opposed to the modification of some other part of the network structure (e.g., the threshold, as in activationist neural networks). Connectionist learning is by far the most common technique for modeling learning in a neural network.

Connection weight: A numerical value associated with each connection in a neural network. The inputs to one layer of the network are the product of the outputs from the previous layer, each multiplied by the connection weight for the connection over which the input travels. In terms of matrix algebra, the outputs are a vector, the inputs are a second vector, and the connection weights are a matrix (with one number for every connected output-input pair). The inputs to the second layer are the product of the matrix times the vector.

Cycle: The usual term for a minimum time interval in a neural network simulation. The brain may be assumed to operate asynchronously in that the precise time that one neuron fires is not necessarily synchronized with the firing of other neurons at more or less that same time. Most artificial neural networks, however, simulate the activity of the entire system by updating each and every unit in the network synchronously in a series of discrete time-steps. Each time-step is called a cycle.

Distance Measure: See similarity space.

Distributed representation: A novel type of representation used in some connectionist models, wherein the information determining what is being represented is spread across the units in a particular layer of the network.

Traditional Artificial Intelligence and Cognitive Psychology representations are purely symbolic in that similarities between the form of the symbols are unrelated to similarities between the things represented. The visible forms of the words "MEN" and "WOMEN" on two bathroom doors are purely symbolic. Alternatively, the silhouettes of male and female figures used for the same purpose are iconic in that they differ from one another in a manner analogous to how the forms of men and women differ.

Within the computer, where everything is represented as a binary string of ones and zeros, symbolic representations are easily handled by arbitrarily assigning different strings to different events/objects. Constructing iconic representations inside the computer is not such an obvious matter. Distributed representations are one way of handling this problem.

For representations of external events/objects, different parts of the string are assigned different values corresponding to different properties of the thing represented. In that way, representations of entire events/objects can be composed from the substrings referring to the relevant properties. For instance, the representation for apple would include substrings for red, round, smooth, thin-skinned, unsegmented, crunchy, etc. The representation for pear, containing substrings for green, pear-shaped, smooth, thin-skinned, unsegmented and crunchy would be more similar to apple than the representation for orange, containing substrings for orange, round, dimpled, thick-skinned, segmented, and juicy, etc. Such representations are called distributed because the features that determine which thing is being represented are spread over the various units comprising the substrings. Often, intermediate representations in the hidden layer are distributed across the units, but contain no discrete features in substrings.

Edge: The mathematical name for a connection in a network. Also called an arc.

Emitter Unit: The specialized unit constructed specifically.to model the adaptive process of In-Vitro Reinforcement. The basic structure is that of a McCulloch-Pitts cell, differing from the standard cell in that, as originally proposed by McCulloch and Pitts, learning is modeled by a variable threshold. The rule for altering the threshold makes the emitter unit a variable-structure stochastic learning automaton. When the unit signals (i.e., emits a one) and the network is reinforced (i.e., receives a one), the threshold is lowered. When the unit signals and no reinforcement is received, the threshold is raised. When the unit does not signal, the threshold is preserved (a memory function).

Feedback: A general term for information about ongoing processing that is used to improve the processing. For example, when sailing towards a dock, seeing the dock drift to the left of the prow of the boat causes the sailor to push the rudder handle to the right (starboard), moving the rudder itself to the left, and steering the boat back towards the dock. The drift of the image of the dock relative to the prow informs the sailor that the boat is veering off course.

In engineering terms, a feedback system is one where information about how the current input is being processed is used to determine how to process that same input as processing proceeds. This in contrast to Feedforward systems, where the information is not used until the next input (or later).

Feedforward: An technical, engineering term for a system where information about how previous inputs were processed is used to determine how to process new inputs. This in contrast to Feedback.

Feedforward Network: An architecture for a connectionist model. (See Figure A-2.) Signals are input on the left, move through the hidden layer(s), and output on the right. Typically, the pattern of activation values over the input units represent the stimulus and the pattern over the output units represent the response. Learning, when modeled, is often due to reinforcement signals made proportional to the vector difference between the outputs and the desired target value and sent backwards through the network. (The learning in a Feedforward network is most often supervised learning, but may also be reinforcement learning.) These reinforcement signals are used to alter the function implemented between the input and output layers. Feedforward architecture is often used in conjunction with backpropagation learning.

Curiously enough, the feedforward architecture includes feedback, at least in the general sense. The reinforcement signals after any input is processed can be used to alter the ongoing process. If we think of the series of inputs as part of an ongoing process, like successive images of the road ahead while we drive, then the vector difference is feedback. If we think of every input as being processed separately, as in the case of sonar images that have to be classified as being images of rocks or submarines, then this is technically a feedforward system, because the vector difference for each input is not available to alter the system until that input has been completely processed.

The term feedforward also distinguishes an aspect of the structure of the network. The only way that information about the processing of the current input can be used to alter processing of that same input is if the network is recurrent. The standard feedforward architecture is non-recurrent.

Firing: A shorthand term for when a unit, capable of only binary output, outputs a one rather than a zero.

Genetic Algorithm: A computational approach to learning modeled on the process of Darwinian evolution. Developed by John Holland in the early 1960s, the genetic algorithm shows very different strengths (and weaknesses) than connectionist models which are based on differentiable functions. Whereas connectionist models tend to converge rapidly to a solution, certain sorts of difficulties in representing problems, known as local minima, may cause connectionist models to give a wrong answer. In principle, the genetic algorithm and related models converge more slowly, but are less vulnerable to local minima.

Note that a Genetic Algorithm is not a connectionist model or a neural network. Stictly speaking it is a part of Machine Learning.

Hamming distance/similarity: A distance/similarity measure comparing any two binary sequences. The two values of each variable in the sequence are compared and a result sequence is generated. If both values are zero or both values are one, then the result value is zero. If one value is zero and the other is one, then the result value is one. The Hamming distance is the sum of all the result values. (This is the equivalent of taking the exclusive-OR of the two binary sequences and summing the result.) Thus, the Hamming distance is the total number of mismatches between the two sequences. The Hamming similarity, by contrast, is the total number of matches.

Hidden layer: A component of various connectionist architectures, especially feedforward architectures (cf. Figure A-2). Technically, a hidden layer is any layer of units that are not directly connected to either the inputs or the outputs.

The first working neural networks, developed in the 1950s and called perceptrons, included an adaptive algorithm to model learning, but only input and output layers with no hidden layer. In 1969, Minsky & Papert authored an infamous book around a series of mathematical proofs that demonstrated certain computational limitations of perceptrons. Briefly, near the end, the authors acknowledged that, by the simple addition of a hidden layer, a connectionist model would have no computational limitations whatsoever. They pointed out, however, that, at the time, there was no known learning algorithm that would pass the corrective feedback signal back through the hidden layer.

With the development of the backpropagation algorithm, connectionist models with hidden layers could learn. This eliminated the limitations of connectionist models and mooted Minsky and Papert's argument. Intiguingly, as Hutchison notes, the requirement of a hidden layer in order to make a connectionist network entirely computationally general only holds for feedforward architectures. By reconnecting parts of the output layer to the input layer, the need for a hidden layer can be eliminated. From the perspective of a connectionist model, the old-fashioned behaviorist assumption that individuals can perceive there own reponses is equivalent to the cognitivist assumption of representations. This equivalence is a specific example of the more general principle that Hempel discussed in his paper, The Theoretician's Dilemma.

Input layer: The layer of units in a connectionist model connected up to the inputs of the simulation. The term input layer is not often used with auto-associative architectures, but is common with feedforward architectures and other S-R architectures. The activation values of the input layer generally represent stimuli.

Iterative Learning: Like traditional statistical techniques such as Regression and Factor Analysis, the learning algorithms for adaptive neural networks can be considered as calculation techniques for problem-solving. Unlike traditional statistical techniques, neural network learning is iterative in the sense that interim solutions are available at every step in the calculation, not just after all the data is input.

In-Vitro Reinforcement (IVR): Results of a standard experimental procedure (Stein, 1997; Stein, Xue, & Belluzzi, 1993; 1994; Stein & Belluzzi, 1989) that demonstrates changing neural activity due to the infusion of the neuromodulator, dopamine. A neuron that exhibits a characteristic multi-spike burst is monitored in-vitro. Whenever a burst is detected, dopamine is injected around the cell via pipette. The burst rate is observed to increase. The basic notion is that initially random activity, when regularly followed by a biologically important event, will come to occur more often. This idea is credited to Skinner (1951).

Details of a proposed molecular mechanism that may underly IVR are given in Appendix B.

Lawnmower problem: An especially challenging problem recommended by Kozar as a task for evaluating genetic algorithms and other genetic programming techniques. The lawnmower problem has a number of interesting features. The important feature for our purposes is that it contains many local minima, traps which many connectionist models find especially difficult.

The basic notion of the lawn mower problem is a rectangular lawn that begins unmown. (See Figure E-1.) A progammable lawnmower takes in a list of instructions like TURN-LEFT and MOW-FORWARD and JUMP-TO-LOCATION-X-Y. After the lawnmower executes the entire list of instructions in order, a certain proportion of the lawn has been mowed. A better program is a list of instructions that mows more of the lawn using fewer instructions. The best program mows the entire lawn using the minimum number of instructions.

A learning system that sets out to solve the lawnmower problem must output programs for the lawnmower. Each program output by the learning system can then be placed in the lawnmower and evaluated as to how much of the lawn gets mown in how many steps. The value of each program can then be input back into the learning system in order to provide feedback that the learning system can use to generate more (and better) lawnmowing programs.

Long-Term Potentiation (LTP): Results of a standard experimental procedure (Bliss, & Lømø, 1973; Kelso, Ganong, & Brown, 1986; Bliss & Collingridge, 1993) that demonstrates changing interactions between neurons said to underly memory and learning. Two neurons connected by a synapse are identified either in vivo or in vitro. The presynaptic neuron is stimulated and the output of the postsynaptic neuron is observed. Several seconds of intense high-frequency current is applied to the presynaptic neuron. Later stimulation of the presynaptic neuron produces increased output on the part of the postsynaptic neuron. The increased sensitivity of the postsynaptic neuron can continue for weeks if multiple bursts of current are applied. The basic notion is that the more often a neural pathway is used, the easier it is to use again in the future. This idea dates back at least to Meyer, 1911, p.86-87. The more elaborate notion that two interconnected neurons whose firing is correlated will come to be more hightly correlated in the future is credited to Hebb (1949) and is that basis for a learning rule used in auto-associative neural networks.

Machine Learning: The subdiscipline of Artificial Intelligence concerned with artificially intelligent systems that learn.

McCulloch-Pitts cells: The models of neurons constituting the units in the very first neural networks. Proposed by McCulloch & Pitts (1943) and discussed extensively by Minsky (1967), all connection weights were fixed at one, excitatory signals were summed, one inhibitory signal would override any excitation, and the activation function was a threshold function. McCulloch and Pitts demonstrated that networks of such units could compute any computable function. They also proposed that learning be implemented using variable thresholds, but never implemented this themselves.

Neural Networks: Also called an Artificial Neural Network (ANN). These are connectionist models of psychological phenomena that attempt to take more seriously the neurophysiological data regarding CNS activity than do most connectionist models.

Neural Plasticity: A term from neuroscience indicating changes in the brain that alter brain function so as to constitute learning. It is assumed that with the structure of the brain fixed, responding will not change. Learning is presumed to be due to changes in the neural structure, hence, neural plasticity.

Node: The mathematical name for a unit in a network.

Non-recurrent: A feature of a network architecture. A network is said to be non-recurrent if there is no path via the connections where a signal can pass from any unit back to itself. Otherwise, the network is said to be recurrent.

Orchestrion: The second of the IVR-inspired neural networks. The difference between Orchestrion and Clavier is that, in Clavier, the emitter units are connected one-to-one with the outputs, while, in Orchestrion, the emitter units are grouped into clusters and the outputs of all emitter units in the clusters are input into the collector units.

At present, only the threshold variant of Orchestrion has been tested and that unsuccessfully. Successful testing of the threshold variant is a necessary prerequisite to determining the correct parameters for the next more complex type of IVR-based network, Piano-Forte.

Output function: The mathematical function of the activation value that determines the value output by a particular unit in a connectionist model. Just as the activation value is typically a weighted sum of the inputs, the output value is some function of the activation value, hence the term output function.

Typical functions used for output functions are: (1) sigmoidal functions, in part because such functions match measurable aspects of neuronal activity and in part because quasi-linear variants of multiple regression, such as the backpropagation algorithm, work well with sigmoidal transformations. (2) threshold functions, as in the case of McCulloch-Pitts cells, because simple logical functions can be calculated using threshold functions, making the analysis of the network's performance subject to logical proof, and because real neurons often fire in an all-or-nothing fashion typical of threshold-governed systems.

Output layer: The layer of units in a connectionist model connected up to the outputs of the simulation. The term output layer is not often used with auto-associative architectures, but is common with feedforward architectures and other S-R architectures. The activation values of the output layer generally represent responses.

In feedforward architectures, during training, the difference between the output on any cycle and the desired target, measured using some similarity function, determines the value of the reinforcement feedback that is used to alter the connection weights in the network.

Parallel Distributed Processing (PDP) model: Term used to describe connectionist models by Rumelhart, McClelland, & The PDP Research Group (1986; McClelland, Rumelhart, & The PDP Research Group, 1986). Their focus is on parallel processing, distributed representations, and models of psychological function more than neurological plausibility. Many of the basic distinctions, terminology, and paradigmatic problems are detailed in their two-volume "bible." A good guide to translating the lingo as well.

Parallel processing: A term used to describe the computational strategy for connectionist models as well as for certain types of computers. Most generally, parallel processing is contrasted with the older, serial processing still used in most computers. Computers, of course, perform millions of calculations per second. Serial processing computers perform one calculation at a time. Parallel processing computers can perform more than one calculation at a time. To the degree that the plan for performing some complex calculation can be broken down into calculations that can be performed without waiting for certain other calculations to be completed, parallel processing can be much faster than serial processing.

The vast majority of computers are still more-or-less serial processors and connectionist models are usually simulated on standard, serial computers. Thus, the parallel design of the connectionist algorithm usually produces no real benefit in actual speed, although in principle, were the connectionist model to be simulated on a parallel machine, it would run much faster than it does.

The reason for using parallel processing in connectionist models and neural networks is not principally for speed. The idea is that the architecture of the central nervous system is better understood in terms of the logic of parallel, rather than serial, processing. This is a major conceptual difference between connectionist models and traditional Artificial Intelligence models, where computational details like the difference between parallel and serial implementations are not considered important.

The actual logic of connectionist models illustrates how the use of units and connections can enable parallel processing. Given a large number of units organized into layers, the following logic can be used to calculate a surprising number of functions: On each cycle, for each unit, the steps in the calculation are: (1) multiply each input from the previous layer by the connection weight for the connection over which it travels, (2) sum the weighted inputs to produce the activation value, and (3) transform the activation value into an output value via the output function. The output of that unit then constitutes input to connected units in the next layer. On each cycle, the input pattern represented over the input layer gradually propagates forward layer by layer until the output is calculated for that cycle. In principle, a computer (or brain) where each unit was an individual processing system (or neuron) could calculate its little portion of the solution at the same time as every other unit in the same layer, hence the term "parallel."

Percentile Reinforcement: A reinforcement schedule (that is, a mathematical rule for providing rewards to someone in order to facilitate a certain type of learning) used to generate novel behaviors in both human and non-human animals. Percentile reinforcement is a formal rule for implementing shaping, a common technique used both for skills training with the developmentally disabled and for training animals to do complex tricks.

The basic notion is that of successive approximations. It is likely that, at the onset of training, a poodle does nothing like balancing a ball on its nose while jumping through a hoop of fire. There is, however, always variation in what the animal is currently doing. Inevitably, some of these activities are more propitious with respect to the desired complex behavior than others. Shaping consists in rewarding the activities that are closest to the desired behavior. Over time, the various activities will contain a higher and higher proportion of behavior similar to the target behavior. The criterion is shifted as the behavior changes such that only the best activities are reinforced over time.

For instance, the poodle might be reinforced just for standing still, then for raising a front paw, then for raising both paws, then for rearing, then for staying upright for a longer period of time, then for keeping its nose up while standing, etc., until the poodle regularly and reliably balances the ball on its nose and jumps through the flaming hoop.

Percentile reinforcement is a formalized technique for implementing (and automating) shaping. It includes two parameters, p and m. Percentile reinforcement means ranking each new action with respect to the last m actions emitted and reinforcing the new action if and only if it ranks higher than all but p percent of those previous actions. p is the constant percentage of responses that will be reinforced over time as the criterion shifts. m is the number of previous responses to which the current response is compared. All possible responses are scaled with respect to similarity to the desired behavior. The current response is compared to the last m responses. If it is closer than all but p proportion of the last m responses, then it is reinforced.

In sum, percentile reinforcement consists in estimating the changing distribution of activity by the organism with respect to some measure (here, the distance from the target). As the distribution shifts towards the target, the percentile reinforcement schedule guarantees that the proportion of reinforced to non-reinforced actions remains constant (See Figure A-3). For certain ranges of m and p, effective shaping of various actions can be demonstrated in a number of species, including humans. See Galbicka (1994) and Platt (1973) for details.

Piano-Forte: The third and most complex of the IVR-inspired neural networks. Piano-Forte differs from the simpler versions such as Orchestrion in that a module of sensory inputs is included. Piano-Forte is thus designed to handle stimulus-response (S-R) relations like feedforward networks, such as backpropagation networks. If successful, Piano-Forte will be the first neural network to model such relations using a response-reinforcer (R-S) architecture.

In Piano-Forte, the input layer generates inhibitory signals to a large fixed proportion of randomly selected emitter units. This insures that no stimulus ever elicits any response. (While not especially neurally plausible, successful simulation will constitute a sufficiency proof that categorization tasks can be performed by systems using emitted responses as well as elicited responses.) Whatever emitter units remain active are shaped to produce the correct response via collector units as in Orchestrion.

Pixel: (from "picture element"). The single dot from which digitized pictures (as on a TV screen) are composed. In color pictures, each pixel is red, green, or blue, with some degree of intensity. In black and white pictures, each pixel ranges from white through shades of lighter to darker grey to black. Black and white pictures can also be composed of sequences of binary values, where each pixel is either white (one) or black (zero).

Police Artist: The name given for the demonstration version of the simulation of the search task for Clavier, wherein an exact match to a target string is shaped using the percentile reinforcement schedule. In the initial Police Artist simulations, the Hamming distance was used as the measure of similarity. The name, Police Artist, was suggested by the fact that Clavier duplicates the target string exactly with no sensory/stimulus information about the string itself.

Recurrent: A feature of a network architecture. A network is said to be recurrent if there is at least one unit (node) where a signal can pass from the unit back to itself. This can happen in a few different ways. First, a node can have an edge connecting back to itself. Second, if the connections are non-directional, that is, signals can travel both ways across a connection, then any connection from one unit to another also connects from the unit back to itself. Third, even with directional connections, if there is a path from a unit through other units back to the first unit, the network is said to be recurrent. Otherwise, the network is said to be non-recurrent.

Recurrent networks are considered to be mathematically more powerful than non-recurrent ones. (They are also more difficult to analyze.) Interestingly, a non-recurrent network can perform the same mathematical functions as a recurrent one if parts of the outputs are copied back to the inputs at the end of a cycle. The network itself is non-recurrent, but signals are calculated in the same fashion as in a recurrent network, because they are passed back to the earlier units outside the network (not via the connections). (Analyzing the behavior of such a network is just as difficult as analyzing a recurrent network, because it is the information returning to the node that makes it so, not the fact of whether the information passes through the network or around it.)

Reinforcement Learning: A type of learning midway between supervised and unsupervised learning wherein certain types of artificial neural networks (or other learning systems) adapt to a task with environmental feedback provided in the form of a unidimensional scalar. In contemporary neural network research, the scalar is most often a real-valued number, but in the case of learning automata (Narendra & Thathachar, 1989) a binary number, indicating either reinforcement or non-reinforcement, is often used. Binary feedback provides only the minimum amount of information from the environment to the network on each cycle. In IVR-based neural networks, binary feedback is used.

The problem of determining which portions of the network to modify on each trial (called the credit assignment problem, Staddon & Zhang, 1991) is more difficult for reinforcement learning than for supervised learning. As with supervised and unsupervised techniques, reinforcement learning is iterative.

It is important to note that the "reinforcement" of reinforcement learning may or may not correspond precisely to the notion of "reinforcement" found in the animal learning literature. Different schools of animal learning define reinforcement differently and different learning models may model it differently.

Technically, reinforcement learning is defined not in terms of the system that learns (using only scalar feedback), but in terms of the environmental task that provides that type of feedback, as well as having other specific characteristics ( Sutton & Barto, 1998). A learning system that performs reinforcement learning is one that can successfully negotiate a reinforcement learning task.

Representation: In order for a computer simulation model to be a model of some real-world phenomenon, each input of the simulation must represent some antecedent event in the real world and each output must represent some consequent event. While not necessary, artificial intelligence simulations often are designed such that intermediate states represent putative interceding events.

Of course, in the case of psychological models, the interceding events might be brain events and/or objectively unobservable "mental" events. A representation of an external, observable, measurable event is simply a unique symbol that is used each and every time the computer program simulates a part of the process that involves that event. A representation of an objectively unobservable event is rather more complex. (Cf. representationalism.)

Representationalism: A central doctrine of cognitive science. It is an empirical question whether the activity of the mind/brain that causes behavior is partitionable into discrete causal events. Brain activity might be like a thunderstorm. A lot of chaos with no distinguishable bits that might precede discrete external effects such as lightning. It is an empirical question whether any discrete internal events correspond to the proper causes of distinguishable actions. Discrete causes might exist for clenching fists, but not for grabbing objects. It might just be that the chaos causes clenching when objects are felt, but that there is no discrete event corresponding to the intention or plan to grasp the object. It is an empirical question whether any discrete brain events that are involved in the production of action correspond to the subjectively observable events we call beliefs, desires, intentions, moods, etc. The proper scientific classifications for these internal events might be as different from the folk classification of beliefs, desires, etc., as the proper separate scientific classifications of bears and koalas are from the folk classification of teddy bear or the proper scientific classification of mollusks are from the separate folk classifications of snails, slugs and clams.

Empirical evidence for these claims is scant. To some extent, the apriori specification of what would constitute evidence of some of these claims has not yet been determined. Nonetheless, if you accept all of the above claims, plus the final claim that the process that coordinates these internal representations guarantees that they are meaningful in the same sense that words and phrases and other linguistic entities are theorized to be meaningful by the mainstream philosophers of language, then you are a representationalist.

Robotics: The engineering study of robots. While intrinsically related to Artificial Intelligence, the two disciplines have a surprising shortage of intercommunication. The study of virtual robots within the computer, called Artificial Life, helps fill in the gap to some extent.

Arguments to the effect that artificial intelligence (AI) cannot work without robotics date back at least to the early 1960s. Robotics has made much slower progress than AI, but, inevitably, the grand failures of AI have had their roots in problems necessary to robotics. Intriguingly, neural networks are the first type of system to show progress in managing certain of these problems (called pattern recognition problems). Neural networks have thus moved forward both Artificial Intelligence and Robotics research far more rapidly than previously.

R-S Architecture: A term in learning theory intended to contrast with the notion of S-S and S-R learning. Since S-S and S-R learning both have their correlates in both statistical procedures and adaptive neural networks, it may be that R-S architectures as well. The basic notion of R-S learning is that responses are emitted by the organism with no particular prior eliciting stimulus and that the learning process that causes responses to become more and more useful is due to the differential reinforcement of more useful responses.

Computational learning systems with analogies to R-S learning include Variable-structure Stochastic Learning Automata and the Genetic Algorithm. Statistical procedures with analogies to R-S learning include Monte Carlo techniques and Tuning.

Search task: A task derived from the fields of Artificial Intelligence and Machine Learning wherein a space of numerical elements is simulated on the computer with one element designated as the target position in the space. The computational learning system begins at a random point in the space and uses feedback regarding the distance between the current position and the target to locate the target. Various criterion for concluding the search are given by Narendra & Thathachar (1989).

Selectionist Neural Network: A type of adaptive neural network, which may be either connectionist or activationist. A selectionist neural network is designed according to the principle that learning is analogous to the process of evolution via natural selection. The specifics of the implementation vary widely. Broadly, part of the structure of the network is treated as a pool of elements, randomly altered on each cycle, where newly altered elements operate to determine later outputs only on the basis of previous reinforcement. The analogy to Darwinian evolution is often not as clear as in the Genetic Algorithm.

Self Organizing: A type of system that alters its behavior without an explicit teaching or training signal. Input patterns are added to the system and the functioning of the entire system changes so as to characterize the structure of each input with respect to the totality of all of the inputs and the relations between them. Self-organizing systems are characterized as modeling unsupervised learning when the alteration of the systems behavior is intended to model learning.

Shaping: Also known as the method of successive approximations, shaping is a technique originally developed by B. F. Skinner (1951) for training an animal to perform novel tasks. Shaping has proved valuable with humans as well as with non-human animals. Currently, shaping is used both for skills training with the developmentally disabled and for training animals to do complex tricks. The animal is observed and, presuming that the desired behavior is not seen, the activities most similar to the desired behavior are reinforced (that is, a reward such as food is offered contingent upon occurrence of the behavior). Gradually, the various activities will come to resemble the desired behavior. The criterion for reinforcement is gradually shifted so that only a proportion of the behavior most similar to the desired behavior is reinforced as the overall variety of behavior shifts toward the target. Extremely complex performances, such as circus tricks, can be produced using this technique.

Similarity Space: A technique used both in statistics and Artificial Intelligence as well as connectionist models for determining measures, particularly measures of feedback. A set of elements, usually stimuli or responses, is modeled in terms of measurable features. Each element is represented as a point in a multidimensional space. The distance between points, according to some mathematical measure, is used to model the similarity between the elements.

For instance, in reinforcement and supervised models, the distance between the target response and the response produced is used to determine the feedback. In stimulus-response (S-R) models, the distance between the stimuli determines how much the feedback for the response to one stimulus affects the later classification of another stimulus. In stimulus-stimulus (S-S) models, the stimulus space is transformed into a space of representations that capture essential aspects of shared properties of the stimuli. In Auto-associative models, that representation space is used to reconstruct missing portions of input patterns dependent upon similarities of the previous input patterns.

S-R Architecture: A term in learning theory intended to contrast with the notion of S-S learning. Especially in the study of Pavlovian/Classical Conditioning, there has been a long-standing dispute as to what is more important to determining learning, changes in the relationships between stimuli or between stimuli and responses. While this debate has languished as basic assumptions have evolved, both S-S and S-R learning have correlates in both statistical procedures and adaptive neural networks. The basic notion of S-R learning is that responses are elicited by the stimuli presented to the organism. The organism is thus just a reactive automaton following DeCartes conception of animal behavior. Learning is presumed to consist in strengthening and weaking of stimulus-response connections due to differential reinforcement. The S-R conception contains no explanation of novel responding except as a new composite of basic, elemental responses.

Computational learning systems with analogies to S-R learning include Feedforward neural networks and production systems such as those used in adaptive Expert Systems in Artificial Intelligence. Statistical procedures with analogies to S-R learning include multiple regression and analysis of variance.

S-S Architecture: A term in learning theory intended to contrast with the notion of S-R learning. Especially in the study of Pavlovian/Classical Conditioning, there has been a long-standing dispute as to what is more important to determining learning, changes in the relationships between stimuli or between stimuli and responses. While this debate has languished as basic assumptions have evolved, both S-S and S-R learning have correlates in both statistical procedures and adaptive neural networks. The basic notion of S-S learning is that correlations between stimuli in tbe world come to be mirrored by connection strengths between internal representations of those stimuli. Responses innately elicited by biologically relevant stimuli thus come to be elicited earlier by arbitrary stimuli that regularly predict the occurrence of the biologically relevant stimuli. The S-R conception contains no explanation of novel responding whatsoever.

Computational learning systems with analogies to S-R learning include Auto-associative neural networks such as those using Hebbian learning and Simulated Annealing. Statistical procedures with analogies to S-R learning include factor analysis, canonical correlation, and principal components analysis.

Supervised Learning: A type of learning wherein certain types of artificial neural networks adapt to a task with environmental feedback provided in the form of a multidimensional vector. Usually the vector is real-valued and provides detailed information as to how different the output generated on a given trial was from the desired output. (In general, all possible outputs are represented in an multidimensional output space where the distance between points indicates the similarity between outputs.) Such networks operate after the fashion of Regression analysis, generating a single function that best captures a large number of input-output (stimulus-response) pairings. Most learning algorithms for supervised learning differ from statistical techniques such as Multiple Regression Analysis in that supervised learning is usually quasi-linear or non-linear and the learning technique is iterative. (Cf. back-propagation.) (See also Unsupervised Learning and Reinforcement Learning.)

Threshold: A mathematical device very popular in psychological theorizing. The threshold is either a fixed or a slowly changing numerical value. A more rapidly moving value indicating the amount of some psychological property is compared periodically to the threshold. Output is one (yes) if the value exceeds the threshold and zero (no) otherwise. Originally developed as a model of perception in psychophysics, where a certain amount of light or sound, etc., is necessary for a person to report the presence of a stimulus, thresholds have been used extensively for many psychological phenomena.

Threshold function: A type of mathematical function used as an activation function in the orginal McCulloch-Pitts cells. Less commonly used now. The output of a threshold function is either zero or one. The threshold is a specific numerical value. If the sum of the inputs exceeds the threshold, the output is one. Otherwise, it is zero.

Unit: Local component of a connectionist model corresponding to a neuron. Mathematically, units normally correspond to coordinates of some vector. Each unit includes a numerical value called an activation value. Usually, there is an