Learning as Evolution

McDowell's evolutionary theory of behavior dynamics (ETBD; McDowell, 2004) instantiates the Skinnerian analogy between positive reinforcement and natural selection (Skinner, 1981).

In a nutshell, behavior is viewed as a tool that operates on the environment, with more or less success. Behavioral strategies that prove successful tend to be repeated; those that do not, tend to be abandoned. 

ETBD arranges a range of integers, that represent all behaviors that can be exhibited by an agent, its behavioral repertoire. The range can be separated into target classes, groups of integers that are occasionally reinforced.

POINTS

22

0   ...      471 ... 511                  512 ... 552     ...    4000 

Target Class 1

Target Class 2

Continuous-choice environment used to examine preference patterns in humans (top) and in ETBD (bottom; Popa, 2013). Two sources of reinforcement, Button 1 and Button 2, were programmed to deliver points according to certain rules (i.e., schedules of reinforcement). 

ETBD functionality 

A typical experiment begins with a random population (generation 1), chosen from the available range (with replacement). From the population of behaviors available "now", one is randomly chosen and emitted.

The emission initiates local rules of selection, recombination, and mutation that transform the existing population into a new one (generation 2). From it, one behavior is randomly emitted, a third generation is created, and so on. The sequence of responses that emerges from the reiteration of Darwinian rules is analyzed as if they were produced by a biological organism.

Generation n

EMISSION

DARWINIAN CYCLE

Generation n+1

time

ETBD

Student

t1

t2

t3

t4

t5

t6

t7

t8

t9

516

511

548

533

500

480

475

476

499

522

522

B2

B1

B2

B2

B1

B1

B1

B1

B1

B2

B2

SELECTION


Parents for the next generation are always chosen with replacement from the existing population. If the emission was not reinforced, parents are chosen at random. If it was reinforced, a probability density function is used to assign greater chances of becoming parents to behaviors that are closer to the reinforced emission.

Selection and reinforcer magnitude

The strength of selection events depends on the mean of the function (µ) and was shown to be computationally equivalent to the reinforcer magnitude (McDowell et. al., 2008; Popa & McDowell, 2016) , with strong selection events corresponding to high reinforcer magnitude.

RECOMBINATION

 

Two parents produce a child behavior by recombining their "genotypes", the corresponding binary form of their integer. The method used here is called bit-string recombination: each bit in the child’s binary string has a 50% chance of coming from one parent or the other. Children produced this way tend to resemble their parents.

MUTATION

 

Introduces a small amount of spontaneous variation in the population, by altering the genotypes of some child-behaviors. If a behavior is affected, one bit in its binary string is chosen at random and flipped - from 0 to 1 or from 1 to 0. 

Mutation rate and the 

Default Mode Network (DMN) 

 

The probability to be affected by mutation is referred to as mutation rate. This value is set in the beginning of the experiment and can take values between 0% (no mutants) and 100% (all mutants). The rate of mutation was hypothesized to correspond to the magnitude of spontaneous fluctuations of the brain's DMN (Popa & McDowell, 2016). This network of interconnected regions shows strong spontaneous activation at rest (Buckner, Andrews-Hanna, & Schacter, 2008; Raichle et al., 2001) and failure to suppress it during tasks that require sustained attention or response inhibition was associated with increased levels of behavioral variability (Weissman, Roberts, Visscher, & Woldorff, 2006; Feige, Biscaldi, Saville, Kluckert, & Bender, 2013).   

References

Buckner, R. L., Andrews-Hanna, J. R., & Schacter, D. L. (2008). The brain’s default network: anatomy, function

and relevance to disease. Annals of the New York Academy of Sciences, 1124, 1–38. doi: 10.1196/annals.1440.011

McDowell, J. J (2004). A computational model of selection by consequences. Journal of the Experimental Analysis of Behavior, 81, 297–317. doi: 10.1901/jeab.2004.81-297

Popa, A., & McDowell, J, J. (2016). Behavioral Variability in an Evolutionary Theory of Behavior Dynamics. The Journal of the Experimental Analysis of Behavior, 105, 270-290.

Skinner, B. F. (1981). Selection by consequences. Science213, 501–504.


Skinner, B. F. (1984). Selection by consequences. Behavioral and Brain Sciences, 7, 477–510.

Raichle, M. E., MacLeod, A. M., Snyder, A. Z., Powers, W. J., Gusnard, D. A., & Shulman, G. L. (2001). A default mode of brain function. Proceedings of the National Academy of Sciences, 98, 676–682. doi: 10.1073/pnas.98.2.676

Waslick, B., & Greenhill, L. L. . (2004). Attention-deficit/hyperactivity disorder. In J. M. Wiener & M. K. Dulcan (Eds.), The American psychiatric publishing textbook of child and adolescent psychiatry (3 ed., pp. 485-507). Washington DC: American Psychiatric Publishing Inc.