Research Methodology
Numerous research efforts have been made to establish the relative suitability of various selection techniques that can be applied in genetic programming. A major problem in any attempt at ranking performance has been the apparent bias of different operators to different tasks ([5], [6]). Although the genetic programming paradigm has been available since 1990, there is still some debate in this area [16].
The purpose of this research was to evaluate the performance of the most commonly employed selection methods when applied to a difficult, unstable, nonlinear control problem: the cartpole problem with difficult starting conditions. After quantifying the performance of these methods, a comparison of relative utility was conducted to identify any significant variation in their suitability to this problem domain.
The Chapter that follows is intended to describe the experimental and analytical methods that were adopted for this research. Where it was considered necessary, the reasoning behind the choices that were made is also included.
The main factors under investigation were the selection techniques. The selection strategies that were tested against the cartpole problem are noted in the following sections. The genetic operators and their parameters, excluding selection operators, remained constant throughout the experiment. By convention, the main features of the program are represented in the form of a tableau (table 1). Additional parameters and further explanations are covered in section 3.2.1.
In the standard genetic programming tableau, if automatic function definition is to be used, a small section of the tableau is dedicated to the parameters that are required for this technique (such as separate function and terminal sets). Much of the reason for this inclusion is to allow the reader to identify the extent of additional domain knowledge the author has afforded the genetic program.
In the design of this experiment, the decision was made that the genetic programming paradigm should be applied to the cartpole problem without automatic function definition. A minimum of domain knowledge can then be supplied to the system. Thus, not only is the difficulty of the problem maintained, but also the performance of thje system can be attributed to the selection technique used at the time of each run.
Table
2: Genetic Program Tableau
Objective  To evolve a solution to the cartpole problem that will balance the pole for a period of 1000s without exceeding the limits outlined by Geva and Sitte [25]. The solution should also centre the cart 
Terminal Set  The state variables from the simulation and randomly generated floating point constants. 
Function Set  Addition (+), Subtraction (), Multiplication (*), a modified Division operator (%), and an operator to Reverse the sign of its argument (ANTI). 
Raw Fitness  The number of time steps the solution managed to conform to the parameters of the simulation (i.e., time to failure / 0.02). An additional bonus is awarded for centring the cart (maximum of 10 raw fitness points). 
Wrapper  A Spinglass ‘if’ function that returns 1 if its argument is greater than zero, otherwise it returns 1. 
Parameters  Population = 4,000; Total Generations = 51. 
Termination  Criterion Terminates only upon completion of all generations. 
There are no formally recommended criteria for the selection of an optimal range of parameters in either genetic algorithms or genetic programming. According to Koza, “The problem of optimal parameter selection is an interesting and important issue affecting both stringbased algorithms and hierarchical genetic algorithms (as well as other machine learning paradigms). The problem of optimal parameter selection is deserving of future work” [16, 15]. At present, the selection of these parameters is a subjective judgement made by the writer of the genetic program/algorithm.
The
parameters for the genetic operators that were used throughout this investigation
have been included in table 3.
Parameter  Value 
Population Size  4,000 
Number of Generations  51 
Runs per Implementation  50 
Crossover Rates  90% 
Fitness Proportionate Reproduction  10% 
Mutation Rate  0% 
Maximum Length after Crossover  15 genes 
Minimum Length in Generation Zero  3 genes 
Number of Trials  10 per Selection Method 
Koza identifies the two major parameters in the algorithm as population size and the number of generations [17]. For this experiment, the complexity of the problem suggests a large population size for two main reasons. First, a large population is necessary to maintain a store of genetic material capable of solving the problem over the course of evolution. Secondly, larger populations are less likely to suffer a premature convergence over an unsatisfactory solution [17].
Although the relatively large population size that was used for the runs may seem inconsistent with the findings of De Jong (mentioned below), populations of this size are not inconsistent with much of the research already conducted in genetic programming ([12], [17], [18], [28]). In addition, a large population size will eliminate the need for fitness scaling of the individuals within that population. The reason for scaling is to reduce the effect of fitness outliers and so restrict convergence. Convergence is mostly an issue in smaller populations, because as population size increases so too does the natural diversity of the gene pool.
The crossover and mutation rates selected were based on the findings of De Jong. As stated by Goldberg, “in De Jong’s (1975) study of genetic algorithms in function optimization, a series of parametric studies across a fivefunction suite of problems suggested that good GA performance requires the choice of a high crossover probability, a low mutation probability (inversely proportional to the population size), and a moderate population size” [10, 71].
To account for the effect of initially random populations, to allow a sufficient opportunity of success for each selection method, and to accumulate adequate data for a distribution of the ‘learning curves’, each of the selection methods was given ten trials of 50 runs.
At the beginning of each trial, prior to the generation of the initial random population (generation zero), the random number generator was seeded with a new value and over 1,000 constants were randomly generated. These randomly generated constants were tabulated and made available to the system, in the generation of the initial random population, for the duration of each trial. In this way, over the course of the 10 trials, the system had access to over 10,000 floating point constants to generate a solution.
3.2.2 Populating Generation Zero
The main criteria for generating the initial population is as follows:
Part
of the reason for using Park and Miller’s random number generator was
that, unlike the trials made by Koza, there is no guaranteed uniqueness
of individuals [17]. Another reason for this implementation was to allow
the replication of results. Because random number generators vary greatly
from compiler to compiler, the source code for this pseudorandom number
generator has been provided at Appendix V.
3.2.3 Devising a Fitness Function
A fitness function to balance the pole for a given period can be defined as dependent on the period that both the cart position and pole angle are within predefined limits. This function would be of the form:
Equation
4: Standard Fitness Function
A surprising situation arose from the research conducted by Randall et al. [25]. The fitness function used by the genetic algorithm in that case was similar to that noted above. Interestingly, although the controller was not explicitly told to centre the cart, the results of that research indicate that the controller did so anyway. For this reason, the centring of the cart was extraneous to the fitness of the chromosomes and is deserving of further investigation.
The fitness function for the genetic program should explicitly require the cart to be centred. This condition can be achieved by including a nominal bonus for the cart’s proximity to the centre of the track.
The fitness function, adapted to centre the cart is as follows:
Equation
5: Explicit Centring in Fitness Function
There are, presently, numerous approaches that can be used to scale the fitness of the population. The purpose of each of these measures is the same: to reduce the affect of fitness outliers in the population. In other words, scaling is intended to avoid the domination of a population by some ‘super’ individual in the span of a few generations [10]. This kind of domination is analogous with premature convergence.
Alternatively, the process of scaling is intended “to maintain selective pressure as the overall performance rises within the population” [5, 195]. In this way, the scaling mechanism is used to distinguish the performance of chromosomes when improvement among the population is minute.
For the purpose of this experiment, since part of the investigation is based on convergence, fitness scaling has not been applied. In this way, the effect of outliers upon the runs were determined solely by the selection technique being used.
3.2.4 Devising a Function Set and Terminal Set
The function set and terminal set have a major effect upon the potential performance of the genetic program. To devise the function set and terminal set, the programmer must have a reasonable understanding of the problem area to ensure that all the necessary operators, variables, and range of numerical values are available in the ‘gene pool’. On the other hand, extraneous members in the function and terminal sets will only reduce the efficiency of the program’s search.
The function set (FSET) represents the operators that are available in the gene pool. The following operators have been selected for the cartpole problem:
FSET
= { +, , *, %, ANTI}
where:
+
= addition operator.

= subtraction operator.
*
= multiplication operator.
%
= a modified (safe) division operator.
ANTI
= a function to reverse the sign of its argument.
The terminal set is a set of numerical values or operators without arguments. As with the function set, the terminals in the set must be adequate for the system to find a solution. The terminals identified in the cartpole problem are the simulation state variables ( x, v,Ø,w), and the randomly generated floating point constants (RANDFLOAT) within the range of 10.0 to 10.0. This terminal set (TSET) is (conventionally) represented as:
TSET = { x, v,Ø ,w, RANDFLOAT}
3.2.5 Criterion for Termination
The criterion for termination was the specified maximum number of generations. The main reasons to terminate a genetic programming run prior to the allowed generations are to reduce processor cost and develop a solution in the shortest period of time. The benefits of early termination for these reasons are machine and language dependent, so were not of interest in this base line research.
Early termination assumes that the optimal control strategy is known and the genetic program’s fitness function reflects a performance rating of that optimal strategy. Alternatively, a nominal ‘satisfactory’ fitness could be defined as the termination criterion.
Additional termination criteria would have been likely to bias the data produced by the experiment towards those techniques (or even runs) that performed well at online learning. However, offline learning was also to be investigated from the results. Hence, a predefined generation span, as the sole termination criterion, was selected as an equitable parameter for the genetic program in the investigation of both online and offline performance.
3.2.6 Selection of Genetic Operators
The genetic operators that were investigated in this research are the selection methods. To maintain consistency for the comparison of these methods, all additional operators will be used with the same parameters as those shown in table 2. The parameters in table 2 represent the most fundamental implementation of the genetic programming paradigm.
The main selection methods under investigation are:
The simulation of the CartPole problem follows the dynamic equation [9, 42]:
Equation
6: Simulation of The CartPole Problem
The proportional mass of the pole to the system ( µ_{p }) and acceleration (a) are given, respectively, by:
where:
From equation 6 above, the position and velocity of the cart, respectively, at any time are given by Euler’s approximate integration:
Equation
7: Cart Position
and
the angular position and angular velocity, respectively, at any time are
given by:
The parameters for the simulation will follow those used by Geva and Sitte described as “the most often used version of the pole balancing problem” [7, 40].
Table 4: Standard Parameters for CartPole Simulation.
Parameter  Value  
Track Length  l _{t }  _{2.4 m.} 
_{Cart Mass }  m_{c}  _{1.0 kg.} 
_{Pole Mass }  m_{p}  _{0.1 kg.} 
_{Pole Length }  _{2 lambda}  _{1 m.} 
_{Gravity }  _{g }  _{9.8 m/s.}^{2} 
^{Force}  ^{F}  ^{10 N} 
^{Failure Angles }  ^{0.21 rad.}  
^{Discrete Time Step}  ^{0.02 s.} 
The initial starting positions for the simulation (table 4) are those identified by Geva and Sitte as being of appropriate difficulty to minimise the effect of random solutions (table 1) and so reduce the probability of deriving a solution based on random search of the problem space [7]:
Table 5: Initial Conditions for Simulation
Parameter  Value 
_{Cart Position (x)}  _{1 m.} 
_{Cart Velocity (v)}  _{1 m/s.} 
_{Pole Angle (Ø)}  _{0.17 rad.} 
_{Angular Velocity (w)}  _{0.18 rad/s.} 
_{Balance Time }  _{1,000 s.} 
The
final implementation of this simulation is written in C++ and the simulation
runs on a Sun SparcStation/4. The main reasons C++ was chosen as the implementation
language are:
3.2.9
Performance Specifications
The simulation uses the parameters recommended by Geva and Sitte to ensure both a minimal random effect and a suitable complexity of the task.
In
order to describe the performance of the controller, Geva and Sitte suggest
the following specifications:
For
the purpose of this experiment, the performance specifications 1,2, and
3 are adequate to describe the performance of the controllers: specifications
4,5, and 6 were considered either inappropriate or inapplicable for this
experiment.
To comply with specification 4 it may, at face value, seem reasonable to sum the ‘time to failure’ of the best individuals of each generation prior to the successful generation to measure the total training time of the system. There are, however, two main characteristics of genetic systems that would make this measurement inaccurate.
First, because genetic systems explore numerous points (up to population size) in the problem state space, the extent to which the best chromosomes of prior generations actually contributed to the ‘hillclimb’ of the final solution is not known.
Secondly, even if the level of contribution of a solution’s ancestors could be identified, the ‘inherent parallelism’ displayed by genetic systems (i.e., the exploration of ‘unseen Boolean hyperplanes’) means that the system’s learning is not just a function of a chromosome’s ancestors. In fact, as the learning in genetic systems extends beyond the population, for a true measure of the ‘effort’ to learn, the extent of the contribution made by chromosomes not present in the population would also need to be identified (section 2.1) [13].
As an alternative to measuring the ‘training time’ of the successful controllers, the generation in which success was achieved is used as the gauge of ‘effort’ by the system to derive a solution. The statistical comparison of these generations, to identify any significant difference, is described in detail in section 3.3.
Specifications 5 and 6 above are intended to describe how ‘robust’ the solutions are in the performance of pole balancing and cart centring. However, the purpose of this research was not to identify the best possible solution to the cartpole problem. Rather, the purpose of this research was to compare the performance of the selection methods used when genetic programming is applied to this difficult dynamic control problem. After the genetic program performed the task it was explicitly asked to perform, through the fitness function, any additional abilities of the controllers are incidental as they are extraneous.
3.2.10 Raw Data Output of the Simulation
In order to conduct the comparison of selection methods, and establish that the genetic programming paradigm can be successfully applied to the cartpole problem the following details were recorded for each of the runs:
Based on the data collected from the experiment outlined above (section 3.2.10), the purpose of the statistical analysis in this research project was to rate the relative performance of each selection method. Using the raw data recorded during the runs, a ‘learning curve’ could be analysed for each selection method.
The selection of a statistical technique to rate the performance of learning is problematic. According to Davis, “Repetitions of any design problem using any search suite method generally show variation in results owing to random effects. Statistical methods are necessary to permit sound conclusions on effectiveness in the presence of this statistical noise. We establish confidence in assessments of the relative effectiveness of these methods by analysing experimental results with the t test” [5, 119]. Unfortunately however, the fitness of chromosomes are often scaled throughout the run which has the effect of normalising the data. Fitness scaling was not applied in this experiment (section 3.2.3.1).
According to Koza, a “generally improving (but not necessarily monotonic) trend in average fitness is typical of genetic algorithms” (chart 12) [16, 69]. While the data from this experiment did follow a general trend, a standard parametric t test as suggested by Davis (above) was not appropriate owing to the distinctly nonnormal distributions of the results (chart 7). In particular, a uniform transformation of the raw data across each run of a selection method was elusive because of the unique distributions of the data for each respective selection technique. The generational data, however, did show a natural normal distribution, which allowed the use of distribution analysis by confidence interval estimate as an alternative.
In the findings of Randall et al ., the typical learning on a single, successful run shows the typical learning of the genetic algorithm [25]. This behaviour conforms with that found in this experiment where the results show little improvement in the fitness of individual chromosomes until sudden convergence of high fitness in the population occurs in the span of one or two generations. These findings display consistently low fitness for much of the learning, with some scatter, and uniform high fitness thereafter (charts 17  19).
These results indicate that distinctly different possible outcomes, and therefore distributions of variances in the data, can be expected in any strategy involving the genetic algorithm. As the distribution of the data dictates the appropriate statistical technique for any comparison, it was not possible to select any one particular technique to be used. Rather, numerous tests were employed to measure various aspects of the system’s learning, and to identify the significance of any performance variance.
The
analysis of learning concentrated on four aspects of learning: the rate
of learning of successful controllers; the overall online performance
of each selection technique; the overall offline performance of each selection
technique; and the significance of any differences in overall performance
as a result of the selection technique.
3.3.1 De Jong’s Performance Measures
In 1975, according to Goldberg, “De Jong completed his important and pivotal dissertation, ‘An Analysis of the Behaviour of a Class of Genetic Adaptive Systems’. De Jong’s study still stands as a milestone in the development of genetic algorithms ... To quantify the effectiveness of different genetic algorithms, De Jong devised two measures, one to gauge convergence and the other to gauge ongoing performance. He called these measures offline (convergence) and online (ongoing) performance respectively” [10, 107].
Although these comparison methods are not conventional statistical measures, the statement made above by Goldberg suggests that these measures are both well known and respected in the field of genetic algorithms. As noted in section 3.3, the trends and distributions of the raw data accumulated in this research parallel those found in genetic algorithms. This parallel was to be expected, as the basic algorithm of genetic programming is the same as that for genetic algorithms. For these reasons, De Jong’s performance measures have been applied in this research.
Online performance is a measure designed to determine a genetic system’s ability to perform well in online applications. The major requirement of online applications is that the adaptive system is expected to find a ‘satisfactory’ solution rapidly. When a solution is found, the system may (or may not) continue its search for superior solutions. The main characteristic of an online application is that learning is conducted ‘online’. In other words, the system is trained ‘onthejob’.
De Jong’s definition of online performance is shown in equation below.
Equation 9: De Jong's OnLine Performance
This equation states that the online performance of a strategy on a given environment is the average fitness of all trials up to the current trial inclusively.
Offline performance is a measure of convergence. It is intended to indicate the expected performance of a genetic system’s ability when applied in offline applications. As the name implies, in offline learning there is no expectation of the system to learn rapidly. Rather, the system conducts its learning until reaching some termination criterion, independent of the quality of found solutions (e.g. the number of generations). In an offline learning environment, the system is afforded the luxury of saving the best solutions found so far, so that they can be produced later upon satisfaction of its termination criterion.
The offline performance of a strategy is represented by the equation:
Equation 10: De Jong's OffLine Performance
Thus, the offline performance of a strategy is a cumulative average of the fitness of the best individuals from that strategy up to and including the current trial.
3.3.2 Distribution Analysis of Performance
Distribution analysis was applied to the observed average of the best performance and to the observed average of the mean performance in order to identify any significant difference in the results over the course of the runs. The analysis for both measures used a confidence interval estimate of 95 per cent.
The equation applied to determine the boundaries of the 95 per cent confidence intervals is shown below.
Equation
11: Confidence Interval Estimate for Performance Measures
The rate of learning of successful controllers required the use of an unusual statistical test owing to the small sample sizes and the difference in sample sizes. In order to test for significant differences, the generation in which the controller was found was tabulated for each selection technique, the common median was found and ‘the extension of the median test’ was performed.
The Hypothesis tested was:
H_{0  The rate of learning of the genetic program is independent of the selection technique,i.e., there is no significant difference in the learning rates.}
H_{1  The rate of learning is not independent of the selection technique.}
_{To afford confidence in the conclusions, the level of significance for this test was = 0.05.}
To determine the value of X^{2}, the extension of the median test follows the standard formula for the X^{2} test for independent samples:
Equation 12: The Extension of the Median Test
In
other words, the extension of the median test is the sum, over all cells,
of the observed number of cases below (or above) the common median, less
the expected number of cases, then squared, and then divided by the expected
number of cases at that cell.In light of the small sample size and in consideration
of the insensitivity of this test to sample variance, only extreme independence
would have been identified. For this reason, an analysis of variance was
also performed on the same data.