Research Methodology

3.1 Outline

Numerous research efforts have been made to establish the relative suitability of various selection techniques that can be applied in genetic programming. A major problem in any attempt at ranking performance has been the apparent bias of different operators to different tasks ([5], [6]). Although the genetic programming paradigm has been available since 1990, there is still some debate in this area [16].

The purpose of this research was to evaluate the performance of the most commonly employed selection methods when applied to a difficult, unstable, non-linear control problem: the cart-pole problem with difficult starting conditions. After quantifying the performance of these methods, a comparison of relative utility was conducted to identify any significant variation in their suitability to this problem domain.

The Chapter that follows is intended to describe the experimental and analytical methods that were adopted for this research. Where it was considered necessary, the reasoning behind the choices that were made is also included.

3.2 Experimental Method

The main factors under investigation were the selection techniques. The selection strategies that were tested against the cart-pole problem are noted in the following sections. The genetic operators and their parameters, excluding selection operators, remained constant throughout the experiment. By convention, the main features of the program are represented in the form of a tableau (table 1). Additional parameters and further explanations are covered in section 3.2.1.

In the standard genetic programming tableau, if automatic function definition is to be used, a small section of the tableau is dedicated to the parameters that are required for this technique (such as separate function and terminal sets). Much of the reason for this inclusion is to allow the reader to identify the extent of additional domain knowledge the author has afforded the genetic program.

In the design of this experiment, the decision was made that the genetic programming paradigm should be applied to the cart-pole problem without automatic function definition. A minimum of domain knowledge can then be supplied to the system. Thus, not only is the difficulty of the problem maintained, but also the performance of thje system can be attributed to the selection technique used at the time of each run.

Table 2: Genetic Program Tableau
 
 
Objective To evolve a solution to the cart-pole problem that will balance the pole for a period of 1000s without exceeding the limits outlined by Geva and Sitte [25]. The solution should also centre the cart
Terminal Set  The state variables from the simulation and randomly generated floating point constants.
Function Set   Addition (+), Subtraction (-), Multiplication (*), a modified Division operator (%), and an operator to Reverse the sign of its argument (ANTI). 
Raw Fitness The number of time steps the solution managed to conform to the parameters of the simulation (i.e., time to failure / 0.02). An additional bonus is awarded for centring the cart (maximum of 10 raw fitness points).
Wrapper  A Spinglass ‘if’ function that returns 1 if its argument is greater than zero, otherwise it returns -1.
Parameters  Population = 4,000; Total Generations = 51.
Termination Criterion Terminates only upon completion of all generations.

3.2.1 Parameter Settings

There are no formally recommended criteria for the selection of an optimal range of parameters in either genetic algorithms or genetic programming. According to Koza, “The problem of optimal parameter selection is an interesting and important issue affecting both string-based algorithms and hierarchical genetic algorithms (as well as other machine learning paradigms). The problem of optimal parameter selection is deserving of future work” [16, 15]. At present, the selection of these parameters is a subjective judgement made by the writer of the genetic program/algorithm.

The parameters for the genetic operators that were used throughout this investigation have been included in table 3.
 

Table 3: Program Parameters
Parameter Value
Population Size   4,000
Number of Generations  51
Runs per Implementation 50
Crossover Rates 90%
Fitness Proportionate Reproduction 10%
Mutation Rate 0%
Maximum Length after Crossover  15 genes
Minimum Length in Generation Zero 3 genes
Number of Trials 10 per Selection Method

Koza identifies the two major parameters in the algorithm as population size and the number of generations [17]. For this experiment, the complexity of the problem suggests a large population size for two main reasons. First, a large population is necessary to maintain a store of genetic material capable of solving the problem over the course of evolution. Secondly, larger populations are less likely to suffer a premature convergence over an unsatisfactory solution [17].

Although the relatively large population size that was used for the runs may seem inconsistent with the findings of De Jong (mentioned below), populations of this size are not inconsistent with much of the research already conducted in genetic programming ([12], [17], [18], [28]). In addition, a large population size will eliminate the need for fitness scaling of the individuals within that population. The reason for scaling is to reduce the effect of fitness outliers and so restrict convergence. Convergence is mostly an issue in smaller populations, because as population size increases so too does the natural diversity of the gene pool.

The crossover and mutation rates selected were based on the findings of De Jong. As stated by Goldberg, “in De Jong’s (1975) study of genetic algorithms in function optimization, a series of parametric studies across a five-function suite of problems suggested that good GA performance requires the choice of a high crossover probability, a low mutation probability (inversely proportional to the population size), and a moderate population size” [10, 71].

To account for the effect of initially random populations, to allow a sufficient opportunity of success for each selection method, and to accumulate adequate data for a distribution of the ‘learning curves’, each of the selection methods was given ten trials of 50 runs.

At the beginning of each trial, prior to the generation of the initial random population (generation zero), the random number generator was seeded with a new value and over 1,000 constants were randomly generated. These randomly generated constants were tabulated and made available to the system, in the generation of the initial random population, for the duration of each trial. In this way, over the course of the 10 trials, the system had access to over 10,000 floating point constants to generate a solution.

3.2.2 Populating Generation Zero

The main criteria for generating the initial population is as follows:


Part of the reason for using Park and Miller’s random number generator was that, unlike the trials made by Koza, there is no guaranteed uniqueness of individuals [17]. Another reason for this implementation was to allow the replication of results. Because random number generators vary greatly from compiler to compiler, the source code for this pseudo-random number generator has been provided at Appendix V.
 

3.2.3 Devising a Fitness Function

A fitness function to balance the pole for a given period can be defined as dependent on the period that both the cart position and pole angle are within predefined limits. This function would be of the form:

Equation 4: Standard Fitness Function
Standard Fitness Function

A surprising situation arose from the research conducted by Randall et al. [25]. The fitness function used by the genetic algorithm in that case was similar to that noted above. Interestingly, although the controller was not explicitly told to centre the cart, the results of that research indicate that the controller did so anyway. For this reason, the centring of the cart was extraneous to the fitness of the chromosomes and is deserving of further investigation.

The fitness function for the genetic program should explicitly require the cart to be centred. This condition can be achieved by including a nominal bonus for the cart’s proximity to the centre of the track.

The fitness function, adapted to centre the cart is as follows:

Equation 5: Explicit Centring in Fitness Function

3.2.3.1 Fitness Scaling

There are, presently, numerous approaches that can be used to scale the fitness of the population. The purpose of each of these measures is the same: to reduce the affect of fitness outliers in the population. In other words, scaling is intended to avoid the domination of a population by some ‘super’ individual in the span of a few generations [10]. This kind of domination is analogous with premature convergence.

Alternatively, the process of scaling is intended “to maintain selective pressure as the overall performance rises within the population” [5, 195]. In this way, the scaling mechanism is used to distinguish the performance of chromosomes when improvement among the population is minute.

For the purpose of this experiment, since part of the investigation is based on convergence, fitness scaling has not been applied. In this way, the effect of outliers upon the runs were determined solely by the selection technique being used.

3.2.4 Devising a Function Set and Terminal Set

The function set and terminal set have a major effect upon the potential performance of the genetic program. To devise the function set and terminal set, the programmer must have a reasonable understanding of the problem area to ensure that all the necessary operators, variables, and range of numerical values are available in the ‘gene pool’. On the other hand, extraneous members in the function and terminal sets will only reduce the efficiency of the program’s search.

3.2.4.1 The Function Set

The function set (FSET) represents the operators that are available in the gene pool. The following operators have been selected for the cart-pole problem:

FSET = { +, -, *, %, ANTI}
where:
+ = addition operator.
- = subtraction operator.
* = multiplication operator.
% = a modified (safe) division operator.
ANTI = a function to reverse the sign of its argument.

3.2.4.2 The Terminal Set

The terminal set is a set of numerical values or operators without arguments. As with the function set, the terminals in the set must be adequate for the system to find a solution. The terminals identified in the cart-pole problem are the simulation state variables ( x, v,Ø,w), and the randomly generated floating point constants (RANDFLOAT) within the range of -10.0 to 10.0. This terminal set (TSET) is (conventionally) represented as:

TSET = { x, v,Ø ,w, RANDFLOAT}

3.2.5 Criterion for Termination

The criterion for termination was the specified maximum number of generations. The main reasons to terminate a genetic programming run prior to the allowed generations are to reduce processor cost and develop a solution in the shortest period of time. The benefits of early termination for these reasons are machine and language dependent, so were not of interest in this base line research.

Early termination assumes that the optimal control strategy is known and the genetic program’s fitness function reflects a performance rating of that optimal strategy. Alternatively, a nominal ‘satisfactory’ fitness could be defined as the termination criterion.

Additional termination criteria would have been likely to bias the data produced by the experiment towards those techniques (or even runs) that performed well at on-line learning. However, off-line learning was also to be investigated from the results. Hence, a predefined generation span, as the sole termination criterion, was selected as an equitable parameter for the genetic program in the investigation of both on-line and off-line performance.

3.2.6 Selection of Genetic Operators

The genetic operators that were investigated in this research are the selection methods. To maintain consistency for the comparison of these methods, all additional operators will be used with the same parameters as those shown in table 2. The parameters in table 2 represent the most fundamental implementation of the genetic programming paradigm.

The main selection methods under investigation are:

3.2.7 Simulation of the Cart-Pole Problem

The simulation of the Cart-Pole problem follows the dynamic equation [9, 42]:

Equation 6: Simulation of The Cart-Pole Problem

The proportional mass of the pole to the system ( µp ) and acceleration (a) are given, respectively, by:

where:

From equation 6 above, the position and velocity of the cart, respectively, at any time are given by Euler’s approximate integration:

Equation 7: Cart Position

and the angular position and angular velocity, respectively, at any time are given by:

Equation 8: Pole Position

The parameters for the simulation will follow those used by Geva and Sitte described as “the most often used version of the pole balancing problem” [7, 40].

Table 4: Standard Parameters for Cart-Pole Simulation.
Parameter  Value
Track Length  l 2.4 m.
Cart Mass  mc 1.0 kg.
Pole Mass  mp 0.1 kg.
Pole Length  2 lambda 1 m.
Gravity  9.8 m/s.2
Force F 10 N
Failure Angles  0.21 rad.
Discrete Time Step 0.02 s.

The initial starting positions for the simulation (table 4) are those identified by Geva and Sitte as being of appropriate difficulty to minimise the effect of random solutions (table 1) and so reduce the probability of deriving a solution based on random search of the problem space [7]:

Table 5: Initial Conditions for Simulation
Parameter  Value
Cart Position  (x) 1 m.
Cart Velocity (v) 1 m/s.
Pole Angle  (Ø) 0.17 rad.
Angular Velocity (w) 0.18 rad/s.
Balance Time  1,000 s.

3.2.8 Implementation Language

The final implementation of this simulation is written in C++ and the simulation runs on a Sun SparcStation/4. The main reasons C++ was chosen as the implementation language are:
 


3.2.9 Performance Specifications

The simulation uses the parameters recommended by Geva and Sitte to ensure both a minimal random effect and a suitable complexity of the task.

In order to describe the performance of the controller, Geva and Sitte suggest the following specifications:
 


For the purpose of this experiment, the performance specifications 1,2, and 3 are adequate to describe the performance of the controllers: specifications 4,5, and 6 were considered either inappropriate or inapplicable for this experiment.

To comply with specification 4 it may, at face value, seem reasonable to sum the ‘time to failure’ of the best individuals of each generation prior to the successful generation to measure the total training time of the system. There are, however, two main characteristics of genetic systems that would make this measurement inaccurate.

First, because genetic systems explore numerous points (up to population size) in the problem state space, the extent to which the best chromosomes of prior generations actually contributed to the ‘hill-climb’ of the final solution is not known.

Secondly, even if the level of contribution of a solution’s ancestors could be identified, the ‘inherent parallelism’ displayed by genetic systems (i.e., the exploration of ‘unseen Boolean hyperplanes’) means that the system’s learning is not just a function of a chromosome’s ancestors. In fact, as the learning in genetic systems extends beyond the population, for a true measure of the ‘effort’ to learn, the extent of the contribution made by chromosomes not present in the population would also need to be identified (section 2.1) [13].

As an alternative to measuring the ‘training time’ of the successful controllers, the generation in which success was achieved is used as the gauge of ‘effort’ by the system to derive a solution. The statistical comparison of these generations, to identify any significant difference, is described in detail in section 3.3.

Specifications 5 and 6 above are intended to describe how ‘robust’ the solutions are in the performance of pole balancing and cart centring. However, the purpose of this research was not to identify the best possible solution to the cart-pole problem. Rather, the purpose of this research was to compare the performance of the selection methods used when genetic programming is applied to this difficult dynamic control problem. After the genetic program performed the task it was explicitly asked to perform, through the fitness function, any additional abilities of the controllers are incidental as they are extraneous.

3.2.10 Raw Data Output of the Simulation

In order to conduct the comparison of selection methods, and establish that the genetic programming paradigm can be successfully applied to the cart-pole problem the following details were recorded for each of the runs:

3.3 Statistical Techniques

Based on the data collected from the experiment outlined above (section 3.2.10), the purpose of the statistical analysis in this research project was to rate the relative performance of each selection method. Using the raw data recorded during the runs, a ‘learning curve’ could be analysed for each selection method.

The selection of a statistical technique to rate the performance of learning is problematic. According to Davis, “Repetitions of any design problem using any search suite method generally show variation in results owing to random effects. Statistical methods are necessary to permit sound conclusions on effectiveness in the presence of this statistical noise. We establish confidence in assessments of the relative effectiveness of these methods by analysing experimental results with the t test” [5, 119]. Unfortunately however, the fitness of chromosomes are often scaled throughout the run which has the effect of normalising the data. Fitness scaling was not applied in this experiment (section 3.2.3.1).

According to Koza, a “generally improving (but not necessarily monotonic) trend in average fitness is typical of genetic algorithms” (chart 12) [16, 69]. While the data from this experiment did follow a general trend, a standard parametric t test as suggested by Davis (above) was not appropriate owing to the distinctly non-normal distributions of the results (chart 7). In particular, a uniform transformation of the raw data across each run of a selection method was elusive because of the unique distributions of the data for each respective selection technique. The generational data, however, did show a natural normal distribution, which allowed the use of distribution analysis by confidence interval estimate as an alternative.

In the findings of Randall et al ., the typical learning on a single, successful run shows the typical learning of the genetic algorithm [25]. This behaviour conforms with that found in this experiment where the results show little improvement in the fitness of individual chromosomes until sudden convergence of high fitness in the population occurs in the span of one or two generations. These findings display consistently low fitness for much of the learning, with some scatter, and uniform high fitness thereafter (charts 17 - 19).

These results indicate that distinctly different possible outcomes, and therefore distributions of variances in the data, can be expected in any strategy involving the genetic algorithm. As the distribution of the data dictates the appropriate statistical technique for any comparison, it was not possible to select any one particular technique to be used. Rather, numerous tests were employed to measure various aspects of the system’s learning, and to identify the significance of any performance variance.

The analysis of learning concentrated on four aspects of learning: the rate of learning of successful controllers; the overall on-line performance of each selection technique; the overall off-line performance of each selection technique; and the significance of any differences in overall performance as a result of the selection technique.
 
 

3.3.1 De Jong’s Performance Measures

In 1975, according to Goldberg, “De Jong completed his important and pivotal dissertation, ‘An Analysis of the Behaviour of a Class of Genetic Adaptive Systems’. De Jong’s study still stands as a milestone in the development of genetic algorithms ... To quantify the effectiveness of different genetic algorithms, De Jong devised two measures, one to gauge convergence and the other to gauge on-going performance. He called these measures off-line (convergence) and on-line (on-going) performance respectively” [10, 107].

Although these comparison methods are not conventional statistical measures, the statement made above by Goldberg suggests that these measures are both well known and respected in the field of genetic algorithms. As noted in section 3.3, the trends and distributions of the raw data accumulated in this research parallel those found in genetic algorithms. This parallel was to be expected, as the basic algorithm of genetic programming is the same as that for genetic algorithms. For these reasons, De Jong’s performance measures have been applied in this research.

3.3.1.1 On-Line Performance

On-line performance is a measure designed to determine a genetic system’s ability to perform well in on-line applications. The major requirement of on-line applications is that the adaptive system is expected to find a ‘satisfactory’ solution rapidly. When a solution is found, the system may (or may not) continue its search for superior solutions. The main characteristic of an on-line application is that learning is conducted ‘on-line’. In other words, the system is trained ‘on-the-job’.

De Jong’s definition of on-line performance is shown in equation below.

Equation 9: De Jong's On-Line Performance

This equation states that the on-line performance of a strategy on a given environment is the average fitness of all trials up to the current trial inclusively.

3.3.1.2 Off-Line Performance

Off-line performance is a measure of convergence. It is intended to indicate the expected performance of a genetic system’s ability when applied in off-line applications. As the name implies, in off-line learning there is no expectation of the system to learn rapidly. Rather, the system conducts its learning until reaching some termination criterion, independent of the quality of found solutions (e.g. the number of generations). In an off-line learning environment, the system is afforded the luxury of saving the best solutions found so far, so that they can be produced later upon satisfaction of its termination criterion.

The off-line performance of a strategy is represented by the equation:

Equation 10: De Jong's Off-Line Performance

Thus, the off-line performance of a strategy is a cumulative average of the fitness of the best individuals from that strategy up to and including the current trial.

3.3.2 Distribution Analysis of Performance

Distribution analysis was applied to the observed average of the best performance and to the observed average of the mean performance in order to identify any significant difference in the results over the course of the runs. The analysis for both measures used a confidence interval estimate of 95 per cent.

The equation applied to determine the boundaries of the 95 per cent confidence intervals is shown below.

Equation 11: Confidence Interval Estimate for Performance Measures

3.3.3 The Rate of Learning

The rate of learning of successful controllers required the use of an unusual statistical test owing to the small sample sizes and the difference in sample sizes. In order to test for significant differences, the generation in which the controller was found was tabulated for each selection technique, the common median was found and ‘the extension of the median test’ was performed.

The Hypothesis tested was:

H0 - The rate of learning of the genetic program is independent of the selection technique,i.e., there is no significant difference in the learning rates.

H1 - The rate of learning is not independent of the selection technique.

To afford confidence in the conclusions, the level of significance for this test was = 0.05.

To determine the value of X2, the extension of the median test follows the standard formula for the X2 test for independent samples:

Equation 12: The Extension of the Median Test

In other words, the extension of the median test is the sum, over all cells, of the observed number of cases below (or above) the common median, less the expected number of cases, then squared, and then divided by the expected number of cases at that cell.In light of the small sample size and in consideration of the insensitivity of this test to sample variance, only extreme independence would have been identified. For this reason, an analysis of variance was also performed on the same data.
 


[Chapter 2] [Chapter 4]
[Beer]