Results

This Chapter contains the results obtained from the genetic programming runs. The raw data obtained from the runs comprised the generational information for 500 runs for each of the selection strategies. The data recording the state changes of the simulation were also accumulated for those controllers that managed to balance the pole for the 1,000 seconds.

Based on the collected data, the simulation results allow for an analysis of the successful controllers with regard to their performance, i.e., those mentioned in section 3.2.9.

The analysis of selection techniques was based on a comparison of their respective performance in on-line learning, off-line learning and the rate of learning.

A number of limitations have been identified that may have had an effect on results that follow and the repetition of this experiment. These limitations may be divided into two areas of concern: simulation errors; and time constraints.

Computer simulation errors can be the result of poor logic (within the domain of the programmer’s control), or may result from the mathematical accuracy (the rounding error) of the chosen programming language. While logic errors can be avoided by desk checking the algorithm (by hand), rounding error can result in simulation noise that makes the reproduction of identical results unlikely. Efforts were made in the programming of this simulation to minimise the effect of this noise by converting fractional values to base data types; however, the simulation formula itself is intensely reliant upon floating point arithmetic with some extremely small values.

As an example of the effect this type of rounding error can produce, in validating the accuracy of the simulation used in this experiment, known values were tested against the simulation. The known values where those given by Geva and Sitte [9].

The controller that was given has the form:

F =
Fmax sgn[ 0.02
*x* + 0.06*v* + 0.94*Ø* + 0.33*w*].

The
results reported from this controller stated that the value of RMS
*x*
after 1,000 seconds should be 0.21m. The RMS amplitude of *x* that
resulted from the same controller applied to the simulation used in this
experiment for 1,000 seconds was 0.20086m.

As the purpose of this experiment was to measure the performance of selection strategies, any noise resulting from the simulation would have been consistent for each of these strategies. Thus, any comparison would have been based on a uniform problem environment. Readers should keep this in mind if comparing the results found from their own research should their programming environment be inconsistent with that used in this experiment. For this reason, the source code for the experiment has been supplied at Appendix V to this thesis should the reader wish to reproduce the conditions of this experiment.

The other limitation, that of time constraint, affects the quantity of data collected for analysis. Unfortunately, this analysis is based on a relatively small sample size, with respect to both the complexity of the problem and the effect of the initial random population on the final output of the genetic program. Although the experimental runs were made on a Sun/SparcStation 4 and those runs could be made concurrently, computational resources were limited. Particularly in light of the duration of each run and the intensive processor requirements of both genetic programming and the cart-pole simulation. These factors made the accumulation of further samples impractical.

**4.3
Performance of Controllers**

This section is concerned with identifying the extent to which the controllers produced through this experiment satisfy the performance specifications (where applicable) outlined in section 3.2.9. The following sections display the performance of the best controllers found by each selection method.

*Table
6: The Best Controller found by Roulette Wheel Selection*

Evolved Controller | (+
Ø, (% (+ w , (% (+ x, (% v , (+ Ø,
1.922))), 1.922)), 7.230)) |

Minimised | Ø
+ (w ( x + v/(Ø + 1.922 )) / 1.922) /
7.23 |

Raw Fitness | 50,009 |

Occurred In Run | 401 |

Occurred In Generation | 42 |

*Table
7: The Best Controller found by Tournament Selection*

Evolved Controller | (+w,
(+ (+ (+ x, (* 9.905, (* 2.171, Ø))), (* w, 2.171)),
Ø)) |

Minimised | x
+ 22.503755Ø + 3.171w |

Raw Fitness | 50,009 |

Occurred In Run | 395 |

Occurred In Generation | 22 |

*Table
8: The Best Controller found by Expected Value Model Selection*

Evolved Controller | (+
Ø, (+ (+ v, (* Ø, (+ w, 7.206))), (* w,
(* w, w)))) |

Minimised | v
+ 8.206Ø + Øw + w^{3} |

^{Raw
Fitness } |
^{50,004} |

^{Occurred
In Run } |
^{294} |

^{Occurred
In Generation } |
^{25} |

**4.3.1
Specification 1(a): Pole Angle of Controllers**

A plot of the pole angle for the controller for the first 20 seconds of the simulation shows the strategy adopted for balancing the pole. The plots for the best solutions derived by each respective selection technique are offered for inspection.

A plot of the state changes caused by the controller evolved through roulette wheel selection is displayed below.

*Chart
1: Roulette Wheel - Pole Angle V Time*

The next plot shows the state changes of the pole angle for the controller found through tournament selection.

*Chart
2: Tournament - Pole Angle V Time*

The plot below describes the behaviour of the pole angle that resulted from the best controller found through the application of the expected value model selection technique.

*Chart
3: Expected Value Model - Pole Angle V Time*

**4.3.2
Specification 1(b): Cart Position of Controllers**

The diagrams included in this section plot the position of the cart relative to the centre of the track for the first 20 seconds of the simulation. This complies with the second performance measure included in specification 1 at section 3.2.9.

The first chart, shown below, plots the behaviour of the cart (i.e., its position) when controlled by the solution evolved while the genetic program used the roulette wheel selection method.

*Chart
4: Roulette Wheel - Cart Position V Time*

The plot showing the cart position state changes for the best controller derived by tournament selection is displayed below.

*Chart
5: Tournament - Cart Position V Time*

The chart below represents the strategy used by the controller that was evolved through the expected value model selection technique.

*Chart
6: Expected Value Model - Cart Position V Time*

**4.3.3
Specifications 2 and 3: RMS Amplitudes**

The RMS amplitudes of both the pole angle and the cart position are measures intended to describe the extent of the controller’s ability to centre the cart and stabilise the system over the course of the simulation (i.e., 1,000 seconds). For both of the RMS amplitudes described below, the performance of the controller improves as the value approaches zero.

*Table
9: RMS Amplitudes of Cart Position and Pole Angle*

State Variable | Roulette Wheel | Tournament | Expected Value Model |

(metres)x |
0.054220 | 0.074286 | 1.049271 |

(radians) Ø |
0.000081 | 0.000092 | 0.000085 |

**4.4
Performance of Selection Techniques**

Although only three controllers were described in the previous sections, the genetic programming paradigm managed to evolve a total of 13 controllers that could balance the pole for the specified 1,000 seconds. As is evident from the details shown in section 4.3, each of the selection techniques tested managed to evolve controllers that could balance the pole, within the parameters of the simulation, for the specified 1,000 seconds.

The ‘learning curves’ for each of the runs that produced the best controllers described in section 4.3 are shown on chart 7 below.

*Chart
7: Log of the Observed Learning Curves of Best Controllers*

A log scale was used in the chart above to highlight the subtle fitness changes in the best of each generation shown. As mentioned earlier (section 3.3), the learning behaviour in the runs shown in chart 7 (of generally low to suddenly high fitness in a few generations) can be expected in genetic algorithms.

While the best fitness of individual runs seems to follow a sporadic trend in improved performance, the average fitness does display the overall improvement mentioned by Koza [16, 19]. This trend (shown in chart 8) is the foundation of De Jong’s on-line performance measure (section 4.4.1).

*Chart
8: Observed Mean Fitness of each Selection Technique*

A plot of the average of the best observed fitness within each generation has been included below (chart 9), as it is the overall relationship of these trends that form the basis of De Jong’s off-line performance measure (section 4.4.2).

*Chart
9: Average of Best Observed Fitness of each Selection Technique*

Not only the average fitness displays the trend noted by Koza, above. A summary glance at the plot of results shown above indicate a similar trend in the overall average fitness of the best individuals in each generation can also be expected. The data used to produce chart 9 was an average, at each generation, of the best performance within that generation over the course of 500 runs per selection method.

The
measure of on-line performance was developed by De Jong as a means to gauge
a genetic system’s on-going performance. In other words, the on-line
performance of a strategy describes how well that strategy can be expected
to perform in rapidly finding a satisfactory solution.
^{A detailed
description of on-line learning is included in section 3.3.2.1.}

^{While
many problems in genetic algorithms use a cost minimisation function to
rate fitness, the cart-pole problem has a natural utility maximisation
fitness function (time to failure). For this reason, improvement in the
performance of each selection technique is reflected by a positive trend
in the plotted line shown below (Chart 10 ).}

De Jong’s off-line performance measure aims to identify the extent of a strategy’s convergence within its environment (the problem domain). The purpose of this measure is to describe, based on the characteristic of observed best fitness, a strategy’s suitability in learning for off-line applications. This measure is described in some detail in section 3.3.1.2.

The results of applying De Jong’s measure to the observed runs is shown in the chart 11 below:

*Chart
11: Off-Line Performance*

**4.4.3
Distribution of the Observed Means**

As the basis of the on-line performance test is the mean of the samples for each strategy, a linear distribution test was performed on the average generation mean over all runs to identify any significant difference between selection techniques.

The thin lines in the following plots represent the 95 per cent confidence interval estimates for the observed average mean of the generations over the runs. The actual observed mean is shown by the thicker line.The first chart, chart 12 shown below, displays the observed means and respective confidence interval estimates from the data recorded from the runs using roulette wheel and tournament selections.

*Chart
12: Distribution of Means - Roulette Wheel and Tournament Selection*

Chart 13, below, shows the means and confidence interval estimates for roulette wheel and expected value model selection strategies.

*Chart
13: Distribution of Means - Roulette Wheel and Expected Value Model Selection*

The final relationship is that between the expected value model and the tournament selection strategies shown in chart 14.

*Chart
14: Distribution of Means - Expected Value Model and Tournament Selection*

**4.4.4
Distribution of the Observed Best**

The purpose of distribution analysis on the average observed best performance of strategies is to identify the significant difference, if any, between those strategies. In particular, the data used to form these curves has a direct relationship with the off-line performance measure (section 4.4.2).

The conventions applied in section 4.4.3, above, have been adhered to in the charts that follow. The first of these charts (chart 15) describes the observed results of best performance within each generation, with the 95 per cent confidence interval estimates for the roulette wheel and tournament selection strategies.

*Chart
15: Distribution of Best - Roulette Wheel and Tournament Selection*

The following chart, chart 16, displays the observed data with confidence interval estimates for roulette wheel and the expected value model selection strategies.

*Chart
16: Distribution of Best - Roulette and Expected Value Model Selection*

Chart 17 below is intended to aid in identifying any significant difference between the average of the best observed fitness of tournament and the expected value model selection strategies.

*Chart
17: Distribution of Best - Tournament and Expected Value Model Selection*

**4.4.5
Performance in Learning Rate**

For the purpose of gauging the rate of learning (or the speed with which a selection technique finds a solution), the successful runs were isolated from the balance of the raw data. Success was considered to be a controllers ability to balance the pole for the specified 1,000 seconds.

In order to show the learning rates observed, the best individual in each generation was plotted, up to and the final generation of that run. The plot of each selection strategy is represented on its own chart and a log scale has been applied to the fitness values to more clearly identify the course of evolution.

Using the extension of the median test, the data displayed in the charts below formed the samples for the tests of independence and variance. In this way, any significant difference in the rate of learning, the time to success, of the three techniques might be identified.

4.4.5.1 Samples of Successful Controllers

The chart that follows, chart 18, comprises the observed learning curves of the six successful controllers evolved using roulette wheel selection.

*Chart
18: Successful Runs of Roulette Wheel Selection*

Chart 19, represented below shows the same observations for the four controllers identified by the tournament selection strategy. Of note in this next chart, in particular the line that follows the learning in run 42, is a clear indication of the complexity of this problem.

In generation eight of run 42 the chromosome that achieved the extreme fitness must have been awarded a high probability of selection. However, no selection of that chromosome included reproduction. Instead, any selection of that chromosome involved its crossover. Regardless of how often this crossover occurred, the change in the character of its progeny in the next generation was sufficient to result in a condition of severe devolution. This situation is a trait of those systems that do not employ an elitist strategy of guaranteed survival of the fittest.

*Chart
19: Successful Runs of Tournament Selection*

The following chart displays the three successful controllers evolved through the use of the expected value model selection strategy.

*Chart
20: Successful Runs of Expected Value Model Selection*

4.4.5.2 The Extension of the Median Test

As is evident from charts 18 - 20, the samples available are both small and of varying size. While this is not an obstacle in the calculation of this value, it does result in a very conservative estimate of independence. A more detailed description of this statistical test can be found in section 3.3.3.

Since the groups of selection techniques are independent of each other and the sample sizes for each group is non-uniform, a significance test for k independent samples is in order. Since the generation in which the selection technique managed to balance the cart constitutes at best an ordinal measure of the rate of learning, the extension of the median test is appropriate for testing the hypothesis concerning differences in central tendencies.

The hypothesis tested was:

H0 - The rate of learning of the genetic program is independent of the selection technique, i.e. there is no significant difference in the learning rates.

H1 - The rate of learning of the genetic program is not independent of the selection technique.

Level
of Significance: Let = 0.05, N = 12, *d* = 2

Reject
H
0 if ^{2
> 5.991 otherwise
do not reject H
}^{0}

*Table
10: Tabulated data for the Extension of the Median Test*

Roulette | Tournament | Expected Value Model | Total | |

Generation of Observed Success | 24 | 8 | 10 | 42 |

25 | 18 | 14 | 57 | |

36 | 19 | 26 | 81 | |

37 | 43 | 80 | ||

37 | 37 | |||

Total | 110 | 62 | 26 | 198 |

Common Median = 24.5 | ||||

Observed |
Roulette | Tournament | Expected Value Model | Total |

> than common median | 4 | 1 | 1 | 6 |

< than common median | 1 | 3 | 2 | 6 |

Total | 5 | 4 | 3 | 12 |

Expected |
||||

> than common median | 2.5 | 2 | 1.5 | 6 |

< than common median | 2.5 | 2 | 1.5 | 6 |

Total | 5 | 4 | 3 | 12 |

Difference |
||||

> than common median | 1.5 | -1 | -0.5 | 0 |

< than common median | -1.5 | 1 | 0.5 | 0 |

Total | 0 | 0 | 0 | 0 |

By
the computation of the data described above, it was determined that *X*^{2=
3.13. Since this is greater than the stated level of significance ( = 0.05),
the decision must be not to reject the null hypothesis. Based on this data,
the rate of learning of the genetic program is independent of the selection
technique employed.}

As the sample sizes are small, and the extension of the median test does not take into account the variance in the data, an analysis of variance test was conducted. While a more sensitive measure of independence, it is still questionable because of the small sample size.

The table showing the results of this analysis is displayed below.

*Table
11: Analysis of Variance*

Source | SS | df | MS | F | Significance |

Method | 474.783 | 2 | 237.392 | 2.181 | 0.169 |

Residual | 979.469 | 9 | 108.830 | ||

Total | 1454.25 | 11 |

The result of this analysis supports the findings of the extension of the median test (section 4.4.5.2) that there is no significant difference in the rate of learning of the selection methods.