Results

4.1 Introduction

This Chapter contains the results obtained from the genetic programming runs. The raw data obtained from the runs comprised the generational information for 500 runs for each of the selection strategies. The data recording the state changes of the simulation were also accumulated for those controllers that managed to balance the pole for the 1,000 seconds.

Based on the collected data, the simulation results allow for an analysis of the successful controllers with regard to their performance, i.e., those mentioned in section 3.2.9.

The analysis of selection techniques was based on a comparison of their respective performance in on-line learning, off-line learning and the rate of learning.

4.2 Limitations

A number of limitations have been identified that may have had an effect on results that follow and the repetition of this experiment. These limitations may be divided into two areas of concern: simulation errors; and time constraints.

Computer simulation errors can be the result of poor logic (within the domain of the programmer’s control), or may result from the mathematical accuracy (the rounding error) of the chosen programming language. While logic errors can be avoided by desk checking the algorithm (by hand), rounding error can result in simulation noise that makes the reproduction of identical results unlikely. Efforts were made in the programming of this simulation to minimise the effect of this noise by converting fractional values to base data types; however, the simulation formula itself is intensely reliant upon floating point arithmetic with some extremely small values.

As an example of the effect this type of rounding error can produce, in validating the accuracy of the simulation used in this experiment, known values were tested against the simulation. The known values where those given by Geva and Sitte [9].

The controller that was given has the form:

F = Fmax sgn[ 0.02 x + 0.06v + 0.94Ø + 0.33w].

The results reported from this controller stated that the value of RMS x after 1,000 seconds should be 0.21m. The RMS amplitude of x that resulted from the same controller applied to the simulation used in this experiment for 1,000 seconds was 0.20086m.

As the purpose of this experiment was to measure the performance of selection strategies, any noise resulting from the simulation would have been consistent for each of these strategies. Thus, any comparison would have been based on a uniform problem environment. Readers should keep this in mind if comparing the results found from their own research should their programming environment be inconsistent with that used in this experiment. For this reason, the source code for the experiment has been supplied at Appendix V to this thesis should the reader wish to reproduce the conditions of this experiment.

The other limitation, that of time constraint, affects the quantity of data collected for analysis. Unfortunately, this analysis is based on a relatively small sample size, with respect to both the complexity of the problem and the effect of the initial random population on the final output of the genetic program. Although the experimental runs were made on a Sun/SparcStation 4 and those runs could be made concurrently, computational resources were limited. Particularly in light of the duration of each run and the intensive processor requirements of both genetic programming and the cart-pole simulation. These factors made the accumulation of further samples impractical.

4.3 Performance of Controllers

This section is concerned with identifying the extent to which the controllers produced through this experiment satisfy the performance specifications (where applicable) outlined in section 3.2.9. The following sections display the performance of the best controllers found by each selection method.

Table 6: The Best Controller found by Roulette Wheel Selection
 
 
Evolved Controller  (+ Ø, (% (+ w , (% (+ x, (% v , (+ Ø, 1.922))), 1.922)), 7.230))
Minimised  Ø + (w ( x + v/(Ø + 1.922 )) / 1.922) / 7.23
Raw Fitness  50,009
Occurred In Run  401
Occurred In Generation  42

Table 7: The Best Controller found by Tournament Selection
 
 
Evolved Controller  (+w, (+ (+ (+ x, (* 9.905, (* 2.171, Ø))), (* w, 2.171)), Ø))
Minimised x + 22.503755Ø + 3.171w
Raw Fitness  50,009
Occurred In Run  395
Occurred In Generation  22

Table 8: The Best Controller found by Expected Value Model Selection
 
Evolved Controller  (+ Ø, (+ (+ v, (* Ø, (+ w, 7.206))), (* w, (* w, w))))
Minimised  v + 8.206Ø + Øw + w3
Raw Fitness  50,004
Occurred In Run  294
Occurred In Generation  25

 

4.3.1 Specification 1(a): Pole Angle of Controllers

A plot of the pole angle for the controller for the first 20 seconds of the simulation shows the strategy adopted for balancing the pole. The plots for the best solutions derived by each respective selection technique are offered for inspection.

A plot of the state changes caused by the controller evolved through roulette wheel selection is displayed below.

Chart 1: Roulette Wheel - Pole Angle V Time

The next plot shows the state changes of the pole angle for the controller found through tournament selection.

Chart 2: Tournament - Pole Angle V Time

The plot below describes the behaviour of the pole angle that resulted from the best controller found through the application of the expected value model selection technique.

Chart 3: Expected Value Model - Pole Angle V Time

4.3.2 Specification 1(b): Cart Position of Controllers

The diagrams included in this section plot the position of the cart relative to the centre of the track for the first 20 seconds of the simulation. This complies with the second performance measure included in specification 1 at section 3.2.9.

The first chart, shown below, plots the behaviour of the cart (i.e., its position) when controlled by the solution evolved while the genetic program used the roulette wheel selection method.

Chart 4: Roulette Wheel - Cart Position V Time

The plot showing the cart position state changes for the best controller derived by tournament selection is displayed below.

Chart 5: Tournament - Cart Position V Time

The chart below represents the strategy used by the controller that was evolved through the expected value model selection technique.

Chart 6: Expected Value Model - Cart Position V Time

4.3.3 Specifications 2 and 3: RMS Amplitudes

The RMS amplitudes of both the pole angle and the cart position are measures intended to describe the extent of the controller’s ability to centre the cart and stabilise the system over the course of the simulation (i.e., 1,000 seconds). For both of the RMS amplitudes described below, the performance of the controller improves as the value approaches zero.

Table 9: RMS Amplitudes of Cart Position and Pole Angle
State Variable  Roulette Wheel  Tournament Expected Value Model
x(metres) 0.054220  0.074286 1.049271
Ø(radians)  0.000081 0.000092 0.000085

4.4 Performance of Selection Techniques

Although only three controllers were described in the previous sections, the genetic programming paradigm managed to evolve a total of 13 controllers that could balance the pole for the specified 1,000 seconds. As is evident from the details shown in section 4.3, each of the selection techniques tested managed to evolve controllers that could balance the pole, within the parameters of the simulation, for the specified 1,000 seconds.

The ‘learning curves’ for each of the runs that produced the best controllers described in section 4.3 are shown on chart 7 below.

Chart 7: Log of the Observed Learning Curves of Best Controllers

A log scale was used in the chart above to highlight the subtle fitness changes in the best of each generation shown. As mentioned earlier (section 3.3), the learning behaviour in the runs shown in chart 7 (of generally low to suddenly high fitness in a few generations) can be expected in genetic algorithms.

While the best fitness of individual runs seems to follow a sporadic trend in improved performance, the average fitness does display the overall improvement mentioned by Koza [16, 19]. This trend (shown in chart 8) is the foundation of De Jong’s on-line performance measure (section 4.4.1).

Chart 8: Observed Mean Fitness of each Selection Technique

A plot of the average of the best observed fitness within each generation has been included below (chart 9), as it is the overall relationship of these trends that form the basis of De Jong’s off-line performance measure (section 4.4.2).

Chart 9: Average of Best Observed Fitness of each Selection Technique

Not only the average fitness displays the trend noted by Koza, above. A summary glance at the plot of results shown above indicate a similar trend in the overall average fitness of the best individuals in each generation can also be expected. The data used to produce chart 9 was an average, at each generation, of the best performance within that generation over the course of 500 runs per selection method.

4.4.1 On-Line Performance

The measure of on-line performance was developed by De Jong as a means to gauge a genetic system’s on-going performance. In other words, the on-line performance of a strategy describes how well that strategy can be expected to perform in rapidly finding a satisfactory solution. A detailed description of on-line learning is included in section 3.3.2.1.

While many problems in genetic algorithms use a cost minimisation function to rate fitness, the cart-pole problem has a natural utility maximisation fitness function (time to failure). For this reason, improvement in the performance of each selection technique is reflected by a positive trend in the plotted line shown below (Chart 10 ).

Chart 10: On-Line Performance

4.4.2 Off-Line Performance

De Jong’s off-line performance measure aims to identify the extent of a strategy’s convergence within its environment (the problem domain). The purpose of this measure is to describe, based on the characteristic of observed best fitness, a strategy’s suitability in learning for off-line applications. This measure is described in some detail in section 3.3.1.2.

The results of applying De Jong’s measure to the observed runs is shown in the chart 11 below:

Chart 11: Off-Line Performance

4.4.3 Distribution of the Observed Means

As the basis of the on-line performance test is the mean of the samples for each strategy, a linear distribution test was performed on the average generation mean over all runs to identify any significant difference between selection techniques.

The thin lines in the following plots represent the 95 per cent confidence interval estimates for the observed average mean of the generations over the runs. The actual observed mean is shown by the thicker line.The first chart, chart 12 shown below, displays the observed means and respective confidence interval estimates from the data recorded from the runs using roulette wheel and tournament selections.

Chart 12: Distribution of Means - Roulette Wheel and Tournament Selection

Chart 13, below, shows the means and confidence interval estimates for roulette wheel and expected value model selection strategies.

Chart 13: Distribution of Means - Roulette Wheel and Expected Value Model Selection

The final relationship is that between the expected value model and the tournament selection strategies shown in chart 14.

Chart 14: Distribution of Means - Expected Value Model and Tournament Selection

 

4.4.4 Distribution of the Observed Best

The purpose of distribution analysis on the average observed best performance of strategies is to identify the significant difference, if any, between those strategies. In particular, the data used to form these curves has a direct relationship with the off-line performance measure (section 4.4.2).

The conventions applied in section 4.4.3, above, have been adhered to in the charts that follow. The first of these charts (chart 15) describes the observed results of best performance within each generation, with the 95 per cent confidence interval estimates for the roulette wheel and tournament selection strategies.

Chart 15: Distribution of Best - Roulette Wheel and Tournament Selection

The following chart, chart 16, displays the observed data with confidence interval estimates for roulette wheel and the expected value model selection strategies.

Chart 16: Distribution of Best - Roulette and Expected Value Model Selection

Chart 17 below is intended to aid in identifying any significant difference between the average of the best observed fitness of tournament and the expected value model selection strategies.

Chart 17: Distribution of Best - Tournament and Expected Value Model Selection

4.4.5 Performance in Learning Rate

For the purpose of gauging the rate of learning (or the speed with which a selection technique finds a solution), the successful runs were isolated from the balance of the raw data. Success was considered to be a controllers ability to balance the pole for the specified 1,000 seconds.

In order to show the learning rates observed, the best individual in each generation was plotted, up to and the final generation of that run. The plot of each selection strategy is represented on its own chart and a log scale has been applied to the fitness values to more clearly identify the course of evolution.

Using the extension of the median test, the data displayed in the charts below formed the samples for the tests of independence and variance. In this way, any significant difference in the rate of learning, the time to success, of the three techniques might be identified.

4.4.5.1 Samples of Successful Controllers

The chart that follows, chart 18, comprises the observed learning curves of the six successful controllers evolved using roulette wheel selection.

Chart 18: Successful Runs of Roulette Wheel Selection

Chart 19, represented below shows the same observations for the four controllers identified by the tournament selection strategy. Of note in this next chart, in particular the line that follows the learning in run 42, is a clear indication of the complexity of this problem.

In generation eight of run 42 the chromosome that achieved the extreme fitness must have been awarded a high probability of selection. However, no selection of that chromosome included reproduction. Instead, any selection of that chromosome involved its crossover. Regardless of how often this crossover occurred, the change in the character of its progeny in the next generation was sufficient to result in a condition of severe devolution. This situation is a trait of those systems that do not employ an elitist strategy of guaranteed survival of the fittest.

Chart 19: Successful Runs of Tournament Selection

The following chart displays the three successful controllers evolved through the use of the expected value model selection strategy.

Chart 20: Successful Runs of Expected Value Model Selection

 

4.4.5.2 The Extension of the Median Test

As is evident from charts 18 - 20, the samples available are both small and of varying size. While this is not an obstacle in the calculation of this value, it does result in a very conservative estimate of independence. A more detailed description of this statistical test can be found in section 3.3.3.

Since the groups of selection techniques are independent of each other and the sample sizes for each group is non-uniform, a significance test for k independent samples is in order. Since the generation in which the selection technique managed to balance the cart constitutes at best an ordinal measure of the rate of learning, the extension of the median test is appropriate for testing the hypothesis concerning differences in central tendencies.

The hypothesis tested was:

H0 - The rate of learning of the genetic program is independent of the selection technique, i.e. there is no significant difference in the learning rates.

H1 - The rate of learning of the genetic program is not independent of the selection technique.

Level of Significance: Let = 0.05, N = 12, d = 2

Reject H 0 if 2 > 5.991 otherwise do not reject H 0

Table 10: Tabulated data for the Extension of the Median Test
Roulette Tournament Expected Value Model Total
Generation of Observed Success 24 8 10 42
25 18 14 57
36 19 26 81
37 43 80
37 37
Total 110 62 26 198
Common Median = 24.5
Observed Roulette Tournament Expected Value Model Total
> than common median 4 1 1 6
< than common median 1 3 2 6
Total 5 4 3 12
Expected
> than common median 2.5 2 1.5 6
< than common median 2.5 2 1.5 6
Total 5 4 3 12
Difference
> than common median 1.5 -1 -0.5 0
< than common median -1.5 1 0.5 0
Total 0 0 0 0

By the computation of the data described above, it was determined that X2= 3.13. Since this is greater than the stated level of significance ( = 0.05), the decision must be not to reject the null hypothesis. Based on this data, the rate of learning of the genetic program is independent of the selection technique employed.

4.4.5.3 Analysis of Variance

As the sample sizes are small, and the extension of the median test does not take into account the variance in the data, an analysis of variance test was conducted. While a more sensitive measure of independence, it is still questionable because of the small sample size.

The table showing the results of this analysis is displayed below.

Table 11: Analysis of Variance
Source SS df MS F Significance
Method 474.783 2 237.392 2.181 0.169
Residual  979.469 9 108.830
Total 1454.25 11

The result of this analysis supports the findings of the extension of the median test (section 4.4.5.2) that there is no significant difference in the rate of learning of the selection methods.

[Chapter 3] [Chapter 5]
[Beer]