5.1 Introduction

This Chapter contains an interpretation of the results shown in Chapter 4. The following discussion describes the results of this experiment and gives an insight into the consequences of this research. In order to address the issues, this Chapter has been divided into sections that reflect the motivation of the dissertation.

5.2 Performance of the Genetic Programming Paradigm

In the problem of pole balancing, the genetic programming paradigm proved successful in deriving controllers within the constraints of the simulation. The genetic programming paradigm managed to evolve 13 controllers that could successfully perform the task of pole balancing for 1,000 seconds.

To state conclusively that the genetic programming paradigm was successful in balancing the pole and centring the cart requires the fulfilment of two conditions: first, that the genetic programming paradigm performed better than a random search; and secondly, that the state of the cart-pole was considered centred and stable.

Previous research, discussed below, has shown that the problem solving strategy of genetic systems is a better than random strategy when applied to non-trivial problems. The main difference between the random search’s strategy and the genetic program’s strategy is the genetic program’s goal to preserve better solutions as they are found, in other words survival of the fittest.

That the genetic algorithm conducts a directed search through a problem space was mathematically proven by John Holland in 1975 [13]. The genetic programming paradigm employs the genetic algorithm to direct its search. In addition, the results from research in both genetic algorithms ([5], [10], [25]) and genetic programming ([1], [14], [16]) support Holland’s findings that, in non-trivial applications, genetic systems perform a better than random search.

The research performed by De Jong on genetic algorithm performance shows the characteristics of a random search’s on-line and off-line performance. The resulting performance of the random search is displayed by a line drawn parallel to the horizontal axis [10]. In other words, the random search does not learn. That each of the selection strategies displayed learning typical of genetic systems (charts 10 and 11) and that the cart-pole problem is a non-trivial problem suggests the performance of the genetic programming paradigm was better than random.

In answering the question, was a cart centred, the question is: How centred is centred? Section 4.3 shows the performance characteristics of the controllers that qualified as the best in this experiment. Charts 1, 2 and 3 display the pole angles for these controllers and indicate the stability of the pole. In this case, two controllers (charts 1 and 2) display very little oscillation around angle zero. These two controllers also display a definite centring trend of the cart with reduced oscillation around the centre of the track over time (charts 4 and 5).

In the charts in section 4.3, the controllers show distinctly different solutions to the cart-pole problem. These results indicate numerous potential solutions to this problem. Then, a comparison of the results of previous research within this domain, in which alternate solutions to those mentioned here were found, would indicate the relative performance of these controllers.

Unfortunately, although the ‘benchmark’ version cart-pole problem used in this research was recommended to allow a comparison of results based on performance, this version is a recent development. As a result, the amount of research available for comparison is small. One comparison that can be made is with the results of Randall et al . [25].

In Randall et al .’s comparison of a variety of adaptive techniques, the best controller found was evolved by a genetic algorithm. The final states of that controller are shown in table 12 with the final states of the best solutions found by each selection strategy in this research.

Table 12: Comparison of Centring With Genetic Algorithm
Strategy Time To Failure (s.) RMS x (m.)  RMS (r.)
Genetic Algorithm (Randall et al .) 1,000 0.37 0.01 
Genetic Program (Roulette Wheel) 1,000  0.05422 0.000081 
Genetic Program (Tournament) 1,000 0.074286 0.000092 
Genetic Program (Expected Value Model) 1,000 1.049271  0.000085 

Based on table 12 above, the relative performance of two of the controllers evolved by the genetic programming paradigm displayed better centring and stability of the simulation than the best solution found by the genetic algorithm in the research of Randall et al. [25]. The conclusion is reasonable therefore, that the centring and stability characteristics of the two best controllers found by the genetic program are comparable or even better than those found by previously reported adaptive techniques.

5.3 Performance of Selection Techniques

This section is concerned with the interpretation of the findings shown in section 4.4. Based on these findings, the following section considers the observed effect, if any, of the various selection techniques upon the performance of the genetic programming paradigm in its application to the cart-pole problem.

5.3.1 On-Line Performance

The on-line performance of the selection techniques is represented in chart 10. This performance reflects the application’s ability, or likelihood, to find solutions to the problem rapidly. Based on the analysis in section 4.4.1, the best on-line performance observed in this experiment was achieved by the expected value model.

While the results shown in chart 10 appear inconsistent with the quantity (section and quality (section 4.3) of the controllers actually found, these results do conform to the findings in De Jong’s 1975 study [10]. A discussion of the effect of population size is conducted in Chapter seven; however, for the population size used, this research suggests that, when applied to the cart-pole problem, the expected value model selection strategy is most likely to produce a satisfactory controller in the shortest time.

However, that observation is made with some reservation. The on-line performance measure is based on the cumulative average of the means of the populations tested (chart 8), and so a significant difference in on-line performance should be reflected in a significant difference in the distribution of the observed means (section 4.4.3).

A review of the results in section 4.4.3 indicate that the gradient of the mean improvement for roulette wheel is steeper in the later generations of the run than those observed for either the expected value model or the tournament selection methods. This observation is confirmed by the significant difference in the distribution of the means that occurs in favour of roulette wheel when compared to tournament selection (chart 12). While there is a significant difference favouring the expected value model over tournament selection (chart 14), that difference is not as pronounced as the one between roulette wheel and tournament. And, although there is no significant difference between roulette wheel and expected value model selection techniques (chart 13) this analysis does suggest that, if further generations had been allowed, roulette wheel selection may have proven more effective at on-line performance.

5.3.2 Off-Line Performance

On the basis of the analysis of off-line performance conducted in section 4.4.2, the roulette wheel selection method proved the most effective strategy at the 50 generation mark. The expected value model, on the other hand, showed the best off-line performance for most of the run. According to these results, tournament selection, again, displayed the poorest performance (chart 11).

The off-line performance observed is this experiment must be interpreted with some caution. The off-line performance of a strategy is based on the cumulative mean fitness of the best individuals observed to some point (chart 9). An analysis of the distribution of the mean of these observed fittest individuals should indicate the extent of the difference in off-line performance between these strategies. The details of that analysis can be found in section 4.4.4.

The outcome of the analysis in section 4.4.4 indicates that there is no significant difference in the distribution of the mean of the fittest observed individual for each selection technique (charts 15 - 17). Then, the actual level of relative off-line performance cannot be confidently stated.

5.3.3 Rate of Learning

Given a sample of successful controllers (section 4.4.5), an attempt was made to determine whether any of the selection techniques analysed in these runs were more efficient at deriving solutions. For this reason, a test of independence was performed based on the distribution of the generation in which success occurred. The most appropriate test for this analysis appeared to be the extension of the median test (section The results of that analysis suggests that there was no significant difference in the samples tested.

Unfortunately, this analysis was severely hampered by the small number of successful controllers for each of the selection methods. With this restriction in mind, an analysis of variance was performed with the same data (section The result of that analysis was consistent with the previous results: there was no significant difference in the distribution of the generation in which success occurred.

Although this finding does not conform with the observed on-line performance of each selection technique (section 4.4.2), the limitations placed on the analyses in sections and suggest that their results cannot be considered conclusive. The analyses were included, however, to recommend a method identifying this trait in future research.

[Chapter 4] [Chapter 6]