This thesis takes an in depth look at the Soccer Pools (also known as the 6 from 38 Pools, or The Pools) and seeks to determine methods that will allow a player to choose numbers that more likely to win, or more likely to win `big' if they do win. The main purpose of this research is to understand the Soccer Pools' market, and to assess whether any inefficiency can be exploited, thus potentially reducing expected losses, or creating expected profit for an individual playing the Soccer Pools. A secondary purpose is to assess the adequacy of the Poisson distribution in modelling soccer scores.
There are three main elements considered in the study that can be possibly exploited in the Soccer Pools. The first element involves modelling soccer results and applying it to the Soccer Pools, which is found in Chapters 2 and 3. While a great deal of literature exists on modelling soccer results, such as that written by Maher (1982), Karlis and Ntzoufras (2003), and many others, applying such models to playing (and winning) the Soccer Pools is an untouched area. Results of soccer games are modelled using the Poisson distribution. The model is assessed in quite different ways using the 2003/04 English Premier League results and bookmakers odds offered during the Euro 2004 soccer tournament.
Since it is unreasonable to assume that scores in any two matches have the same Poisson parameters the finding of `extra-Poisson' variation in the overall distribution of goals scored in a season is not surprising. Such extra Poisson variation appears to be reduced or eliminated when looking at the scores of individual teams. An overview of the between team variation is obtained through fitting a Log linear model to the 2003/04 English Premier League. The final aspect of comparing a Poisson model to data is via an analysis of bookmakers odds for specific matches. It is discovered that bookmakers apparently do use the Poisson model to offer odds for soccer matches in the Euro 2004 competition.
In relation to the Soccer Pools, the main purpose in using a Poisson model is to predict score draws, since it is (mainly) on the basis of score draw results that the winning numbers are chosen. While the (uncorrelated) Poisson model, with match specific parameters estimated using a log linear model, successfully predicts the numbers of score draws, prediction of which matches will result in score draws is almost a complete failure, as Chapter 3 demonstrates. This therefore suggests that the Poisson model in this (independent bivariate) form is inadequate, and has not worthwhile application to the Soccer Pools. More complex models, such as dependent bivariate Poisson models, require consideration where it is demonstrated that there is (some) improvement in predicting matches that result in score draws.
The second element examines the systematic method by which the winning numbers in the Soccer Pools are chosen. The investigation in Chapter 4 looks at empirical data from the Soccer Pools and the `theory' behind the drawing of numbers. for which there is no published prior investigation. Since the tiebreak rule favours the higher numbered score draw matches, these numbers feature more often as winning numbers. So an individual playing the Soccer Pools increases his/her chances of winning by picking larger numbers.
Chapter 5 contains the third element of this thesis which invokes the concept of `lucky' numbers. Many people choose numbers they consider to be `lucky' in some way or another. These numbers are often not unique to players of the Pools, and should these numbers win, then the prize must be shared with many people. Picking `unlucky' numbers sees fewer people with these same numbers, and if they win, then there will be less people with whom to share the prize with. This is looked at in relation to Lotto by Cunynghame (2004), and has also been the subject of earlier investigations in regard to Lotto, however such an analysis on the Soccer Pools has not been conducted in the past. Multiple linear regression and stepwise selection techniques considered many predictors based on the winning numbers and pattern combinations of these winning numbers, to predict the size of the division 3 prize. The `lucky' numbers models were only able to predict with moderate accuracy the size of the division 3 prize. They pose some value for prediction, and may be used to gain a slight advantage in choosing numbers, and number patterns, for winning the Soccer Pools.
The overall results from the study gave no conclusive methods by which one can take advantage of inefficiencies in the Soccer Pools' market. Slight exploitation is possible by choosing large numbers, and using the results of the regression, but nothing specific. It certainly appears to be the case that exploitation of these `elements' can lead to a reduction in expected losses when playing the Soccer Pools.