Background ideas for "infera"

Go up to Infera home page

    ____      ____               
   /  _/___  / __/__  _________ _
   / // __ \/ /_/ _ \/ ___/ __ `/
 _/ // / / / __/  __/ /  / /_/ / 
/___/_/ /_/_/  \___/_/   \__,_/

Infera is a tool for figuring out how good teams are from the results of a limited number of games between those teams.

It is intended to be useful

for ranking teams in single tournaments and
for integration of results from multiple tournaments.

It is also expected to make possible the design of new tournament formats, by allowing one, for example, to infer a ranking of teams in a pool without all teams in the pool having played each other.

Infera is also able to offer predictions of outcomes of games.

Infera is expected to work best if the scores of all games are supplied, rather than just who won every game. Scores are better because the program will give more accurate rankings if close games are distinguished from games with a big goal difference. Also if one team easily beats several teams, then the margin by which it wins in each case gives useful information about how good those lesser teams are *relative to each other*.

Infera only takes into account information from actual game results and completely ignores supposed `final placings' generated by standard tournament formats. Infera is very different from any sports ranking systems I have come across.

Teams playing in events for which Infera is being used as originally intended should be informed that every point in every game counts towards their inferred ranking. The user may, however, use Infera in a variety of other ways. You can designate certain games alone as being games that will be included in the analysis. You can alter the "weight" attached to each game. You can feed Infera just the win/lose outcomes if you want.

Disclaimer

Usual disclaimers apply.

Assumptions

Any system that infers the underlying quality of a team from outcomes of games must do so on the basis of a set of assumptions. Some of these assumptions are unavoidable, and some are clearly arbitrary, and subject to change.

Unavoidable assumptions

Transitivity

If we want to rank a set of teams, then it is unavoidable that they end up in an order, such that if A is `better' than B and B is `better' than C then A is better than C. However, we might really believe that there could be three teams such that A consistently beats B, B beats C, and C beats A. This possibility is incompatible with making a ranking. It is inevitable that we must assume that how good each team is can be summarised in a single number.

I call this number the `potential' of the team.

Noisiness

Any reasonable ranking system will take into account the fact that the performance of a team in terms of score achieved is not absolutely constant. Now and then a better team will lose to a worse team.

Assumptions that could be made differently but which are fixed in Infera

Constancy of a team's potential

We could assume that the potential of a team fluctuates during a tournament, but this is a can of worms that I choose to avoid. Infera assumes that a team has a constant potential, and that the only variability in scores comes from the intrinsically statistical nature of the game.

Infera thus may be harsh on teams that are inconsistent, sometimes losing games badly that they could easily win `on a good day'. If this is viewed as a problem then the easiest hack is to modify the data that is fed to Infera in some way; for example, the user may choose to omit each team's worst result and compute rankings based on the results of the other games. I'd say, "tough, if a team can't perform consistently then they don't deserve to be ranked as high as their best performance".

Obviously, if a tournament director knows who is playing on a given team and has prior knowledge of their skills, they will be able to give better predictions of teams' performance. The Infera program effectively assumes that the team's roster is unchanging.

Relationship between potential and score

We have to assume a probabilistic relationship between the potentials of two teams and the score that results in their game. We then *invert* this relationship in order to infer what the potentials of all the teams are given all the game results. Various assumptions could be made about this probabilistic relationship.

Infera is built on the assumption that the probability distribution of the score depends only on the *difference* in potential between the two teams. I might add more details here if people wanted to know.

If, on using Infera for a time, it becomes evident that the assumed relationship between potentials and scores is too simple, then I could code up more advanced inference engines.

Examples of relationship between potential difference and score

I think it's a good idea to give a feel for what scores are likely to arise for a given potential difference given my assumed model.

Let's consider a game of first-to-fifteen.

If team A's potential is 0.1 greater than team B's, then the probability that A will win is 0.61, with the most probable score being 15:12 (A:B). Thus teams with a difference of 0.1 in potential are expected to have close games with the better team winning 3 games out of 5. Potential differences smaller than 0.1 correspond to teams that are very hard to distinguish in just one game.

If team A's potential is 0.3 greater than team B's, then the probability that A will win is 0.79, with the most probable score being 15:10.

If team A's potential is 0.5 greater, then the probability that A will win is 0.91, with the most probable score being 15:8.

If team A's potential is 1.0 greater, then the probability that A will win is 0.996, with the most probable score being 15:5.

If the potential difference is 2.0, then the most probable score is 15:1, and the probability that team B will score more than 4 points is 0.07.

Assumptions that are clearly arbitrary

The user can obviously decide which results to include in the analysis. If you want to know how good a team has been all season, you should include all games with equal "weight". If on the other hand you really believe that each team's potential changes slowly through the year and you want to infer the current potentials then the cheap and dirty way of doing this is to "weight" old results with a smaller weight (e.g. 0.25, 0.5) than recent results (weight 1).

Another example: maybe you'd like the final results in a tournament to depend only weakly on the first day's results (when teams are finding their feet) and more strongly on the last day's results (which are the crucial games in many standard tournaments). If so, you can give lesser weight to the early games.

How Infera combines mutiple games between a pair of teams

If two teams have played several games then Infera effectively treats all those games as a single larger game with score equal to the sum of the scores. So to end up ahead on the basis of those games alone, a team must simply ensure it has a positive goal difference in those games. The use of "weights" on games simply multiplies the scores by the corresponding weights. So a game with a score of 8-4 that is included with a weight of 0.5 is equivalent to a 4-2 win with weight 1.

Discussion about this program

Disclaimer

The author disclaims all warranties with regard to this software, including all implied warranties of merchantability and fitness. In no event shall the author be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss of use, data or profits, whether in an action of contract, negligence or other tortious action, arising out of or in connection with the use or performance of this software.

The author makes no representations about the suitability of this software for any purpose. It is provided `as is' without express or implied warranty and without technical support.

David MacKay <mackay@mrao.cam.ac.uk>

Strange Blue (Cambridge Ultimate) home page Last modified: Tue Feb 3 22:43:54 1998