Southeast Regional Outdoor Ultimate Championships
Cambridge, November 1-2 1997 

            Team             infera rating
          1 UTI              11.2 +/- 0.2
          2 Red Shift        10.9 +/- 0.2
          3 First Touch      10.8 +/- 0.2
          4 Slowhawks        10.2 +/- 0.2
          5 Doh'hawks        10.1 +/- 0.2
          6 Strange Blue 2    9.7 +/- 0.2
          7 Skunks 1          9.5 +/- 0.2
          8 Strange Blue 1    9.2 +/- 0.2
          9 Skunks 2          8.3 +/- 0.2

Spirit of the game: Doh'hawks

These results are shown graphically in a postscript file

The start of the 97 regionals featured thick fog - fog thicker than the winds of the 96 regionals were windy. But all the teams managed occasionally to find the disc, each other, and the endzones during their initial games. The sun burst through after an hour or two, and the weather from then on was pleasant and gentle.

The Spirit in the tournament was really good, and I think everyone had a good time. Skunks and Mohawks led the Halloween partying at Darwin College, and UTI and First Touch were able to return to London for their Halloween parties without adverse effects on their games.

More than half of the teams had a substantial number of beginners, and I think that they really enjoyed learning from playing with the top teams of the region.

Many thanks to everyone who came. Thankyou for playing your games on time, and for being clean and tidy guests. I think the Cambridge Rugby Club will be willing to have us all again. Maybe a Summer tournament next year?


There now follow:

(A) Lost property.
(B) Discussion of the 'infera' software used to process the scores
	and spit out the above ranking and ratings.
(C) Scores of all games.
(D) Photos from the tournament.

(A)  =====================================================================
Lost property:	found --
		one key on chain with hospice medallion
		one black hat, one pair black gloves
		three shirts
		one pair black tracksuit trousers

(B)  =====================================================================
Discussion of Infera's ranks
0) Background: Infera is a program which infers the most probable ranking, by ability, of a set of teams based on the scores of any games they have played. It is applicable to any tournament format. Teams may have played different numbers of games, the games may have been of different durations, and teams need not have been arranged in equal strength pools.

A description of the program can be found on the web here. http://www.inference.org.uk/ultimate/infera/

The basic idea is that the score of any one game provides information about the relative rank of the two teams. -- A game does not provide concrete information though; if two teams are close in ability, the game might go either way. So a close win for team A over team B does not show for certain that A is better than B; it's just more probable. The longer the game, the more information it gives about the teams' relative abilities. Infera uses probability theory to figure out the most probable ranking.

Infera was first used for real at the 1997 Southeast Regionals in Cambridge. The tournament format was in fact a round robin, so it is possible to compare infera's ranking with the rankings given by more traditional methods which can be applied to round robins.

                        infera score  games won    goal difference
1 UTI              UTI  11.2 +/- 0.2   8            67
2 Red Shift         RS  10.9 +/- 0.2   7            44
3 First Touch       FT  10.8 +/- 0.2   6            48
4 Slowhawks         M1  10.2 +/- 0.2   5            8
5 Doh'hawks         M2  10.1 +/- 0.2   4            1
6 Strange Blue 2   SB2   9.7 +/- 0.2   2            -21
7 Skunks 1         SK1   9.5 +/- 0.2   3            -34
8 Strange Blue 1   SB1   9.2 +/- 0.2   1            -43
9 Skunks 2         SK2   8.3 +/- 0.2   0            -70

1) Comparison with goal difference

After a round robin, one possible way to rank teams is by goal difference. In this tournie, it turned out that infera's rankings were almost the same as the rankings you would get from goal difference, except Red Shift and First Touch (who have goal differences 44 and 48) came out switched. Maybe the reason that Red Shift had a slightly poorer goal difference than FT is that UTI really pulled out the stops in their last game, against RS. And this final game was longer in duration by 33% than all other games in the tournament, so this game has a slightly disproportionate effect on the ranking by goal difference.

2) Comparison with traditional (win/lose) rankings.

Another traditional (and rather crude) performance measure for a round robin is number of games won. In this tournament, it gives a clean ranking of the teams, but a _different_ one from infera's. Skunks1 come ahead of SB2 by `games won', because the SK1/SB2 result was SK1 6 SB2 5. But Infera gave SB2 a score of 9.7 +/- 0.2, and SK1 a score of 9.5 +/- 0.2.

So, why did infera put SB2 slightly ahead of SK1? The SB2/SK1 result was a close result, obviously. (only a draw could have been closer, and the hooter happened to go during an odd point.) So to rank SB2 relative to SK1, Infera takes into account not only the SK1/SB2 result, but also the results against other teams, and the ranks of those other teams.

So let's look at the other scores of SK1 and SB2 ...

SK1 3  UTI 13    SB2 2  UTI 13
SK1 2  RS 13  	 SB2 1  RS 13 
SK1 5  FT 13  	 SB2 8  FT 10   <<<<<<<<<<<<<<<<
SK1 6  M1 11	 SB2 6  M1 13 
SK1 4  M2 13  	 SB2 4  M2 7    <<<<<<<<<
SK1 6  SB1 4	 SB2 9  SB1 6 
SK1 9  SK2 3	 SB2 13 SK2 1   <
Clearly, SB2 did much better against FT and against M2.

It's because of these strong results that SB was ranked a tiny bit higher.

Which is the fairer ranking? Infera reports what it reckons is the _most_probable_ ranking, and it takes into account more than just the simple win/lose outcome. Its estimate is that it is more probable, given all the results, that SB2 was stronger than SK1, and that the SK1/SB2 game happened to go the other way, rather than the alternative hypothesis, that SK1 is better than SB2, but SB2 managed by fluke to get a much better result against FT. What do you think?


3) Counterfactuals concerning the final:

Just as the close outcome of the single SB2/SK1 game was overruled by evidence from other games, the ranking by infera of the number 1 and 2 teams isn't simply determined by the result of the "final" game that they play against each other.

So people might be interested to know:

What would have happened if the score in the final had been closer? I have plugged in a few alternative scores (the true score was UTI 15, RS 3) to see how big a win the finalists needed to guarantee the number 1 ranking. (RS went into the final with a slightly higher ranking than UTI, on the basis of the previous 35 games.)

If the score had been UTI 15, RS 14 then the ranks would have come out:

1         RS  11.08  
2        UTI  11.04  
3         FT  10.82  
4         M1  10.19  
5         M2  10.14  
6        SB2   9.68  
7        SK1   9.52  
8        SB1   9.18  
9        SK2   8.35  
So winning by one point would not have been enough for UTI to be ranked number 1 (though it must be emphasised that the difference between 11.08 and 11.04 is utterly tiny - and a sensible idea would be to have the option of calling the overall outcome a tie when the differences are so small relative to the remaining uncertainty).

The critical score in this case is 15-13. If UTI won by more than two points, then they got the number 1 ranking from infera.

We could ask, why this difference? Why did Red Shift come into the final with a head start in the rankings? There are two simple explanations: [i] UTI accidentally turned up late for their game with Mohawks 1, and generously conceded five points, making the final score 13-7 instead of 13-2. [ii] UTI played a friendly game with Skunks 2 (with Nick Haslam switching sides) which ended with a score of 8-2. If we 'correct' these two exceptional events, by entering the Mohawks game as a 13-2 result, and, say, omitting the UTI/Skunks2 game from the data, we find that it is now _UTI_ who enter the finals with the highest rank, and Red Shift would have had to beat them better than 15-13 for infera to be persuaded that Red Shift were the number 1 team.

In conclusion,

(1) I think infera worked just fine and gave rankings that made complete sense. I'll put in data from other tournaments if people send it to me in the right format.

(2) When close hypothetical scores (eg 15-13 or 13-15) are put in for a game between two teams (eg the final), the outcomes of other games could sometimes overrule the outcome of that game. To ensure that these effects are not spurious, I would recommend that when infera is used, teams should not have penalties put onto their score for turning up late to games, or totally failing to show up; this mucking with the scores is the sort of thing which might on rare occasions cause infera to be confused. It should be easy to find other ways to penalise late teams, if necessary! Incidentally, I didn't include any penalties in the tournament rules, and almost all the games at the 97 Cambridge tournament ran on time.

(3) It might be good to declare two teams to have equal rank, if their infera ratings are closer than, say 0.1.

(4) If you have any comments on infera, I may well have responded to them already on the web pages I mentioned above. There is a huge number of ways you can use infera; for example, if you want to tell it only the win/lose/draw outcomes, instead of the actual scores, you may. My chief reason for recommending it is that allows you to choose arbitrary tournament formats no matter how many teams turn up and what games are played, and still easily get rankings out at the end of the day. But you could use it, for example, to rank teams in a long term league of several tournaments (even if some teams have attended different numbers of tournaments). Alternatively you could use infera to determine the number of tour points allocated to teams for their performance a tournament. Infera gives each team a rating such that teams judged to have been very close end up with similar ratings, and teams that are far better than the others get proportionally bigger ratings. Instead of using some arbitrary fixed numbers like 1st=200, 2nd=120, 3rd=80, the infera ratings would return numbers that reflect how close the number 2 team came to the number 1 team, etc.

(C) ============== Scores ============================
 Tournament schedule can be seen here:
 Here are the scores:
# saturday nov 1 97 #
RS 7 M1 3
RS 13 SB1 1
RS 8 FT 6
RS 9 M2 4
FT 13 SB1 3
SB1 11 SK2 4
UTI 13 SB1 3
SK2 2 M2 10
SB2 13 SK2 1
FT 13 SK2 0
UTI 13 M1 7
# note: M1 were given 5 points by UTI as an apology for being late
M1 13 SB2 6
SB2 8 FT 10
SK1 6 SB2 5
SK1 6 M1 11
UTI 13 SK1 3
M2 13 SK1 4
UTI 13 M2 5
# sunday
UTI 9 FT 5
RS 13 SB2 1
M2 8 SB1 0
M2 7 SB2 4
RS 13 SK2 3
M1 10 SB1 5
SK1 6 SB1 4
SK2 1 M1 9
RS 13 SK1 2
UTI 8 SK2 2
FT 13 M2 2
UTI 13 SB2 2
FT 13 M1 3
SK1 9 SK2 3
M1 8 M2 5
SB1 6 SB2 9
FT 13 SK1 5
# Final ( to 15 points )
UTI 15 RS 3

(D) =====================================================================
A few photos of fog, discs, teams and human pyramids are on the web here:


David MacKay <mackay@mrao.cam.ac.uk>
Last modified: Mon Nov 3 12:01:21 1997