Seems to me to be working as intended. Games aren't given a score based on how good they are. They're given a score relative to the average score of that type of game and the best score. Only the best game from any one player counts.
So in the case you cite, there are four games from two players, only the top two count, and the average score and the best score are very close. This gives a game even a little bit below the average a very bad score, and the second game from a player must almost by necessity be relatively far away from the average, since the average is calculated based on both player's best games.
I don't like this system. I think that the number of games that other games get weighed against is much too small, especially at higher levels and for larger maps. I think there's a lot of opportunity for someone who plays under two accounts to game the system. But it IS designed that way.
I still think there is something wrong in this case:
1 WastinTime 1304 AD Huge Deity Diplomatic Epic Ancient
2 Misotu 1325 AD Huge Deity Diplomatic Epic Ancient
3 Misotu 1375 AD Huge Deity Diplomatic Epic Ancient
4 WastinTime Jul 2010 AD Huge Deity Diplomatic Epic Future
Especially since there is a really really bad game (Jul 2010 AD) which should bring down the average.
and the 2nd and 3rd are
very close to #1 (but score 9pts and
0pts respectively).
Compare that to the quick speed, same table:
1 WastinTime 1250 AD Huge Deity Diplomatic Quick Ancient
2 Misotu 1710 AD Huge Deity Diplomatic Quick Renaissance
3 Misotu 1730 AD Huge Deity Diplomatic Quick Renaissance
4 Bram 1750 AD Huge Deity Diplomatic Quick Renaissance
This one appears to work. 2nd and 3rd place are not very good, very close to last place, but still get ~25 points (instead of .001 in the first example)
Even Bram gets 22 points.