Nice work!
One problem with using only the fastest time is that it's only possible to say that the top ranking players has posted faster times in relation to the required gold time. Strictly speaking we don't know if it's because the game has gotten easier, if the fastest players have become better, or if they simply are trying harder.
Also, we know nothing about how it looks beneath that #1 time, which is really important to know for two reasons:
1. Is the trend the same for big group as it is for the top players? If the game has gotten easier, then there should be improvements across the entire range.
2. How big is the sample of players?
For GT5 there's a total of 40 players posting their times on that database, while for GT6 there's currently 68 players. For GT4 there are 320 players posting times. For GT3 there's no complete list, but there seems to be about 100-150 players in each test. The bigger the sample, the higher the probability of finding a "spike", i.e. someone who is so much better than everyone else, as shown by the standard deviation curve:
What measuring the fastest time does, is that it only measures the individual that's the furthest to the right in the curve above. Everything behind is unknown.
It's not very likely that the different sizes in sample groups makes a big difference when it comes to GT3 and GT4 vs GT5 and GT6, because the sample is actually larger in GT3 and GT4 so the probability of finding a spike is much higher there. However, when it comes to determining which of GT3 and GT4 that is the hardest, and which of GT5 and GT6 that is the hardest, it very much comes into play. GT3 has only about half the sample size of GT4, and GT5 has around 3/5th the size of GT6.
There is one last thing that we also need to check for: Changes in the group of players.
- Has their skill level changed?
- Has their gaming equipment changed? (wheels/pads/monitors/seats)
Both of these changes could result in better times even if the game is static in terms of difficulty.
So, how do we actually test this?
1. Make a list of people who has GT3-GT6 (or even better, GT1-GT6 if we want to test all games)
2. Pick a random sample, say 10 people.
3. Have these people perform license tests in GT1 - GT6 under similar conditions and with similar equipment. Maybe each player could perform each licence test 3 times. If it's too much (it is too much, let's face it - GT4 alone is like 80 tests or something) perhaps we could pick a random sample of 5-10 different license tests per game.
4. Collect their times and do the math.
5. Interview the players to get their perspective of the difficulty level of each game and to get a bit of background knowledge of their previous experience of each game.
6. Write a report