- 112
- San Marino
Now that a first iteration of the GT Sophy AI has been released to the public, it may be interesting to know how it actually works. I've seen quite a few misconceptions, so I've read the paper (1), and the following is a short summary that I hope is easy to understand for anyone, with or without a background in machine learning.
Basically, GT Sophy is a program that, 10 times per second, gets all relevant information from a Gran Turismo race (coordinates of the center line and boundaries of the track, plus the coordinates of all other cars), and replies with appropriate controller inputs (throttle/brake, left/right steering).
It was trained using Reinforcement Learning (similar to Google's Alpha Zero chess engine), meaning that it has no hardcoded knowledge about racing, like where to brake before a corner, or how to maximize exit speed. Rather, it has learned to drive on its own, being given rewards (for fast lap times or overtakes) and penalties (for track excursions or collisions), and then optimizing its own behavior. After a while, it would manage to complete a lap, and after tens of thousands of hours of simulated racing, it would become strong enough to beat the best human sim racers.
During training, Sophy is penalized for any contact between cars - regardless of who is at fault, which is often hard to determine objectively. As a result, it obeys basic racing ettiquette, and tends to leave room for opponents in tight situations, rather than cutting them off.
It's interesting to note that it didn't learn exclusively by racing against itself, but has also been given specific scenarios to be able to deal with human imperfection. This includes taking corners in traffic (where the driver in front may brake too early) or crowded grid starts (where humans may drive more erratically).
Still, GT Sophy is different from a human driver in a couple of ways. For example, it won't use throtte and brake at the same time. It can determine the position of cars behind it as precisely as the position of cars in front of it, but doesn't know the actual dimensions of the cars. Also, at 100 ms per distinct action (which takes around 25 ms to compute), it acts and reacts more slowly than human drivers - but at the same time, it is more precise and doesn't make mistakes.
There are several aspects of the game that Sophy, as of now, ignores, like manual gear changes, traction control or brake balance. And while it has learned advanced racing techniques (like exploiting slipstream for overtakes, or driving defensive lines) it hasn't been trained on overall race strategy (like fuel saving or tire management). It also doesn't seem to have been exposed to wet or mixed conditions. Of course, all of this could be added in the future.
During training, Sophy is a computer program receiving information from, and sending information to, a cluster of Playstations, continually collecting rewards and penalties and updating the optimal driving model. What has been deployed for the current "Race Together" event is simply a version this model. It receives the race state 10 times per second, and replies with the optimal controller input. It's a static copy of the model, so regardless of how you drive against it, it does not learn from you.
As far as I'm aware, there is no precise information about the different color variations of GT Sophy. In case they are different (beyond the performance of the cars), they could represent different stages of learning, or be optimized for slightly different scenarios. It is also unknown if Sophy has driven all of the available tracks (possible) and all of the available cars (unlikely), so it may not (yet) be an expert at, say, driving a Suzuki Escudo on an off-road course.
(1) Outracing champion Gran Turismo drivers with deep reinforcement learning
Basically, GT Sophy is a program that, 10 times per second, gets all relevant information from a Gran Turismo race (coordinates of the center line and boundaries of the track, plus the coordinates of all other cars), and replies with appropriate controller inputs (throttle/brake, left/right steering).
It was trained using Reinforcement Learning (similar to Google's Alpha Zero chess engine), meaning that it has no hardcoded knowledge about racing, like where to brake before a corner, or how to maximize exit speed. Rather, it has learned to drive on its own, being given rewards (for fast lap times or overtakes) and penalties (for track excursions or collisions), and then optimizing its own behavior. After a while, it would manage to complete a lap, and after tens of thousands of hours of simulated racing, it would become strong enough to beat the best human sim racers.
During training, Sophy is penalized for any contact between cars - regardless of who is at fault, which is often hard to determine objectively. As a result, it obeys basic racing ettiquette, and tends to leave room for opponents in tight situations, rather than cutting them off.
It's interesting to note that it didn't learn exclusively by racing against itself, but has also been given specific scenarios to be able to deal with human imperfection. This includes taking corners in traffic (where the driver in front may brake too early) or crowded grid starts (where humans may drive more erratically).
Still, GT Sophy is different from a human driver in a couple of ways. For example, it won't use throtte and brake at the same time. It can determine the position of cars behind it as precisely as the position of cars in front of it, but doesn't know the actual dimensions of the cars. Also, at 100 ms per distinct action (which takes around 25 ms to compute), it acts and reacts more slowly than human drivers - but at the same time, it is more precise and doesn't make mistakes.
There are several aspects of the game that Sophy, as of now, ignores, like manual gear changes, traction control or brake balance. And while it has learned advanced racing techniques (like exploiting slipstream for overtakes, or driving defensive lines) it hasn't been trained on overall race strategy (like fuel saving or tire management). It also doesn't seem to have been exposed to wet or mixed conditions. Of course, all of this could be added in the future.
During training, Sophy is a computer program receiving information from, and sending information to, a cluster of Playstations, continually collecting rewards and penalties and updating the optimal driving model. What has been deployed for the current "Race Together" event is simply a version this model. It receives the race state 10 times per second, and replies with the optimal controller input. It's a static copy of the model, so regardless of how you drive against it, it does not learn from you.
As far as I'm aware, there is no precise information about the different color variations of GT Sophy. In case they are different (beyond the performance of the cars), they could represent different stages of learning, or be optimized for slightly different scenarios. It is also unknown if Sophy has driven all of the available tracks (possible) and all of the available cars (unlikely), so it may not (yet) be an expert at, say, driving a Suzuki Escudo on an off-road course.
(1) Outracing champion Gran Turismo drivers with deep reinforcement learning