This sounds easier than it is. Basically Sophy has 3 inputs: steer, accelate and brake. It checks every millisecond Wat the best input would be.
If you would say, pick the best input in 95% of all times, it wouldn't be able to drive a straight line.
I don't mean for it to pick the best input 95% of the time, and 5% of the time it's totally random.
I'm going to tie the response in with part of Imari's quote below.
Nice idea, but if you've ever played with a dodgy controller or a loose wheel, then you know that random variation on your inputs both slows you down and turns you into a completely unpredictable menace. It would probably work to slow Sophy down, but it would undo a lot of what makes her racecraft great. Instead of being able to go safely side by side with Sophy, low level players would be getting rammed off the road.
You might say that's accurate to the low level racing experience with real humans, but that's exactly why a slow but clean AI is desirable. Slow drivers don't like being rammed off by the AI any more than fast drivers do.
I don't mean that every conclusion the AI has come up with only has a 90% chance of being used, and 10% of the time a random number is thrown in.
For reference, Imari I do understand how a neural network works, I was using the term 'understanding of the physics' purely for brevity of discussion, I appreciate any perceived 'understanding' is an emergent behaviour of the neural network, like an ant colony 'deciding' on the location for a new nest.
What I mean is, seemingly at a 10Hz interval, Sophy takes the inputs from the game data, car location, speed, angle, tyre condition, weight, opponent proximity / position etc. etc. etc., these factors go through the black box, and out of that black box comes a conclusion in the simplistic form of 3 commanded inputs to the game, a commanded steering angle, a commanded throttle % and a commanded brake %.
Obviously I am simplifying this.
My suggestion is merely at say 'Beginner' level, a variation is applied (of random magnitude each calculation) across a range of + / - 10%. (10% may be a bit much but it makes the maths easy to demonstrate what I mean...)
For utter simplicity, the 'randomiser' is a number between 0 and 20. It is randomised as part of each 10Hz calculation, and that is used to pick the % variation (within the range +/- 10% for Beginner).
So Sophy's calculations have concluded that a steering angle of 30* is optimal, with 0% brake and 80% throttle.
The randomizer has randomly landed on 10. For our range of +/- 10%, 10 corresponds to the midpoint, so 0 variation.
Sophy inputs exactly as requested, 30* of steering, 0% brake and 80% throttle.
The next 10Hz interval, the calculation happens again. It's a constant radius corner, and Sophy's black box again concludes 30* steering, 0% brake and 80% throttle.
The randomizer has landed on 13. So we will apply a +3% variation to what Sophy has commanded.
Thus due to not being perfect, Sophy ends up inputting 30* + 3% steering, so 30.9*, 0 + 3% of braking, so 0% still (3% of 0 is still 0...) and 80 + 3% throttle, so 82.4%.
This doesn't mean Sophy randomly spears off into the trees, it means for 1 tenth of a second, she has input 1* more steering than is absolutely optimal, and a tiny bit more throttle than is optimal. For 1 tenth of a second.
The next 10Hz interval, the calculation will start again, taking all of this new position etc. into account.
That is in no way the same as a dodgy controller, or being unable to drive straight or randomly swerving all over the place. It being a percentage of the optimal input means in a straight line / low inputs it'll have almost no impact at all, and because of how the throttle and brakes work the larger sways at higher numbers won't really matter too much, as Sophy uses ABS anyway, and tiny percentage adjustments to throttle opening at high openings don't make much difference to the power output.
Hopefully that explains it better. My suggestion is to leave the black box calculating the optimal inputs every tenth of a second, and just add some fuzziness to the implementation of that optimal input to mimic the slight fallibility of human attempts at hitting exactly the right amount of brake input every time etc. etc.