It's possible that the Course Maker maps just aren't as efficient as premade maps. I recall that Imari lamented pop-in on these courses, though this is essentially happening on objects miles away, like treelines. At least that has been my experience in my Course Maker tracks, and I have a big five mile-plus one.
I have also noticed on this monster track of mine that there is frame stutter around a certain turn that doesn't have an inordinate amount of trackside clutter, and it happens every time whether there are other bot cars there or not. Maybe it's a loading "seam" point, just a guess. So I'm thinking that at least in Course Maker 1, the game is having to work a little more with these generated assets, and PD is working to bring Course Maker 2 to performance parity with the prebaked tracks.
I think the size of the track would be limited by what you could store in cache, or share amongst friends, or maybe just generate in a reasonable time-frame (assuming the terrain is generated in-game this time). Draw distance itself isn't an issue, as SSRX first demonstrated.
What is an issue is how to manage the scene detail dynamically when you don't know what's going to be in the scene. The level of detail we have in games is only possible by reducing the apparent detail of things as they become less important, be they off-centre from the "stage" being set, hidden (occluded) by other objects, or far off into the distance that their details can't actually be resolved by their pixel footprint. Reducing detail on things that can't be appreciated, or removing them outright if they can't even be seen, allows other things to be more detailed, given a finite rendering budget.
A lot of that, for static scenery, can be pre-computed for maximal efficiency - in racing games, the classical technique is to define a visibility set that is dependent on your location on the track. For tracks that are made in the game, either you need to do that intensive visibility testing before you can drive on it (could take forever) or you use something a little more flexible as you drive. There are plenty of realtime technologies for occlusion culling, distance LoD scaling and such, but they all imply a scene-dependent overhead and can be difficult to tune, and all have failure states and idiosyncrasies. Clearly, just rendering things that are a fixed distance away (as in GT5's course maker) isn't good enough.
I've thought for a long time that Ronda, for instance, is not open yet because that dynamic scene management isn't ready and the "classical" method is intractable with branching roads and everything visible from everywhere (a single loop typically has very limited locations it can view itself from, and you can usually build "tunnels"). Because of the overhead in a dynamic system, it would interfere with anything else that is squeezed into the scheduling on the SPUs (which is all done manually in code). For example, a new sound synthesis method, or a new virtual sound source selection and LoD method (more sound sources with the same memory footprint), both of which would be SPU-based and scheduling critical (buffer swapping in particular, with sounds), must be carefully integrated into any changes in the renderer / scene setup.
It's like you said: you can add these features in incrementally, but you've got to do that SPU balancing every single time, and every time you do that you have a risk of breaking something. Best to get it all working as one package: roll on Spec-II.