This feature was introduced to make sure that for instance stage2-g4 is build from a stage1-ppc and not a stage1-g4. But this also means stage3-g4 will start from a stage2-ppc. Unless a different stage3 spec is written for each subarch. This can be solved easily by introducing a variable notion to the catalyst spec format: source_subpath for a stage3 would be profile-bla-bla/stage2-${subarch}-bla instead of hardcoded. I think a lot of architectures didn't realise this and have build their optimized stage3 from an unoptimized stage2. An easy way to circumvent this would be to add optimized stage1's.
Same side effect for GRP: G4 GRP would be built against a generic stage3 instead of a G4 stage3. Catalyst users need to overwrite the source_subpath to prevent that from happening
this is the intended behaviour. I don't see a problem with having a different spec file for each subarch.