I looked into parallel processing extensively when doing my multiband processing preset pack. So to add to your tips, here are some more.
Some amps draw a lot more CPU power than others. So how many amps you can add depends on the amps+cabs you choose.
In my 4-band presets using the Brit Trem Jump, Brit Plexi Jump, Solo Lead OD, and German Mahadeva, it wasn't possible to have four Amp+Cabs without running out of CPU. However, a workaround that works well is to use two separate Amps, and feed them into one Cab. You lose the ability to tweak each cab individually, but many times, it's the amps that benefit the most from tweaking anyway.
Although this doesn't apply to your situation where you're feeding into the Guitar and Aux In to get stereo, for a given amount of blocks, a parallel path within a single chain will run out of CPU faster than a single path within two chains, set to the same input (e.g., Host in Helix Native).
One of my favorite aspects of Helix is how you can get really great stereo imaging out of it. This isn't always relevant to live use, but for the Native version, it's incredible to get stereo imaging without having to use delays or other workarounds :)