The official recommendation from arm is that the optimal bandwidth = 2bit*core frequency.
For example if the core is running at 500MHz with a trace clock = 80 Mhz = 160 Mhz DDR, the number of trace line needed:
optimal bandwidth / 1 line bandwidth = 2*500Mhz/160 = 6.25 trace line
So with a single core running at 500 Mhz, and 80Mhz trace clock TPIU.PortSize 8 should be fine.
With 2 cores you need 2*bandwidth ⇾ 12,5 trace line, so you need TPIU.PortSize 16