Either a new example or an enhancement of the perf example, to exercise/characterize the bandwidth of HBM and DDR