Your calculation looks healthy. I see no issue: stress and forces are gradually decreasing.
I do not think you need 2 k-points in the out-of-plane direction, i.e., nc can probably be set to 1. I would also reduce the number of k-points in the other directions, as your in-plane unit cell looks relatively large. I would first test basic convergence with respect to k-point sampling using GGA functional that is much cheaper, and do it for different k-point grids to see how much that affects the results. You may also do HSE optimization with less k-points and then increase the grid density when computing the band structure/band gap.
Is there any band gap in this structure when using GGA? You may also check whether your GGA-optimized structure and current HSE-optimized structure have similar GGA (or HSE) gaps or band structure in general. That would allow you checking if using HSE is actually important for relaxation or GGA works good enough.