The systems I have done this on before have been cases with 3x3 k-points, so it hasn't been a real problem to me. But I can see 10 k-points changes things a lot.
The calculation time should be virtually linear in number of nodes, so that would be one way. And, inversely, it's also linear in k-points. So reducing k-points by 2 and nodes by 10, you should have a 20x speedup (give or take). Also, maybe you don't need such high resolution or long energy range.