On a general level, the complexity increase for these problems is cubic in the number of atoms (orbitals), so 5x more atoms could quite easily take ~100x longer per step, meaning 2-3 hours in your case.
So the real question is sort of more, how you can reduce the 90 s for the small system! The standard answer to that is to use MPI parallelization, but that is not controlled in the input file but rather from the Job Manager or command line, depending on how you submit the job. I assume your nice machine has something like 14-20 cores so running in parallel would provide a substantial benefit, but I cannot tell if you are already doing that?