The performance improvement in parallel depends on very many factors. In your case, it matters a lot also whether the cores are distributed among several MPI nodes or all on the same machine. In the latter case you would also have a strong memory overhead, and in fact probably there is no way you can run 800 atoms on a single node, unless you have 32 Gb of RAM or so.